Skip to content
CrowLingo

Navigate CrowLingo

Jump to any page. Type to filter.

The Crow · Sub-page

How a crow makes sound.

The vocal anatomy that makes the repertoire possible — two independent sound sources, fine muscular control, no need to stop breathing to call.

The syrinx, not the larynx

Birds don't have a larynx the way mammals do. They have a , located at the bottom of the trachea where it bifurcates into the two bronchi. Each side of the bifurcation has its own set of vibrating membranes, controlled by independent sets of muscles. That dual-source architecture is the single biggest reason birds can do acoustic things mammals can't.

The most striking consequence: a bird can produce two independent notes at once, one from each bronchus. Many passerines exploit this for complex harmonic structure or even for trilling at impossibly fast rates by alternating between sides. Crows use the dual source more conservatively — most caws are single-source — but the spectral richness of rattles, complex calls, and certain mimicry behaviors comes from the two channels overlapping.

The frequency window

Crow vocal energy lives roughly between 200 Hz and 8 kHz. The fundamental of most adult caws sits around 500–1500 Hz, with strong harmonics up to about 4 kHz and weaker harmonics extending higher. Juvenile begging dips lower at the fundamental and reaches less high; rattles and certain alarm calls extend up toward 8 kHz before the energy tails off.

This window is why bioacoustic preprocessing (stage 3) bandpasses between 200 Hz and 8 kHz. Below 200 Hz is traffic and wind rumble; above 8 kHz is mostly noise for crows specifically. The bandpass preserves what the model needs to hear and drops what would only confuse it.

Muscular control and learned variation

The syringeal muscles are under direct neural control, which is what lets corvids learn vocalizations. Crows aren't as virtuosic at vocal learning as parrots, but they can modify their calls based on social context, can selectively imitate sounds in captivity, and demonstrably acquire some call features culturally from their family groups (the substrate for dialect).

What this means for the methods

The bandpass window, the dual sources, and the muscular fineness all map directly into properties that recover. The 200–8 kHz range is where SSL embeddings concentrate variance; harmonic structure (which dual sources enrich) is what individual-signature decoding latches onto; learned variation is what allows the dialect signal to be there in the first place.

In other words: the anatomy is the substrate of the repertoire, the repertoire is the substrate of the map, and the map is what the rest of CrowLingo is about.