What the embeddings show

Take a cluster centroid — the mean vector of all calls of one type from one family group. Compute it separately for each group in a population. The centroids differ. The differences are larger than within-group variation. They're consistent across years. They correlate with geographic distance between groups in some studies. That's the signature of dialect in the descriptive sense: shared acoustic conventions inside a group that diverge from the conventions of other groups, in ways that statistical models can detect. This isn't a 2026 finding. Mates[1] et al. showed something close in 2014 using hand-extracted features; the AI-embedding-based pipelines reproduce the result at finer resolution and full automation. The contemporary contribution is not novelty — it's scale and rigor.

Explanation one: cultural transmission with local modification.

Three explanations, three weights of evidence

Explanation one: cultural transmission with local modification. Young crows learn the acoustic conventions of their family group and pass them forward. This is the explanation that earns the word dialect in the human-language sense. Strong descriptive support from playback studies in other corvids (notably ravens), suggestive but not airtight for American crows specifically. Explanation two: genetic variation correlated with geography. Population genetic structure could produce acoustic centroid differences without any learning. Weak support — American crow populations are not strongly genetically differentiated at the relevant spatial scales, but the alternative isn't ruled out. Explanation three: local acoustic environment shaping production. Calls that propagate well in dense forest differ from calls that propagate well in open suburbia. Plausible for some features, doesn't account for the magnitude of observed differences. The honest scientific position: cultural transmission is the leading hypothesis. It is not yet proven.

Individual signatures as the easier case

Individual identity is more securely recoverable than group dialect. Inside any cluster — say, long territorial caws — if you color the points by which crow produced them, the points still sub-cluster. Each individual's caws form a sub-cluster inside the call-type cluster, separated from other individuals along consistent acoustic dimensions. The most reliable separator is harmonic emphasis: the relative loudness of second and third harmonics versus the fundamental. Identity is acoustically stable across years for individuals who survive that long. Marzluff[2]'s neighborhood-bird recognition work, which trained humans by ear, found the same thing the AI now confirms at scale: crows are individual.

The dialect claim's three layers

Layer one is defensible: there is measurable inter-group acoustic variation that exceeds intra-group variation, in shared call types, in multiple studies. Layer two is suggestive: this variation likely reflects cultural transmission rather than only genetics or local-environment acoustics. Layer three is not yet science: that the variation carries functional meaning to the crows — that a crow from group A would behave measurably differently to a call from group B than to one from group A. Functional dialect claims require playback experiments with cross-group exemplars, which is exactly the kind of intervention the ethics floor of responsible bioacoustic research makes hard. The honest statement: we have strong descriptive evidence for dialect at layers one and two, and almost no functional evidence at layer three. Anyone claiming definitive answer to whether crows have dialects in the strong sense is either ignoring layer three or speaking imprecisely.

Why this matters for the future

Designing playback experiments to test functional dialect is where the next decade of behavioral work lies. Playing a generic territorial caw into a territory is a different experiment than playing that territory's own caws back. The response should differ if the crows distinguish individuals and group conventions. Whether anyone runs those experiments at the scale and care required depends on funding cycles, IACUC review processes, and the field's appetite for slow, careful work that doesn't generate viral headlines.

What we know, what we don't

We know individual crows carry acoustically distinct voices and that family groups carry distinct acoustic conventions. We know AI pipelines recover both at finer resolution than hand-engineered features. We do not know whether those conventions function as communication-shaping rules the birds themselves track, or as artifacts of learning that don't change behavior. The work to settle that question hasn't been done. CrowLingo's editorial floor: when the science is at layer two, the prose is at layer two. We name what we know, we name what we don't.