What the paper showed

Working with a long-term field-marked population in upstate New York, the authors recorded territorial caws from known individual crows, extracted a battery of hand-engineered acoustic features (mean fundamental frequency, harmonic-to-noise ratio, harmonic emphasis ratios, duration), and trained a supervised classifier to predict caller identity from features. The classifier achieved accuracy substantially above chance on held-out caws from known individuals. Two findings cascaded from this. First, individual identity is acoustically recoverable from a single caw — the classifier's success demonstrated that the acoustic features carry individual-distinctive information, not just species-distinctive information. Second, the most reliable individual discriminator turned out to be harmonic emphasis: the relative loudness of the second and third harmonics versus the fundamental. Different birds emphasize different harmonics consistently, year over year.

Mates et al.

Why the methods are dated and the finding isn't

Mates[1] et al. used hand-engineered features and a supervised classifier. Modern AI bioacoustics uses self-supervised audio foundation models that produce learned 1,024-or-1,536-dim embeddings, which then go into downstream classifiers (or directly into similarity-search pipelines). The methods could not be more different. But the finding is method-independent. Mates et al.'s individual-identity result has been reproduced — at finer resolution, with full automation — by every modern -based pipeline that has looked. The acoustic signature of individual identity in American crow caws is real. It is robust to method change. It would have been discovered eventually by the modern methods even if Mates et al. had never been published; Mates et al. just got there first with the tools available in 2014.

Why it's the anchor for modern work

Modern AI bioacoustics methods can produce findings that are method-artifact rather than reality-anchored. An model can produce clusters that look meaningful but are artifacts of preprocessing variance, training-data distribution, or parameter choices. Distinguishing real findings from artifacts requires anchors — established findings from independent methodologies that any new method has to reproduce or explain away. Mates[1] et al. is the canonical American crow individual-identity anchor. When a new embedding pipeline claims to recover individual identity, the first defensible test is: does it reproduce Mates et al.'s harmonic-emphasis finding? When the answer is yes (and it usually is), the new method has earned some credibility on its other claims. When the answer is no, you have a methodological problem to explain before you publish.

What it does and doesn't claim

The 2014 paper claims individual identity is acoustically recoverable. It does not claim crows themselves use individual identity in their day-to-day interactions, though that claim is plausible and supported by adjacent work (notably the Marzluff[2] face-recognition program, which establishes the social substrate). It does not claim group-level dialect, though the methods could be extended to test for it (and have been, by subsequent papers). It does not claim translation; it does not claim compositional structure; it does not claim alarm-call referentiality. The paper is empirically modest and methodologically careful, which is exactly why it has aged well.

How it connects to the embedding methods

When a modern pipeline trained on American crow audio recovers individual identity from a single caw, what it has discovered is the same signal Mates[1] et al. discovered. The embedding compresses the original 24-bit, 48-kHz, multi-second waveform into a 1,536-dim vector. Within that vector, several dimensions correlate strongly with the harmonic-emphasis features Mates et al. measured by hand. The compression preserves the signal; the embedding method just got there without having to be told what to look for. From a CrowLingo perspective: every claim our atlas makes about individual identity in cluster narratives is downstream of Mates et al.'s finding. The methods are different; the empirical anchor is the same.

Why this matters for layer-2 dialect claims

Individual identity is layer-one: a sender-side acoustic feature that classifiers can recover. Group-level dialect is layer-two: a sender-side acoustic convention that groups share differently from other groups. Whether dialects function for the crows themselves is layer-three: a receiver-side question that requires playback experiments. The cleanest scientific position holds these layers separate. Mates[1] et al. nailed layer one for American crows. The contemporary -based pipelines extend the methodology to layer two with reasonable confidence. Layer three remains open — and conflating it with the layers below it is the most common popular-coverage error in this space.