Methods · Sub-page

Traditional vs ALP.

Two ways of seeing the same animal. The older view drew lines; the newer one draws geometry.

Vertical split: six discrete labeled call tiles (1970s–2010s) versus a continuous iridescent UMAP cloud (self-supervised era). — IG · 02 · BEFORE · AFTER
Left: six discrete named call types, each a tile, joined only by a dotted boundary. Right: a continuous map of points colored by context, with bridges between regions and inline exemplar spectrograms.

The traditional regime — 1970s to mid-2010s

A trained ear listened to spectrograms. A small set of named types — caw, rattle, assembly, alarm, begging, companion — were defined by visual + acoustic criteria. Every clip was hand-coded to one of them. Inter-rater agreement was good for prototypical exemplars and poor for graded ones, so graded cases were either forced into a category or excluded as "unusual."

That approach produced everything we know about crow communication: that caws encode caller sex and identity (Mates et al. 2014); that mobbing calls recruit family; that assembly calls bring groups to a roost. It was a monumental achievement built on careful ears and patient annotation.

The ALP regime — late 2010s to now

Self-supervised audio models give you a vector per clip without any human ever labeling one. Project the vectors, cluster them, audition exemplars, name the clusters. The named types from the old regime survive — as labels of dense regions in the map, not as boundaries on the world.

The graded cases, which the old regime had to either squeeze or discard, now show up as bridges between clusters. Individual signatures show up as substructure within clusters. Dialect shows up as centroid drift between family groups. All of it visible at once, on the same map, from the same vectors.

What we gave up

Three things. Interpretability per axis: a cluster boundary is a density gradient, not a clean definition we can write down. Encoder coupling: everything is relative to the model used; change the model and the geometry shifts. The illusion of completeness: the old taxonomy felt closed. The new map is permanently open — there's always more graded structure to find.

What we gained

Scale, honesty, and detail. We can analyze tens of thousands of calls in the time it used to take to hand-code a few hundred. We can stop pretending graded variation isn't there. And we can see structure — individual signatures, dialect, hints of composition — that wasn't recoverable at all before.

Navigate CrowLingo

Traditional vs ALP.

The traditional regime — 1970s to mid-2010s

The ALP regime — late 2010s to now

What we gave up

What we gained