Skip to content
CrowLingo

Navigate CrowLingo

Jump to any page. Type to filter.

Decoding · Sub-page

Contextual clustering — geometry meets behavior.

A cluster is just a number until you join it to what the crow was doing. The join is where geometry turns into meaning.

Three-ring sunburst — behavioral wedges, call-type subdivisions, representative waveform glyphs.
IG · 05 · REPERTOIRE · CONTEXT
Sunburst — six behavioral wedges in the inner ring (Territorial, Alarm/Mobbing, Recruitment, Foraging, Parent-Offspring, Affiliative); call-type subdivisions in the middle ring; representative waveform glyphs in the outer.

The join

The pipeline (stages 5–7) produces two tables: an audio table with cluster IDs, and a behavior table with timestamped observations. Joined by time window — usually a few-hundred-millisecond tolerance around the call — they yield a third table: per cluster, the distribution of behaviors that co-occurred.

The shape of that distribution is the signal. A cluster whose calls occur 80% during territorial defense and 5% during foraging is doing something different from a cluster that splits evenly across contexts. The first is interpretable; the second is either an encoding artifact or a genuinely context-generic call type (greetings, contact).

What the wearable-logger work showed

The 2026 carrion-crow paper (Demartsev et al., bioRxiv) is the cleanest recent example. The team deployed wearable audio loggers on a cooperatively breeding crow population, capturing audio and accelerometry per individual. The behavior log was the accelerometer trace, time-aligned to the second.

When they clustered the vocal embeddings and joined to the accelerometry-derived behavioral states, they recovered both the discrete repertoire structure (clusters that map cleanly to a single behavior) and graded structure (grunts that vary continuously with motor activity). The latter is the part that was invisible before — graded variation that the old hand-labeling regime had to either squeeze or discard.

What this is not

It is not "the crow says X means Y." The joined-distribution captures co-occurrence, not semantics. We don't know whether the call causes the behavior, describes the behavior, or simplyaccompanies the behavioral state.

Distinguishing those would require intervention — playback experiments with calibrated control — which the ethics floor constrains heavily and which the Respond stage only barely starts to address.

The honest interpretation

Contextual clustering gives you a probabilistic map from acoustic form to behavioral context. That map is a strong foundation for designing playback experiments, for spotting outlier calls worth investigating, and for distinguishing repertoire change over time. It is not, by itself, a dictionary.