FIG 0.1 — American crow · 1024-dimensional audio space

We can finally see what we couldn't hear.

A focused exploration of one species and one revolution: the American crow, and the new generation of AI audio models that turn its voice into a map. Seven hundred ninety-five recordings. Nine emergent clusters. One geometry.

Open the vocal atlas →How the pipeline works

795
Recordings: 9
Clusters: 83
Essays: 1024
Dimensions

VOCAL ATLAS · UMAP 2DHDBSCAN · 9 CLUSTERS

Scroll the full atlas to explore all nine clustersOpen →

FIG 0.2 — Three coordinates

Listen. Then find each call on the map.

All nine clusters →

Cluster 01 · Territorial

Perched flock — territorial caws

00:25Suburban pine stand, MN

Cluster 04 · Rattle

Rattle complex — affiliative

00:33Urban park, MN

Cluster 05 · Begging

Juvenile begging + adult exchange

00:40Powderhorn Park, MN

FIG 0.3 — The shift

Discrete categories gave way to a continuous map.

A fifty-year methodology change, compressed into one frame.

The change is not that we found new sounds. The change is that we stopped treating each call as a label, and started treating the whole repertoire as a geometry. Graded variation, dialect, individual signature: all of it visible at once, on the same map, in milliseconds per call.

The old labels survive as labels of regions in the map, not as boundaries on the world.
— CrowLingo Editorial

The 2026 carrion-crow bioRxiv preprint by Demartsev et al. used wearable loggers and this mapping discipline to recover both discrete and graded structure in grunts and caws. Territory by territory, individual by individual. The method is the message: stop sorting, start mapping.

FIG 0.4 — Honest framing

We mapped the language. We have not learned it.

Demonstrated today

What the models can do

Automatic detection and segmentation of crow vocalizations from field audio
Unsupervised category discovery — clusters emerge from geometry, not labels
Caller-identity inference at individual-bird resolution
Behavioral-context mapping across nine distinct call types
Zero-shot captioning via NatureLM-audio foundation models

Not yet here

What's still ahead

Compositional decoding — understanding calls as combined units, not single tokens
Real-time bidirectional dialogue between human and crow
A verified "crow dictionary" with human-readable glosses
Cross-species transfer showing which structures generalize
Field-deployable playback systems with ethical guardrails

FIG 0.5 — Where to next

Four ways in, scaled by commitment.

01 · 90 seconds

Listen first

Three real crow recordings with spectrograms and AI interpretation.

→

02 · 10 minutes

Open the atlas

Interactive 2D map. Nine clusters, real spectrograms, behavioral context.

→

03 · 30 minutes

Read the journal

Eighty-three long-form essays on AI bioacoustics, corvid cognition.

→

04 · An evening

Study the methods

Self-supervised audio, latent spaces, NatureLM-audio. Source by source.

→

Frequently asked

What people ask about this.

What is CrowLingo?+

An independent editorial publication analyzing how AI audio models are changing what we know about American crow vocalization. Built on primary corvid research, real field recordings, and open-source bioacoustic models.

Is CrowLingo translating crow language?+

No. The models can map, cluster, and characterize vocalizations. They cannot translate. We are explicit about this distinction throughout the site.

Who built this?+

CrowLingo is a Kymata Labs publication. Not affiliated with Earth Species Project, Cornell Lab of Ornithology, Project CETI, or any specific research group.

Can I use the recordings?+

Field recordings are CC-licensed by their original contributors. The site's editorial content is CC BY-NC 4.0. Source code is MIT.