PILLAR III — WHAT WE CAN DECODE

We are not translating. We are mapping.

What can today's audio AI actually tell you about a crow call? Honest capability cards: what's demonstrated, what's partial, what's still aspirational. No overclaiming.

Read the capability report →What's still ahead

CAPABILITY CARDS

What we can do, honestly.

Demonstrated

Automatic detection

Segment crow vocalizations from hours of field audio. BirdNET-level accuracy.

Demonstrated

Unsupervised clustering

Discover call categories from geometry, no labels needed.

Partial

Caller identification

Infer individual identity from voice. Works on known individuals.

Partial

Behavioral context

Map calls to behavioral states across nine clusters.

Emerging

Zero-shot captioning

NatureLM-audio generates descriptions. Surprisingly accurate.

Not yet

Compositional decoding

Understanding calls as combined meaningful units. The frontier.

What's still ahead.

The honest map of the frontier.

Open the frontier →

What people ask about this.

What can AI actually decode about crow vocalizations?

Caller sex, individual identity, behavioral context (territorial, mobbing, recruitment, affiliative), and approximate intent — all from a single half-second of crow voice. What it cannot decode: lexical meaning, compositional syntax, or anything that would deserve the word translation.

Do crows have grammar?

Unknown. Statistical models hint at structured composition — caw-rattle sequences are non-random — but the behavioral evidence that crows treat sequence order as meaningful is thin. This is where the next five years of corvid bioacoustic research lives.