What CETI is trying to do

Sperm whales produce stereotyped click sequences called codas. The basic acoustic structure has been documented since the 1970s. CETI's central technical bet is that contemporary AI methods — primarily sequence-modeling techniques borrowed from language modeling — can recover finer structure in coda sequences than classical methods have managed, and that the finer structure may correspond to meaning the whales themselves use. The project deployed underwater hydrophone arrays in the Caribbean to record sperm whale family groups over multiple years, accumulating vocalization datasets at a scale earlier sperm whale researchers couldn't match. The technical hope: with enough data, language-model-style approaches might surface structure analogous to phonemic or sub-lexical structure in human language.

The 2022 roadmap paper by Andreas, Roger Payne, Pratyusha Sharma, David Gruber and others — published in iScience — is the cleanest summary of CETI's framing and early findings.

What they've found so far

The 2022 roadmap paper by Andreas[1], Roger Payne, Pratyusha Sharma, David Gruber and others — published in iScience — is the cleanest summary of CETI's framing and early findings. Sperm whale codas appear to have more variation along multiple acoustic dimensions than classical coda-type classifications recognized. Some of that variation correlates with social context. Some of it is individual or family-group specific. Whether the variation carries functional meaning the whales themselves decode remains the open question — and the team has been explicit that it remains open. Subsequent work has refined the descriptive findings; the functional question has not been settled, and the team's communication has not pretended otherwise.

What's admirable about the CETI framing

The Andreas[1] et al. 2022 roadmap paper is unusual in animal-language AI for explicitly naming the epistemological risks of the project. Translation may be the wrong frame. Meaning may not be lexical in the way human language is. We may discover the absence of structure where popular coverage assumed its presence. These caveats are stated in the project's defining document, not buried in appendices. The team's public communication has tracked the same posture — and the project has weathered the inevitable popular-coverage exaggerations without endorsing them. That's rare and worth noting. Animal-language AI projects that don't pre-commit to this kind of epistemic humility produce credibility-damaging walkbacks that hurt the field generally.

The marine-vs-corvid contrast

CETI works underwater on a single species producing stereotyped click sequences in groups of known individuals across multi-year datasets. Corvid AI work — CrowLingo's project specifically — works above water on a single species producing varied vocal types in territorial groups with comparatively shorter individual-tracking horizons. The technical methods overlap (sequence modeling, -based analysis), but the empirical constraints differ substantially. CETI has fewer call types per individual but more sequence-structural data. Corvid work has more call types per individual but less sequence-structural data. Both face the same receiver-side problem: how to validate that signal structure carries meaning the receiver decodes.

What CETI's data scale suggests

CETI's multi-year, multi-million-dollar effort to assemble sperm whale vocalization data at AI-pretraining scale has produced datasets in the tens-to-hundreds of thousands of vocalization range. Sperm whale vocalizations are relatively easy to attribute to individuals (acoustic source localization works well underwater); the data accumulation challenge is mostly logistical. Corvid vocalizations are harder to attribute, but wearable-logger work (Demartsev 2026) is closing that gap. The general lesson: even with focused, well-funded, multi-year efforts, animal-language AI data scales are several orders of magnitude smaller than human-language AI data scales. The methods have to be adapted to small-data regimes, and the claims have to be calibrated to what small-data methods can defend.

The shared lesson

Both CETI and CrowLingo are doing similar work in different domains: cataloging, characterizing, mapping the geometry of a species' vocal repertoire, then carefully testing what the geometry corresponds to behaviorally. Both face the same fundamental constraint — receiver-side meaning isn't observable from sender-side audio alone. Both have explicitly chosen the careful-cataloging path over the translation-claim path. The shared lesson, stated plainly: animal-language AI in 2026 is doing good map-making. It is not doing translation. Anyone claiming otherwise — for whales, for crows, for any species — is either ignoring the receiver-side problem or speaking imprecisely. Map first; legend later.