Map versus dictionary

The single most-repeated wish: that the public would internalize the distinction between mapping a vocal repertoire and translating a language. Mapping the geometry of bird vocalizations is what the AI methods do well and what the field has gotten dramatically better at since 2022. Translating those vocalizations into human language is what the field cannot do, will not do soon, and probably will not do with the contemporary methods at all. Both can be true. Both ARE true. The popular framing conflates them roughly every six months.

The second-most-repeated wish: that popular coverage would stop treating the receiver-side decoding problem as a technicality that future research will resolve.

Receiver-side problem isn't a detail

The second-most-repeated wish: that popular coverage would stop treating the receiver-side decoding problem as a technicality that future research will resolve. The receiver-side problem — whether animals, hearing a call, demonstrably use the call's information to do something — is the actual rate limiter on translation claims. It's not a detail. It requires behavioral experiments that are ethically expensive and methodologically demanding. It cannot be solved by training larger models. Most AI bioacoustics speculation in popular outlets implicitly assumes the receiver-side problem is solvable by scale; the working scientists generally don't share that assumption.

Field bioacoustics is slow on purpose

Patient field observation — Heinrich's tradition, the McGowan program, the Marzluff Seattle work, the Demartsev[3] wearable-logger studies — produces findings that the AI methods need to anchor against. The slowness isn't lazy; it's calibrated to the resolution at which the questions can be answered honestly. Every time popular coverage characterizes traditional bioacoustics as 'outdated' or 'replaced by AI,' a working scientist somewhere updates their priors about how seriously to take that publication going forward. AI has accelerated some questions; it has not replaced the slow questions that the slow methods are best at.

BirdNET on a phone is a bigger deal than NatureLM-audio in a lab

Most working bioacousticians would, if pressed, name the Merlin phone app and BirdNET[1] as the most consequential AI bioacoustics deployment of the past five years — not the latest research model that wins benchmarks. Reason: deployment scale changes what the public's relationship with bird audio looks like, which changes citizen-science data flows, which changes the data infrastructure the research field can rely on for the next decade. A research model that wins on a benchmark and reaches twelve people doesn't move the needle as much as a deployed model that reaches twelve million. The hierarchy of importance from inside the field is not the hierarchy popular coverage often implies.

Open data is the bottleneck, not algorithms

Algorithm-side progress has been rapid. Data-side progress has been slow because the underlying audio corpora are licensed restrictively (Macaulay Library), encoded inconsistently (decades of audio in different formats), or simply not collected (most species' close-range and quiet vocalizations). Working bioacousticians who pay attention to where progress will come from generally bet on data more than algorithms. The Demartsev[3] wearable-logger work is interesting not for new methods but for the data it generated. The same observation applies to American crow work: when comparable wearable-logger studies happen for American crows, the field will jump forward in ways no algorithm release would deliver.

Ethics is a real constraint, not a footnote

The ethics floor in bioacoustic research — no playback within ten meters of nests, IACUC review for any vertebrate-wildlife playback, etc. — is not regulatory paperwork. It's how the field maintains the relationship with wild populations that makes the science possible. Popular coverage that implicitly assumes 'we'll just run more playback experiments' as the path to translation underestimates how constraining the ethics floor is and how seriously working scientists take it. CrowLingo doesn't deploy playback features for the same reason: the ethical floor for a public-facing site is stricter than the research-lab floor because the user base is uncontrolled.

What working scientists are actually excited about

Wearable bioacoustic loggers (the methodology that produced the Demartsev[3] paper) generalizing to more species. Open-tooling infrastructure (Voxaboxen, weights, BEANS benchmark) accumulating into a shared substrate. Cross-disciplinary collaboration between behavioral ecologists and ML researchers becoming the default rather than the exception. Slow, careful work on receiver-side validation finally getting the funding it deserves. Notably absent from the working-scientist excitement list: imminent translation, AI-mediated human-animal conversation, dictionary-style decoders. These are popular-coverage staples that don't show up in the actual research community's enthusiasm.