What Merlin does
Merlin Sound ID, released by the Cornell Lab in 2021 and refined steadily since, listens through a phone's microphone, identifies birds in real time by their vocalizations, and shows the user which species are calling within hearing distance. It runs on a fairly modest phone CPU. It works offline once species packs are downloaded. It supports thousands of species across regions. It's free. As of 2025 it has been downloaded tens of millions of times globally. For most non-specialist users who interact with AI bioacoustics, Merlin is what AI bioacoustics IS. The research community's models, papers, and benchmarks are invisible to this user base — which is most of the user base.
Merlin works because BirdNET works.
Why it could happen at all
Merlin works because BirdNET[1] works. Stefan Kahl's group at the Cornell Lab and Chemnitz UT released in 2021 (with continued updates through v2.4 in 2024) as a CPU-friendly EfficientNet-B0 model trained on the Xeno-canto and Macaulay corpora. The architectural choices were deliberately conservative — fast inference, small model size, broad species coverage — because the research goal from the start was deployment, not just benchmark performance. BirdNET v1 wasn't the best-performing bioacoustic model on internal Cornell evaluations even at release; it was the one that could run on a phone, in a forest, with the screen brightness low. That choice paid off.
The citizen-science effect
Merlin doesn't just identify birds for the user; it logs the identifications, optionally aggregating them into eBird (Cornell's citizen-science database). The aggregation effect is enormous. As of 2025, eBird receives more Merlin-sourced acoustic detections per day than the manual-checklist submissions that defined the platform in the 2010s. For some species in some regions, Merlin-detected occurrences fill gaps in the observational record that field ornithologists would have taken decades to fill. Migration timing, urban-population shifts, dawn-chorus phenology — all benefit from the data scale Merlin enabled. Acoustic monitoring used to be expensive specialist work; now millions of phones are doing parts of it for free.
What it doesn't replace
Merlin is excellent at species identification and broadly useful for presence/absence monitoring. It is not a substitute for research-grade acoustic monitoring on questions where details matter: individual ID (different question; Merlin doesn't try to do it), within-species behavioral classification (Merlin doesn't try), behavioral context inference (same), graded variation (same). Researchers using Merlin data are careful to recognize the species-ID layer as one signal among many, and to triangulate it against passive acoustic monitoring deployments running heavier models on better hardware. The bioacoustic research ecosystem has stratified: phone-scale species ID is solved at the user layer; everything else lives at the research layer.
What Merlin teaches about deployment
The single biggest lesson from Merlin's success is that getting bioacoustic AI into people's hands matters more than getting it onto leaderboards. For half a decade before Merlin shipped, the bioacoustic ML community had been producing increasingly sophisticated models that won benchmarks and never reached anyone outside the lab. Merlin shipped a model that wasn't SOTA, paired it with a tight UX, attached it to a free database, and changed the public's relationship with bird audio in ways academic publishing alone couldn't. CrowLingo's atlas and journal exist in this lineage of thinking: the science is necessary; the deployment is what makes the science load-bearing.
The licensing edge case
Merlin's training data — the Macaulay Library and Xeno-canto subsets — was assembled under license terms that don't permit redistribution of the underlying audio. Users get inference; they don't get the recordings. For CrowLingo's project, that licensing edge is what motivates the Wikimedia-Commons-first corpus strategy: we want a corpus that's redistributable, so contributors and other research groups can build on it. The Merlin approach is fine for an app that delivers inference; it's not fine for an open-data research substrate. Different goals, different licensing constraints.
What's next for the consumer layer
BirdNET[1] v3 (in development) is expected to extend coverage to bats, frogs, and insects — broadening the consumer-app surface beyond birds. Sound ID for mammals (terrestrial and marine) is a separate effort with different ML challenges (less open data, fewer training examples per species). Whoever ships a Merlin-equivalent for any of these domains will produce a downstream data-aggregation effect comparable to what eBird is seeing with Merlin now. The next decade of citizen-science bioacoustics will probably be defined by which consumer-scale models ship to which taxa, in what order.