What eBird does
Users submit checklists of bird observations from specific locations and times. The platform aggregates these submissions into population-level data products: range maps, abundance estimates, migration timing, distribution-by-season analyses. The data is verified by a network of regional reviewers who flag implausible submissions and resolve identification disputes. The output products are used widely in bird research, conservation, and policy — the eBird Status and Trends maps, in particular, are state-of-the-art for understanding North American bird populations at fine spatial and temporal resolution.
eBird added audio submission capability in the early 2010s.
When audio entered
eBird added audio submission capability in the early 2010s. Users could attach an audio recording (typically a short phone-recorded clip) to a checklist as documentation of an observation, particularly for species that were hard to confirm visually. The audio submissions flowed automatically into the Macaulay Library, becoming part of the world's largest wildlife audio archive. The volume grew dramatically once mobile-phone audio recording became universal, and from roughly 2015 onward, the majority of new Macaulay audio came from eBird-affiliated submissions rather than from dedicated bioacoustics researchers.
What this changed
Three big things. First, the geographic and species coverage of bioacoustic recordings expanded dramatically — places and species that had been undersampled in the dedicated-researcher era now had citizen-science-driven recordings flowing in. Second, the variability of recording quality increased — phone recordings vary much more than dedicated-microphone field recordings — which created both opportunity (more diverse training data for AI models) and challenge (more noise to filter). Third, the absolute volume of bioacoustic data available to researchers grew by orders of magnitude, making large-scale modeling approaches like BirdNET[1] viable in ways they wouldn't have been with researcher-only data.
Why BirdNET depends on this
BirdNET[1], the most-deployed bird-sound classifier, was trained on bird audio that included Macaulay's eBird-fed archive as a major component. Without the eBird-Macaulay pipeline producing millions of recordings across thousands of species, the training data for wouldn't have existed at the scale required. The species coverage BirdNET claims — 6000+ species globally — is downstream of eBird's global community contributing audio at scale. The same is true for to a significant degree, and for any subsequent foundation model in this space. The model architecture is what people credit; the data infrastructure is what makes the architecture useful.
The licensing complication
eBird-contributed audio carries Macaulay Library's licensing constraints (see 'The Macaulay Library and the open-research tension'). Contributors typically aren't licensing their recordings under Creative Commons for unrestricted reuse — they're licensing them to Cornell for archival use. This means that AI models trained on this corpus carry licensing entanglements that vary by use case. Cornell has been pragmatic about research access, but the constraint is real and shapes the bioacoustic data ecosystem in ways the technical AI literature usually doesn't address explicitly. The infrastructure that enables modern bioacoustic AI is partly proprietary, and that has implications for what truly-open downstream projects can do.
What this means for CrowLingo
CrowLingo's atlas is built on Wikimedia Commons recordings rather than eBird-Macaulay submissions, specifically to avoid the licensing constraints. The trade-off: a smaller, more variable, more geographically narrow corpus. The benefit: full open redistribution, no licensing friction, predictable rights for any downstream user. The eBird ecosystem is genuinely the dominant infrastructure for bioacoustic data globally, and most academic bioacoustics research has good reasons to use it. CrowLingo represents the small-but-open alternative — useful for a specific kind of public-reference work, not a replacement for the eBird-Macaulay model.