What the library is
Founded in 1929 by Albert R. Brand, the Macaulay Library is the world's largest scientific archive of natural-history media — audio, video, photographs of wildlife species. Its growth accelerated dramatically with the integration of eBird's checklist-and-media submissions starting in the late 2000s, which transformed the library from a curated researcher-driven collection into a crowd-sourced flood of citizen-science submissions verified by Cornell-coordinated review. The library is the canonical reference dataset for most North American bird vocalizations and increasingly for global species coverage, and most academic bioacoustics research uses it as a primary data source.
Individual recordings on the Macaulay website are licensed for individual non-commercial use with attribution.
The licensing structure
Individual recordings on the Macaulay website are licensed for individual non-commercial use with attribution. Bulk downloads, programmatic access, and research-scale dataset extraction require Cornell-administered agreements that vary case-by-case. The terms are reasonable for academic non-commercial research collaborations and harder for commercial uses, public data products, or external services. The licensing is not Creative Commons in the way that, e.g., Wikimedia Commons is — recordings are licensed to Cornell for archival use, but not openly to the public for unrestricted reuse. This is a deliberate choice that protects contributor expectations and Cornell's stewardship role, but it constrains downstream use cases.
The eBird integration
eBird is Cornell's citizen-science bird-observation platform with over a billion observations to date. Since the integration, every eBird checklist can include audio recordings that flow directly into the Macaulay Library. This has been a massive accelerant for the archive — most newer recordings now come from eBird-affiliated contributors rather than dedicated bioacoustics researchers. The integration shifted the library from a small carefully-curated archive to a large crowd-sourced one, with all the trade-offs that implies: more coverage, more variability in recording quality, more geographic spread, harder curation work. The eBird-Macaulay relationship is the engine that has made Cornell's holdings what they are.
How this shapes the bioacoustics field
Two ways. First, almost every published bioacoustic finding on North American birds depends on Macaulay data at some point — either as the source corpus, as a validation reference, or as a comparison baseline. The field's empirical foundation is largely Cornell's archive. Second, the open-research tension constrains what the field can do publicly. Open-source bioacoustic foundation models (BirdNET[1], , NatureLM-audio[2]) train on Macaulay-derived data under research agreements but typically can't redistribute the underlying audio. End-user applications built on these models can recognize species without ever exposing the training audio. This is a workable arrangement, but it means the foundational data isn't truly open in the way some open-science advocates would prefer.
What CrowLingo uses and doesn't
CrowLingo's atlas uses recordings sourced from Wikimedia Commons under Creative Commons licenses, not Macaulay-library recordings, specifically to ensure that the corpus is truly open and redistributable. This is a smaller corpus than Macaulay's, but it has the property that anyone — researchers, downstream developers, AI training pipelines — can use the same audio without licensing friction. The trade-off is real: Macaulay has far more recordings, far better geographic coverage, far better species coverage. Wikimedia Commons has the licensing freedom. For an atlas designed to be a public reference, the licensing freedom mattered more than the dataset size.
The open-data future
Several initiatives are working on truly-open bioacoustic datasets at scale. iNaturalist's audio integration is growing. The Open Bird Sounds project aims for openly-licensed bird audio at species coverage approaching Macaulay's. Earth Species Project explicitly champions open-science approaches and has begun open-sourcing aspects of its work. The bioacoustics field is in the early stages of a tension between the accumulated-archive value of Macaulay-style proprietary collections and the open-redistribution value of CC-licensed alternatives. Both have roles. Both will probably grow. The relationship between them will shape what becomes possible in animal-language AI over the next decade.