What citizen-science platforms actually want

Recordings that pass several filters: clear primary signal (the target species's vocalization is loud enough to dominate background noise), reasonable duration (long enough to capture multiple repetitions of the vocalization, typically 10-60 seconds for most contexts), accurate location and time metadata (which is automatic on phone-based submissions but worth verifying), and species identification confidence (either confirmed by the recorder or flagged as uncertain for community verification). The platforms don't need professional-grade microphones; they need recordings that clearly document what the contributor observed. Phone recordings are perfectly acceptable in most cases, and the quality bar is more about clarity of the target signal than about technical recording quality.

Several practical guidelines that improve the value of citizen-science recordings as training data for bioacoustic AI models.

What helps the AI models

Several practical guidelines that improve the value of citizen-science recordings as training data for bioacoustic AI models. Record continuously for the duration of the encounter rather than just snippets of interesting moments — context audio (silence before and after vocalizations, ambient sound, other species in the background) is part of what the models learn from. Note behavioral context if visible (the bird was foraging, alarming at a Cooper's hawk, calling to a nearby conspecific, etc.) — behavioral context labels make the recording substantially more valuable for AI training. Include multiple recordings of the same species across different conditions, locations, and behavioral contexts rather than treating recording as a one-and-done documentation exercise. Document recordings at locations and times that the existing archive under-represents (less-recorded regions, less-recorded seasons, less-recorded times of day) — your contributions are most valuable where the archive is thinnest.

What to avoid

Several common mistakes that reduce citizen-science recording value. Don't manipulate audio quality before submission (no noise reduction, no level normalization that the platform's processing pipeline can do better); submit the raw recording. Don't record from too close in ways that overdrive the microphone and produce clipping (close enough to hear clearly is enough). Don't speculate about species identification when you're not sure; flag uncertain identifications for community verification rather than guessing. Don't use audio playback to attract birds for recording (this is bad citizen-science practice and is increasingly considered ethically inappropriate; see 'No playback for wild crows' for the parallel argument).

Where the contribution biases are

Citizen-science contributions reflect where citizen-scientists live, travel, and look. The Macaulay archive is geographically biased toward English-speaking countries (particularly the United States, the United Kingdom, Australia), temporally biased toward weekends and good-weather seasons when people are outdoors, and species-biased toward easily-detected species at well-known birding locations. Recordings from outside these biases — from underrepresented countries, from less-popular birding seasons, from less-popular locations — are systematically more valuable for filling gaps in the archive. The data infrastructure is well-served by contributors who target the biases rather than reinforcing them, though doing so requires recognizing where the biases are.

How CrowLingo's corpus relates

CrowLingo's atlas uses Wikimedia Commons recordings rather than Macaulay-archived recordings, specifically because of the licensing differences. The Wikimedia Commons corpus is smaller than Macaulay's and has its own contribution biases (toward European recordings of common species, with American crow coverage relatively thin compared to its actual abundance). If you want to contribute to truly-open bioacoustic data infrastructure, Wikimedia Commons accepts wildlife audio submissions under Creative Commons licensing. Adding good-quality American crow recordings from your local population to Wikimedia Commons directly contributes to the open-data ecosystem that projects like CrowLingo depend on. The atlas would benefit from broader contribution; the broader open-bioacoustic ecosystem would benefit similarly. Citizen contribution to open archives is one of the practical ways to support the field.