The minimum viable rig
A directional shotgun microphone, a portable recorder, and a phone running a timestamped behavior log. Anything beyond that is optimization. Anything less and you start losing data you can't recover. For the microphone: Sennheiser ME66 or MKE600-class shotgun, or a Rode NTG-2. Off-axis rejection matters more than absolute sensitivity. Pair with a foam-and-fur windscreen ('dead cat'). For the recorder: Zoom H1n or H5 at 48 kHz / 24-bit, mono, lossless WAV. Phone-only recording is acceptable in a pinch but a real recorder gives you preamp headroom and an accurate clock. For the behavior log: any phone app that timestamps events works; CSV or JSONL is the right output format.
48 kilohertz, 24-bit, mono.
Sample rate, bit depth, channels
48 kilohertz, 24-bit, mono. Crow energy lives between 200 hertz and 8 kilohertz; 48 kHz captures everything with comfortable headroom for any downstream filter. 24-bit preserves quiet calls without quantization noise eating the spectral grain that arousal signatures live in. Mono is the right choice for almost every use case — stereo doubles your file size for no per-clip benefit, and most models embed mono anyway. If you have a stereo microphone, record stereo and let preprocessing collapse to mono; if you only have a mono mic, don't waste storage on synthetic stereo.
Behavior log discipline
Keep it append-only and structured. CSV or JSONL. Each row is a timestamped observation: time, crow ID, behavior, partner, notes. Use stable crow IDs (banded if available, descriptor like 'left-eye-mark' otherwise). Resist the urge to interpret in the log — 'long caw, head low, tail flicked' is data; 'agitated' is interpretation that belongs in analysis, not in the log. Behavior logs are the asset that turns recordings from acoustically interesting into scientifically useful. Without one, the audio is an acoustic specimen with no behavioral provenance.
The clap-sync trick
At session start, clap once near the microphone with the phone visible. You'll align audio and log at analysis time by that clap: the recorder timestamps the audible transient, the log timestamps the visible clap event, and the difference is your offset. For sessions longer than twenty minutes, clap every twenty minutes — recorder clocks drift; logs drift too; multiple sync points let you correct for both. Forgetting to clap-sync is the single most common preventable mistake in citizen-science bioacoustic recording. The data isn't ruined, but the joining gets fuzzy.
Where you can point the microphone
Public spaces, your own backyard, parks where recording is allowed. Not within ten meters of an active nest at any time of year. Not during the breeding-season hours (dawn to mid-morning in spring) for the first three weeks after a nest is active. Not anywhere that requires permits you don't have. The 'within ten meters of an active nest' rule is the most-violated one because backyard nests are tempting and feel benign — the literature on disturbance is clear that they're not. Move further away and use a more directional mic.
What to do with the files
Three things, in order. First, archive locally with metadata embedded — ffmpeg metadata tags for location (city-coarsened, not GPS-precise), recordist name, license. Second, license the recordings CC-BY-SA 4.0 or compatible if you want them to be redistributable. Third, contribute to a public archive: Xeno-canto, Wikimedia Commons, or a research lab that takes contributions. CrowLingo accepts CC-BY-SA contributions for the v1 corpus via contact@kymatalabs.com — see the frontier/contribute page for the pipeline. Recordings sitting on a personal hard drive are scientifically inert. Recordings in a public CC-licensed archive are reusable indefinitely.
The honest expectations
Most citizen-science crow recordings are conversational baseline — territorial caws, contact calls, the acoustic equivalent of small talk. Researchers have plenty of those. What's scarce and valuable is recordings of less-common contexts: juvenile begging at fledging, multi-individual mobbing sequences against a known predator, quiet grunts captured close enough to be audible, novel call types you haven't heard locally before. If you can capture any of those with synchronized behavioral notes, you're contributing data the field doesn't already have. The ten-thousandth standard caw recording is less valuable than the first careful juvenile-fledging dataset from your neighborhood.