Stowell 2022: the deep-learning bioacoustics review that grounds the field

What Stowell brought to the field

Dan Stowell^[1] at Queen Mary University of London (later Tilburg University) is one of the longest-active researchers at the intersection of audio analysis and bird vocalization research. His career spans music-information-retrieval roots, computational bioacoustics, and deep-learning audio applications. By 2022 he had been working at this intersection for over a decade and had accumulated the synthesizing perspective that made the review paper possible. The paper is a senior-researcher review: someone who has watched the field develop, knows the literature deeply, and has the credibility to be both critical and constructive about methodological choices.

The review argues that deep learning has transformed computational bioacoustics over the previous five years (2017-2022) and that the field is at a methodological inflection point.

The core argument

The review argues that deep learning has transformed computational bioacoustics over the previous five years (2017-2022) and that the field is at a methodological inflection point. The argument has three components. First: classical signal-processing approaches (handcrafted features, fixed-rule classifiers) are being decisively outperformed by deep-learning approaches across most bioacoustic tasks. Second: the deep-learning architectures that work best are convergent on a small set of patterns (CNN backbones for inputs, transformer-based architectures for more complex tasks, attention mechanisms for selective input processing). Third: methodological challenges remain substantial, particularly around training-data biases, evaluation protocols, and the gap between benchmark performance and field deployment performance.

What the paper covers

Several major topic areas. Spectrograms as input representations and the variations on them (mel-spectrograms, MFCC features, raw waveforms). Architectural choices for different bioacoustic tasks (species classification, individual identification, behavioral context, multi-species detection). Training-data considerations including the role of data augmentation, transfer learning from related domains, self-supervised pretraining. Evaluation metrics and the gap between in-distribution and out-of-distribution performance. The dataset ecosystem (Macaulay Library, BirdCLEF challenges, AudioSet, smaller specialty corpora). The paper is encyclopedic without being shallow; it provides genuine synthesis rather than just enumeration.

What the paper argued was needed

Several methodological agenda items that the review identified as priorities for the field. Better evaluation protocols that test real-deployment performance, not just held-out test sets from the training distribution. More attention to cross-domain generalization (a model trained on Macaulay recordings may underperform on field PAM deployments due to acoustic environment differences). More work on multi-species and multi-individual scenarios that current models often struggle with. More integration with downstream conservation and ecology applications, treating bioacoustic AI as means to ecological understanding rather than as its own performance benchmark. Better treatment of unbalanced datasets where rare species are systematically under-represented in training data. The review was constructive about where work needed to happen, not just descriptive of what had been done.

What has happened since 2022

Several developments since the review have addressed some of its priorities. (2025) explicitly worked on cross-domain generalization and produced improved performance on PAM-deployment scenarios. NatureLM-audio^[3] (2025) applied self-supervised pretraining at large scale. Self-supervised approaches generally have moved from research-frontier to practical-default in just a few years. The BirdCLEF competitions have continued to drive evaluation standards forward. Multi-species detection has gotten substantially better. Some of the gaps the review identified (training-data licensing constraints, deployment-performance gaps) remain substantial; others (architecture choices, basic methodology) have stabilized.

Why this matters for CrowLingo

The atlas's framing of bioacoustic AI methodology — what it can do, where the limits are, what the current best practice is — derives partly from the synthesis Stowell^[1]'s review provided. The references to BirdNET^[2], , NatureLM-audio^[3], and the broader landscape of bioacoustic foundation models are made comprehensible by the methodological context the review established. For readers who want to go deeper into how the AI actually works, Stowell 2022 is the entry-point reference; it's accessible to readers with some technical background and is freely available through PeerJ Computer Science as open-access. The atlas points there as the recommended next-step reference for technically-oriented readers.

Quick answers from this piece.

What is Stowell 2022?

Dan Stowell's review paper 'Computational bioacoustics with deep learning: a review and roadmap,' published in PeerJ Computer Science in 2022. The most-cited methodological reference in modern bioacoustic AI work. Synthesized the state of the field, identified methodological priorities, and helped set the agenda for subsequent work. Open-access and freely available.

What did the review argue?

Three main claims. Deep learning had transformed computational bioacoustics over the prior five years and was decisively outperforming classical signal-processing approaches. The deep-learning architectures that worked best converged on a small set of patterns (CNN backbones, transformers, attention mechanisms). Methodological challenges remained, particularly around training-data biases, evaluation protocols, and the gap between benchmark and field-deployment performance.

Is this paper still relevant in 2026?

Yes, as the foundational synthesis reference. Some specific methodology has advanced (Perch 2.0 cross-domain work, NatureLM-audio self-supervised pretraining), but the basic framework Stowell laid out remains current. For technically-oriented readers wanting to understand how bioacoustic AI works, the review is the recommended entry point. Most subsequent papers in the field cite it as their methodological grounding.

What Stowell brought to the field

The core argument

What the paper covers

What the paper argued was needed

What has happened since 2022

Why this matters for CrowLingo

Quick answers from this piece.

What is Stowell 2022?

What did the review argue?

Is this paper still relevant in 2026?

Stowell 2022: the deep-learning bioacoustics review that grounds the field

What Stowell brought to the field

The core argument

What the paper covers

What the paper argued was needed

What has happened since 2022

Why this matters for CrowLingo

Quick answers from this piece.

Cited in this piece.

Self-supervised audio learning, explained for non-engineers

How to read a bioacoustics paper

How AI is decoding crow vocalizations in 2026

BirdNET vs Perch 2.0 vs NatureLM-audio: the practical 2026 guide

Stowell 2022: the deep-learning bioacoustics review that grounds the field

What Stowell brought to the field

The core argument

What the paper covers

What the paper argued was needed

What has happened since 2022

Why this matters for CrowLingo

Quick answers from this piece.

Cited in this piece.

Self-supervised audio learning, explained for non-engineers

How to read a bioacoustics paper

How AI is decoding crow vocalizations in 2026

BirdNET vs Perch 2.0 vs NatureLM-audio: the practical 2026 guide

What Stowell brought to the field

The core argument

What the paper covers

What the paper argued was needed

What has happened since 2022

Why this matters for CrowLingo

Quick answers from this piece.

Cited in this piece.

People who read this also read

Self-supervised audio learning, explained for non-engineers

How to read a bioacoustics paper

How AI is decoding crow vocalizations in 2026

BirdNET vs Perch 2.0 vs NatureLM-audio: the practical 2026 guide

What Stowell brought to the field

The core argument

What the paper covers

What the paper argued was needed

What has happened since 2022

Why this matters for CrowLingo

Quick answers from this piece.

Cited in this piece.

People who read this also read

Self-supervised audio learning, explained for non-engineers

How to read a bioacoustics paper

How AI is decoding crow vocalizations in 2026

BirdNET vs Perch 2.0 vs NatureLM-audio: the practical 2026 guide