PILLAR II — THE METHODS
Self-supervised audio, from scratch.
The models that make CrowLingo possible: BEATs, Perch, NatureLM-audio. They never saw a label. They learned acoustic structure from millions of unlabeled wildlife recordings, then we asked them about crows.
dimensions
1024
UMAP
2D
density
HDBSCAN
no labels
SSL
SUB-TOPICS
Six concepts that matter.
FOUNDATION
Self-supervised audio
How contrastive pretraining turns spectrograms into embeddings without labels.
DIMENSIONALITY
Latent space & UMAP
Projecting 1024 dimensions to two without losing neighborhood structure.
CLUSTERING
HDBSCAN
Density-based clustering that finds groups without specifying how many.
MODELS
BirdNET vs Perch vs NatureLM
Three foundation models, one crow call, very different embeddings.
PARADIGM
Traditional vs ALP
From hand-labels to self-supervised embeddings — the methodology shift.
MODEL
NatureLM-audio
The first wildlife audio LLM. Zero-shot captioning of animal sounds.
See the full pipeline.
Eight stages, from field recording to AI captioning.
Open the pipeline →