FIG 2.1 — The Crow · Repertoire Atlas

The vocal map.

Nine emergent clusters of crow vocalization, projected from a 1,024-dim NatureLM-audio embedding with UMAP. Bright dots are real CC-licensed recordings — click one to play the audio with its spectrogram and behavioral context. Softer points illustrate cluster geometry. Hover the legend to isolate a cluster.

Loading 795 vocalizations…

Inline glossary

What you're looking at, in five words apiece.

Embedding: A learned vector representation of a clip. Here, 1,024 numbers per call.
Latent space: The high-dimensional space in which embeddings live; geometry ≈ acoustic similarity.
UMAP: A non-linear dimensionality reducer that flattens 1,024 dims to 2 for inspection.
Cluster: A dense region in the embedding space. Here, found by HDBSCAN on the full 1,024-dim vectors.
Bridge point: A point between two clusters indicating graded acoustic variation, not noise.
Context: The behavior co-occurring with the call, joined from synchronized observation logs.

The deep methodology lives at Latent Space 101 and NatureLM-audio.

What people ask about this.

What is the vocal map?

The vocal map is a two-dimensional projection of a 1,024-dim NatureLM-audio embedding space. Each bright dot is a real CC-licensed crow recording; soft background points illustrate cluster geometry. Click any bright dot to play the audio, see its real spectrogram, and read the AI cluster narrative.

What is UMAP?

UMAP is uniform manifold approximation and projection — a non-linear dimensionality reducer that flattens high-dimensional embeddings to two dimensions while preserving local neighborhood structure. Used here to make a 1,024-dim audio embedding inspectable in a scatter plot.

Are the cluster boundaries discovered or assigned?

Both. HDBSCAN discovers dense regions in the full embedding; the names attached to those regions come from human biologists matching exemplars against the prior descriptive vocabulary in Marzluff & Angell, Mates et al., and Verbeek et al.