Methodology
Data sources, analytical procedure, classifier architecture, training set documentation, cross-validation results, known limitations, and fair-use determination. An independent researcher should be able to read this page and replicate the analysis.
This study asks whether Stephen Sondheim's compositional practice contains a measurable, distinctive vocabulary of musical techniques, and whether that vocabulary can be distinguished quantitatively from the broader Broadway tradition. We test this claim across 40 Sondheim songs and 670 comparator songs by six major composers plus ten additional categories, spanning the golden age through the contemporary era.
The analysis identifies eight candidate signatures, extracts musical features from score-level data, trains a multi-label classifier, and measures per-measure occurrence rates with statistical tests for significance. Findings are cross-validated against published musicological scholarship and Sondheim's own writings.
The core result: of eight signatures tested, only one (lyric-music misalignment) distinguishes Sondheim by rate. The remaining seven are shared with the tradition at comparable or higher rates. What is distinctive is not the vocabulary itself but the harmonic grammar in which it operates.
The corpus comprises two pools: the Sondheim catalog (40 songs for which Sondheim is the sole credited composer) and a comparator pool (670 songs across 17 categories). All scores are ingested from symbolic music formats, not audio.
| Metric | Value |
|---|---|
| Songs | 40 |
| Musical events | 50,516 |
| Measures | 6,054 |
| Shows | 13 |
| Source format: MusicXML | 23 songs |
| Source format: MSCZ (MuseScore) | 17 songs |
| Composer / Category | Songs | Notes |
|---|---|---|
| Lloyd Webber | 294 | Phantom, Cats, Evita, JCS, Joseph, Starlight, Aspects, Tell Me |
| Rodgers | 274 | Sound of Music, Oklahoma, Carousel, South Pacific, Pal Joey, King and I |
| Bernstein | 65 | West Side Story, Candide |
| Miranda | 7 | Hamilton, Moana, Encanto, In the Heights |
| Herman | 5 | Hello Dolly, Mame |
| Guettel | 5 | Light in the Piazza |
| Porter | 4 | Anything Goes, Begin the Beguine, So In Love |
| + 10 smaller categories | 16 | Six, Grease, Rent, Wicked, Chorus Line, others |
| Source | Entries | Description |
|---|---|---|
| Hooktheory | 39 | Beat-level chord and melody annotations |
| ChoCo | 795 | Chord annotations corpus |
| iRealPro | 32 + 163 | 32 parsed chord charts, 163 indexed |
| Lyrics-MIDI Dataset | 69 + 184 | 69 Broadway entries, 184 lyrics JSONs |
| Sourdough | 984 | Broadway MIDIs extracted from 5M dataset |
| Golden-age baseline | 107 | Gershwin, Porter, Kern, Berlin |
Feature extraction operates on two parallel pipelines: music21 for MusicXML sources (23 songs) and ms3 for MuseScore MSCZ sources (17 songs). Both pipelines produce the same output schema, a 72-dimensional feature vector per musical event, enabling unified downstream analysis.
Each song is decomposed into a sequence of musical events. There are 14 event types (note onset, chord change, rest, time signature change, key change, dynamic marking, articulation, phrase boundary, suspension resolution, cadence point, melodic peak, melodic trough, harmonic shift, lyric syllable).
| Extractor | Domain |
|---|---|
| dominant_pedal | Harmonic: sustained dominant bass under changing harmonies |
| lyric_music_alignment | Prosodic: stress alignment between lyric syllables and metric position |
| chromatic_ascent | Melodic: stepwise chromatic motion spanning a minor third or more |
| arrested_cadence | Harmonic: cadences that set up resolution but deflect |
| harmonic_plot_twist | Harmonic: unexpected chord substitution at structurally significant points |
| metric_displacement | Rhythmic: melodic or harmonic accents offset from the metric grid |
| melodic_refusal | Melodic: refusal to resolve stepwise to the expected target pitch |
| motivic_seed | Thematic: short intervallic cells that recur across a song or show |
| textural_reveal | Orchestral: texture changes that expose or conceal voices |
| suspension_resolver | Voice leading: suspension preparation, dissonance, and resolution |
| chord_degree_extractor | Harmonic: Roman numeral analysis relative to local key |
| transition_matrix | Harmonic: chord-to-chord transition probabilities |
Each extractor produces typed annotations tied to beat positions within the score. The vectorizer consolidates these annotations into the 72-dimensional vector used for classifier training and statistical analysis. Dimensions encode binary presence, scalar magnitude, and categorical features.
The classifier is a multi-label feedforward network that takes the 72-dimensional feature vector and outputs independent probabilities for each of the eight signatures.
| Parameter | Value |
|---|---|
| Architecture | Feedforward, multi-label |
| Input dimension | 72 |
| Output dimension | 8 (one per signature) |
| Total parameters | 18,120 |
| Loss function | BCEWithLogitsLoss |
| Train/test split | Song-level (no event leakage across songs) |
| Training labels | Seed labels (rule-based, see Limitations) |
The song-level split ensures that no events from the same song appear in both training and test sets. This prevents the classifier from memorizing song-specific patterns rather than learning generalizable signature features.
Training labels are generated by rule-based seed labelers, not by human annotation. This introduces circularity (see Limitations below). The classifier's primary role is to scale the analysis across the full corpus; the editorial findings rely on the rate-based statistical tests, not on individual classifier predictions.
All comparisons between Sondheim and comparator pools use per-measure normalization to mitigate format bias. Raw event counts are divided by the number of measures in each song, producing rates that are comparable across scores of different lengths and source formats.
| Method | Purpose |
|---|---|
| Chi-squared test | Test whether signature occurrence rates differ significantly between Sondheim and each comparator pool |
| p-values | Significance thresholds at 0.05, 0.01, and 0.001 |
| Cohen's h | Effect size: quantifies the practical magnitude of rate differences, independent of sample size |
| 95% confidence intervals | Uncertainty bounds on per-measure occurrence rates for each signature |
Per-measure normalization is the primary defense against format bias: MIDI-sourced comparators may encode musical information differently from MusicXML and MSCZ sources. Normalizing to measures ensures that a song with more encoded events (due to format verbosity) is not counted as having more signature occurrences. This mitigates but does not fully eliminate the bias (see Limitations).
Effect sizes (Cohen's h) are reported alongside p-values because large sample sizes can produce statistically significant results for trivially small differences. A finding is considered substantive only when both the p-value is significant and the effect size is at least small (h ≥ 0.20).
The following limitations are structural to the current version of the analysis. Each is noted where it affects specific findings in the entry.
Findings are cross-validated against published musicological scholarship and Sondheim's own writings. Where the model identifies a signature occurring at a moment Sondheim discussed in his books, the entry presents both the model's finding and the composer's own description.