Methodology

Data sources, analytical procedure, classifier architecture, training set documentation, cross-validation results, known limitations, and fair-use determination. An independent researcher should be able to read this page and replicate the analysis.


§ 01Overview

This study asks whether Stephen Sondheim's compositional practice contains a measurable, distinctive vocabulary of musical techniques, and whether that vocabulary can be distinguished quantitatively from the broader Broadway tradition. We test this claim across 40 Sondheim songs and 670 comparator songs by six major composers plus ten additional categories, spanning the golden age through the contemporary era.

The analysis identifies eight candidate signatures, extracts musical features from score-level data, trains a multi-label classifier, and measures per-measure occurrence rates with statistical tests for significance. Findings are cross-validated against published musicological scholarship and Sondheim's own writings.

The core result: of eight signatures tested, only one (lyric-music misalignment) distinguishes Sondheim by rate. The remaining seven are shared with the tradition at comparable or higher rates. What is distinctive is not the vocabulary itself but the harmonic grammar in which it operates.


§ 02The Corpus
710 songs total

The corpus comprises two pools: the Sondheim catalog (40 songs for which Sondheim is the sole credited composer) and a comparator pool (670 songs across 17 categories). All scores are ingested from symbolic music formats, not audio.

Sondheim catalog
MetricValue
Songs40
Musical events50,516
Measures6,054
Shows13
Source format: MusicXML23 songs
Source format: MSCZ (MuseScore)17 songs
Shows represented: A Funny Thing Happened on the Way to the Forum, Company, Follies, A Little Night Music, Pacific Overtures, Sweeney Todd, Merrily We Roll Along, Sunday in the Park with George, Into the Woods, Assassins, Passion, Anyone Can Whistle, Road Show.
Exclusions: West Side Story (composed by Bernstein, lyrics by Sondheim), Gypsy (composed by Styne), Do I Hear a Waltz? (composed by Rodgers). These shows are excluded because Sondheim served as lyricist only; the compositional vocabulary belongs to the credited composer.
Comparator pool
Composer / CategorySongsNotes
Lloyd Webber294Phantom, Cats, Evita, JCS, Joseph, Starlight, Aspects, Tell Me
Rodgers274Sound of Music, Oklahoma, Carousel, South Pacific, Pal Joey, King and I
Bernstein65West Side Story, Candide
Miranda7Hamilton, Moana, Encanto, In the Heights
Herman5Hello Dolly, Mame
Guettel5Light in the Piazza
Porter4Anything Goes, Begin the Beguine, So In Love
+ 10 smaller categories16Six, Grease, Rent, Wicked, Chorus Line, others
Additional data sources
SourceEntriesDescription
Hooktheory39Beat-level chord and melody annotations
ChoCo795Chord annotations corpus
iRealPro32 + 16332 parsed chord charts, 163 indexed
Lyrics-MIDI Dataset69 + 18469 Broadway entries, 184 lyrics JSONs
Sourdough984Broadway MIDIs extracted from 5M dataset
Golden-age baseline107Gershwin, Porter, Kern, Berlin

§ 03Feature Extraction
72-dimensional vector

Feature extraction operates on two parallel pipelines: music21 for MusicXML sources (23 songs) and ms3 for MuseScore MSCZ sources (17 songs). Both pipelines produce the same output schema, a 72-dimensional feature vector per musical event, enabling unified downstream analysis.

Each song is decomposed into a sequence of musical events. There are 14 event types (note onset, chord change, rest, time signature change, key change, dynamic marking, articulation, phrase boundary, suspension resolution, cadence point, melodic peak, melodic trough, harmonic shift, lyric syllable).

Extractors (12 modules)
ExtractorDomain
dominant_pedalHarmonic: sustained dominant bass under changing harmonies
lyric_music_alignmentProsodic: stress alignment between lyric syllables and metric position
chromatic_ascentMelodic: stepwise chromatic motion spanning a minor third or more
arrested_cadenceHarmonic: cadences that set up resolution but deflect
harmonic_plot_twistHarmonic: unexpected chord substitution at structurally significant points
metric_displacementRhythmic: melodic or harmonic accents offset from the metric grid
melodic_refusalMelodic: refusal to resolve stepwise to the expected target pitch
motivic_seedThematic: short intervallic cells that recur across a song or show
textural_revealOrchestral: texture changes that expose or conceal voices
suspension_resolverVoice leading: suspension preparation, dissonance, and resolution
chord_degree_extractorHarmonic: Roman numeral analysis relative to local key
transition_matrixHarmonic: chord-to-chord transition probabilities

Each extractor produces typed annotations tied to beat positions within the score. The vectorizer consolidates these annotations into the 72-dimensional vector used for classifier training and statistical analysis. Dimensions encode binary presence, scalar magnitude, and categorical features.


§ 04The Classifier
signature_classifier_v2.pt

The classifier is a multi-label feedforward network that takes the 72-dimensional feature vector and outputs independent probabilities for each of the eight signatures.

ParameterValue
ArchitectureFeedforward, multi-label
Input dimension72
Output dimension8 (one per signature)
Total parameters18,120
Loss functionBCEWithLogitsLoss
Train/test splitSong-level (no event leakage across songs)
Training labelsSeed labels (rule-based, see Limitations)

The song-level split ensures that no events from the same song appear in both training and test sets. This prevents the classifier from memorizing song-specific patterns rather than learning generalizable signature features.

Training labels are generated by rule-based seed labelers, not by human annotation. This introduces circularity (see Limitations below). The classifier's primary role is to scale the analysis across the full corpus; the editorial findings rely on the rate-based statistical tests, not on individual classifier predictions.


§ 05Statistical Methods
significance testing

All comparisons between Sondheim and comparator pools use per-measure normalization to mitigate format bias. Raw event counts are divided by the number of measures in each song, producing rates that are comparable across scores of different lengths and source formats.

Tests applied
MethodPurpose
Chi-squared testTest whether signature occurrence rates differ significantly between Sondheim and each comparator pool
p-valuesSignificance thresholds at 0.05, 0.01, and 0.001
Cohen's hEffect size: quantifies the practical magnitude of rate differences, independent of sample size
95% confidence intervalsUncertainty bounds on per-measure occurrence rates for each signature

Per-measure normalization is the primary defense against format bias: MIDI-sourced comparators may encode musical information differently from MusicXML and MSCZ sources. Normalizing to measures ensures that a song with more encoded events (due to format verbosity) is not counted as having more signature occurrences. This mitigates but does not fully eliminate the bias (see Limitations).

Effect sizes (Cohen's h) are reported alongside p-values because large sample sizes can produce statistically significant results for trivially small differences. A finding is considered substantive only when both the p-value is significant and the effect size is at least small (h ≥ 0.20).


§ 06Known Limitations

The following limitations are structural to the current version of the analysis. Each is noted where it affects specific findings in the entry.

1. Format bias
The Sondheim catalog is sourced from MusicXML and MSCZ files. The comparator pool is predominantly MIDI. These formats encode musical information at different levels of fidelity: MusicXML preserves articulations, dynamics, and voice leading that MIDI discards. Per-measure normalization mitigates this asymmetry but does not eliminate it. All non-lyric signature ratios carry this asterisk.
2. Classifier circularity
The seed labels that train the classifier are generated by rule-based heuristics, not by human annotation. The classifier therefore learns to replicate the rules, not to discover independent patterns. Pillar 1 (lyric-music misalignment) is computed independently of the classifier and is not affected. Non-distinctive findings are provisional until human-reviewed labels (200 to 300 events) break the circularity.
3. Guettel sample size
The Guettel comparator pool contains five songs from a single show (The Light in the Piazza). Any observations about Guettel's proximity to Sondheim's vocabulary are framed as hypotheses consistent with acknowledged mentorship, not as conclusions. Additional Guettel data (Floyd Collins, Myths and Hymns) would strengthen or weaken these observations.
4. Reharmonized return
The analysis of reharmonized returns (772 instances across 10 Sondheim shows) has no comparator baseline. We cannot determine whether Sondheim reharmonizes returns more frequently than other composers until the same analysis is run on the comparator pool. This finding is reported as a catalog observation, not as a distinctive feature.

§ 07Cross-Validation Sources
published scholarship

Findings are cross-validated against published musicological scholarship and Sondheim's own writings. Where the model identifies a signature occurring at a moment Sondheim discussed in his books, the entry presents both the model's finding and the composer's own description.

Stephen Sondheim
Finishing the Hat (2010)
Music straightjackets a poem, whereas it liberates a lyric.
Stephen Sondheim
Paris Review interview
I'm someone who believes that the heart of music is harmony, as opposed to melody.
Stephen Schiff
The New Yorker, 1993
Don't allow a melody to plunk neatly into place; don't allow it to resolve; don't give it a home.
Adam Guettel
Library of Congress essay, November 2025
He divided melody into conversational clauses, as Stravinsky divided folk melodies into cells.
Stephen Citron
Sondheim and Lloyd Webber: The New Musical, Oxford University Press
More like a crossword puzzle...intricate cacophony of rests and melodies.
Steve Swayne
How Sondheim Found His Sound, University of Michigan Press, 2005
Harmonic language, voice leading, and structural analysis across the early and middle catalog.
Kristen Hudlow
Harmony, Voice Leading, and Drama in Three Sondheim Musicals, LSU dissertation, 2013
Systematic harmonic analysis of Company, Sweeney Todd, and Into the Woods.
Studies in Musical Theatre
Vol. 6, No. 3, 2012
The Herrmann chord in Sweeney Todd: cinematic harmonic language in theater.