Domenico Scarlatti

Domenico Scarlatti (1685-1759) is well-known for his 555 keyboard sonatas. Although his work is greatly revered by many professional musicians, some claim that it does not show any compository development. That is why I, with my data science mindset, started to wonder: can I use machine learning techniques to prove compository development through the work of Scarlatti?


To be able to compare the sonatas, each sonata of Scarlatti is represented as a normalized string over a finite alphabet by stripping the MIDI files. Hereafter his sonatas are clustered by normalized compression distance (NCD): an algorithmical similarity metric with no musical background knowledge, which is rooted in Kolmogorov Complexity (KC), a measure that captures the similarity between any two sonatas in a single number. More details about the method can be found in the publication. Before experimenting with Scarlatti’s sonatas, the NCD-based clustering method was tested with two experiments.

Normalized compression distance is a measure that captures the similarity between any two sonatas in a single number.

Experiment #1: Capturing the similarity between genres

In the first experiment, twelve jazz, twelve rock and twelve classical pieces were used. They are well clustered but a few classical pieces were erroneously clustered with rock songs:

NCD Dendrogram of Genre Classification

The exact reason is unknown, though my suspicion is that this might be due to relatively rich harmonic diversity. Colloquially speaking, Schumann’s Kinderszenen might be more akin to Nirvana than to Bach’s Wohltemperirte Clavier, because both consist of a few closely related chords only.

Experiment #2: Capturing the similarity between three classical music composers

The second experiment is somewhat more specific as it tries to distinguish between three classical music composers: J.S. Bach, D. Scarlatti and R. Schumann. Each composer clearly has his own cluster, suggesting the NCD works quite well:

NCD Dendrogram of Classical Classification

Scarlatti’s 555 sonatas

Finally, each of the 555 sonatas is compared with every other sonata using the NCD. The clustered dendrogram of the 555 sonatas looks somewhat intimidating, but a close inspection reveals the method recognizes musical similarities quite well. The NCD detected two highly similar sonatas from Scarlatti, K.34:

Sonata K.34

and K.40:

Sonata K.40

The NCD detected that K.149 is one of the least similar sonatas with both K.34 and K.40:

Sonata K.149

Both the sheet music and the audio files confirm the similarity claim of the NCD. Scarlatti could have almost written K.34 and K.40 in a single afternoon, much like variations on a theme or movements in a suite.

Compository development

The notion “compository development” can be defined based on changes in style through the lifetime of the composer. If a composer (or any artist for that matter) develops his or her style, their later works are different from earlier works, regardless of whether one considers it an improvement or not. An explicit definition of the notion of compository development is given with regard to Scarlatti’s sonatas:

Compository development is encountered at sonata k if the average NCD of sonata k and each of its previously composed n sonatas is greater than a constant value c. Mathematically speaking, if the following expression evaluates to true for k − n > 0,

Here, X is the collection of Scarlatti’s 555 sonatas in preprocessed MIDI format. Essentially, if there is little to no difference between the kth sonata and each of its n previously composed sonatas according to the chronological indexation, there will be no compository development across these n sonatas. The higher the value of c, the less similar a certain sonata is compared to its predecessors. Being less similar signifies change, or compository development, though I do not venture to verdict on the quality of such developments.

To estimate whether compository development occurs throughout Scarlatti’s oeuvre, this equation will be used for each sonata relative to a number of previously composed sonatas. To represent this, a least squares polynomial fit is generated of the marks, each representing compository development at that sonata:

Compository Development

The higher the mark, the stronger the compository development is. Interesting are the peaks in the line around sonatas K.40, K.100 and K.500. This might indicate an increasing amount of compository development. The ‘milestone sonatas’, i.e. the sonatas which are compository the most different from earlier work, are sonatas K.40, K.100, K.200 and K.410.


Compository development can be detected by comparing each sonata with previous compositions. Several of Scarlatti’s sonatas signify increased changes in style, which can therefore be considered as ‘milestone sonatas’.

Although artistic debates on composition, style and execution of Scarlatti’s sonatas are likely to continue, it is now scientifically speaking quite hard to maintain that Scarlatti’s work shows “no progressive development in style”. Though it must be said that in a legacy as massive as Scarlatti’s, there is bound to be some repetition and similarity too.


Special thanks to Daan van den Berg & Vadim Zaytsev for their help!


The publication can be found here.