World Music's DIVERSITY and Data Visualisation's EXPRESSIVE POWER collide. A galaxy of INTERACTIVE, SCORE-DRIVEN instrument model and theory tool animations is born. Entirely Graphical Toolset Supporting World Music Teaching & Learning Via Video Chat ◦ Paradigm Change ◦ Music Visualization Greenfield ◦ Crowd Funding In Ramp-Up ◦ Please Share

Saturday, June 25, 2016


Music Notation From MusicXML: Rule-Based (Java & XSLT) vs Data-Driven (Javascript & D3.js)

Music notation has at first glance been thoroughly commoditized. Whether using browser DOM (SVG) or WebGL (Web3D), the future seems to lie higher up the value chain - in music visualization. The motivation is simple: musicians 'hear' the written note, learners seek visual models.

The move towards this graphical 'added value' appears, however, to be being hindered. In this context, the two questions explored here are:

  • Why is browser-based music visualization having such a hard time taking off?
  • Why only muted interest in web-based augmented and virtual reality (Web3D) music learning applications?
The XSLT constellation has until now been viewed as the 'go-to' technology for MusicXML processing. It's claimed clean, well-proven, reasonably quick and elegant. (Disclaimer: I've never used it -> 'popup expert').

Sluggishness of progress in both of the above areas, however, suggest a problem in the current technology stack - one or more technical disconnects somewhere among the most widely used exchange format, MusicXML, the technologies used to build SVG or Web3D models, and those used to manipulate them in the browser.

Could this obstruction lie -at least in part- in the very technology widely being used to generate notation in the browser?


XSLT forms the base of a set of software libraries widely used to transform MusicXML into SVG (a language describing sophisticated graphics of use with the browser DOM (Domain Object Model).

In this post, we try to identify any disconnects, figure out where the benefits of the XSLT technology constellation end and possible impediment begins, and, as exemplified by the banner image above, explore a powerful, data-driven alternative.

In data visualization, the terms 'transformation' and 'transition' are often (and wrongly) used interchangeably.

Transformation refers to a conversion of data container format (such as from MusicXML to SVG, our focus here), or of the end properties (position, size, shape, color, viewpoint or scale) of a graphical screen object following application of a deterministic function to each point in it's underlying data set.

Transition, on the other hand, describes an action itself (the deterministic function) used to achieve a transformation. Several transitions may be employed (chained) to achieve a transformation.

MusicXML's Reach

We looked (in separate post) at the shared role of MusicXML in the 'literal' DOM/SVG vs 'figurative' WebGL/Web3D worlds, and how imagery captured in the former could be repurposed in the latter.

Because it offers such a huge spectrum of opportunity for online social value delivery, this constellation should be the first focus of anyone working towards solutions in music teaching or learning.

MusicXML And The Browser DOM

With MusicXML as standard or default exchange format at the root of current score-building capabilities, in a browser DOM context, we currently have a choice between at least two clearly differentiated open source technology stacks.

On the one hand the dominant, specification-led, rule-based XSLT software constellation, and on the other a test-driven, fly-by-the-seat-of-your-pants javascript approach.

At our disposal on the javascript side, however, are some powerful data visualization libraries - including their flagship, D3.js.

MusicXML And WebGL

Other than for bitmap-based notation, MusicXML appears to have little WebGL / Web3D foothold whatsoever. This can perhaps be ascribed to the following:
  • no provisioning workflows for MusicXML-based information (including difficulties transferring and integrating structured musical semantics to, and imagery -as skins- with, scripted screen objects)
  • no MusicXML integrations into otherwise powerful declarative Web3D languages
  • the ad-hoc nature of Web3D interaction, which runs counter to music notation's tightly time-synchronized structure
  • the general lack (other than the expressively very limited MIDI interface) of standards-based support for haptic and gestural musical instrument sensors


The XSLT specification defines XSLT as "a language for transforming XML documents into other XML documents" - of which both MusicXML and SVG are widely-used examples.

The XSLT Programmer's Reference (2nd Edition) suggests XSLT can be used to:
  • convert one XML-based music exchange format to another, for example from MusicXML to MEI or SMDL formats
  • convert any these into XML-based visual notation (scalar vector graphics format SVG)
  • play the music on a synthesizer, by generating a MIDI (Musical Instrument Digital Interface) file
  • Perform a musical transformation, such as transposing the music into a different key or extracting parts for different instruments or voices.
  • Extract the lyrics, into HTML or into a text-only XML document
  • Capture music from non-XML formats and translate it to XML (XSLT 2.0 is especially useful here).
So, basically from more or less anything to XML, and from XML to more or less anything..

With MusicXML adopted as a W3C standard for music exchange, XSLT and it's suite of tools (XSL, XPath, XQuery etc) have attained the status of a transformational music technology. With XSLT's help, the marriage between MusicXML and SVG notation has, you might say, been made.

There are others approaches, but these tend to focus (using, for example, Domain Specific Languages, or DSLs) on expressing the underlying structures of music more succinctly, in the end offering a level of visual interactivity only marginally different from what is already available using XSLT.

MusicXML And D3.js

According to Interactive Data Visualization for the Web, D3 is intended primarily for explanatory as opposed to exploratory visualizations, the latter of which help you discover significant, meaningful patterns in data.

This is an important distinction - and one that our layered, highly flexible configuration processes for instruments and theory tools modeling is (with D3.js help) intended to overcome.

D3's main uses are:
  • loading data into the browser’s memory
  • creating data bindings
  • initial presentation
  • Client:
  • making selections
  • styling and scaling (transformations)
  • interactivity (transitioning elements between states in response to user input)
Implemented with care in an aggregator platform, to the above list we can add:
  • provisioning workflows covering all possible visual model configuration variants
  • domain-affine (classification hierarchy) storage and retrieval
  • dynamic exchange (single page application) and interworking (peer-to-peer) of visualization models
Parsing in a D3.js context is layered process concerned ultimately with the creation of direct, graphical bindings. Parsing of MusicXML, however, is a fairly complex and layered process aimed at transforming serial data into timeline-based parallel parts and voices.

Moreover, as hinted at above, we want both parsing and any resulting SVG notation generation done on the server, leaving only it's manipulation (transformation during, for example, playback) in the browser.

XSLT's Technical Disconnect

It's somehow telling that XSLT barely features in reviews of, or searches on, tools for data visualization. This is only reinforced by the fact that XSLT's role in music seems to have been capped at the level of notation.

XSLT's potential should, logically, at least extend to the generation and driving of complex, score-driven, exchangeable, time-synchronized and highly interactive on-screen visual SVG animations.

XSLT certainly facilitated the creation of beautifully typeset notations in a wide variety of fonts, but possibly at the cost of interactivity and flexibility.

Though the DOM/SVG opportunities have by no means been exhausted, the problem is only exacerbated as we move towards a WebGL/Web3D context. The following diagram may help illustrate this:
Where any instrument / theory tool disconnect might be attributed to technical complexity, MusicXML's use in the Web3D provisioning chain is still in it's infancy.

I have seen it claimed there is currently no great interest in WebGL music applications. Given the size of the music learning industry, I find this difficult to believe.

What seems more likely is that no-one has yet been able to formulate a technical concept bridging the gap between MusicXML and WebGL.

XSLT Limitations

So what are the limitations of XSLT? Mike Bostock, one of the creators of D3.js sums these up thus:

"Extensible Stylesheet Language Transformations (XSLT) is another declarative approach to document transformation. Source data is encoded as XML, then transformed into HTML using an XSLT stylesheet consisting of template rules. Each rule pattern-matches the source data, directing the corresponding structure of the output document through recursive application. XSLT’s approach is elegant, but only for simple transformations: without high-level visual abstractions, nor the flexibility of imperative programming, XSLT is cumbersome for any math-heavy visualization task (e.g., interpolation, geographic projection or statistical methods)'.

"Cumbersome". Mmm. These remarks (admittedly made before the release of XSLT version 2.0) have profound implications for music visualization, the missing 'visual abstractions' referring to the direct and locally manipulable bindings between data and (say) SVG. Interpolation weaknesses are presumably mentioned because they form the basis for smooth (jitter-free) animations.

There have of course been substantial improvements (explored further below) since then. As of 2010, for example, all leading browsers support client side XSLT transformation. The question is how close these come to the direct data bindings of D3.js.

XSLT And Synchronization

Transforming MusicXML into SVG notation according to embedded time tags is an area where XSLT implementations still seem to have the upper hand.

The XSLT Programmer's Reference suggests that one of the benefits of XSLT is "separating (decoupling) data from presentation", in practice achieved by each different stylesheet rendering data in a different way, and resulting in 'one data source, multiple resulting formats, and conversion between the formats'.

The moment data and presentation are decoupled in a music visualization context, however, we invite a number of issues: synchronization drift between audio and visual elements, less directly accessible data, data fragmentation (and hence a greater likelihood of duplication).

The best guarantee of audio-visual synchronization, on the other hand, is to keep sound and image generation intimately bound to the unique underlying data, and directly associated with hardware timers. Using D3.js, this is pretty straightforward.

XSLT And Flexible Configuration

With the XSLT approach, however, each product format -whether data or document- is effectively a silo. This may be a valid requirement in an MVC environment where the application is in control of presentation (one set of data being tailored for example to mobile vs desktop use, a conventional hierarchical web site, the peculiarities of individual browsers, or simply different user roles), but is inappropriate in a data-driven scenario, where data and it's (sometime multiple) usages are closely bound, the resulting presentation type, whether audio, music notation (essentially a scatterplot), tree, choropleth, map or bar graph is under direct user control.

This disconnect would also conflict with our layered approach to instrument and theory tool configuration, where a change at a lower level (the scale length of a stringed instrument, for example) impacts everything (tunings, fret positions, fingerboard roadmap etc) above it.

The XSLT stack, then, remains suited to rigorously targeted B2B (business-to-business) communications, allowing conversion between a variety of document formats, but less so to reusable and extendable musical animations and the ad-hoc control actions of users in a shared, P2P (peer-to-peer) music environment.

Both XSLT and D3.js rely on an XML parser to convert the XML document into a tree structure. Where XSLT and D3.js differ, however, is that XSLT is designed to navigate an own tree modeled from XML, whereas D3.js is used to directly navigate the browser's DOM tree. These models, though similar, are not the same. Indeed, the XSLT approach results in two distinct tree models on the way to an SVG display - with additional processing overhead.

In both cases, however, we might be forgiven for expecting the final results to be broadly similar.

XSLT And Rule-Based Music Glyph Placement

I have no experience with XSLT, but one thing in particular sounds an alarm: MusicXML has in recent years come to cater to hard-coded music glyph positioning information in MusicXML files.

This suggests one of two things: either XSLT does not lend itself well to scalar (algorithmic) positioning, or -however this positioning is calculated- it tends, in certain situations, to break down.

This is strange for one simple reason: best musical typesetting practice is a long established and well documented skill. Time division ('div') information contained in MusicXML should, in combination with the accumulated notes at any given position, be entirely enough to satisfy scalar/algorithmic positioning - in both the horizontal and the vertical.

XSLT vs D3.js

XSLT is one of a few close analogues of D3.js. Both are document transformers, or 'visualization kernels'.

XSLT, a declarative language (describes not how to do a task, but what to do) with templating capabilities which lend it some functional characteristics.

D3.js -a functional programming language- looks vaguely imperative - but it's behavior is feels declarative..

Many existing notation systems are built on an XSLT foundation, so in theory much of the work involved in readying music exchange files for new applications (in areas such as in-browser, SVG-based music visualization, or WebGL-based augmented or virtual reality web applications) may well already have been done.

Why only 'may'? All the above demand accurate synchronization of data, image and audio, and flexible, powerful image visualization and transformation capabilities, yet XSLT quick hits limitations in these areas. The following diagram hints at why:

So the problem may in part due to browser-technology affinities:
  • D3.js is tightly integrated with, and has gone to considerable lengths to stay abreast of developments in, the core web technologies: HTML5, CSS, SVG and especially javascript (ES6, ES2016 (ES7), ES2017..). XSLT, on the other hand, integrates these technologies more as accessories. With it's own ecosystem, XSLT could be said to be of less direct relevance to the web experience.
  • D3.js integrates well with some declarative languages used in the emerging WebGL field. XSLT has no particularly visible footprint in this area.
  • D3.js's focus is on achieving behaviors proven across most major browsers, tending to blend out any behavioral abnormalities. The XSLT ecosystem, on the other hand, tends towards a perfect result in each individual browser.
  • Lastly, working with XSLT is eased using proprietary XML authoring and execution software, whereas D3.js code can be written and debugged using any of many open-source SDKs in conjunction with the browser's inspector.
Because of their close data-to-DOM bindings, data-driven applications respond instantly and dynamically to changes in a data set. Though XSLT is already a declarative language (describes what you want to do, rather than how to do it), can it really be considered data-driven? Given it's expressive shortcomings ("cumbersome") compared to D3.js, on certain types of client-side transformation, it may be obliged to rebuild some presentation structures from scratch. Here, I wish I knew more.

Can we begin to identify some potential XSLT bottlenecks? Our first cut might be:
  • the tendency to separate data and presentation ('decoupling' as in the 'MVC' model)
  • comparatively limited and clunky selection and visual transformation capabilities
  • a tendency towards rule-based rather than directly scalar (a.k.a. 'algorithmic') SVG positioning
  • it is claimed you cannot reassign a variable's value once set during runtime. NOTE: This limitation appears to have been eased in XSLT 2.0 and 3.0, through functions - also, incidentally, easing bindings to other languages.
  • for music notation use, the product of XSLT transformations is a DOM tree. Because the notation has not been built with the express intention of applying onscreen transformations (a must for further use in Web3D apps), from a structural point of view, many html convenience classes and ids (the scoping 'handles' allowing interrogation, and to which many general transformations can be applied) are missing
  • in some cases, design choices limit the finished SVG's use as a data source - e.g. SVG overlaid with bitmapped zones allowing scoping and control (e.g. definition of looping bounds).
In effect, the technology, while suited to document generation, may not be so suited to the dynamics of data visualization - and especially where these are driven from a musical timeline. Effectively, every time an XSLT rule is fired, presentation elements are rebuilt.

This would be especially telling in an environment such as ours, where user-led dynamic configuration is an issue.

In sum, this is beginning to push us in the following direction:

Let's explore this possibility a little further..

MusicXML Parsing Using Javascript Or D3.js

Conceptually, XML parsing can be thought of at two distinct levels: the trivial level of simple XML data extraction (answering questions such as 'how many times does 'xxxx' appear in the file'), and the more complex re-ordering and positioning of elements according to some other, 'higher-level' dimension (such as MusicXML's 'div' timeline).

Any javascript / D3.js approach faces the same challenges. Indeed proprietary, time-based parsing of MusicXML using javascript is what currently distinguishes the product 'Soundslice' from other music learning environments. It sticks, however, to what are effectively single-part instrumental music, thereby avoiding the challenges of multi-voice playback.

To use D3.js to generate musical notation using SVG, then, we are obliged either to select from and transform the SVG output of existing XSLT solutions, or parse MusicXML from first principles.

With complex (and in a sense recursive) time dependencies in MusicXML, the latter is a non-trivial task.

D3.js And Control

The raw material of data visualization libraries such as D3.js are arrays, often the product of parsing a large batch of data in a text format such as JSON.

With complex data sets, these can result in array scaffolding, where the value found at a given index in one array acts as index to another. The arrays are, in this sense, interlocking.

Were we to visualize such interlocking arrays, we might see something like the following. From the contents of a MusicXML file, interlocking arrays (representing key, number of parts and voices, instrument types, number of notes in an octave and their individually assigned color codings) build a lattice ultimately spitting out the algorithmically generated position and dimensioning characteristics not just of individual notes, but of entire, populated voice, part and other sections.

Choosing a value from an array (such as number of notes or tones per octave) may resolve several dependencies. For example, having resolved the modular base and temperament or intonation (12 for midi, i.e. western or classical equal temperament, but there are many others in use worldwide), we can determine which color scales to use, the fret layout of a lute family instrument, which modal scales apply, and the spatial relationships in a lattice or tonnetz theory model.

Feeding off these arrays, D3.js builds, layer for layer on the server side, interface components (some hidden, some visible) which can be grouped and/or assigned classes and ids for subsequent (client-side) selection purposes.

Based on these selections, be they of part, voice, phrase, bar, notes at a particular time division or indeed individual notes, a wide range of visually enthralling transformations can be made. These have (potentially) huge implications for the ad-hoc, labyrinthine and on the whole more dramatic visual narratives of WebGL.

Moreover, the precision, power and flexibility of these mechanisms have impact at five distinct configuration or control levels:
  • overall (music system)
  • internal defaults (notation, instrument and theory tools)
  • 'what-if' user overrides (notation, instrument and theory tools)
  • environment propagation (shared configuration)
  • interworking (shared controls)
Even in the context of a user's local environment, we're talking of a massive flexibility boost:

Here we see two types of configuration change in action: the defaults, driven both by the musical source (here MusicXML) and user default environment settings, and any directly user-controlled and propagated 'what-if' type changes.

Already this simple scenario opens up huge potential for musical exploration: comparing instrument configurations, 'best-try' fitting of notations to instruments from a different musical (configuration) culture, comparing theory tool specializations etc. These are explored in more detail in a series of more recent posts, but the point here is that they are enabled by and built on a data-driven base.

XSLT And D3.js - Together?

XSLT is powerful tool in the parsing and transformation of MusicXML, D3.js in creating direct data-to-visualization binding and transformations. All this prompts the question whether, using XSLT, rather than decoupling, you can aggregate data into the published SVG where it can be accessed and processed by more powerful selection and transformation libraries.

D3.js has, for example proven, itself perfectly capable of integrating with Web3D declarative languages, so perhaps there is scope here for some data-driven magic.

Clearly, to integrate a library such as D3.js with the product of notation created with the XSLT suite is going on the one hand to require considerable knowledge of both domains, and on the other a presents a high analytical and naming harmonization entry bar.

Nevertheless, moves have been made to bring an easier marriage of XSLT and D3. One such is JSXML, which are a set of XSL stylesheets that transform XML into JavaScript code equivalent to JSX, decoupling view and controller code, and staying closer to a classic HTML/CSS/JavaScript workflow. JSXML is easy to parse (it uses XML) and can easily be re-targeted to different rendering engines (React, Inferno, D3, etc‥).

On the whole, because it allows direct transformations on every visual element in the DOM heirarchy, my feeling is it may be better to proceed with a purely data-driven javascript / D3.js approach. Moreover, D3 has, with, gone 'reactive'.

Perhaps the most compelling argument in D3.js's favor, though, lies in the area of P2P (peer-to-peer) interworking in online teaching and learning. D3.js provides for a level of fine-grain, highly flexible selection and transformation control difficult to achieve in any other environment. Indeed, it's selection and transformation capabilities are, in comparison to XSLT, simply spectacular.

Big, brave, open-source, non-profit, community-provisioned, cross-cultural and batshit crazy. → Like, share, back-link, pin, tweet and mail. Hashtags? For the crowdfunding: #VisualFutureOfMusic. For the future live platform: #WorldMusicInstrumentsAndTheory. Or simply register as a potential crowdfunder..

Summary: Pros & Cons In A Data Visualization Context

Stack ➛ XSLT D3.js
  • Declarative language (describes not how to do a task, but what to do)
  • clean and clear decoupling (separation of responsibilities)
  • Until recently, domain-specific to XML (taking much of the sting out of MusicXML)
  • Supports XPath/XQuery (useful in querying DOMs)
  • Powerful XML transformation capabilities
  • Highly expressive and compact
  • Templating capabilities (lending it some functional characteristics)
  • Mature production proven constellation
  • Close binding between data and usage = tight synchronization (with hardware timers) of audio and imagery
  • Reactive (using
  • W3C web standards focussed (tightly integrated with Browser DOM and technologies).
  • integrates well with declarative languages (incl. those used in the emerging WebGL field)
  • scalar (a.k.a. 'algorithmic') SVG music glyph positioning
  • open source code, open source culture
  • usage leads to an intimate knowledge of browser DOM and technologies
  • Data processing is focussed, local to user needs, lightweight and immediate
  • separation of data and presentation ('decoupling' into MVC-like silos)
  • rule-based rather than directly scalar (a.k.a. 'algorithmic') SVG positioning
  • 2-pass DOM tree generation (more overhead)
  • de-facto platform is heavyweight java-based, but net has moved on to fast/lightweight javascript (node.js)
  • more suited to publishing (music scores) than creating websites (music visualization)
  • (initially) intimidatingly complex programming constellation
  • despite open source code, a closed source culture
  • doesn't do some things many programmers take for granted
  • steep Learning Curve
  • difficult to protect both data and intellectual property
  • lack of domain-specific integration (aggregation) platforms
  • no support for legacy browser technology
  • advanced transformations can be intellectually challenging
  • debugging of chained methods
  • Relatively new arrival in production systems

In sum, the core issues with XSLT seem to be the lack of direct data bindings (direct manipulation of the document object model), XPath/XQuery is no match for D3.js's powerful selection, transformation and transition capabilities.

In a MusicXML or other exchange format context, these have impact both in a 'vertical' (immediate visualization control) and a 'lateral' sense (P2P interworking). In particular, powerful selection capabilities open up the possibility of activating different selection sets depending on learning needs: for example on the horizontal within a score, controls for playback and looping, in the vertical within a score, controls for note and interval identification.

Comments, questions and (especially) critique welcome.