Jamming (on) the Web

Anton Eliëns, Martijn van Welie, Jacco van Ossenbruggen and Bastiaan Schönhage
Vrije Universiteit, Fac. of Mathematics and Computer Sciences
De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands
email: {eliens,martijn,jrvosse,bastiaan}@cs.vu.nl


The Web has become a dominant medium for the dissemination of information and, more recently, for collaborative work as well. The focus has been mainly on textual and graphical information, hardly addressing topics related to musical information. We present a framework that makes musical works accessible for Web users by offering high level support for the display of musical material on the Web as well as for live jam sessions.

Our framework consists of a browser plug-in that supports the display and editing of scores as well as playing scores by connecting to a shared MIDI server. To participate in a jam session, clients of the MIDI server can also send data in real time, for instance by using a keyboard embedded in a Web page. We discuss the issues involved in displaying musical material on the Web and we sketch the technical architecture of our framework.


Compared to textual and graphical material, the capabilities of the Web for musical information are rather poor. The embedding of music, or sound in general, rarely goes beyond links to raw audio and MIDI files or to streamed audio connections. To display a musical work, HTML authors have to use images containing the score. All of these solutions are very low level as they basically regard music as being just sound (or a picture in the case of a score).

True score files are usually a few orders of magnitude smaller, and the audio signal can be synthesized at the client side at any appropriate sample rate. Additionally, a high-level description of music provides the browser with far more information when compared to the raw samples. In previous work we proposed to transmit musical scores (instead of the raw samples) across the Internet and to add sound synthesis functionality to Web browsers [10], and the use of generic SGML to encode structured documents [2].

In this paper, we describe an experimental framework that offers many of the ingredients for true networked music support including facilities for editing, displaying and playing musical scores as well as facilities for high level exchange of musical material and real-time collaborative work involving music and sound. Our approach is based on traditional music notation and on MIDI for playing facilities. The framework builds upon the work done in the DejaVu project at the Software Engineering section of the Vrije Universiteit (Amsterdam), which resulted in a suite of components for developing distributed Web-aware hypermedia applications [1,2,7,8,9,10,12].

The structure of this paper is as follows. We first describe our solution to exchanging musical information. In particular, we describe our score format and the associated score editor, and how musicians may connect to a shared MIDI server to join a real-time collective improvisation. Next, we outline the architecture underlying our approach to networking music and the web components used for its realization. After sketching some applications of networked music we discuss the merits and shortcomings of our solution and indicate some directions for future research.

Scores on the Web

The most ambitious markup language for the dissemination of music on the Web is probably the Standard Music Description Language [4]. SMDL expresses a musical work in terms of four basic domains. The logical domain --- the primary focus of SMDL --- is, according to the standard, describable as ``the composer's intentions with respect to pitches, rhythms, harmonies, dynamics, tempi, articulations, accents, etc.''. The central element of the logical domain, the cantus element is an abstract, one-dimensional finite coordinate space onto which musical and non-musical events can be scheduled. This allows for the inclusion of any dependent time sequences (such as automated lighting information) in a musical work. The standard uses HyTime [3] hyperlinking to specify the relations with information from the other three domains: the gestural domain --- describing any number of particular performances (e.g. MIDI files or digital audio) of the work, the visual domain --- describing any number of scores (a printable/displayable version) of the work, and the analytical domain --- comprised of any number of theoretical analyses or commentaries about the information in the three other domains. The addressing power of HyTime makes it possible to link directly into information expressed in other formats, including MIDI files, digital audio recordings or specific score notations, without modification.


Figure 1: The score displayed by the plug-in

Our approach is more modest and we deploy a much simpler SGML representation, primary geared to encode printable/displayable versions of the score (i.e. SMDL's visual domain). However, the format used is sufficiently rich to be able to generate a playable MIDI representation as well. Information which is usually added by performers (in SMDL this is represented in the gestural domain) such as explicit interpretations of tempi, articulations and accents are not supported in the current version.

    <COMPOSER>Antonio Vivaldi</COMPOSER>
      <MEASURE Sig="3,4" Key=F Clef=Gclef>
        <NOTE Pos="1,3" Stem=down>d6 4 0
        <REST Pos="3,6">C6 8 0
        <NOTE Pos="4,6" Stem=up>a5 8 0
        <NOTETUPLE Stem=down>
          <NOTE Pos="5,6">f5 8 0</NOTE>
          <NOTE Pos="6,6">a5 8 0</NOTE>

Figure 2: An SGML encoded score

To support display and editing of SGML scores on the Web, we developed the Amuse score editor as a plug-in for our Web browser (see figure 1). The editor has a graphical user interface and does not require any SGML knowledge from the user. Figures 2 and  3 show, respectively, a fragment of an example score file, and the associated style sheet with a CSS1-like syntax [5]. Both documents can be edited by the graphical score editor plug-in. Changes in the style sheet are dynamically reflected in the display of the score. A significant enlargement of the page-width parameter, for example, will allow for more measures on a single staff, and will result in a redraw of the complete score.

    margin-left : 30;
    margin-right : 30;
    margin-top : 80;
    margin-bottom : 20;
    page-height : 1000;
    page-width : 920;
    title-align : Center;
    title-font : -*-Times-Bold-R-Normal--*-240-*;
    composer-align : Center;
    composer-font : -*-Times-*-R-Normal--*-180-*;

Figure 3: An associated style sheet

Playing on the Web

The playback facilities of our framework are centered around the MIDI server. After registering as a MIDI-client, the score editor is able to send the generated MIDI version of the score to the separate MIDI server. The MIDI server builds upon a socket-level client/server library and a class library that provides the basic functionality for MIDI devices, MIDI clients and the MIDI server. Note that the audio device is usually an exclusive resource, and by connecting to a single MIDI server, several client applications can have simultaneous access to a single MIDI output device. The functionality of the MIDI server comprises: When a MIDI device is registered, a cookie is given out that may be used by a client to request the server to set up a virtual connection with that device. The cookie also prohibits unauthorized clients from accessing a MIDI output device.

Collective improvisation

We developed the keyboard applet as an alternative input device to be able to send ``live'' MIDI data to our server. Since multiple applications can have access to the MIDI-server, a user can have a score edit session running, and simultaneously be playing a keyboard applet.

To engage in a jam session, the keyboard applet connects to the JamServer instead of the MIDI server. The JamServer acts as the central point of a jam session, keeping track of all clients engaged in the session.


Figure 4: The jam server

To start a jam session, all jam clients connect to a single JamServer and send it their MIDI data. The JamServer is connected to one or more MIDI servers, as depicted in figure 4. By having the JamServer separate from the MIDI server itself, the latter is relieved from the burden of jam session management. Every connected MIDI device will receive all the MIDI data submitted by the jam clients. This data is relayed to these devices by the MIDI server(s), through the virtual MIDI data stream that is created when registering as a jam client.

In figure 4 we see three jam clients connected to a single JamServer (on machine B). The MIDI server is running on the same machine as the JamServer. Both the clients on machine A and C have registered a MIDI-out device (a software sound synthesis MIDI program developed for Solaris [6]) with the MIDI server on B. The user on A has additionally registered a MIDI-in device (the keyboard). Using the keyboard, the user on A can contribute to the jamming. The score editor on C is directly connected to the MIDI server and is not engaged in the jam session. The MIDI server will redirect MIDI request from the score editor only to the MIDI device on C.


To give an indication of the speed and response times of our system, we have used a special jam client, jamping, that measures the average delay between sending a MIDI message to the JamServer and receiving the same message on a connected MIDI device. For a 486DX2-66 PC with Linux with one client and both servers local, this resulted in a round-trip-delay time of 5.5 milliseconds. A similar setup on a Sparc-5 with Solaris resulted in 2.6 milliseconds. A similar configuration with the JamServer on a LAN gave 3.5 milliseconds average round-trip-delay time. Nevertheless, with a server in Amsterdam and a client in Sweden, we obtained an average round-trip-delay time of 87 milliseconds, with a peak of 1.6 seconds. Clearly, the length and variability of round-trip-delay times may be a prohibiting factor for jamming on a global scale.

Architecture of the Web Components

The software described so far was developed for our SGML-based Web browser [2] as an extension to the hush class library [1]. The hush library contains the classes providing the interface to an embedded script interpreter and allows for a smooth interaction with the underlying window system. Other extensions of hush include


Figure 5: Overview of the Web components

Figure 5 shows an overview of the basic Web-related components of the hush library. The browser provides the top level user interface for all Web components, including a viewer, a scrollbar, navigation buttons (back, forward, home, reload) and an entry box to enter URLs. The netclient, web and MIMEviewer components form the conceptual base of our approach of connecting to the Web:

The MIMEviewer component provides an abstract interface to viewers for several MIME types. The web widget only knows about the (abstract) MIMEviewer class while the actual functionality is implemented in several concrete viewer classes, one per MIME type. Specific viewers for new MIME types can be plugged dynamically into the MIMEviewer object.

When the MIMEviewer gets the instruction to display a document of a certain MIME type, it changes its role and becomes a viewer for that particular MIME type. This dynamic role-switching idiom is discussed in more detail in [2]. As a result, the addition of new viewers can be done without changing the web widget.

The netclient component builds the bridge between the local web widget and the World Wide Web by providing an abstract and uniform interface to network (file) access and transport protocols. In the realization of the netclient components we have employed the dynamic role-switching idiom in the same way as in the implementation of the MIMEviewer components.

The web object creates a MIMEviewer object and tells which role it should play (e.g. SGML, Amuse or VRMLviewer). This role can be changed during the lifetime of a single MIMEviewer object by calling a method to change its role. A browser typically uses only one single MIMEviewer object that changes its role according to the type of data that should be displayed. The SGMLviewer is the default viewer, it displays generic SGML documents by using style sheets for each document type. By default, a style sheet for HTML is used [11]. Since our generic SGMLviewer is more suited for textual documents and does not offer editing support, we developed a separate viewer/editor to process our Amuse/SGML score files.

Since the MIMEviewer provides no network functionality at all, it generates events whenever it needs to retrieve data pointed to by a URL. Such events are generated as a response to user interaction (e.g. clicking an anchor) or to fetch inline data during the parsing process. These events are typically handled by the web component which plays a central role in our approach because it combines the functionality of the MIMEviewer and the netclient components. Additionally, the web component adds a history and caching mechanism to the MIMEviewer. The web component's behavior is similar to the standard widgets of the hush framework, and can be conveniently used as a part of an application's GUI. Because the web widget has both a C++ class interface and a script interface, it is easy to create, or extend, applications with Web functionality.


Music can significantly enhance the perception of HTML pages, especially in a commercial or educational environment. Possible application areas which might benefit from the use of high-level music encodings and networked MIDI include:

Music publishing

For publishing music, networks, and in particular the Web with hyperlinking facilities, offer the opportunity to provide a rich information context, including references to audio representations, performances, background material and discographies. As stated before, high level encoding augmented with client-side playback facilities is of critical importance for delivering high quality musical material at adequate speed.

Music education

Collaborative facilities as described in [12] may be used to realize music education programs on the Web. For example, solfège and compositional exercises may be done in a virtual classroom, where instructor and pupils are connected via the network, as described in the previous sections.

Collective Composition

Our current framework can be extended to support collaborative composition. As a demonstration for high school students, we have implemented a compositional game based on a dice game said to be invented by Mozart in the 18th century. Pupils were requested to submit one or more measures constructed as a variation on a limited number of patterns. Both the selection of measures and the choice of a variation may be done randomly.

Jamming (on) the Web

At this stage, jamming on the Web is not a realistic option. First of all, on a global-scale network round-trip-delay times are too large and too variable for real interaction. Apart from technological issues, it is safe to say that real time musical collaboration is a relatively unexplored area. More research is needed, not only with respect to technological issues, but also in the area of human-computer interaction and computer-supported collaborative work.

Conclusions and Future Work

The current version of the score editor and keyboard applet only works on our hush Web browser. We are currently developing a Netscape plug-in version of the score editor and experimenting with a Java version of the keyboard applet.

Because of the textual format of music description languages, it will be possible to employ anchoring and link facilities within musical documents as well. We plan to support both HTML as well as HyTime hyperlinks in future versions of our score editor.

We have described a framework offering a high-level description for the exchange of musical scores which can also be used for the generation of MIDI data. Furthermore, with our MIDI server architecture we can connect several musicians to share their music. Although both aspects of our system are in their beginning stages and need to be elaborated further, they indicate the new possibilities of music on the Web. In other words, it is time for the Web to become aware of music!


First of all we like to thank S.A. Megens (SAM) who developed the MIDI Jam server as part of his MSc. thesis project. Further we like to thank all those students who participated in one of the jamming sessions, devoting time to our project while they were supposed to do something useful.


Anton Eliëns. Hush: A C++ API for Tcl/Tk. The X Resource, (14):111--155, April 1995.

Anton Eliëns, Jacco van Ossenbruggen, and Bastiaan Schönhage. Animating the Web --- An SGML-based Approach. In Proceedings of the International Conference on 3D and Multimedia on the Internet, WWW and Networks, 17-18 April 1996, Bradford. British Computer Society, April 1996.

International Organization for Standardization. Information Technology --- Hypermedia/Time-based Structuring Language (HyTime), 1992. International Standard ISO 10744:1992.

International Organization for Standardization/International Electrotechnical Commission. Standard Music Description Language (SMDL), 1996. Draft International Standard ISO/IEC 10743.

Håkon W. Lie and Bert Bos. Cascading Style Sheets, level 1, November 1996. W3C Proposed Recommendation; Available at www.w3.org/pub/WWW/TR/.

Sebastiaan A. Megens. More Music in Hush. Master's thesis, Vrije Universiteit, Amsterdam, August 1996.

Matthijs van Doorn and Anton Eliëns. Integrating WWW and Applications, November 1994. ERCIM/W4G --- International Workshop on WWW Design Issues, Amsterdam.

Matthijs van Doorn and Anton Eliëns. Integrating Applications and the World Wide Web. Computer Networks and ISDN Systems, 27(6):1105--1110, April 1995. Special issue: Proceedings of the Third International World-Wide Web Conference --- Technology, Tools and Applications, April 10-14, Darmstadt, Germany.

Jacco van Ossenbruggen and Anton Eliëns. Music in Time-based Hypermedia. In ECHT'94, The European Conference on Hypermedia Technology, pages 224--270, September 1994.

Jacco van Ossenbruggen and Anton Eliëns. Bringing Music to the Web. In Proceedings of the Fourth International World Wide Web Conference --- The Web Revolution, pages 309--314. O'Reilly and Associates, Inc., December 1995.

Jacco van Ossenbruggen, Anton Eliëns, and Bastiaan Schönhage. Web Applications and SGML. In David F. Brailsford and Richard K. Furuta, editors, Proceedings of the Sixth International Conference on Electronic Publishing, Document Manipulation and Typography, 24--26 September 1996, Palo Alto, USA,, volume 8(2 & 3) June and September 1995 of Electronic Publishing --- Origination, Dissemination and Design, pages 51--62, 1996. John Wiley & Sons, Ltd.

Martijn van Welie and Anton Eliëns. Chatting on the Web, February 1996. ERCIM/W4G --- International Workshop on CSCW and the Web, Sankt Augustin, Germany, February 7-9, 1996.