Vocabulary for Music Informatics

Donald Byrd, School of Informatics, Indiana University

Revised mid April 2007

 

Note: this vocabulary is far from comprehensive, but it's intended to include every term that's important for my courses. However, so far, it includes mostly terms relevant to Music Representation, Searching, and Retrieval (I545). "*" in front of an item means it's new or has had major changes recently.

For music terms only, a far more extensive set of short definitions -- hyperlinked, and with illustrations, audio pronounciation, useful appendices, and a few audio and video examples -- is available online in the Virginia Tech Multimedia Music Dictionary. In addition, the web site of IU’s Cook Music Library has a reference page with links to many online music references, including New Grove Online, the online version of the definitive, 28-volume New Grove Dictionary of Music and Musician, plus bibliographies, music databases, tutorials, etc.

The list below includes terms from music and audio technology, including computer-technology terms used in an unusual way, etc., as well as standard information-retrieval and library-science terms.


- Acoustics: the branch of physics that studies sound; differentiated from psychoacoustics.

- ADC: analog-to-digital converter.

- AMR: Audio Music Recognition: converting a recording of music to a symbolic form -- usually MIDI -- so it can be manipulated, e.g., by a score-editing program or MIDI sequencer.

- Audio

- Argument: in mathematics and programming languages, something given to a function that tells it what to do. In the R programming language, the statement plot(m, freq) calls the plot function with m and freq as arguments.

- Audio fingerprinting

- Bag-of-features [bag-of-words, bag-of-notes, etc.] model

- Bandwidth

- Beat (acoustics): periodic change in the amplitude of a sound caused by interference, i.e., interaction between components that are close in frequency. For example, if a 440 Hz sine wave and a 442 Hz sine wave are played simultaneously, the sound will get louder and softer at a rate of 2 Hz.

- Best-match searching: searching in which some (necessarily inexact) method is used to decide what documents best match the user's query. With text, "best match" is almost always interpreted to mean "have the most similar meaning to". With best-match searching, the query "dog spaghetti" might find documents that don't use both the word "dog" and the word "spaghetti", but which the IR system nonetheless estimates are probably discussing dogs in connection with pasta.

- Bibliographic information: for music, information about it (composer, title, performers, date of composition, medium, etc. etc.) rather than the content of the music itself.

- Bibliographic searching (also called metadata-based searching): searching on information about the music--composer, title, performers, date of composition, medium, etc. etc.--rather than the music itself.

- *Bitrate (sometimes written bit rate, or data rate): the number of bits transmitted or processed per unit time. In communication systems, bit rate is often used as synonym for terms like connection speed or transfer rate. In digital multimedia, bitrate is the number of bits used per unit of time to represent audio or video after data compression. For example, MP3 files typically have bitrates between 128K and 192K bps (bits per second).

- CMN: Conventional Music Notation; the form of notation developed in the Western art-music tradition since around 1700 and now in common use, with some variation, for many kinds of music. Also called "common practice notation" (CPN), "traditional music notation" (TMN), etc. Less ethnocentric terms include the word "Western", e.g., "conventional Western music notation" (CWMN).

- Classifier

- Collection: a group of documents to be used, often for searching.

- Codec: "compressor/decompressor": hardware or software to convert uncompressed audio, video, etc., to and from a compressed form.

- Computer music: in general, applications of computers for creative musical purposes, especially sound synthesis, and to a lesser extent as aids to musical composition.

- Content-based searching: for music, searching for some or all of the music itself, whether melody, rhythm, harmony, etc., alone, or all aspects at once. Differentiated from metadata-based searching.

- Critical band (psychoacoustics): a range of frequencies within which one tone can mask another and make the second tone inaudible, if the first tone is loud enough.

- DAC: digital-to-analog converter.

- Data compression. The two forms are lossy and lossless. Lossy compression is common for audio (examples include MP3, AAC, etc.). Lossless compression (sometimes called packing) is common for text, music notation files, etc., but relatively rare for audio because standard methods (.zip and so on) save very little space, and even audio-specific techniques don't save that much.

- Database: (1) A collection that has been prepared for use in IR, usually by adding an index to speed up searching. (2) Less accurate but common usage, a synonym for "collection".

- Declarative representation: a representation in which the information is represented explicitly; opposed to procedural representation. (Cf. the definitions of descriptive and prescriptive notation.)

- Descriptive notation: said of a notation of music, or an aspect of notation, that describes the sounds a performer or ensemble produced in a specific performance. Opposed to prescriptive notation.

- Domains of musical information (from SMDL)

- Encoding: Precisely how the information is expressed in computer memory, in terms of bits. An encoding is much like a representation, except that it's more concrete.

- Envelope: the amplitude structure of a sound, so called because the sound just "fits" inside its envelope: this is especially clear from a zoomed-out time-domain display of the waveform. The envelopes of sounds of real acoustic instruments are complex, but they can be approximated well with piecewise-linear functions. A type of piecewise-linear envelope that has long been important in electronic music synthesizers is the "ADSR" (attack - decay - sustain - release).

- Evaluation

- Event (computer technology)

- Exact-match searching (also called Boolean searching): searching in which query terms are combined with the connectives "AND", "OR", and "NOT"; the documents retrieved are those that literally satisfy the condition. With text, the query "dog OR spaghetti" would find a large number of documents, namely all that use the word "dog" plus all that use "spaghetti"; "dog AND spaghetti" would find a much smaller number of documents, namely all that use both words.

- Explicit structure: structure that is definite and "expressed without vagueness, implication, or ambiguity; leaving no question as to meaning or intent" (from Merriam-Webster Online).

- Expressive completeness: how much of all possible music a music representation could express.

- Fair Use: under U.S. copyright law, use of copyrighted material that is permissible without the permission of the rights holder(s). Four factors determine whether a given use qualifies as Fair Use: (1) the purpose and character of the use; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the whole work; and (4) the effect of the use on the potential market for or value of the work.

- Frequency: for a periodic signal, the rate at which it repeats, usually given in Hertz (cycles per second).

- Frequency domain

- Fundamental

- *Groundtruth (also written "ground truth"): the standard by which the results of a test of any system or technique are judged. If a music-IR system decides that certain pieces of music in a collection match a given query and others don't, its accuracy can only be judged by comparing those decisions to groundtruth. An equivalent term used in some fields, e.g., medicine, is "gold standard".

- Harmonic (acoustics): a pure, sine-wave component of a note or musical sound. A harmonic has a frequency that's an integer multiple of the fundamental; cf. partial.

- Header (computer technology)

- Heirarchy

- Indexing: a method of searching that relies on first constructing an index of the collection of documents that is much like the index of a book. Searching is then done simply by looking in the index, which is usually vastly more efficient than sequential searching. Indexing is very common in text IR; it is much less common in music IR. One reason for this is undoubtedly because of its difficulty for anything but monophonic music. It is not yet completely clear how it's even possible to construct a useful index of polyphonic music.

- Information need: the information a user wants or needs. To convey this to an IR system of whatever kind, it must be expressed concretely as a query, but the information need itself is abstract.

- IPR: Intellectual Property Rights

- IR: information retrieval.

- Known-item search: a search in which the user's information need is a specific document--a book, article, piece of music, recording, etc.--whose existence is known to them.

- *Library (computer technology): a named "package" that adds specific capabilities to a programming system or language. In R, the tuneR library adds functions for handling audio.

- Markov model (also Markov chain, Markov process)

- Markup language

- Matching: in information retrieval, the process of finding items of whatever sort -- complete text documents, recordings, scores, etc., or passages within them -- that seem to match a user's query.

- Melodic confound: an element that is believed to complicate perception of melodies because it can easily be heard in different ways or missed completely. Examples include rests, repeated notes, and grace notes, and perhaps ornaments like trills.

- Metadata: "data about data", that is, information about something rather than the thing itself (or an excerpt from it). Metadata encompasses not only what is described by the standard library term bibliographic information, but additional information about the structure of the content.

- MIDI: Musical Instrument Digital Interface: the standard protocol for transmitting musical events from one synthesizer or computer to another.

- MIR: either Multimedia Information Retrieval, or Music Information Retrieval.

- MIREX: the Music Information Retrieval Evaluation eXchange, an annual evaluation of MIR systems. In some sense, MIREX is the TREC of the MIR world. MIREX provides a set of standard evaluations for MIR researchers to benchmark their systems, with the hope that with these evaluations we will see improvement from year to year in these systems, just as standard evaluations have helped improved the state of the art in information retrieval and speech recognition. [Paul Lamere]

- Musical idea: something like a theme or part of a theme, or perhaps a distinctive rhythm pattern, or (especially in electronic music) a timbre. Arguably, every decent piece of music is based on and expresses musical ideas, just as an essay or article is based on and expresses ideas of a different kind. Musical ideas can be thought of as the "concepts" underlying information needs in music.

- Notation

- N-gram: a sequence of consecutive symbols (in text, normally characters; in music, perhaps pitch intervals), where n is a small positive integer, typically from about 3 to 6. For example, the word "Supremes" contains the 3-grams "Sup", "upr", "pre", "rem", "eme", "mes".

- OPAC: "Online Public Access Catalog", i.e., in libraries, a system like IUCAT.

- OMR: Optical Music Recognition: converting a scanned-in image of music notation to a symbolic form so it can be manipulated, e.g., by a score-editing program or MIDI sequencer.

- Parameter: for our purposes, synonymous with argument.

- Partial: a pure, sine-wave component of a note or musical sound. The frequency of a partial may or may not be an integer multiple of the fundamental; cf. harmonic.

- Passage-level retrieval: retrieving sections of documents rather than entire documents.

- Patch: referring to event-based systems such as MIDI and most synthesizers (particularly hardware synthesizers), a setting that produces a specific timbre, perhaps with additional features. The terms "voice", "timbre", and "program" are all used for the identical concept; all have the potential to cause substantial confusion and should be avoided as much as possible.

- PCM: Pulse Code Modulation. A ridiculously complicated term (borrowed from communication theory, where it makes sense) for sampled audio encoded in the most straightforward way, with linear sample values and no compression.

- Perceptual coding: a technique in which information that is assumed, based on principles of psychoacoustics, to be inaudible or barely audible is removed from a recording; the basis of lossy compression. Examples include the MP3, WMA, and AAC file formats.

- Phase (acoustics)

- Periodic: said of a signal that repeats, i.e., it "does the same thing" over and over.

- Polyphony: as used in music information technology, any texture other than strict monophony, i.e., with more than one note sounding at a time.

- Precision: the number of relevant documents retrieved by an IR system, divided by the total number of documents retrieved. The higher the better; 1.0 is a perfect score. A related concept is that of "false positives": every retrieved document that is not relevant is a false positive.

- Prescriptive notation: said of music notation, something that describes the actions the performer should take rather than the sounds they are to produce. As ordinarily used, both CMN and tablature are entirely prescriptive. Opposed to descriptive notation.

- Procedural representation: a representation in which the information is represented implicitly; opposed to declarative representation. (Cf. the definitions of descriptive and prescriptive notation.)

- Psychoacoustics: the study of how sound is perceived, a matter of psychology and, to a lesser extent, music theory; differentiated from acoustics.

- Public domain: in law, said of works that are not protected by copyright.

- Quantization: approximating something with one of a set of discrete values; for example, rounding numbers to the nearest whole number, or the nearest multiple of 1000. In audio, the term is usually applied to amplitude (e.g., quantizing samples to 16 bits = of a a set of 2^16 values). With music in symbolic form, it is usually applied to timing (e.g., quantizing note attack times to the nearest 16th).

- Quantization error: the amount of error produced by quantization. In general, the larger the number of values in the set used in quantizing, the smaller the quantization error. With audio, quantization error in sample values produces background noise similar to white noise.

- Query: the concrete expression, in a form suitable for a specific information- retrieval system, of an information need.

- Ranking: arranging retrieved documents in order of how relevant an IR system estimates they are to the query.

- Recall: the number of relevant documents retrieved by an IR system, divided by the total number of relevant documents. The higher the better; 1.0 is a perfect score. A related concept is that of "false negatives": every relevant document that is not retrieved is a false negative.

- Relevance: in a document (or passage in a document), the property of being helpful to the user in satisfying their information need. Relevant documents are the ones that make the user happy, while irrelevant ones don't :-) . This is a loose definition of relevance; some would say relevance is about satisfying queries, not information needs.

- Relevance feedback: a feature of an information-retrieval system whereby a user, after viewing the results of a search, gives the system feedback on which of the retrieved documents is in fact of interest; armed with this information to supplement the original query, the retrieval system then does a new search. Web search systems often have a simple version labelled "More like this" or some such.

- Relevance judgment: a judgment as to whether a certain document is or is not relevant to a given information need; relevance judgments are normally done by a human being, ideally the one who made up the information need. Evaluating the precision and/or recall of any IR system requires relevance judgments of some sort. Relevance judgments are a form of groundtruth.

- Representation: what information is conveyed, regardless of how it's done. A representation is much like an encoding, but it's more abstract.

- Salience: the extent to which something gets our attention, for whatever reason; how significant something is in terms of perception.

- Sample depth: in digitizing an analog signal, the number of bits per sample per channel. 16 bits is most common, e.g., for CDs. Also called bit depth, sample width, or bit width.

- Sampling rate: in digitizing an analog signal, the number of samples made per unit time (normally a second). CDs have a sampling rate of 44,100 per second.

- Scale

- Schema: a formal definition of what constitutes a valid document. Examples include XML DTDs and schemas and BNF (Backus-Naur Form) descriptions.

- Segmentation: the process of dividing a document into chunks that are useful for evaluating relevance; the chunks may be natural units--in text, words are the usual segments--or they may be arbitrary sequences of symbols, e.g., n-grams.

- Sequencer: a program for recording, editing and playing back music primarily in MIDI form. Originally sequencers could handle only MIDI (stored in SMFs); however, as both personal-computer hardware and software-development tools have gotten more powerful, more and more "digital audio sequencers", which integrate audio and MIDI (stored in hybrid encodings), have become available.

- Sequential searching: looking for matches by exhaustively scanning the collection of documents. Opposed to indexing and related techniques like signatures, which--when they can be applied--are usually enormously more efficient.

- Signal

- SMF: Standard MIDI File.

- Sonogram: a time/frequency-domain display of a signal; as used in Max/MSP, a particular representation with color representing amplitude at a given frequency.

- Spectrogram: a time/frequency-domain display of a signal; as used in Max/MSP, really a real-time display of the spectrum.

- Spectrum

- Streaming: the psychoacoustic effect in which a series of individual events heard (for music, a monophonic series of notes) is perceived as being divided into two or more simultaneous "lines".

- Structural generality: how much of the structure in any piece of music a music representation can express.

- Symbolic representation: a representation in which some structure is explicit, i.e., for music, either time-stamped events or notation. However, note that the adjective "symbolic" is sometimes reserved for notation only.

- System (as used in CMN): a set of staves containing music to be played simultaneously; this is indicated graphically at the left end of the staves with a vertical line extending from the top of the top staff to the bottom of the bottom staff.

- Time domain

- Time/frequency domain

- Time-stamped event: a musical event, i.e., something directly audible (most likely a note onset or ending), coupled with the time at which that event happens. MIDI, SMFs, and Csound score files all describe events; only the latter two include time stamps.

- Transient: a short-duration signal that represents a nonharmonic attack phase of a musical sound or spoken word. It contains a high degree of nonperiodic components and a higher magnitude of high frequencies than the harmonic content of that sound. [from Wikipedia]

- TREC: Text REtrieval Conference, a series of annual meetings sponsored by the U.S. National Institute of Standards and Technology and other government agencies, some of them involved in intelligence work. The meetings are focused on analyzing the latest results of what is essentially a contest with standard IR tasks, designed to "push" the state of the art of text information retrieval. Cf. trec.nist.gov .

- User interface

- Voice: as used in music information technology, a single, logically distinct "line" of music within a single part. In a narrow sense, a voice must be strictly monophonic, but we'll generally use it in a looser sense, allowing chords. NB: Of course, "voice" also commonly refers to human singing, but it's usually obvious when it's used in that sense. Unfortunately, articles and manuals about MIDI sometimes use the term in a way with far more potential for confusion: cf. patch.

- Waveform: a time-domain display of a signal.

- White noise

- XML


Comments to: donbyrd(at)indiana.edu
Copyright 2005-2007, Donald Byrd