IDEAS FOR MUSIC INFORMATICS INDEPENDENT PROJECTS - 10 Jan. 2007 (DAB = Don Byrd)

Here are some ideas for semester projects for the courses I teach. NB: I don't claim
they're all _good_ ideas, though I haven't included bad ones intentionally! Also, a
few of these are much more appropriate for one of my courses than for the other, so
I reserve the right to reject even these projects in a certain case.

I'd be happy to consider anything else you think is relevant to the course; in fact,
it's better in some ways if students make their own projects up. Please feel
free to discuss any of this with me at any time.

"*" in front of an item = brand-new or with major changes recently.
"!" indicates projects I'd especially like to see someone work on.

Requiring programming...
!- Write a simple program to compute the similarity between two melodies, rhythm
	patterns, chord progressions, or even complete pieces, represented in some
	symbolic form (such a program could be used as the basis for a music-retrieval
	program). [topic: symbolic retrieval]
- Attempt to improve NightingaleSearch's music searching and evaluate the results in
	some way, preferably by MIREX or standard TREC (Cranfield model) methods.
	[topic: symbolic retrieval]
- Build a database (e.g., from MIDI files, or from an existing collection like CCARH's)
	of at least 500 music "documents"; build a suitable set of queries; and/or
	investigate how the choice of search parameters affect the results, as
	evaluated by MIREX or TREC methods. [topic: symbolic retrieval]
- Same as above, but using one of the MIREX tasks, with M2K/D2K or any other software.
- Use M2K/D2K to investigate any music-representation or searching question.

Could be done with or without programming...
- Extend OMRAS harmonic distributions, as used in the "OMRAS Audio-degraded Music IR
	Experiment" (cf. the ISMIR 2002 paper by Pickens et al.; reference in the list of
	publications on my website)
!- Use NightingaleSearch to study something appropriate to some collection of music,
	e.g., to confirm or refute accepted wisdom about that music. The 24 Preludes and
	24 Fugues of Bach's Well-Tempered Clavier and a relative handful of other pieces
	already exist in NightingaleSearch format; otherwise, there are utilities to
	convert music from some other forms (e.g., MuseData) to its format, though they
	may not work very well. [topic: music analysis]
!- Same as above, but using the Humdrum Toolkit, and there's far more music already
	available in a format Humdrum can use. [topic: music analysis]
- Investigate converting between representations of the same type (e.g., one type of
	notation to another); optionally write a program to implement your ideas. [topic:
	music representation]
- Investigate converting between representations of different types (e.g., notation to
	MIDI); optionally write a program to implement your ideas. [topic: music
	representation]
!- Most music-IR research to date has been on tonal, functional-harmony Western music;
	investigate how it could be extended to other music(s).
!- It's obvious that automated Schenkerian analysis, even going just two or three
	levels down from the surface, would be incredibly valuable for music IR; however,
	a general, style-independent solution is probably not possible in general. But how
	about with restrictions -- e.g., only Anglo-American folksongs or 12-bar blues? Cf.
	"controllers" in David Cope's EMI system. [topic: music analysis, music perception,
	cognitive science]
!- Adapt Steve Larson's theory of musical forces to recognize similarity between
	melodies or even complete polyphonic pieces of music. [topic: music analysis]
!- Devise a way to test Steve Larson's theory of musical forces with a database of
	melodies. [topic: music analysis]
- Investigate clustering musical documents on whatever basis; this could be very
	useful for visualization, recommender or improvisations systems, etc. Cf.
	several papers from ISMIR and elsewhere, and techniques like Kohonen maps and
	spring embedding.
!- Investigate user-interface issues in music searching, either content-based or
	bibliographic. One option would be to actually design a user interface.
- Follow up on the ISMIR 2000 Mozart Varations survey: do it more scientifically, or
	at least investigate how that could be done, preferably by designing a valid
	experiment. [topic: relevance judgments]
!- DAB's Extremes of CMN list (on my website) is interesting, but _distributions_ for
	some collection of music and one or more of the features (e.g., written pitch
	or duration, or just number of augmentation dots!) showing how often various values
	occur in a significant body of music would be much more revealing; such distributions
	could be useful in statistical authorship studies, for example. Compute distributions
	for some of the items in the list. For a music collection, you could use the CCARH
	database (http://www.ccarh.org/), with kern data (http://kern.humdrum.net/) accessed
	via the Humdrum toolkit, or with MusicXML data (available from me) accessed via a
	program of your own. In any case, the programming part of this is relatively easy.
- Work on any of the topics listed in a recent ISMIR Call for Papers
	(http://www.ismir.net/).
- Investigate methods for finding music that is unplayable. Playability is usually a
	very subjective thing, but published music that is _clearly_ unplayable exists.
	For example, a Scriabin piano sonata includes a note that's above the range of
	any piano, and I believe that in one of his symphonies Beethoven asks the
	violins to play below their lowest note.
- The "V2V" offshoot of the Variations2 project was an attempt to combine content-based
	and metadata-based searching. Investigate further how the two forms of searching
	could be combined from any standpoint: user interface, ranking, etc.
!- Investigate the "Mickey Mouse Club theme" problem: to what extent is a music-
	searching program likely to find matches in inner voices that are of little or
	no interest because they're completely inaudible? (The answer may well depend
	on whether the program knows about the voices and does not look for matches that
	cross voices: see the "disastrous loss of precision" idea below.)
!- Investigate what it would take to identify 12-bar blues in a collection of, say,
	MIDI files or Humdrum/kern files, and try out your technique.
!- Investigate the extent to which a performer's chosen medium influences their
	perception of music. For example, do tuba, bassoon, and double-bass players tend
	to hear lower lines as more salient than flutists or violinists do? How about
	basses vs. sopranos? What about salience of rhythm vs. pitch, e.g., for drummers
	vs. other musicians?
- Byrd & Crawford (2002) speculate on the disastrous loss of precision they believe
	would result from taking "matches" that cross voices as seriously as those that
	stay within a voice, without considering the audibility of the matches. Investigate
	and produce evidence one way or the other.
!- Study a widely-used existing style-genre classification, e.g., that of All Music
	Guide, iTunes, Amazon.com, etc. Describe in some detail how it could be implemented
	by computer. Optionally, implement and test part of it, probably with a
	symbolic representation (audio is probably too difficult to do anything with
	in a semester). [topic: music classification]
- Study "national" style classification from either audio or CMN. What features
	that a program might really be able to identify make music sound French,
	Slavic, American, etc.? [topic: music classification]
- Propose a new task for MIREX. Why is this task significant? How could entries
	be evaluated? [topic: evaluation]

Almost certainly _not_ involving programming...
- Walter Hewlett and DAB have found previously-unknown instances of the famous
	"B A C H" motive in the music of Bach and Douglas Hofstadter, respectively. DAB
	used NightingaleSearch. Use any other music-searching technology to find anything
	interesting in any database (the CCARH database is a good one for this purpose).
!- Investigate a basis for ranking music documents in search results. With music as
	with text, this is normally done by similarity, and justified via "relevance".
	But are these the best concepts for ranking music?
- Extend or otherwise significantly improve DAB's table of candidate music-IR testbed
	databases (on my website).
!- Extend DAB's comparison of music to text, images, etc. by adding other media, more
	details, or both.
- Test/compare existing audio music recognition programs; compare them to optical
	music recognition programs. Cf. www.music-notation.info/en/compmus/audio2midi.html .
- Compare in detail two or more music-notation encoding systems. This could be based on
	a table Natalia Minibayeva did several years ago, or could be completely new work.
	[topic: music representation]
- Investigate and compare existing metadata formats for music, and/or design a new one.
!- Extend, improve, or just evaluate DAB and Eric Isaacson's music-representation
	requirements specification (created for Variations2).
!- Annotate or otherwise significantly improve DAB's music-IR bibliography (on my
	website). When I taught music IR in 2003, someone annotated 50 entries in terms
	of how useful they would likely be to someone in the class; more of that would be
	worthwhile, or more of the type of annotations some entries have now.
- Compare classifications of music representations, e.g., DAB's, Selfridge-Field's,
	Castan's, Wiggins'; perhaps propose a new classification. [topic: music
	representation]
- Study how MIREX works. Compare it to similar undertakings in other domains (TREC
	for text IR, the standard speech-recognition and question-answering tests, etc.).
	How could MIREX be improved? [topic: evaluation]
!- List and discuss several of what you consider the most important unsolved problems
	of music IR. (I'll be glad to tell you what I think some of these problems are,
	but you're welcome to choose your own.)
!- In Sept. 2005, a former director of engineering for All Music Guide said,
	in so many words, that programs that do automatic genre classification from
	audio are probably finding _something_, and something useful, but it may not be
	genres as people understand them. Investigate and report on the accuracy of this
	statement. [topic: music classification]
!- There is very little agreement among existing style-genre classifications: the
	numbers of categories varies wildly (All Music Guide has 34, iTunes 37, Amazon.com
	23, etc.)--and even those numbers overestimate the agreement, since they're not
	all "flat" lists, and some confuse styles and forms. Compare at least three
	existing classifications, and comment on which seems most practical for computer
	implementation and why. [topic: music classification]
!- Study existing sets of relevance judgments for music and/or create a new set.
	[topic: relevance judgments]
!- Find a small number, but at least five or six, pieces of music each of which is,
	in your opinion, as different as possible from all the others. Once you've
	chosen the pieces, either justify or refute your claim that each is as different
	as possible from the others on a basis that's as objective as possible, most
	likely a survey of listeners. Better, use such a measure to find the pieces in the
	first place. In either case, "as objective as possible" is not likely to be very
	objective: discuss the inherent limitations of objectivity here.
	[topic: music classification]