License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/DFU.Vol3.11041.37
URN: urn:nbn:de:0030-drops-34652
Go to the corresponding DFU Volume Portal

Essid, Slim ; Richard, Gaƫl

Fusion of Multimodal Information in Music Content Analysis

4.pdf (1 MB)


Music is often processed through its acoustic realization. This is restrictive in the sense that music is clearly a highly multimodal concept where various types of heterogeneous information can be associated to a given piece of music (a musical score, musicians' gestures, lyrics, user-generated metadata, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to "multimodal music analysis" studies. This article gives a synthetic overview of methods that have been successfully employed in multimodal signal analysis. In particular, their use in music content processing is discussed in more details through five case studies that highlight different multimodal integration techniques. The case studies include an example of cross-modal correlation for music video analysis, an audiovisual drum transcription system, a description of the concept of informed source separation, a discussion of multimodal dance-scene analysis, and an example of user-interactive music analysis. In the light of these case studies, some perspectives of multimodality in music processing are finally suggested.

BibTeX - Entry

  author =	{Slim Essid and Ga{\"e}l Richard},
  title =	{{Fusion of Multimodal Information in Music Content Analysis}},
  booktitle =	{Multimodal Music Processing},
  pages =	{37--52},
  series =	{Dagstuhl Follow-Ups},
  ISBN =	{978-3-939897-37-8},
  ISSN =	{1868-8977},
  year =	{2012},
  volume =	{3},
  editor =	{Meinard M{\"u}ller and Masataka Goto and Markus Schedl},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-34652},
  doi =		{10.4230/DFU.Vol3.11041.37},
  annote =	{Keywords: Multimodal music processing, music signals indexing and transcription, information fusion, audio, video}

Keywords: Multimodal music processing, music signals indexing and transcription, information fusion, audio, video
Collection: Multimodal Music Processing
Issue Date: 2012
Date of publication: 27.04.2012

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI