Google India PhD Student, INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Abstract Semantic Representation of Music for Music Information Retrival Tasks
Melody extraction is the task of automatically extracting the dominant melodic line in a polyphonic music. Here, polyphony refers to the music in which more than one instrument may sound concurrently (e.g. piano, violin, drums, human singing voice etc.,) or it can be a single instrument which is capable of producing multiple notes at a given time (e.g. violin). The word melody is the musicological term which is purely subjective in nature. Hence, we can find many definitions of melody in various contexts. The melody representation adopted in my work is the one proposed by Mastaka Goto, such as melody is the sequence of F0 (fundamental frequency or pitch) values correspond to the dominant instrument's perceived pitch. The dominant instrument can be either the human singing voice or any lead instrument in the polyphonic music signal. The accurate extraction of the melody remained as a challenging and unsolved task in the research community because of its two-fold complexity. Firstly, the polyphonic music signal is the superposition of many instruments which play simultaneously. Hence, it is hard to attribute specific frequency bands and energy levels to a specific instrument. Secondly, the task of determining the sequence of pitch values that constitutes the main melody. This in turn poses mainly three challenges: (i) determining the melody regions in the music signal, (ii) ensuring the estimated F0 is in the correct octave range and (iii) selecting the right melody pitch when there is more than one note present at the same time. The accurately extracted melody can be used in many potential applications such as automatic music transcription, query by humming, music de-soloing, singer identification, and in many other music information retrieval tasks.
Abstract: A tone smoothing is performed such that to each time section of a melody line segment a number is associated such that for all groups of directly neighboring time sections, to which the same spectral component is associated by the melody line segment, the numbers associated with the directly neighboring time sections are different numbers from one to the number of the directly neighboring time sections, for each spectral component that is associated with one of the time sections of the melody line segment, the numbers of those groups are added up to which time sections of the same the respective spectral component is associated by the melody line segment, a smoothing spectral component is determined as the spectral component for which the greatest summing-up results, and the melody line segment is changed, by associating the determined smoothing spectral component to each time section of the melody line segment. By this, in particular the inadequacy of monophonic audio signals is considered, usually comprising a transient process at beginnings of notes, so that only to the end of the notes the desired note pitch is achieved.
Pub.: 04 Oct '05, Pinned: 28 Jul '17
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting, from among a collection of videos, a set of candidate videos that (i) are identified as being associated with a particular song, and (ii) are classified as a cappella video recordings; extracting, from each of the candidate videos of the set, a monophonic melody line from an audio channel of the candidate video; selecting, from among the set of candidate videos, a subset of the candidate videos based on a similarity of the monophonic melody line of the candidate videos of the subset with each other; and providing, to a recognizer that recognizes songs from sounds produced by a human voice, (i) an identifier of the particular song, and (ii) one or more of the monophonic melody lines of the candidate videos of the subset.
Pub.: 14 Apr '15, Pinned: 28 Jul '17
Abstract: An information processing apparatus is provided which includes a signal conversion unit for converting an audio signal to a pitch signal indicating a signal intensity of each pitch, a melody probability estimation unit for estimating for each frame a probability of each pitch being a melody note, based on the audio signal, and a melody line determination unit for detecting a maximum likelihood path from among paths of pitches from a start frame to an end frame of the audio signal, and for determining the maximum likelihood path as a melody line, based on the probability of each pitch being a melody note, the probability being estimated for each frame by the melody probability estimation unit.
Pub.: 31 Dec '13, Pinned: 28 Jul '17
Abstract: Authors: Juan J. Bosch ; Ricard Marxer ; Emilia Gómez Article URL: http://www.tandfonline.com/doi/full/10.1080/09298215.2016.1182191?ai=z4&mi=3fqos0&af=R Citation: Journal of New Music Research Publication Date: 2016-05-23T11:22:30Z Journal: Journal of New Music Research
Pub.: 23 May '16, Pinned: 28 Jul '17