Automated Music Track Extraction and Identification from Full-Length Movies (soundtrackID )

We are developing an end-to-end system for the automatic detection, segmentation, and identification of music in full-length movies. Audio is analyzed in one-second windows using pretrained models, merged into candidate tracks, reviewed, and submitted to external APIs; identification accuracy is then evaluated based on the API results.

Key data

Description

This research project aims to automatically extract, segment, and identify music tracks embedded in full-length movies. The system is designed as an end-to-end pipeline with a simple web-based user interface, focusing on practical feasibility and evaluation of existing music detection and identification models.

Users interact with the system through a Web UI that allows uploading a movie file. Upon upload, the audio track is automatically extracted from the video and processed independently. The audio is then segmented into fixed-length windows of one second to enable fine-grained temporal analysis.

Each one-second audio segment is classified in a binary fashion as either music or non-music. For this task, state-of-the-art pretrained models such as MTUCI/MusicDetection from Hugging Face are used. These models output a probability score for the presence of music, enabling robust detection across different genres and sound environments. A configurable threshold allows users to manually adjust the sensitivity of the music detection, supporting experimentation with precision–recall tradeoffs.

Consecutive segments classified as music are merged into continuous regions, which are treated as candidate music tracks. Each detected track is then extracted as a standalone audio clip. These Clips are made individually available in the UI for playback and inspection, allowing users to quickly verify detection quality.

For music identification, each extracted track is sent to one or more external music recognition APIs (to be determined). The system is designed to support multiple APIs in parallel to compare identification accuracy, latency, and robustness. Returned metadata—such as track title, artist, and confidence score—is collected and normalized.

Finally, the identified music titles are displayed in the UI alongside the extracted audio tracks. The project serves as an experimental platform to study music detection accuracy in real-world movie audio, assess API-based music identification performance, and explore how thresholding and segmentation choices affect end-to-end results.