A robust speaker-aware speech separation technique using composite speech models

Mak, Wen Xuan (2020) A robust speaker-aware speech separation technique using composite speech models. Final Year Project, UTAR.

Preview

PDF
Download (2452Kb) | Preview

Abstract

Speech separation techniques are commonly used for selective filtering of audio sources. Early works apply acoustic profiling to discriminate against multiple audio sources. Meanwhile, modern techniques leverage on composite audio-visual cues for a more precise audio source separation. With visual input, speakers are firstly recognized for their facial features, then voice-matched for corresponding audio signal filtering. However, existing speech separation techniques do not account for off-screen speakers when they are actively speaking in these videos. This project aims to design a robust speaker-aware speech separation pipeline to accommodate speech separation for offscreen speakers. The pipeline essentially performs speech separation in a sequential fashion, starting from (1) audio-visual speech separation for all visible speakers, then (2) performing blind source separation on residual audio signal to determine off-screen speech. Two independent models are designed, namely an audio-only and an audiovisual model, which is then merged together to form a pipeline that performs comprehensive speech separation. The outcome of the project is a data type agnostic speech separation technique that demonstrates robust filtering performance regardless of input types.

Item Type:	Final Year Project / Dissertation / Thesis (Final Year Project)
Subjects:	Q Science > Q Science (General)
Divisions:	Faculty of Information and Communication Technology > Bachelor of Computer Science (Honours)
Depositing User:	ML Main Library
Date Deposited:	07 Jan 2021 14:49
Last Modified:	07 Jan 2021 14:49
URI:	http://eprints.utar.edu.my/id/eprint/3906

Actions (login required)

View Item