Choong, Ren Jun (2020) Smart Sound Recovery Using Visual Microphone. Final Year Project, UTAR.
Abstract
The visual microphone is a passive means of remote sound recovery from a silent high-speed video by extracting the subtle vibrations of objects in the video caused by sound. Despite reporting success in some of the published literature, there is still no work investigating the effect of framewise image denoising preprocessing before sound recovery, and has limited investigations done on the effect of different colour to grayscale conversion methods in the sound recovery process. There is also an open problem on whether a high video recording framerate truly causes a recovered sound to have a lower quality. Furthermore, the existing techniques used for sound recovery generally require active human intervention. This work fills in those gaps, first by establishing that performing image denoising before sound recovery can reduce the noise power by up to 47.16 %, increase the intelligibility by 9.11 % and signal-to-noise ratio by 17.97 %. Secondly, it is found that colour to grayscale conversion methods based on the weighted mean of the red, green and blue primary colour channels, and a perceptually uniform grayscale representation used in the International Commission on Illumination 1976 L*a*b* produce the best sound recovery performance, while any form of visual enhancement degrades the recovered sound quality. Thirdly, it is established that higher recording framerates do correspond to a better-recovered sound quality. Finally, a brief discussion on the possibilities of smart sound recovery using the results of this work is discussed.
Actions (login required)