Enjoyable - a multi-contexts and real-time audio description method on YouTube videos for the visual impaired

Lee, Wei Song (2025) Enjoyable - a multi-contexts and real-time audio description method on YouTube videos for the visual impaired. Final Year Project, UTAR.

Preview

PDF
Download (5Mb) | Preview

Abstract

Video streaming platforms often lack sufficient accessibility features for visually impaired users, as generating audio descriptions (AD) manually is time-intensive and resource-heavy. This project introduces "Enjoyable," an online platform with an automated AD system. Instead of relying on external databases, the system enables content creators to label clustered faces directly within videos, improving character recognition across diverse genres. A structured script-based approach integrates image captions and dialogue, forming a comprehensive movie script. This script is processed by LLM-based Single-Prompt Multiturn Multi-Agent Reasoning System (SMARS), comprising agents—Investigator agent, Visual Validator agent, Context Historian agent, Integrator agent, Audio Describers agent, Syntax Fixers agent, Language Flow Expert agent, Word Count Checker agent, Fact Checker agent, Messenger agent, Target Audience agent—who collaboratively generate personalized ADs. This method enhances accessibility by streamlining visual-auditory data interaction and addressing limitations of existing AD systems.

Item Type:	Final Year Project / Dissertation / Thesis (Final Year Project)
Subjects:	T Technology > T Technology (General) T Technology > TD Environmental technology. Sanitary engineering
Divisions:	Faculty of Information and Communication Technology > Bachelor of Computer Science (Honours)
Depositing User:	ML Main Library
Date Deposited:	29 Aug 2025 11:26
Last Modified:	29 Aug 2025 11:26
URI:	http://eprints.utar.edu.my/id/eprint/7318

Actions (login required)

View Item