UTAR Institutional Repository

Speech to text with emoji

Tong, Kah Pau (2023) Speech to text with emoji. Final Year Project, UTAR.

[img]
Preview
PDF
Download (5Mb) | Preview

    Abstract

    Speech transcription technology has been a significant life changer in the media, entertainment, and education fields. Transcript services have greatly simplified the work of record-keeping, research, and note-taking without the inconvenience of manually transcribing protracted audio or video segments for hours at a time. However, reading plain text alone from the transcription cannot convey the messenger's emotion compared to listening to it. Humans are wired to experience many basic emotions. These fundamental emotions assist us in understanding, connecting, and communicating with others. Thus, emoticons deliver the user's emotions in the message. With our current technology, we can add emoticons to our text by choosing it manually or by saying the emoticons' names using speech recognition technology. However, this may cause some hassle and other problems. In this project, an artificial intelligence-based mobile application for emotional voice transcription was proposed to solve the difficulties of improving digital communication, increasing equality for disabled persons, and boosting attentiveness in online courses. The objectives of this project encompass examining the feasibility of voice recognition for emotion detection, develop an emotional voice recognition system that accurately measures various speech features to display appropriate emojis and create a speech-to-text solution that transcribes text with emojis at a rate comparable to the user's speech rate and emotional portrayal. Furthermore, prototyping methodology was chosen as the project approach. It consists of a requirement analysis phase, followed by a five steps repeatable cycle: design, model training, prototyping, review, and refinement, and finally, the development, test, and release phase. In conclusion, the final prototype achieved a processing speed of 10-15 seconds, a speech transcript accuracy of 99.5%, and an emotion identification accuracy of 80.3% via incremental upgrades and adjustments through the prototype and development phase. Although there were future enhancements and improvements, such as a customised voice profile, multilingual assistance, transcription sharing and system architecture change to the client and server side, the project is considered successful where all objectives are fulfilled.

    Item Type: Final Year Project / Dissertation / Thesis (Final Year Project)
    Subjects: Q Science > QA Mathematics > QA76 Computer software
    Divisions: Lee Kong Chian Faculty of Engineering and Science > Bachelor of Science (Honours) Software Engineering
    Depositing User: Sg Long Library
    Date Deposited: 05 Oct 2023 19:57
    Last Modified: 05 Oct 2023 19:57
    URI: http://eprints.utar.edu.my/id/eprint/5889

    Actions (login required)

    View Item