UTAR Institutional Repository

An application for identifying movies from plot with word embeddings and deep learning

Kean, Soh Zhe Herng (2023) An application for identifying movies from plot with word embeddings and deep learning. Final Year Project, UTAR.

[img]
Preview
PDF
Download (2044Kb) | Preview

    Abstract

    Natural language processing (NLP) is a field of study in computer science that aims to help computers understand and process human language. Advancements in NLP technology have led to improvements in interactions between humans and computers. Through NLP, the average technology user does not necessarily have to be an expert in computers to “talk” to computers. A common NLP task carried out by computers is multiclass text classification, which allows computers to group documents of similar meaning into one category. In this paper, a movie identifier from plot which implements the multiclass text classification task mentioned above through a combination of natural language processing and deep learning techniques is proposed to help people who wish to identify movies they have watched in the past but have forgotten their titles. The application can also help people who have heard of bits and pieces of a movie’s plot search for the movie themselves. The proposed model receives an input of plots from movies extracted from a dataset. Next, preprocessing is performed on the text, such as stemming and lemmatization. Stopwords are removed from the text to discard any words that are not meaningful. The corresponding movie titles of the plots are encoded into integers as targets for the model to predict. The text from the plots is tokenized and encoded into integers as well so that it can be interpreted by the model. As seen in the upcoming parts of this paper, multiple architectures will be reviewed and experimented on. However, most of these architectures follow a similar route in terms of learning features from the text mentioned above, that is transforming the tokens into some sort of embedding layer, subjecting those embeddings through multiple layers in a neural network, and finally classifying the input text and predict the title of the movie referenced in it.

    Item Type: Final Year Project / Dissertation / Thesis (Final Year Project)
    Subjects: Q Science > Q Science (General)
    T Technology > T Technology (General)
    Divisions: Faculty of Information and Communication Technology > Bachelor of Computer Science (Honours)
    Depositing User: ML Main Library
    Date Deposited: 08 Sep 2023 21:36
    Last Modified: 08 Sep 2023 21:36
    URI: http://eprints.utar.edu.my/id/eprint/5519

    Actions (login required)

    View Item