UTAR Institutional Repository

Machine learning for email filtering and categorising

Tan, Kai Qin (2023) Machine learning for email filtering and categorising. Final Year Project, UTAR.

Download (2528Kb) | Preview


    As digitalisation persists, email has become the primary communication channel for personal and business users. This project focuses on three Natural Language Processing (NLP) tasks: 1) Spam Filtering, 2) Categorising, and 3) Summarising. For each task, it is using the Enron spam dataset, AG news dataset, and XSum dataset, respectively. Owing to the unprecedented growth in email transactions, businesses generally require an automated email management system to manage their mailbox, including applications in customer service and internal email. This project encompasses the classical machine learning method, conventional neural networks, and transformers for the tasks. For instance, a comparison is made for each task, and the model with the highest accuracy and F1 score is selected. Regarding the best performing model, they are Long Short-Term Memory (LSTM), Bi-directional LSTM, and PEGASUS for spam filtering, categorising, and summarising, respectively. Both LSTM and Bi-LSTM achieved the highest accuracy on the filtering and categorising tasks, with 99% and 92%, respectively. Similarly, the PEGASUS transformer has leveraged the summary similarity score by about 15% higher in all categories than the conventional neural network. The comparison concludes that limitations on training and machine specification will affect transformer’s performance in categorisation work. Conventional neural networks have the upper hand in text categorisation under the limitations, but transformers showed better resilience in summarisation owing to its unique training method. Interestingly, the neural network and transformer could not differentiate the similarities between different categories resulting in slightly lower accuracy. Furthermore, this project also presents a web-based interface for the three tasks to demonstrate the feasibility of the selected model in each designated task.

    Item Type: Final Year Project / Dissertation / Thesis (Final Year Project)
    Subjects: H Social Sciences > HG Finance
    Divisions: Lee Kong Chian Faculty of Engineering and Science > Bachelor of Science (Honours) Quantity Surveying
    Depositing User: Sg Long Library
    Date Deposited: 12 Dec 2023 16:24
    Last Modified: 12 Dec 2023 16:24
    URI: http://eprints.utar.edu.my/id/eprint/6154

    Actions (login required)

    View Item