Koi, Chin Chong (2024) Deep learning for image classification. Final Year Project, UTAR.
Abstract
Facial emotion detection plays a vital role in human communication, as facial expressions are key indicators of emotional states and intentions. This research investigates the application of Convolutional Neural Networks (CNNs), specifically VGG16 and ResNet50 models, for facial expression image classification using the AffectNet dataset. The primary objective is to enhance the accuracy of facial emotion recognition by employing various training strategies, including full training, fine-tuning with frozen and unfrozen layers, and leveraging pre-trained models. The Multi-task Cascaded Convolutional Networks (MTCNN) technique was integrated to improve face detection capabilities within these models. The results demonstrate that the VGG16 model achieved the highest accuracy. When fully trained from scratch on the AffectNet dataset, it attained an accuracy of 69.95%. This performance was further improved to 69.98% through fine-tuning with initially frozen layers, highlighting the effectiveness of leveraging pre-trained features while refining deeper layers for emotion recognition tasks. The ResNet50 model also showed significant improvement with fine-tuning, achieving an accuracy of 71.72% when layers were initially frozen and then fine-tuned. The CNN model performed moderately, with the best accuracy of 58.71% observed when layers were frozen during fine-tuning after full training. The integration of MTCNN with ResNet50 was particularly effective, allowing it to predict different sets of data with the most accuracy among the tested models. All findings contribute to the ongoing research in facial expression recognition and provide insights into the effectiveness of various CNN architectures and training strategies for improving emotion classification accuracy.
Actions (login required)