Improving hand written digit recognition using hybrid feature selection algorithm

Wong, Khye Mun (2022) Improving hand written digit recognition using hybrid feature selection algorithm. Final Year Project, UTAR.

Preview

PDF
Download (3317Kb) | Preview

Abstract

In the field of machine learning, handwritten digit recognition was known as one of the crucial problems for pattern recognition and computer vision applications. There were a few applications of handwritten digit recognition, which include recognizing the digits on a utility map, zip code on a postal mail, identifying bank check amount processing and many more. Offline handwritten digits have different traits, such as size, orientation, position, and thickness. Every individual’s handwriting was unique in such a way that it would increase the difficulty level of the classification process. High outline similarities between certain digits and overfitting issues for high dimensional data would further affect the computational time and cost. Therefore, many researchers have applied and developed various machine learning algorithms that could efficiently tackle the handwritten digit recognition problem. In this report, the main objective was to obtain the binary classification accuracy of handwritten digit recognition in a Multiple Feature dataset (MFEAT). Minimum Redundancy and Maximum Relevance (mRMR) was used as the primary approach in this report because, being a filter method, it had the greater advantage over a wrapped and embedded method. mRMR could save computational time and effectively considering the relevance of subset features and redundancy within the selected handwritten digit feature. While mRMR was capable of identifying a subset of features that were highly relevant to the targeted classification variable, it still carry the weakness of capturing redundant features along with the algorithm. Support Vector Machine Recursive Feature Elimination (SVM-RFE) as an embedded method, was selected as an alternative approach besides mRMR. SVM-RFE could further select the subset features based on ranking weights criterion, insignificant features with small ranking weights will be removed while retaining only significant features that have greater influence. However, RFE was flawed by the fact that those features selected by RFE were not ranked by importance albeit RFE could effectively eliminate the less important features and exclude redundant features. In view of their respective strength and deficiency, this study combined both these methods and used a support vector machine (SVM) as the underlying classifier anticipating the mRMR to make an excellent complement to the SVM-RFE. The hybrid method was exemplified in a binary classification between digits ‘4’ and ‘9’ from a multiple features dataset. The proposed hybrid method together with two extra predictive models, namely the mRMR and the SVM-RFE, were built for comparison. As a result, four significant features were shortlisted to achieve the highest accuracy which was 100.00% by using the proposed hybrid method. Apart from that, the proposed hybrid method was capable of selecting the highest test accuracy of 99.2% when only one feature was included. The result showed that the hybrid mRMR+SVM-RFE was better than both the sole SVM-mRMR and the sole SVM-RFE approaches in the sense that the hybrid approach achieved higher classification accuracy by using a smaller amount of features.

Item Type:	Final Year Project / Dissertation / Thesis (Final Year Project)
Subjects:	H Social Sciences > HA Statistics Q Science > Q Science (General) T Technology > T Technology (General)
Divisions:	Faculty of Science > Bachelor of Science (Honours) Statistical Computing and Operations Research
Depositing User:	ML Main Library
Date Deposited:	05 Jan 2023 22:03
Last Modified:	05 Jan 2023 22:03
URI:	http://eprints.utar.edu.my/id/eprint/4940

Actions (login required)

View Item