Koe, Jia Chi (2020) Design of Predictive Model for TCM Tongue Diagnosis In Malaysia Using Machine Learning. Final Year Project, UTAR.
Abstract
In recent years, Traditional Chinese Medicine (TCM) has gained popularity in Malaysia. There are four diagnostic methods (四诊) in TCM: Inspection (望), Listening and Smelling (闻), Inquiry (问) and Palpation (切). Tongue diagnosis which is part of Inspection is carried out through the observation on patient’s tongue body and coating. However, tongue diagnosis is subjective and is lack of objective evaluation criteria as the judgement is made based on the TCM physician’s experience, and thus different physicians might have different judgements towards the same patient. The lack of objectivity and standard evaluation criteria in tongue diagnosis have restricted its development. In this project, machine learning algorithm will be applied to design a predictive model for TCM tongue diagnosis. This project is divided into several parts, specifically as follows: 1. The existing tongue image acquisition system has strict requirements on the light source and the camera. However, the portability and popularity of these instruments are still poor, and thus an easy way of taking tongue image by using mobile camera is proposed. Five important rules for taking a tongue image are established to ensure the image quality. 2. Mask R-CNN is trained to segment the tongue from the image. The results show that it is able to segment the tongue under different illumination and even if it is blur or not captured exactly from the front of the tongue. 3. Four tongue features (greasy tongue coating (腻苔), teeth-marks (齿痕), cracks (裂纹), and spots (点刺) ) are extracted from each image. YOLO are employed in this project to extract cracks and teeth-marks while Mask R-CNN are used to extract greasy tongue coating and spots. YOLO achieves 100% accuracy in extracting cracks and near 80% accuracy in extracting teeth-marks. Meanwhile, Mask R-CNN achieves 87.5% accuracy in extracting greasy tongue coating,. However, both Mask R-CNN and YOLO do not perform well in extracting spots. Although Mask R-CNN achieves 85% accuracy, its sensitivity and F1- score are just 45% and 47% respectively. vii 4. Six supervised machine learning algorithms (Linear Regression, Logistic Regression, K Nearest Neighbors (KNN), Decision Trees (DT), Support Vector Machine (SVM) and Random Forest) are used to perform disease prediction. Besides, cross validation and bootstrap are implemented to ensure the robustness and to improve the accuracy of the predictive model. Two predictions are carried out: prediction of healthy/unhealthy and prediction of high blood pressure/no high blood pressure. However, all algorithms perform poorly in predicting healthy/unhealthy as the highest accuracy is just 68% which was obtained using DT. For prediction of high blood pressure/no high blood pressure, all algorithms have really bad performance without bootstrapping, where their accuracies are all around 50% while their sensitivity and F1-score were less than 20%. After bootstrapping, KNN and SVM are able to achieve near 80% in accuracy, sensitivity and F1- score. KNN even achieves 90% sensitivity. In other words, KNN is able to catch most of the positive cases correctly.
Actions (login required)