A web-based implementation of k-means algorithms

Lee, Quan (2022) A web-based implementation of k-means algorithms. Final Year Project, UTAR.

Preview

Abstract

The K-means algorithm has been around for over a century. While a rather simplistic and dated algorithm, it remains widely used and taught till this day. The K-means algorithm requires two inputs for it to be applied onto a data set, the value K, and a proximity measure. Picking the right inputs is of utmost importance if one wishes to achieve good results with the algorithm, especially the proximity measure. There are plenty of different proximity measures available in the world, all of them best suited for different types of applications and data sets. Yet knowing this, most modern data mining tools only offer a handful of proximity measures to the user, with the most common ones being Euclidean distance and Manhattan distance. This stinginess of proximity measures in data mining tools is stifling the performance of the algorithm. This is where k-luster comes in. k-luster, the web application developed as a result of this project, implements the K-means and K-means++ algorithm along with ten proximity measures, seven of which are distance measures and whereas the remaining three are similarity measures. The project was planned using the Kanban development methodology, and was built using HTML, CSS, JavaScript, Django, NumPy and pandas. The completed web application is then hosted on Heroku. k-luster allows users to upload their own data set, or choose from one of three samples if they just want to try out the application. Playing around with different settings and comparing the results obtained, it is clear how large of an impact choosing the right proximity measure can make. In conclusion, this project has accomplished what it first set out to achieve. However, there is still much room for improvement. Firstly, k-luster could incorporate additional clustering algorithms, or even classification algorithms in the future. Furthermore, the web application could save the users’ past work, so that they may resume their work at a later time without skipping a beat.

Item Type:	Final Year Project / Dissertation / Thesis (Final Year Project)
Subjects:	Q Science > QA Mathematics > QA76 Computer software
Divisions:	Lee Kong Chian Faculty of Engineering and Science > Bachelor of Science (Honours) Software Engineering
Depositing User:	Sg Long Library
Date Deposited:	26 Dec 2022 22:19
Last Modified:	26 Dec 2022 22:19
URI:	http://eprints.utar.edu.my/id/eprint/5010

Actions (login required)

View Item