UTAR Institutional Repository

Efficient similarity index based task routing for LLM

Liew, Zheng Xian (2025) Efficient similarity index based task routing for LLM. Final Year Project, UTAR.

[img] PDF
Download (2598Kb)

    Abstract

    Large Language Models (LLMs) represent a significant innovation within the field of Generative AI (GenAI), yet their practical deployment is hampered by a critical tradeoff between the high performance of large models and the cost-efficiency of smaller ones. This research addresses this challenge by designing and evaluating an intelligent LLM routing system to optimize both cost and performance. Four distinct routing methods, including standalone classifiers like XGBoost and a hybrid KNN+XGBoost model, were evaluated on their ability to dynamically select the most suitable LLM for queries across diverse benchmarks: conversational (MT-Bench), mathematical reasoning (GSM8K), and multi-domain (MMLU). The results demonstrate that the proposed routing framework achieves significant cost savings, reducing expenses by up to 2.76× on MT-Bench with minimal impact on response quality. The hybrid KNN+XGBoost model proved most robust, particularly on the challenging MMLU benchmark where other models failed, simpler classifiers struggled in specialized domains. Ultimately, this project validates LLM routing as a powerful strategy for cost optimization.

    Item Type: Final Year Project / Dissertation / Thesis (Final Year Project)
    Subjects: T Technology > T Technology (General)
    Divisions: Faculty of Information and Communication Technology > Bachelor of Computer Science (Honours)
    Depositing User: ML Main Library
    Date Deposited: 29 Dec 2025 00:03
    Last Modified: 29 Dec 2025 00:03
    URI: http://eprints.utar.edu.my/id/eprint/7131

    Actions (login required)

    View Item