LLM-powered active learning for cost-effective text classification
dc.contributor.advisor | Makrehchi, Masoud | |
dc.contributor.author | Rouzegar, Hamidreza | |
dc.date.accessioned | 2024-12-03T16:53:03Z | |
dc.date.available | 2024-12-03T16:53:03Z | |
dc.date.issued | 2024-10-01 | |
dc.description.abstract | This thesis presents an LLM-powered active learning framework for cost-effective text classification, addressing the challenge of potential LLM annotation errors while balancing annotation quality and model accuracy. Our methodology combines human and large language model (LLM) annotations using uncertainty sampling and confidence scoring. Starting with a small, labeled seed set, the model iteratively selects the most informative data points for annotation, reducing labeling costs while maximizing performance. To simulate real-world scenarios, a dynamically updated proxy validation set mirrors the distribution of the unlabeled pool, enabling reliable performance estimation throughout training. The Performance Improvement Cost Ratio (PICR) is introduced as an objective stopping criterion to optimize the balance between costs and accuracy gains. Additionally, role-based prompting enhances annotation quality, creating a scalable framework adaptable to diverse text classification tasks. Experimental results demonstrate that the proposed approach achieves human-comparable performance at reduced costs, underscoring its potential for practical applications. | |
dc.identifier.uri | https://hdl.handle.net/10155/1867 | |
dc.language.iso | en | |
dc.subject.other | LLMs | |
dc.subject.other | Text classification | |
dc.subject.other | Active learning | |
dc.subject.other | Smart annotation | |
dc.subject.other | Role design | |
dc.title | LLM-powered active learning for cost-effective text classification | |
dc.type | Thesis | |
thesis.degree.discipline | Electrical and Computer Engineering | |
thesis.degree.grantor | University of Ontario Institute of Technology | |
thesis.degree.name | Master of Applied Science (MASc) |