Document Type

Conference Proceeding

Publication Date

1-2-2026

Abstract

In this paper, we introduce Augmented Thematic Analysis with Large Language Models (ATA-LLM), a novel framework that integrates manifold learning algorithms and clustering techniques to support inductive thematic analysis. This qualitative method, widely used in software engineering, is essential for uncovering patterns and understanding human factors and software requirements. Traditional thematic analysis involves data coding, theme identification, and the interpretation of complex narratives, making it a labor-intensive and time-consuming process. Recent advances in large language models (LLMs) offer promising opportunities; however, it remains unclear how comparable these approaches are to traditional human thematic analysis. To address this gap, we evaluated ATA-LLM using a validated qualitative dataset and compared the outcomes against human-coded analysis. Our findings indicate that within the ATA-LLM framework, DenseMAP and UMAP effectively preserve both local and global structures of high-dimensional data, resulting in more coherent and meaningful themes than other techniques. These results highlight the potential of ATA-LLM to enhance the rigor, consistency, and efficiency of inductive thematic analysis.

Comments

This is a pre-copy-editing, author-produced PDF of an article presented at SEET—Software Engineering for Emerging Technologies 2025 and accepted for publication in Communications in Computer and Information Science, volume 2725, in 2026.  The final publication may differ and is available at Springer via https://doi.org/10.1007/978-3-032-08977-9_29.

Copyright

Springer

Available for download on Saturday, January 02, 2027

Share

COinS