From Language Identification to Topic Modelling: A Fully Integrated Pipeline for Multilingual Aspect-Based Sentiment Analysis and Component-Level Insights

ŞEKER, ABDULKADİR

doi:10.1111/exsy.70300

From Language Identification to Topic Modelling: A Fully Integrated Pipeline for Multilingual Aspect-Based Sentiment Analysis and Component-Level Insights

ŞEKER A.

Expert Systems, cilt.43, sa.7, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 43 Sayı: 7
Basım Tarihi: 2026
Doi Numarası: 10.1111/exsy.70300
Dergi Adı: Expert Systems
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Compendex, INSPEC, Library, Information Science & Technology Abstracts (LISTA), Psycinfo
Anahtar Kelimeler: language identifcation, large language models, machine translation, sentiment analysis, topic modelling
Sivas Cumhuriyet Üniversitesi Adresli: Evet

Özet

The rapid expansion of multilingual digital platforms has made the accurate analysis of user-generated content across different languages and cultural contexts increasingly essential. However, existing methods struggle to maintain consistent performance due to linguistic diversity, morphological complexity, and structural variations in text. Many studies in the literature analysis stages as isolated components, which causes errors in early stages to propagate and negatively affect overall performance. To address these challenges, this study proposes an integrated and multilingual aspect-based sentiment analysis pipeline that encompasses language identification, machine translation, sentiment classification, and topic modelling. The proposed approach evaluates a comprehensive range of models, from statistical methods to transformer-based architectures and Large Language Models, using the M-ABSA and MARC datasets which comprise over one million reviews across 21 languages. The analyses not only assess the final model performance but also examine the relative contributions, limitations, and error-propagation effects of each component in the pipeline. The findings quantitatively reveal how linguistic diversity, noise, and contextual variability in real-world data influence analysis processes, while systematically comparing the behaviour of generative models and traditional approaches under such challenging conditions. Overall, the study underscores the necessity of an end-to-end, integrated analysis pipeline over fragmented solutions and provides a comprehensive methodological and practical contribution to the field of multilingual text processing.