Empirical analysis of code smell prediction in mobile applications based on software quality indicators


Kekül H.

Journal of Computer Languages, cilt.87, sa.87, ss.101402, 2026 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 87 Sayı: 87
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1016/j.cola.2026.101402
  • Dergi Adı: Journal of Computer Languages
  • Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED)
  • Sayfa Sayıları: ss.101402
  • Sivas Cumhuriyet Üniversitesi Adresli: Evet

Özet

This study presents a machine learning-based approach for detecting code smells in Android applications. An open-source dataset containing size, complexity, object-oriented, and Android-specific code metrics was used in the study. The dataset consists of 629 instances in total, and the prevalence rates of the eight code smells range from 26.23% to 75.36%. This indicates that the problem involves a moderate level of class imbalance. To evaluate the contribution of Android-specific metrics, an ablation analysis was conducted on four different feature sets. These feature sets were constructed from classical metrics, Android-specific metrics, a configuration excluding metrics that may carry application-level information leakage risk, and a set including all metrics. In addition, to prevent classes belonging to the same application from appearing simultaneously in both the training and test partitions, the evaluation process was carried out using package-based group-aware nested cross-validation. Eight classification algorithms were evaluated: Gaussian Naive Bayes, Extra Trees, Random Forest, Logistic Regression, Multi-Layer Perceptron, K-Nearest Neighbors, Linear Discriminant Analysis, and Quadratic Discriminant Analysis. The results show that the Random Forest model achieved the highest average F1 score for many code smells, while the Extra Trees model stood out particularly for the Swiss Army Knife smell. The average-rank and post-hoc analyses revealed that Random Forest was generally the strongest model, although this superiority was not equally statistically significant against all other models. The ablation results indicate that the strongest discriminative signal generally comes from classical software metrics, while Android-specific metrics provide a complementary contribution for certain smells. The findings demonstrate that machine learning techniques provide an effective and practical approach for code smell detection in Android applications.