A multiclass hybrid approach to estimating software vulnerability vectors and severity score

Kekul, HAKAN; ERGEN, BURHAN; ARSLAN, HALİL

doi:10.1016/j.jisa.2021.103028

A multiclass hybrid approach to estimating software vulnerability vectors and severity score

Kekul H., ERGEN B., ARSLAN H.

JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, cilt.63, 2021 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 63
Basım Tarihi: 2021
Doi Numarası: 10.1016/j.jisa.2021.103028
Dergi Adı: JOURNAL OF INFORMATION SECURITY AND APPLICATIONS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Anahtar Kelimeler: Software security, Software vulnerability, Information security, Text analysis, Multiclass classification, TEXT CLASSIFICATION, SELECTION
Sivas Cumhuriyet Üniversitesi Adresli: Evet

Özet

Classifying detected software vulnerabilities is an important process. However, the metric values of security vectors are manually determined by humans, which takes time and may introduce errors stemming from human nature. These metrics are important because of their role in the calculation of vulnerability severity. It is necessary to use machine learning algorithms and data mining techniques to improve the quality and speed of vulnerability analysis and discovery processes. However, studies in this area are still limited. In this study, vulnerability vectors were estimated using the natural language processing techniques bag of words, term frequency-inverse document frequency, and n -gram for feature extraction together with various multiclass classification algorithms, namely Naive Bayes, decision tree, k-nearest neighbors, multilayer perceptron, and random forest. Our experiments using a large public dataset facilitate assessment and provide a standard-compliant prediction model for classifying software vulnerability vectors. The results show that the joint use of different techniques and classification algorithms is a promising solution to a multi-probability and difficult-to-predict problem. In addition, our study fills an important gap in its field in terms of the size of the dataset used and because it covers a vulnerability scoring system version that has not yet been extensively studied.