The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behcet's Disease


IŞIK Y. E., GÖRMEZ Y., AYDIN Z., Bakir-Gungor B.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, cilt.19, sa.3, ss.1909-1918, 2022 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 19 Sayı: 3
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1109/tcbb.2021.3053429
  • Dergi Adı: IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, BIOSIS, Biotechnology Research Abstracts, Communication Abstracts, Compendex, EMBASE, INSPEC, MEDLINE, Metadex, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.1909-1918
  • Anahtar Kelimeler: Diseases, Feature extraction, Machine learning, Predictive models, Bioinformatics, Support vector machines, Radio frequency, Behcet's disease (BD), feature selection, machine learning, disease prediction, most informative SNPs, GENOME-WIDE ASSOCIATION, FEATURE-SELECTION, RISK PREDICTION, IDENTIFICATION, CLASSIFICATION, ALGORITHMS
  • Sivas Cumhuriyet Üniversitesi Adresli: Evet

Özet

Behcet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behcet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.