ICENTE'22, Konya, Türkiye, 17 - 19 Kasım 2022, ss.55
According to data from the World Health Organization, around 17.9 million people die annually from
cardiovascular diseases. This is equivalent to approximately 32% of all global deaths. Over seventy-five
percent of these deaths occur in low- and middle-income nations. Determining the features that have the
greatest impact on the death or survival of heart patients and developing models that accurately predict patient
survival is an important issue of the present day. . In recent years, machine learning has been used to predict
patients' survival during follow-up by combining their medical records with other features such as gender,
age and weight. However, the enormous quantity of features makes it challenging for physicians to diagnose
diseases and severely impacts the prediction performance of machine learning in terms of cost and time. In
this regard, it is essential to keep an optimal number of features and select the most effective ones. In the
proposed study, a dataset was used on the survival of heart patients from the data repository at the University
of California Irvine. This dataset includes a total of 13 different patient features, which were collected from
299 different individuals. The recursive feature elimination method was used for feature selection in order to
identify the parameters that have the most impact on patient survival. The yeo-johnson power transformation
was applied from the normalizing approaches to make the feature sets that do not have a normal distribution
from the selected features closer to the normal distribution. Finally, Support Vector Machines, Naive Bayes,
Random Forest, Decision Tree, Logistic Regression, XGBoost, CatBoost, and the K-Nearest Neighbor
machine learning algorithms were used to predict the survival of patients with heart disease. As a result of
the study, the number of features used to predict patient survival was reduced to six, and a confusion matrix
was produced to assess and compare the results of machine learning models in terms of accuracy, recall, and
precision. According to the obtained results, the algorithm XGBoost best predicts the survival of patients
with a 90% level of accuracy.