Big data analytics for default prediction using graph theory

Yildirim, Mustafa; Okay, FEYZA; Ozdemir, SUAT

doi:10.1016/j.eswa.2021.114840

Big data analytics for default prediction using graph theory

Yildirim M., Okay F., Ozdemir S.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.176, 2021 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 176
Basım Tarihi: 2021
Doi Numarası: 10.1016/j.eswa.2021.114840
Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, INSPEC, Metadex, Public Affairs Index, Civil Engineering Abstracts
Anahtar Kelimeler: Big data analytics, Graph theory, Machine learning, Default prediction, SHAP value, BANKRUPTCY PREDICTION, CREDIT RISK, GENETIC ALGORITHM, FINANCIAL RATIOS, LEARNING-MODELS, NEURAL-NETWORKS, MACHINE, SELECTION, REGRESSION, CLASSIFICATION
Gazi Üniversitesi Adresli: Evet

Özet

With the unprecedented increase in data all over the world, financial sector such as companies and industries try to remain competitive by transforming themselves into data-driven organizations. By analyzing a huge amount of financial data, companies are able to obtain valuable information to determine their strategic plans such as risk control, crisis management, or growth management. However, as the amount of data increase dramatically, traditional data analytic platforms confront with storing, managing, and analyzing difficulties. Emerging Big Data Analytics (BDA) overcome these problems by providing decentralized and distributed processing. In this study, we propose two new models for default prediction. In the first model, called DPModel-1, statistical (logistic regression), and machine learning methods (decision tree, random forest, gradient boosting) are employed to predict company default. Derived from the first model, we propose DPModel-2 based on graph theory. DPModel-2 also comprises new variables obtained from the trading interactions of companies. In both models, grid search optimization and SHapley Additive exPlanations (SHAP) value are utilized in order to determine the best hyperparameters and make the models interpretable, respectively. By leveraging balance sheet, credit, and invoice datasets, default prediction is realized for about one million companies in Turkey between the years 2010?2018. The default rates of companies range between 3%-6% by year. The experimental results are conducted on a BDA platform. According to the DPModel-1 results, the highest AUC score is ensured by random forest with 0.87. In addition, the results are improved for each technique separately by adjusting new variables with graph theory. According to DPModel-2 results, the best AUC score is achieved by random forest with 0.89.