Decision tree-based data mining approach for the evaluation of survival in primary malignant bone tumors: A surveillance, epidemiology and end results database study


Yapar D., Yapar A., TOKGÖZ M. A., BİLGE U.

Journal of Orthopaedic Surgery, vol.31, no.2, 2023 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 31 Issue: 2
  • Publication Date: 2023
  • Doi Number: 10.1177/10225536231189780
  • Journal Name: Journal of Orthopaedic Surgery
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, MEDLINE, Directory of Open Access Journals
  • Keywords: data mining, decision tree, primary malignant bone tumors, SEER database
  • Gazi University Affiliated: Yes

Abstract

Purpose: This study aimed to conduct a large-scale population-based study to understand the epidemiological characteristics of Primary Malignant Bone Tumors (PMBTs) and determine the prognostic factors by concurrently using the classical statistical method and data mining methods. Methods: Patients included in this study were extracted from the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) database: “Incidence-SEER Research Data, 18 Registries, Nov 2020 Sub”. Patients with unclassified and incomplete information were excluded. This search algorithm resulted in a dataset comprising 6234 cases. Survival analyses were performed with Kaplan-Meier curves and the Log-rank test. Multivariate Cox regression analysis determined the independent prognostic factors of PMBT. A decision tree-based data mining technique was used in this study to confirm the prognostic factors. Results: 5-years survival rate was 63.6% and 10-years survival rate was 55.3% in the patients with PMBT. Sex, age, median household income, histology, primary site, grade, stage, metastasis, and the total number of malignant tumors were determined as independent risk factors associated with overall survival (OS) in the multivariate COX regression analysis. The prognostic factors resulting in five terminal nodes in the decision tree (DT) included stage, age, and grade. The stage was the most important determining factor for vital status. The terminal node with the shortest number of surviving patients included 801 (72.3%) deaths in 1102 patients with distant stage, and hazard ratio was calculated as 5.4 (95% CI: 4.9–5.9; p <.001). These patients had a median survival of only 17 months. Conclusions: Rules extracted from DTs provide information about risk factors in specific patient groups and can be used by clinicians making decisions on individual patients. We recommend using DTs in combination with COX regression analysis to determine risk factors and the effect of these factors on survival.