The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning


Creative Commons License

Çılgın C., Gökşen Y., Gökçen H.

İzmir Journal of Social Sciences, cilt.5, sa.1, ss.9-20, 2023 (Hakemli Dergi)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 5 Sayı: 1
  • Basım Tarihi: 2023
  • Doi Numarası: 10.47899/ijss.1270433
  • Dergi Adı: İzmir Journal of Social Sciences
  • Derginin Tarandığı İndeksler: Index Copernicus
  • Sayfa Sayıları: ss.9-20
  • Gazi Üniversitesi Adresli: Evet

Özet

For those who invest in real estate as an investment tool, as well as those who buy and sell real estate, the price of real estate should be predicted realistically and with the highest accuracy. It should be noted that the predict model should be the most appropriate representation of the underlying fundamentals of the market. Otherwise, the mistake to be made in the real estate valuation will cause some undesirable results such as inconsistent and unhealthy increase or decrease of the property tax, excessive gains or losses in favor of some groups, and adverse effects on investors and potential real estate owners. At this point, data-driven real estate valuation approaches are preferred more frequently to create highly accurate and unbiased estimates. However, the consistency, precision and accuracy of the models realized with machine learning approaches are directly related to the data quality. At this point, the effects of outlier detection on prediction performance in real estate valuation are investigated with a large data set obtained in this study. For this purpose, a heterogeneous data set with 70.771 real estate data and 283 variables, 4 different outlier detection methods were tested with 3 different machine learning approaches. The empirical findings reveal that the use of different outlier detection approaches increases the prediction performance in different ranges. With the best outlier detection approach, this performance increase was at a high 21,6% for Random Forest, with a 6,97% increase in average model performance.