Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall


Sattari M. T., Falsafian K., İRVEM A., S S., Qasem S. N.

Engineering Applications of Computational Fluid Mechanics, cilt.14, sa.1, ss.1078-1094, 2020 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 14 Sayı: 1
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1080/19942060.2020.1803971
  • Dergi Adı: Engineering Applications of Computational Fluid Mechanics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.1078-1094
  • Anahtar Kelimeler: Eastern Mediterranean, machine learning, Missing data, rainfall, random Forest, Turkey
  • Hatay Mustafa Kemal Üniversitesi Adresli: Evet

Özet

In this study, two kernel-based models were used which include Support Vector Regression (SVR) and Gaussian Process Regression (GPR) and were compared with two tree-based models that are M5 and Random Forest (RF) for estimating missing monthly precipitation data in Antakya, Dortyol, Iskenderun and Samandag stations, which are the important precipitation stations in the Eastern Mediterranean region, Turkey. For this purpose, firstly 10% random precipitation data were assumed as missing data for the period 1980-2019. Secondly, the missing data in each station was estimated with the data of other stations within the framework of four data combinations scenarios. In Kernel-based SVR and GPR methods, the RBF kernel gave suitable results for the selected study area. While SVR and RF methods gave very close estimation results, the SVR method gave relatively better results than the other methods especially in error minimizing aspects. Gaussian function based GPR model generally tries to estimate missing data closer to means. This is the main disadvantage of the GPR model and therefore it is unsuccessful in the estimation process. Finally, the results showed that the algorithms based on machine learning are successful in estimating the missing precipitation data.