Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall


Sattari M. T., Falsafian K., İRVEM A., S S., Qasem S. N.

Engineering Applications of Computational Fluid Mechanics, vol.14, no.1, pp.1078-1094, 2020 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 14 Issue: 1
  • Publication Date: 2020
  • Doi Number: 10.1080/19942060.2020.1803971
  • Journal Name: Engineering Applications of Computational Fluid Mechanics
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, INSPEC, Directory of Open Access Journals
  • Page Numbers: pp.1078-1094
  • Keywords: Eastern Mediterranean, machine learning, Missing data, rainfall, random Forest, Turkey
  • Hatay Mustafa Kemal University Affiliated: Yes

Abstract

In this study, two kernel-based models were used which include Support Vector Regression (SVR) and Gaussian Process Regression (GPR) and were compared with two tree-based models that are M5 and Random Forest (RF) for estimating missing monthly precipitation data in Antakya, Dortyol, Iskenderun and Samandag stations, which are the important precipitation stations in the Eastern Mediterranean region, Turkey. For this purpose, firstly 10% random precipitation data were assumed as missing data for the period 1980-2019. Secondly, the missing data in each station was estimated with the data of other stations within the framework of four data combinations scenarios. In Kernel-based SVR and GPR methods, the RBF kernel gave suitable results for the selected study area. While SVR and RF methods gave very close estimation results, the SVR method gave relatively better results than the other methods especially in error minimizing aspects. Gaussian function based GPR model generally tries to estimate missing data closer to means. This is the main disadvantage of the GPR model and therefore it is unsuccessful in the estimation process. Finally, the results showed that the algorithms based on machine learning are successful in estimating the missing precipitation data.