UTAR Institutional Repository

Robust Data Fusion Techniques Integrated Machine Learning Models For Estimating Reference Evapotranspiration

Chia, Min Yan (2022) Robust Data Fusion Techniques Integrated Machine Learning Models For Estimating Reference Evapotranspiration. PhD thesis, UTAR.

[img]
Preview
PDF
Download (9Mb) | Preview

    Abstract

    Evapotranspiration (ET) is one of the most important hydrological processes as it has prominent effects on the environment’s energy balance and water budget. Accurate estimation of the ET is vital for many national-level decisions making processes, including water resources allocation, irrigation scheduling as well as crop management. To date, many related hydrological works still endorse the Penman-Monteith (PM) model as the standard for the computation of the reference evapotranspiration (ET0) as per the recommendation by the United Nations Food and Agricultural Organisation. ET0 is a value which can be converted to the actual crop ET (ETc) with the inclusion of a crop-dependent factor. However, despite the PM model being accepted as a universal method for determining the ET0, this method is often criticised due to the high number of meteorological variables needed. Thus, many researchers had resorted to the utilisation of machine learning models to overcome this pitfall of the PM model. Nonetheless, based on the literature review performed, machine learning models are data-hungry in nature, which increases the difficulty of training a model from scratch. The data hunger of machine learning models can be classified into two categories, namely the qualitative hunger (where machine learning models need for various features for training) and quantitative hunger (need for a vast amount of historical data for training). This forms the major gap in the research field. The works presented in the thesis strive to solve the data hunger of machine learning models through the integration of data fusion techniques, with a minimalistic approach by using simple yet robust models. Besides, two scenarios (Scenario 2 and Scenario 3) were designed to evaluate the spatial robustness of the developed models, so that the local data dependency can be discounted. This study was performed at 12 meteorological stations, using meteorological data dated from 1st January 2000 to 31st December 2019, and which are distributed across Peninsular Malaysia, whereby about 19.7 % of its land is covered by oil palm plantations (a major contributor to the country’s agricultural output). The multilayer perceptron (MLP), the support vector machine (SVM) and the adaptive neuro-fuzzy inference system (ANFIS) were used as the base models for obtaining optimum input combinations as well as benchmark performances at each of the stations. Three different data fusion techniques were investigated, including the data centric bootstrap aggregating, the model centric Bayesian modelling approach and the black-box based non-linear neural ensemble (NNE). Observations of the results of this study revealed that the solar radiation (Rs) is the most essential variable for estimating ET0 in Peninsular Malaysia. The accuracy of the estimations using the MLP, SVM and ANFIS could be improved by the inclusion of different complementary variables, which vary depending on the geographical characteristics at the meteorological stations. The bootstrap aggregating failed in enhancing the performance of the MLP, SVM and ANFIS. The size of the dataset overwhelmed the problem’s dimensionality, thus rendering the bootstrap aggregating to be ineffective. The Bayesian model averaging (BMA) enhanced the estimation of the ensembles of the base MLP, SVM and ANFIS. This was done through the Bayesian weight assignments to combine the favourable traits of the individual models. However, the BMA algorithm was found to be rigid as it was results-oriented and might opt to omit some base models if their performance were significantly poorer than the others. This happened when the number of input meteorological variables was high, and the BMA was converted to the Bayesian model selection (BMS). Nevertheless, when the number of input meteorological variables was low, the BMA based ensemble (BMA-E) produced satisfactory performance. As for the NNE, a novel meta-learner based on the stochastic-enabled extreme learning machine integrated with whale optimisation algorithm (WOA-ELM) was developed and used in such an application for the first time. The results showed that the WOA-ELM based ensemble (WOA-ELM-E) improved the performance of the base models in general. This was attributed to the flexibility of its structure and its ability to “look” at the target value once more during its training phase. The WOA-ELM-E was found to be the best model at most of the meteorological stations. Furthermore, when the best local models were tested at external stations (Scenario 2), only the WOA-ELM-E could produce estimations with satisfactory accuracy. The best models of other variants such as the BMA-E, MLP and ANFIS could only produce acceptable accuracy if they were applied in regions with similar geographical characteristics. In other words, the WOA-ELM-E can be said to have good spatial robustness, especially the one trained at Station 48620 (Sitiawan). This, in turn, could nullify the need for local model development and local data collection, consequently overcoming the qualitative and quantitative hungers of the classical machine learning models. Another scenario (Scenario 3) was designed to study the usefulness of globally pooled data in enhancing the spatial robustness of the WOA. The results showed that such an approach produced a hybrid model ELM E which had similar robustness as the one trained considered to be effective. In c at Station 48620 (Sitiawan) and was onclusion, the output of the research works reported in this thesis ET 0 includes the approach for developing a one across Peninsular Malaysia accurately. This for can all model to estimate the be regarded as the major contribution as it could po ssibly elim inate the process of local data collection for the development or calibration of a local ET proposal and implementation of water 0 estimating model. Subsequently, the resources improve the social welfare at a national level.

    Item Type: Final Year Project / Dissertation / Thesis (PhD thesis)
    Subjects: Q Science > QC Physics
    T Technology > T Technology (General)
    Divisions: Institute of Postgraduate Studies & Research > Lee Kong Chian Faculty of Engineering and Science (LKCFES) - Sg. Long Campus > Doctor of Philosophy in Engineering
    Depositing User: Sg Long Library
    Date Deposited: 26 Aug 2022 02:24
    Last Modified: 26 Aug 2022 02:24
    URI: http://eprints.utar.edu.my/id/eprint/4608

    Actions (login required)

    View Item