27 research outputs found
Third Annual Report: Agricultural Sector Analysis in Thailand
The Thailand Agricultural Sector Analysis Program is a cooperative project between Iowa State University, the Ministry of Agriculture and Cooperatives (through its Division of Agricultural Economics) and USOM/ Thailand. The project, which has now completed its third year, was initiated July 1, 1973, in response to direct requests by the Ministry of Agriculture and Cooperatives for cooperation and collaboration in the development and application of sector analysis models and methods that have practical utility in guiding future development of Thailand\u27s agriculture at national, regional, and local levels.
The agricultural sector analysis planning activity is centered in the Division of Agricultural Economics (DAE) which is in the Office of the Under Secretary of State, Ministry of Agriculture and Cooperatives (MOAC), Royal Thai Government.
The purpose of the project is to provide Thai planners with an assessment of possible policy decisions at the Kingdom, region, or local level. The focal point of the project is the welfare of the 24 million people living in Thailand\u27s rural households.https://lib.dr.iastate.edu/dae-card_sectoranalysis/1005/thumbnail.jp
New variants of random forest-based methods for survival analysis and applications to biomedical datasets
Survival analysis problems involve predicting the time passed until the occurrence of an event of interest (the target variable), based on the values of some predictive features. Survival analysis is a specific type of supervised machine learning problem where the value of the target variable can be censored, meaning for some individuals, it may be known only that they survived (did not experience the event of interest) until a certain date, while it is unknown if the event of interest occurred after that date. Traditional supervised learning methods cannot directly cope with censored data, and so they need to be modified to properly address survival analysis problems.
This thesis focuses on the random forest algorithm, a popular and powerful supervised learning algorithm, and proposes new variants of random forest (RF) or RF-based algorithms for survival analysis.
The proposed RF or RF-based variants are evaluated on 11 survival analysis datasets created for this research, where the target variable is the time passed until an individual is diagnosed with a certain age-related disease. Most of these datasets were created by extracting relevant data from databases of longitudinal studies of ageing, so that the target variable denotes in general the time passed until an individual is diagnosed with some age-related disease.
This thesis has three main contributions, which involve proposing three new types of variants of RF or RF-based algorithms to cope with censored data in survival analysis problems, as follows.
The first contribution is to propose new RF variants with a modified procedure for creating subsets of training data to be used for learning the decision trees in a RF. This involves replacing the censored value of a target variable by another value which is then treated as an uncensored target value, allowing the other parts of a traditional RF algorithm to be applied without modification. Experiments with the 11 survival analysis datasets have shown that the proposed RF variants improved predictive accuracy in general when compared with the standard RF and some standard statistical methods for survival analysis, with statistical significance in some cases. However, the proposed RF variants were outperformed by a standard random survival forest (RSF) algorithm, a powerful RF-based algorithm developed specifically for survival analysis.
Motivated by the good performance of the RSF algorithm in the previously mentioned experiments, the second contribution of this thesis is to propose several new variants of the RSF algorithm. The proposed RSF variants focus on modifying two major components of the standard RSF algorithm: the criterion used for feature selection at each node of each tree in the forest, and the procedure used for computing the target variable value predicted by each leaf node of each tree. Experiments with the 11 survival analysis datasets have shown that, although the variations in the feature-selection criterion did not lead to significant differences in predictive accuracy, one of the variations in the procedure for computing the values predicted at leaf nodes achieved in general significantly higher accuracies than the standard RSF algorithm and the popular Cox Proportional Hazard (PH) algorithm.
The third contribution is to propose several new variants of the Deep Survival Forest (DSF) algorithm, which learns a more complex survival analysis model by stacking several learned RSF models into layers, inspired by deep learning principles. The proposed DSF variants focus on the base RSF algorithm used to learn the RSF models at each layer. More precisely, the proposed DSF variants replace the standard RSF algorithm with one of the RSF variants proposed earlier in this thesis, as base learners in each layer. Experiments with the 11 survival analysis datasets have shown that one of the proposed DSF variants achieved significantly higher predictive accuracy than the popular Cox PH algorithm and somewhat higher accuracy than the standard DSF in general.
In summary, this research has proposed new variants of RF or RF-based algorithms for coping with censored data in survival analysis problems; and in general the proposed algorithm variants have been shown to be competitive with (sometimes significantly more accurate than) standard methods for survival analysis