Search CORE

5 research outputs found

Machine learning, statistical learning and the future of biological research in psychiatry

Author: Barr
Batista
Bishop
Breiman
Cortes
D. Stahl
Ding
Everitt
Glantz
Guyon
Hand
Kohavi
Kotsiantis
Maddala
Miles
Mitchell
P. McGuffin
R. Iniesta
Rokach
Russell
Scholkopf
Tibshirani
Vapnik
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Hybrid dragonfly algorithm with neighbourhood component analysis and gradient tree boosting for crime rates modelling

Author: Khairuddin Alif Ridzuan
Publication venue
Publication date: 01/01/2021
Field of study

In crime studies, crime rates time series prediction helps in strategic crime prevention formulation and decision making. Statistical models are commonly applied in predicting time series crime rates. However, the time series crime rates data are limited and mostly nonlinear. One limitation in the statistical models is that they are mainly linear and are only able to model linear relationships. Thus, this study proposed a time series crime prediction model that can handle nonlinear components as well as limited historical crime rates data. Recently, Artificial Intelligence (AI) models have been favoured as they are able to handle nonlinear and robust to small sample data components in crime rates. Hence, the proposed crime model implemented an artificial intelligence model namely Gradient Tree Boosting (GTB) in modelling the crime rates. The crime rates are modelled using the United States (US) annual crime rates of eight crime types with nine factors that influence the crime rates. Since GTB has no feature selection, this study proposed hybridisation of Neighbourhood Component Analysis (NCA) and GTB (NCA-GTB) in identifying significant factors that influence the crime rates. Also, it was found that both NCA and GTB are sensitive to input parameter. Thus, DA2-NCA-eGTB model was proposed to improve the NCA-GTB model. The DA2-NCA-eGTB model hybridised a metaheuristic optimisation algorithm namely Dragonfly Algorithm (DA) with NCA-GTB model to optimise NCA and GTB parameters. In addition, DA2-NCA-eGTB model also improved the accuracy of the NCA-GTB model by using Least Absolute Deviation (LAD) as the GTB loss function. The experimental result showed that DA2-NCA-eGTB model outperformed existing AI models in all eight modelled crime types. This was proven by the smaller values of Mean Absolute Percentage Error (MAPE), which was between 2.9195 and 18.7471. As a conclusion, the study showed that DA2-NCA-eGTB model is statistically significant in representing all crime types and it is able to handle the nonlinear component in limited crime rate data well

Universiti Teknologi Malaysia Institutional Repository

Learning from Heterogeneous Sources via Gradient Boosting Consensus

Author: David Grangier
Jean-francois Paiement
Philip S. Yu
Xiaoxiao Shi
Publication venue
Publication date: 01/01/2012
Field of study

Multiple data sources containing different types of features may be available for a given task. For instance, users ’ profiles can be used to build recommendation systems. In addition, a model can also use users ’ historical behaviors and social networks to infer users ’ interests on related products. We argue that it is desirable to collectively use any available multiple heterogeneous data sources in order to build effective learning models. We call this framework heterogeneous learning. In our proposed setting, data sources can include (i) nonoverlapping features, (ii) non-overlapping instances, and (iii) multiple networks (i.e. graphs) that connect instances. In this paper, we propose a general optimization framework for heterogeneous learning, and devise a corresponding learning model from gradient boosting. The idea is to minimize the empirical loss with two constraints: (1) There should be consensus among the predictions of overlapping instances (if any) from different data sources; (2) Connected instances in graph datasets may have similar predictions. The objective function is solved by stochastic gradient boosting trees. Furthermore, a weighting strategy is designed to emphasize informative data sources, and deemphasize the noisy ones. We formally prove that the proposed strategy leads to a tighter error bound. This approach consistently outperforms a standard concatenation of data sources on movie rating prediction, number recognition and terrorist attack detection tasks. We observe that the proposed model can improve out-of-sample error rate by as much as 80%.

CiteSeerX

Crossref

Advances in knowledge discovery and data mining Part II

Author: CAO Tru
CHEUNG David Wai-Lok
HO Tu-Bao
LIM Ee Peng
MOTODA Hiroshi
ZHOU Zhi-Hua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

Institutional Knowledge at Singapore Management University

HKU Scholars Hub