5 research outputs found

    Hybrid dragonfly algorithm with neighbourhood component analysis and gradient tree boosting for crime rates modelling

    Get PDF
    In crime studies, crime rates time series prediction helps in strategic crime prevention formulation and decision making. Statistical models are commonly applied in predicting time series crime rates. However, the time series crime rates data are limited and mostly nonlinear. One limitation in the statistical models is that they are mainly linear and are only able to model linear relationships. Thus, this study proposed a time series crime prediction model that can handle nonlinear components as well as limited historical crime rates data. Recently, Artificial Intelligence (AI) models have been favoured as they are able to handle nonlinear and robust to small sample data components in crime rates. Hence, the proposed crime model implemented an artificial intelligence model namely Gradient Tree Boosting (GTB) in modelling the crime rates. The crime rates are modelled using the United States (US) annual crime rates of eight crime types with nine factors that influence the crime rates. Since GTB has no feature selection, this study proposed hybridisation of Neighbourhood Component Analysis (NCA) and GTB (NCA-GTB) in identifying significant factors that influence the crime rates. Also, it was found that both NCA and GTB are sensitive to input parameter. Thus, DA2-NCA-eGTB model was proposed to improve the NCA-GTB model. The DA2-NCA-eGTB model hybridised a metaheuristic optimisation algorithm namely Dragonfly Algorithm (DA) with NCA-GTB model to optimise NCA and GTB parameters. In addition, DA2-NCA-eGTB model also improved the accuracy of the NCA-GTB model by using Least Absolute Deviation (LAD) as the GTB loss function. The experimental result showed that DA2-NCA-eGTB model outperformed existing AI models in all eight modelled crime types. This was proven by the smaller values of Mean Absolute Percentage Error (MAPE), which was between 2.9195 and 18.7471. As a conclusion, the study showed that DA2-NCA-eGTB model is statistically significant in representing all crime types and it is able to handle the nonlinear component in limited crime rate data well

    Learning from Heterogeneous Sources via Gradient Boosting Consensus

    No full text
    Multiple data sources containing different types of features may be available for a given task. For instance, users ’ profiles can be used to build recommendation systems. In addition, a model can also use users ’ historical behaviors and social networks to infer users ’ interests on related products. We argue that it is desirable to collectively use any available multiple heterogeneous data sources in order to build effective learning models. We call this framework heterogeneous learning. In our proposed setting, data sources can include (i) nonoverlapping features, (ii) non-overlapping instances, and (iii) multiple networks (i.e. graphs) that connect instances. In this paper, we propose a general optimization framework for heterogeneous learning, and devise a corresponding learning model from gradient boosting. The idea is to minimize the empirical loss with two constraints: (1) There should be consensus among the predictions of overlapping instances (if any) from different data sources; (2) Connected instances in graph datasets may have similar predictions. The objective function is solved by stochastic gradient boosting trees. Furthermore, a weighting strategy is designed to emphasize informative data sources, and deemphasize the noisy ones. We formally prove that the proposed strategy leads to a tighter error bound. This approach consistently outperforms a standard concatenation of data sources on movie rating prediction, number recognition and terrorist attack detection tasks. We observe that the proposed model can improve out-of-sample error rate by as much as 80%.

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
    corecore