55 research outputs found

    Hybrid expert ensembles for identifying unreliable data in citizen science

    Get PDF
    Citizen science utilises public resources for scientific research. BirdTrack is such a project established in 2004 by the British Trust for Ornithology (BTO) for the public to log their bird observations through its web or mobile applications. It has accumulated over 40 million observations. However, the veracity of these observations needs to be checked and the current process involves time-consuming interventions by human experts. This research therefore aims to develop a more efficient system to automatically identify unreliable observations from large volume of records. This paper presents a novel approach — a Hybrid Expert Ensemble System (HEES) that combines an Expert System (ES) and machine induced models to perform the intended task. The ES is built based on human expertise and used as a base member of the ensemble. Other members are decision trees induced from county-based data. The HEES uses accuracy and diversity as criteria to select its members with an aim of improving its accuracy and reliability. The experiments were carried out using the county-based data and the results indicate that (1) the performance of the expert system is reasonable for some counties but varied considerably on others. (2) An HEES is more accurate and reliable than the Expert System and also other individual models, with Sensitivity of 85% for correctly identifying unreliable observations and Specificity of 99% for reliable observations. These results demonstrated that the proposed approach has the ability to be an alternative or additional means to validate the observations in a timely and cost-effective manner and also has a potential to be applied in other citizen science projects where the huge amount of data needs to be checked effectively and efficiently

    Randomness In Tree Ensemble Methods

    Get PDF
    Tree ensembles have proven to be a popular and powerful tool for predictive modeling tasks. The theory behind several of these methods (e.g. boosting) has received considerable attention. However, other tree ensemble techniques (e.g. bagging, random forests) have attracted limited theoretical treatment. Specifically, it has remained somewhat unclear as to why the simple act of randomizing the tree growing algorithm should lead to such dramatic improvements in performance. It has been suggested that a specific type of tree ensemble acts by forming a locally adaptive distance metric [Lin and Jeon, 2006]. We generalize this claim to include all tree ensembles methods and argue that this insight can help to explain the exceptional performance of tree ensemble methods. Finally, we illustrate the use of tree ensemble methods for an ecological niche modeling example involving the presence of malaria vectors in Africa

    Essays On Random Forest Ensembles

    Get PDF
    A random forest is a popular machine learning ensemble method that has proven successful in solving a wide range of classification problems. While other successful classifiers, such as boosting algorithms or neural networks, admit natural interpretations as maximum likelihood, a suitable statistical interpretation is much more elusive for a random forest. In the first part of this thesis, we demonstrate that a random forest is a fruitful framework in which to study AdaBoost and deep neural networks. We explore the concept and utility of interpolation, the ability of a classifier to perfectly fit its training data. In the second part of this thesis, we place a random forest on more sound statistical footing by framing it as kernel regression with the proximity kernel. We then analyze the parameters that control the bandwidth of this kernel and discuss useful generalizations

    Estudio de métodos de construcción de ensembles de clasificadores y aplicaciones

    Get PDF
    La inteligencia artificial se dedica a la creación de sistemas informáticos con un comportamiento inteligente. Dentro de este área el aprendizaje computacional estudia la creación de sistemas que aprenden por sí mismos. Un tipo de aprendizaje computacional es el aprendizaje supervisado, en el cual, se le proporcionan al sistema tanto las entradas como la salida esperada y el sistema aprende a partir de estos datos. Un sistema de este tipo se denomina clasificador. En ocasiones ocurre, que en el conjunto de ejemplos que utiliza el sistema para aprender, el número de ejemplos de un tipo es mucho mayor que el número de ejemplos de otro tipo. Cuando esto ocurre se habla de conjuntos desequilibrados. La combinación de varios clasificadores es lo que se denomina "ensemble", y a menudo ofrece mejores resultados que cualquiera de los miembros que lo forman. Una de las claves para el buen funcionamiento de los ensembles es la diversidad. Esta tesis, se centra en el desarrollo de nuevos algoritmos de construcción de ensembles, centrados en técnicas de incremento de la diversidad y en los problemas desequilibrados. Adicionalmente, se aplican estas técnicas a la solución de varias problemas industriales.Ministerio de Economía y Competitividad, proyecto TIN-2011-2404

    Application of machine learning to agricultural soil data

    Get PDF
    Agriculture is a major sector in the Indian economy. One key advantage of classification and prediction of soil parameters is to save time of specialized technicians developing expensive chemical analysis. In this context, this PhD thesis has been developed in three stages: 1. Classification for soil data: we used chemical soil measurements to classify many relevant soil parameters: village-wise fertility indices; soil pH and type; soil nutrients, in order to recommend suitable amounts of fertilizers; and preferable crop. 2. Regression for generic data: we developed an experimental comparison of many regressors to a large collection of generic datasets selected from the University of California at Irving (UCI) machine learning repository. 3. Regression for soil data: We applied the regressors used in stage 2 to the soil datasets, developing a direct prediction of their numeric values. The accuracy of the prediction was evaluated for the ten soil problems, as an alternative to the prediction of the quantified values (classification) developed in stage 1

    Comparative Uncertainty Visualization for High-Level Analysis of Scalar- and Vector-Valued Ensembles

    Get PDF
    With this thesis, I contribute to the research field of uncertainty visualization, considering parameter dependencies in multi valued fields and the uncertainty of automated data analysis. Like uncertainty visualization in general, both of these fields are becoming more and more important due to increasing computational power, growing importance and availability of complex models and collected data, and progress in artificial intelligence. I contribute in the following application areas: Uncertain Topology of Scalar Field Ensembles. The generalization of topology-based visualizations to multi valued data involves many challenges. An example is the comparative visualization of multiple contour trees, complicated by the random nature of prevalent contour tree layout algorithms. I present a novel approach for the comparative visualization of contour trees - the Fuzzy Contour Tree. Uncertain Topological Features in Time-Dependent Scalar Fields. Tracking features in time-dependent scalar fields is an active field of research, where most approaches rely on the comparison of consecutive time steps. I created a more holistic visualization for time-varying scalar field topology by adapting Fuzzy Contour Trees to the time-dependent setting. Uncertain Trajectories in Vector Field Ensembles. Visitation maps are an intuitive and well-known visualization of uncertain trajectories in vector field ensembles. For large ensembles, visitation maps are not applicable, or only with extensive time requirements. I developed Visitation Graphs, a new representation and data reduction method for vector field ensembles that can be calculated in situ and is an optimal basis for the efficient generation of visitation maps. This is accomplished by bringing forward calculation times to the pre-processing. Visually Supported Anomaly Detection in Cyber Security. Numerous cyber attacks and the increasing complexity of networks and their protection necessitate the application of automated data analysis in cyber security. Due to uncertainty in automated anomaly detection, the results need to be communicated to analysts to ensure appropriate reactions. I introduce a visualization system combining device readings and anomaly detection results: the Security in Process System. To further support analysts I developed an application agnostic framework that supports the integration of knowledge assistance and applied it to the Security in Process System. I present this Knowledge Rocks Framework, its application and the results of evaluations for both, the original and the knowledge assisted Security in Process System. For all presented systems, I provide implementation details, illustrations and applications

    Aggregating Evidence in Climate Science: Consilience, Robustness and the Wisdom of Multiple Models

    Get PDF
    The goal of this dissertation is to contribute to the epistemology of science by addressing a set of related questions arising from current discussions in the philosophy and science of climate change: (1) Given the imperfection of computer models, how do they provide information about large and complex target systems? (2) What is the relationship between consilient reasoning and robust evidential support in the production of scientific knowledge? (3) Does taking the mean of a set of model outputs provide epistemic advantages over using the output of a single ‘best model’? Synthesizing research in philosophy and science, the thesis analyzes connections among consilient inductions, robustness analysis, and the aggregation of various sources of evidence, including computer simulations, by investigating case studies of climate change that exemplify the strength of consilient reasoning and the security of robust evidential support. It also explains the rationale and epistemic conditions for improving estimates by averaging multiple estimates, comparing a simple case of averaging estimates to practices in multi-model ensemble studies. I argue: (A) the concepts of consilience and robustness account for the strength and security of inferences that rely on imperfect computer modelling methods, (B) consilient reasoning is conducive to attaining robust evidential support, and (C) an analogy can explain why averaging the outputs of multiple models can improve estimates of a target system, given that conditions of model independence, skill and unequal weighting are taken into account

    Advances towards behaviour-based indoor robotic exploration

    Get PDF
    215 p.The main contributions of this research work remain in object recognition by computer vision, by one side, and in robot localisation and mapping by the other. The first contribution area of the research address object recognition in mobile robots. In this area, door handle recognition is of great importance, as it help the robot to identify doors in places where the camera is not able to view the whole door. In this research, a new two step algorithm is presented based on feature extraction that aimed at improving the extracted features to reduce the superfluous keypoints to be compared at the same time that it increased its efficiency by improving accuracy and reducing the computational time. Opposite to segmentation based paradigms, the feature extraction based two-step method can easily be generalized to other types of handles or even more, to other type of objects such as road signals. Experiments have shown very good accuracy when tested in real environments with different kind of door handles. With respect to the second contribution, a new technique to construct a topological map during the exploration phase a robot would perform on an unseen office-like environment is presented. Firstly a preliminary approach proposed to merge the Markovian localisation in a distributed system, which requires low storage and computational resources and is adequate to be applied in dynamic environments. In the same area, a second contribution to terrain inspection level behaviour based navigation concerned to the development of an automatic mapping method for acquiring the procedural topological map. The new approach is based on a typicality test called INCA to perform the so called loop-closing action. The method was integrated in a behaviour-based control architecture and tested in both, simulated and real robot/environment system. The developed system proved to be useful also for localisation purpose
    corecore