2,069 research outputs found

    Log file analysis for disengagement detection in e-Learning environments

    Get PDF

    Random Forests for Big Data

    Get PDF
    Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to "divide-and-conquer" approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations

    A Classification System for Diabetic Patients with Machine Learning Techniques

    Get PDF
    International audienceDiabetes mellitus (DM) is a group of metallic disorder characterized by steep levels of blood glucose prolonged over a time. It results the defection in insulin production or improper action of the cells to the insulin produced. It is one of the significant public health care challenge worldwide. Diabetes exists in a body when pancreas does not construct enough hormone insulin or the human body is not being able to use the insulin properly. The diagnosis of diabetes (diagnosis, etiopathophysiology, therapy etc.) need to generate and process the vast amount of data. Data mining techniques have proven its usefulness and effectiveness in order to evaluate the unknown relationships or patterns if exists with such vast data. In the present work, five techniques based on machine learning namely, AdaBoost, LogicBoost, RobustBoost, NaĂŻve Bayes and Bagging have been proposed for the analysis and prediction of DM patients. The proposed techniques are employed on the data set of Pima Indians Diabetes patients. The results computed are found to be very accurate with classification accuracy of 81.77% and 79.69% by bagging and AdaBoost techniques, respectively. Hence, the proposed techniques employed here are highly adorable, effective and efficient in order to predict the DM

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Statistics in the Big Data era

    Get PDF
    It is estimated that about 90% of the currently available data have been produced over the last two years. Of these, only 0.5% is effectively analysed and used. However, this data can be a great wealth, the oil of 21st century, when analysed with the right approach. In this article, we illustrate some specificities of these data and the great interest that they can represent in many fields. Then we consider some challenges to statistical analysis that emerge from their analysis, suggesting some strategies

    Review—Machine Learning Techniques in Wireless Sensor Network Based Precision Agriculture

    Get PDF
    The use of sensors and the Internet of Things (IoT) is key to moving the world\u27s agriculture to a more productive and sustainable path. Recent advancements in IoT, Wireless Sensor Networks (WSN), and Information and Communication Technology (ICT) have the potential to address some of the environmental, economic, and technical challenges as well as opportunities in this sector. As the number of interconnected devices continues to grow, this generates more big data with multiple modalities and spatial and temporal variations. Intelligent processing and analysis of this big data are necessary to developing a higher level of knowledge base and insights that results in better decision making, forecasting, and reliable management of sensors. This paper is a comprehensive review of the application of different machine learning algorithms in sensor data analytics within the agricultural ecosystem. It further discusses a case study on an IoT based data-driven smart farm prototype as an integrated food, energy, and water (FEW) system

    A Review of Machine Learning Approaches for Real Estate Valuation

    Get PDF
    Real estate managers must identify the value for properties in their current market. Traditionally, this involved simple data analysis with adjustments made based on manager’s experience. Given the amount of money currently involved in these decisions, and the complexity and speed at which valuation decisions must be made, machine learning technologies provide a newer alternative for property valuation that could improve upon traditional methods. This study utilizes a systematic literature review methodology to identify published studies from the past two decades where specific machine learning technologies have been applied to the property valuation task. We develop a data, reasoning, usefulness (DRU) framework that provides a set of theoretical and practice-based criteria for a multi-faceted performance assessment for each system. This assessment provides the basis for identifying the current state of research in this domain as well as theoretical and practical implications and directions for future research
    • 

    corecore