Search CORE

2,138 research outputs found

Log file analysis for disengagement detection in e-Learning environments

Author: Cocea Mihaela
Weibelzahl S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2009
Field of study

Portsmouth University Research Portal (Pure)

Comparing Machine Learning Strategies for SoH Estimation of Lithium-Ion Batteries Using a Feature-Based Approach

Author: Cristaldi L
Faifer M
Marri I
Petkovski E
Publication venue
Publication date: 01/01/2023
Field of study

Lithium-ion batteries play a vital role in many systems and applications, making them the most commonly used battery energy storage systems. Optimizing their usage requires accurate state-of-health (SoH) estimation, which provides insight into the performance level of the battery and improves the precision of other diagnostic measures, such as state of charge. In this paper, the classical machine learning (ML) strategies of multiple linear and polynomial regression, support vector regression (SVR), and random forest are compared for the task of battery SoH estimation. These ML strategies were selected because they represent a good compromise between light computational effort, applicability, and accuracy of results. The best results were produced using SVR, followed closely by multiple linear regression. This paper also discusses the feature selection process based on the partial charging time between different voltage intervals and shows the linear dependence of these features with capacity reduction. The feature selection, parameter tuning, and performance evaluation of all models were completed using a dataset from the Prognostics Center of Excellence at NASA, considering three batteries in the dataset

Archivio istituzionale della ricerca - Politecnico di Milano

Random Forests for Big Data

Author: Genuer Robin
Poggi Jean-Michel
Tuleau-Malot Christine
Villa-Vialaneix Nathalie
Publication venue
Publication date: 19/11/2015
Field of study

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to "divide-and-conquer" approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

ProdInra

Hal-Diderot

A Classification System for Diabetic Patients with Machine Learning Techniques

Author: Rawat Vandana
Suryakant Suryakant
Publication venue: 'International Journal of Mathematical, Engineering and Management Sciences plus Mangey Ram'
Publication date: 01/06/2019
Field of study

International audienceDiabetes mellitus (DM) is a group of metallic disorder characterized by steep levels of blood glucose prolonged over a time. It results the defection in insulin production or improper action of the cells to the insulin produced. It is one of the significant public health care challenge worldwide. Diabetes exists in a body when pancreas does not construct enough hormone insulin or the human body is not being able to use the insulin properly. The diagnosis of diabetes (diagnosis, etiopathophysiology, therapy etc.) need to generate and process the vast amount of data. Data mining techniques have proven its usefulness and effectiveness in order to evaluate the unknown relationships or patterns if exists with such vast data. In the present work, five techniques based on machine learning namely, AdaBoost, LogicBoost, RobustBoost, Naïve Bayes and Bagging have been proposed for the analysis and prediction of DM patients. The proposed techniques are employed on the data set of Pima Indians Diabetes patients. The results computed are found to be very accurate with classification accuracy of 81.77% and 79.69% by bagging and AdaBoost techniques, respectively. Hence, the proposed techniques employed here are highly adorable, effective and efficient in order to predict the DM

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Crossref

Access to Research at National University of Ireland, Galway

Review—Machine Learning Techniques in Wireless Sensor Network Based Precision Agriculture

Author: Bhansali Shekhar
Burton Lamar
Mekonnen Yemeserach
Namuduri Srikanth
Sarwat Arif I.
Publication venue: FIU Digital Commons
Publication date: 19/12/2019
Field of study

The use of sensors and the Internet of Things (IoT) is key to moving the world\u27s agriculture to a more productive and sustainable path. Recent advancements in IoT, Wireless Sensor Networks (WSN), and Information and Communication Technology (ICT) have the potential to address some of the environmental, economic, and technical challenges as well as opportunities in this sector. As the number of interconnected devices continues to grow, this generates more big data with multiple modalities and spatial and temporal variations. Intelligent processing and analysis of this big data are necessary to developing a higher level of knowledge base and insights that results in better decision making, forecasting, and reliable management of sensors. This paper is a comprehensive review of the application of different machine learning algorithms in sensor data analytics within the agricultural ecosystem. It further discusses a case study on an IoT based data-driven smart farm prototype as an integrated food, energy, and water (FEW) system

DigitalCommons@Florida International University

Statistics in the Big Data era

Author: DI CIACCIO AGOSTINO
GIORGI Giovanni Maria
Publication venue: CLEUP
Publication date: 01/01/2016
Field of study

It is estimated that about 90% of the currently available data have been produced over the last two years. Of these, only 0.5% is effectively analysed and used. However, this data can be a great wealth, the oil of 21st century, when analysed with the right approach. In this article, we illustrate some specificities of these data and the great interest that they can represent in many fields. Then we consider some challenges to statistical analysis that emerge from their analysis, suggesting some strategies

Archivio della ricerca- Università di Roma La Sapienza

A Review of Machine Learning Approaches for Real Estate Valuation

Author: Huang Yu-Hsiang (John)
ROOT THOMAS H
Strader Troy J
Publication venue: AIS Electronic Library (AISeL)
Publication date: 18/07/2023
Field of study

Real estate managers must identify the value for properties in their current market. Traditionally, this involved simple data analysis with adjustments made based on manager’s experience. Given the amount of money currently involved in these decisions, and the complexity and speed at which valuation decisions must be made, machine learning technologies provide a newer alternative for property valuation that could improve upon traditional methods. This study utilizes a systematic literature review methodology to identify published studies from the past two decades where specific machine learning technologies have been applied to the property valuation task. We develop a data, reasoning, usefulness (DRU) framework that provides a set of theoretical and practice-based criteria for a multi-faceted performance assessment for each system. This assessment provides the basis for identifying the current state of research in this domain as well as theoretical and practical implications and directions for future research

AIS Electronic Library (AISeL)