Search CORE

2,477 research outputs found

Machine Learning Algorithms for Smart Data Analysis in Internet of Things Environment: Taxonomies and Research Trends

Author: Anabi Hilary Kelechi
Khalid Yahya
MOHAMMED H. ALSHARIF
Shehzad Ashraf Chaudhry
Publication venue
Publication date: 01/01/2020
Field of study

Machine learning techniques will contribution towards making Internet of Things (IoT) symmetric applications among the most significant sources of new data in the future. In this context, network systems are endowed with the capacity to access varieties of experimental symmetric data across a plethora of network devices, study the data information, obtain knowledge, and make informed decisions based on the dataset at its disposal. This study is limited to supervised and unsupervised machine learning (ML) techniques, regarded as the bedrock of the IoT smart data analysis. This study includes reviews and discussions of substantial issues related to supervised and unsupervised machine learning techniques, highlighting the advantages and limitations of each algorithm, and discusses the research trends and recommendations for further study

Covenant University Repository

Machine Learning for Detection of Cognitive Impairment

Author: Diaz Valeria
Rodríguez Guillermo Horacio
Publication venue: Budapest Tech
Publication date: 01/03/2022
Field of study

The detection of cognitive problems, especially in the early stages, is critical and the method by which it is diagnosed is manual and depends on one or more specialist doctors, to diagnose it as the cognitive decline escalates into the early stage of dementia, e.g., Alzheimer's disease (AD). The early stages of AD are very similar to Mild Cognitive Impairment (MCI); it is essential to identify the possible factors associated with the disease. This research aims to demonstrate that automated models can differentiate and classify MCI and AD in the early stages. The present research used a combination of Machine Learning (ML) algorithms to identify AD, using gene expressions. The algorithms used for the classification of cognitive problems and healthy people (control) were: Linear Regression, Decision Trees (DT), Naîve Bayes (NB) and Deep Learning (DP). The result of this research shows ML algorithms can identify AD, in early stages, with an 80% accuracy, using a Deep Learning (DL) algorithm.Fil: Diaz, Valeria. Universidad de Palermo. Facultad de Ingeniería; ArgentinaFil: Rodríguez, Guillermo Horacio. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Sistemas Tandil; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentin

CONICET Digital

A Study of Text Mining Framework for Automated Classification of Software Requirements in Enterprise Systems

Author
Publication venue
Publication date: 01/01/2016
Field of study

abstract: Text Classification is a rapidly evolving area of Data Mining while Requirements Engineering is a less-explored area of Software Engineering which deals the process of defining, documenting and maintaining a software system's requirements. When researchers decided to blend these two streams in, there was research on automating the process of classification of software requirements statements into categories easily comprehensible for developers for faster development and delivery, which till now was mostly done manually by software engineers - indeed a tedious job. However, most of the research was focused on classification of Non-functional requirements pertaining to intangible features such as security, reliability, quality and so on. It is indeed a challenging task to automatically classify functional requirements, those pertaining to how the system will function, especially those belonging to different and large enterprise systems. This requires exploitation of text mining capabilities. This thesis aims to investigate results of text classification applied on functional software requirements by creating a framework in R and making use of algorithms and techniques like k-nearest neighbors, support vector machine, and many others like boosting, bagging, maximum entropy, neural networks and random forests in an ensemble approach. The study was conducted by collecting and visualizing relevant enterprise data manually classified previously and subsequently used for training the model. Key components for training included frequency of terms in the documents and the level of cleanliness of data. The model was applied on test data and validated for analysis, by studying and comparing parameters like precision, recall and accuracy.Dissertation/ThesisMasters Thesis Engineering 201

ASU Digital Repository

Big data analytics for preventive medicine

Author: Imran M
Razzak MI
Xu G
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

© 2019, Springer-Verlag London Ltd., part of Springer Nature. Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations

Deakin Research Online

OPUS - University of Technology Sydney

Federation ResearchOnline

Data Mining in Internet of Things Systems: A Literature Review

Author: M. Sarhan Amany
Publication venue: Arab Journals Platform
Publication date: 05/10/2023
Field of study

The Internet of Things (IoT) and cloud technologies have been the main focus of recent research, allowing for the accumulation of a vast amount of data generated from this diverse environment. These data include without any doubt priceless knowledge if could correctly discovered and correlated in an efficient manner. Data mining algorithms can be applied to the Internet of Things (IoT) to extract hidden information from the massive amounts of data that are generated by IoT and are thought to have high business value. In this paper, the most important data mining approaches covering classification, clustering, association analysis, time series analysis, and outlier analysis from the knowledge will be covered. Additionally, a survey of recent work in in this direction is included. Another significant challenges in the field are collecting, storing, and managing the large number of devices along with their associated features. In this paper, a deep look on the data mining for the IoT platforms will be given concentrating on real applications found in the literatur

Arab Journals Platform

Abstraction, aggregation and recursion for generating accurate and simple classifiers

Author: Kang Dae-Ki
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2006
Field of study

An important goal of inductive learning is to generate accurate and compact classifiers from data. In a typical inductive learning scenario, instances in a data set are simply represented as ordered tuples of attribute values. In our research, we explore three methodologies to improve the accuracy and compactness of the classifiers: abstraction, aggregation, and recursion;Firstly, abstraction is aimed at the design and analysis of algorithms that generate and deal with taxonomies for the construction of compact and robust classifiers. In many applications of the data-driven knowledge discovery process, taxonomies have been shown to be useful in constructing compact, robust, and comprehensible classifiers. However, in many application domains, human-designed taxonomies are unavailable. We introduce algorithms for automated construction of taxonomies inductively from both structured (such as UCI Repository) and unstructured (such as text and biological sequences) data. We introduce AVT-Learner, an algorithm for automated construction of attribute value taxonomies (AVT) from data, and Word Taxonomy Learner (WTL), an algorithm for automated construction of word taxonomy from text and sequence data. We describe experiments on the UCI data sets and compare the performance of AVT-NBL (an AVT-guided Naive Bayes Learner) with that of the standard Naive Bayes Learner (NBL). Our results show that the AVTs generated by AVT-Learner are compeitive with human-generated AVTs (in cases where such AVTs are available). AVT-NBL using AVTs generated by AVT-Learner achieves classification accuracies that are comparable to or higher than those obtained by NBL; and the resulting classifiers are significantly more compact than those generated by NBL. Similarly, our experimental results of WTL and WTNBL on protein localization sequences and Reuters newswire text categorization data sets show that the proposed algorithms can generate Naive Bayes classifiers that are more compact and often more accurate than those produced by standard Naive Bayes learner for the Multinomial Model;Secondly, we apply aggregation to construct features as a multiset of values for the intrusion detection task. For this task, we propose a bag of system calls representation for system call traces and describe misuse and anomaly detection results on the University of New Mexico (UNM) and MIT Lincoln Lab (MIT LL) system call sequences with the proposed representation. With the feature representation as input, we compare the performance of several machine learning techniques for misuse detection and show experimental results on anomaly detection. The results show that standard machine learning and clustering techniques using the simple bag of system calls representation based on the system call traces generated by the operating system\u27s kernel is effective and often performs better than approaches that use foreign contiguous sequences in detecting intrusive behaviors of compromised processes;Finally, we construct a set of classifiers by recursive application of the Naive Bayes learning algorithms. Naive Bayes (NB) classifier relies on the assumption that the instances in each class can be described by a single generative model. This assumption can be restrictive in many real world classification tasks. We describe recursive Naive Bayes learner (RNBL), which relaxes this assumption by constructing a tree of Naive Bayes classifiers for sequence classification, where each individual NB classifier in the tree is based on an event model (one model for each class at each node in the tree). In our experiments on protein sequences, Reuters newswire documents and UC-Irvine benchmark data sets, we observe that RNBL substantially outperforms NB classifier. Furthermore, our experiments on the protein sequences and the text documents show that RNBL outperforms C4.5 decision tree learner (using tests on sequence composition statistics as the splitting criterion) and yields accuracies that are comparable to those of support vector machines (SVM) using similar information

Digital Repository @ Iowa State University (ISU)

Text Classification: A Review, Empirical, and Experimental Evaluation

Author: Taha Aya
Taha Kamal
Yeun Chan
Yoo Paul D.
Publication venue
Publication date: 11/01/2024
Field of study

The explosive and widespread growth of data necessitates the use of text classification to extract crucial information from vast amounts of data. Consequently, there has been a surge of research in both classical and deep learning text classification methods. Despite the numerous methods proposed in the literature, there is still a pressing need for a comprehensive and up-to-date survey. Existing survey papers categorize algorithms for text classification into broad classes, which can lead to the misclassification of unrelated algorithms and incorrect assessments of their qualities and behaviors using the same metrics. To address these limitations, our paper introduces a novel methodological taxonomy that classifies algorithms hierarchically into fine-grained classes and specific techniques. The taxonomy includes methodology categories, methodology techniques, and methodology sub-techniques. Our study is the first survey to utilize this methodological taxonomy for classifying algorithms for text classification. Furthermore, our study also conducts empirical evaluation and experimental comparisons and rankings of different algorithms that employ the same specific sub-technique, different sub-techniques within the same technique, different techniques within the same category, and categorie

arXiv.org e-Print Archive

Evaluation of Various Algorithms' Performance in Supervised Binary Classification for Occupant Detection Using a Dataset from a Residential Building

Author: Andersen Kamilla Heimar
Heiselberg Per Kvols
Johra Hicham
Marszal-Pomianowska Anna
O'Brien William
Schaffer Markus
Publication venue: Department of the Built Environment, Aalborg University
Publication date: 30/09/2023
Field of study

This technical report describes the evaluation process of various machine learning algorithms' performance used for supervised binary classification for occupant detection, using a dataset from a residential building in the North of Denmark

VBN