43,103 research outputs found
Multi-Sensor Event Detection using Shape Histograms
Vehicular sensor data consists of multiple time-series arising from a number
of sensors. Using such multi-sensor data we would like to detect occurrences of
specific events that vehicles encounter, e.g., corresponding to particular
maneuvers that a vehicle makes or conditions that it encounters. Events are
characterized by similar waveform patterns re-appearing within one or more
sensors. Further such patterns can be of variable duration. In this work, we
propose a method for detecting such events in time-series data using a novel
feature descriptor motivated by similar ideas in image processing. We define
the shape histogram: a constant dimension descriptor that nevertheless captures
patterns of variable duration. We demonstrate the efficacy of using shape
histograms as features to detect events in an SVM-based, multi-sensor,
supervised learning scenario, i.e., multiple time-series are used to detect an
event. We present results on real-life vehicular sensor data and show that our
technique performs better than available pattern detection implementations on
our data, and that it can also be used to combine features from multiple
sensors resulting in better accuracy than using any single sensor. Since
previous work on pattern detection in time-series has been in the single series
context, we also present results using our technique on multiple standard
time-series datasets and show that it is the most versatile in terms of how it
ranks compared to other published results
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
A systematic review of data quality issues in knowledge discovery tasks
Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafĂo mas fundamental es la exploraciĂłn de los grandes volĂşmenes de datos y la extracciĂłn de conocimiento Ăştil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisiĂłn sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrĂcola conocida como la roya del cafĂ©.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust
Machine learning from coronas using parametrization of images
We were interested to develop an algorithm for detection of coronas of people in altered states of consciousness (two-classes problem). Such coronas are known to have rings (double coronas), special branch-like structure of streamers and/or curious spots. We used several approaches to parametrization of images and various machine learning algorithms. We compared results of computer algorithms with the human expert’s accuracy. Results show that computer algorithms can achieve the same or even better accuracy than that of human experts
Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians
This paper presents a general and efficient framework for probabilistic
inference and learning from arbitrary uncertain information. It exploits the
calculation properties of finite mixture models, conjugate families and
factorization. Both the joint probability density of the variables and the
likelihood function of the (objective or subjective) observation are
approximated by a special mixture model, in such a way that any desired
conditional distribution can be directly obtained without numerical
integration. We have developed an extended version of the expectation
maximization (EM) algorithm to estimate the parameters of mixture models from
uncertain training examples (indirect observations). As a consequence, any
piece of exact or uncertain information about both input and output values is
consistently handled in the inference and learning stages. This ability,
extremely useful in certain situations, is not found in most alternative
methods. The proposed framework is formally justified from standard
probabilistic principles and illustrative examples are provided in the fields
of nonparametric pattern classification, nonlinear regression and pattern
completion. Finally, experiments on a real application and comparative results
over standard databases provide empirical evidence of the utility of the method
in a wide range of applications
A traffic classification method using machine learning algorithm
Applying concepts of attack investigation in IT industry, this idea has been developed to design
a Traffic Classification Method using Data Mining techniques at the intersection of Machine
Learning Algorithm, Which will classify the normal and malicious traffic. This classification will
help to learn about the unknown attacks faced by IT industry. The notion of traffic classification
is not a new concept; plenty of work has been done to classify the network traffic for
heterogeneous application nowadays. Existing techniques such as (payload based, port based
and statistical based) have their own pros and cons which will be discussed in this
literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
Disease modeling using Evolved Discriminate Function
Precocious diagnosis increases the survival time and patient quality of life. It is a binary classification, exhaustively studied in the literature. This paper innovates proposing the application of genetic programming to obtain a discriminate function. This function contains the disease dynamics used to classify the patients with as little false negative diagnosis as possible. If its value is greater than zero then it means that the patient is ill, otherwise healthy. A graphical representation is proposed to show the influence of each dataset attribute in the discriminate function. The experiment deals with Breast Cancer and Thrombosis & Collagen diseases diagnosis. The main conclusion is that the discriminate function is able to classify the patient using numerical clinical data, and the graphical representation displays patterns that allow understanding of the model
- …