76,347 research outputs found
How Short is a Piece of String?: the Impact of Text Length and Text Augmentation on Short-text Classification Accuracy
Recent increases in the use and availability of short messages have created opportunities to harvest vast amounts of information through machine-based classification. However, traditional classification methods have failed to yield accuracies comparable to classification accuracies on longer texts. Several approaches have previously been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data supplementation sources. Existing literature does not precisely describe the impact of text length on classification performance. This work quantitatively examines the changes in accuracy of a small selection of classifiers using a variety of enhancement methods, as text length progressively decreases. Findings, based on ANOVA testing at a 95% confidence interval, suggest that the performance of classifiers using simple enhancements decreases with decreasing text length, but that the use of more sophisticated enhancements risks over-supplementation of the text and consequent concept drift and classification performance decrease as text length increases
Dynamic Metric Learning from Pairwise Comparisons
Recent work in distance metric learning has focused on learning
transformations of data that best align with specified pairwise similarity and
dissimilarity constraints, often supplied by a human observer. The learned
transformations lead to improved retrieval, classification, and clustering
algorithms due to the better adapted distance or similarity measures. Here, we
address the problem of learning these transformations when the underlying
constraint generation process is nonstationary. This nonstationarity can be due
to changes in either the ground-truth clustering used to generate constraints
or changes in the feature subspaces in which the class structure is apparent.
We propose Online Convex Ensemble StrongLy Adaptive Dynamic Learning (OCELAD),
a general adaptive, online approach for learning and tracking optimal metrics
as they change over time that is highly robust to a variety of nonstationary
behaviors in the changing metric. We apply the OCELAD framework to an ensemble
of online learners. Specifically, we create a retro-initialized composite
objective mirror descent (COMID) ensemble (RICE) consisting of a set of
parallel COMID learners with different learning rates, demonstrate RICE-OCELAD
on both real and synthetic data sets and show significant performance
improvements relative to previously proposed batch and online distance metric
learning algorithms.Comment: to appear Allerton 2016. arXiv admin note: substantial text overlap
with arXiv:1603.0367
Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams
The last decade has seen a surge of interest in adaptive learning algorithms
for data stream classification, with applications ranging from predicting ozone
level peaks, learning stock market indicators, to detecting computer security
violations. In addition, a number of methods have been developed to detect
concept drifts in these streams. Consider a scenario where we have a number of
classifiers with diverse learning styles and different drift detectors.
Intuitively, the current 'best' (classifier, detector) pair is application
dependent and may change as a result of the stream evolution. Our research
builds on this observation. We introduce the \mbox{Tornado} framework that
implements a reservoir of diverse classifiers, together with a variety of drift
detection algorithms. In our framework, all (classifier, detector) pairs
proceed, in parallel, to construct models against the evolving data streams. At
any point in time, we select the pair which currently yields the best
performance. We further incorporate two novel stacking-based drift detection
methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The
experimental evaluation confirms that the current 'best' (classifier, detector)
pair is not only heavily dependent on the characteristics of the stream, but
also that this selection evolves as the stream flows. Further, our
\mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion
while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure
Exploiting Evolution for an Adaptive Drift-Robust Classifier in Chemical Sensing
Gas chemical sensors are strongly affected by drift, i.e., changes in sensors' response with time, that may turn statistical models commonly used for classification completely useless after a period of time. This paper presents a new classifier that embeds an adaptive stage able to reduce drift effects. The proposed system exploits a state-of-the-art evolutionary strategy to iteratively tweak the coefficients of a linear transformation able to transparently transform raw measures in order to mitigate the negative effects of the drift. The system operates continuously. The optimal correction strategy is learnt without a-priori models or other hypothesis on the behavior of physical-chemical sensors. Experimental results demonstrate the efficacy of the approach on a real problem
Classification of Systematic Measurement Errors within the Framework of Robust Data Reconciliation
A robust data reconciliation strategy provides unbiased variable estimates in the presence of a moderate quantity of atypical measurements. However, estimates get worse if systematic measurement errors that persist in time (e.g., biases and drifts) are undetected and the breakdown point of the robust strategy is surpassed. The detection and classification of those errors allow taking corrective actions on the inputs of the robust data reconciliation that preserve the instrumentation system redundancy while the faulty sensor is repaired. In this work, a new methodology for variable estimation and systematic error classification, which is based on the concepts of robust statistics, is presented. It has been devised to be part of the real-time optimization loop of an industrial plant; therefore, it runs for processes operating under steady-state conditions. The robust measurement test is proposed in this article and used to detect the presence of sporadic and continuous systematic errors. Also, the robust linear regression of the data contained in a moving window is applied to classify the continuous errors as biases or drifts. Results highlight the performance of the proposed methodology to detect and classify outliers, biases, and drifts for linear and nonlinear benchmarks.Fil: Llanos, Claudia Elizabeth. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - BahĂa Blanca. Planta Piloto de IngenierĂa QuĂmica. Universidad Nacional del Sur. Planta Piloto de IngenierĂa QuĂmica; ArgentinaFil: Sanchez, Mabel Cristina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - BahĂa Blanca. Planta Piloto de IngenierĂa QuĂmica. Universidad Nacional del Sur. Planta Piloto de IngenierĂa QuĂmica; ArgentinaFil: Maronna, Ricardo Antonio. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Departamento de Matemáticas; Argentin
Online Tool Condition Monitoring Based on Parsimonious Ensemble+
Accurate diagnosis of tool wear in metal turning process remains an open
challenge for both scientists and industrial practitioners because of
inhomogeneities in workpiece material, nonstationary machining settings to suit
production requirements, and nonlinear relations between measured variables and
tool wear. Common methodologies for tool condition monitoring still rely on
batch approaches which cannot cope with a fast sampling rate of metal cutting
process. Furthermore they require a retraining process to be completed from
scratch when dealing with a new set of machining parameters. This paper
presents an online tool condition monitoring approach based on Parsimonious
Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly
flexible principle where both ensemble structure and base-classifier structure
can automatically grow and shrink on the fly based on the characteristics of
data streams. Moreover, the online feature selection scenario is integrated to
actively sample relevant input attributes. The paper presents advancement of a
newly developed ensemble learning algorithm, pENsemble+, where online active
learning scenario is incorporated to reduce operator labelling effort. The
ensemble merging scenario is proposed which allows reduction of ensemble
complexity while retaining its diversity. Experimental studies utilising
real-world manufacturing data streams and comparisons with well known
algorithms were carried out. Furthermore, the efficacy of pENsemble was
examined using benchmark concept drift data streams. It has been found that
pENsemble+ incurs low structural complexity and results in a significant
reduction of operator labelling effort.Comment: this paper has been published by IEEE Transactions on Cybernetic
- …