3,563 research outputs found
Speeding up the decision making of Support Vector Classifier
Abstract In this paper, we propose a new approach for speeding up the decision making of Support Vector Classifiers (SVC
Learning Dynamic Feature Selection for Fast Sequential Prediction
We present paired learning and inference algorithms for significantly
reducing computation and increasing speed of the vector dot products in the
classifiers that are at the heart of many NLP components. This is accomplished
by partitioning the features into a sequence of templates which are ordered
such that high confidence can often be reached using only a small fraction of
all features. Parameter estimation is arranged to maximize accuracy and early
confidence in this sequence. Our approach is simpler and better suited to NLP
than other related cascade methods. We present experiments in left-to-right
part-of-speech tagging, named entity recognition, and transition-based
dependency parsing. On the typical benchmarking datasets we can preserve POS
tagging accuracy above 97% and parsing LAS above 88.5% both with over a
five-fold reduction in run-time, and NER F1 above 88 with more than 2x increase
in speed.Comment: Appears in The 53rd Annual Meeting of the Association for
Computational Linguistics, Beijing, China, July 201
Machine learning assists the classification of reports by citizens on disease-carrying mosquitoes
Mosquito Alert (www.mosquitoalert.com/en) is an expert-validated citizen science platform for tracking and controlling disease-carrying mosquitoes. Citizens download a free app and use their phones to send reports of presumed sightings of two world-wide disease vector
mosquito species (the Asian Tiger and the Yellow Fever mosquito). These reports are then supervised by a team of entomologists and, once validated, added to a database. As the platform prepares to scale to much larger geographical areas and user bases, the expert validation by entomologists becomes the main bottleneck. In this paper we describe the use of machine learning on the citizen reports to automatically validate a fraction of them, therefore allowing the entomologists either to deal with larger report streams or to concentrate on those that are more strategic, such as reports from new areas (so that early warning protocols are activated) or from areas with high epidemiological risks (so that control actions to reduce mosquito populations are activated). The current prototype flags a third of the reports as “almost certainly positive” with high confidence. It is currently being integrated into the main workflow of the Mosquito Alert platform.Postprint (published version
On The Stability of Interpretable Models
Interpretable classification models are built with the purpose of providing a
comprehensible description of the decision logic to an external oversight
agent. When considered in isolation, a decision tree, a set of classification
rules, or a linear model, are widely recognized as human-interpretable.
However, such models are generated as part of a larger analytical process. Bias
in data collection and preparation, or in model's construction may severely
affect the accountability of the design process. We conduct an experimental
study of the stability of interpretable models with respect to feature
selection, instance selection, and model selection. Our conclusions should
raise awareness and attention of the scientific community on the need of a
stability impact assessment of interpretable models
Active Learning for Dialogue Act Classification
Active learning techniques were employed for classification of dialogue acts over two dialogue corpora, the English human-human Switchboard corpus and the Spanish human-machine Dihana corpus. It is shown clearly that active learning improves on a baseline obtained through a passive learning approach to tagging the same data sets. An error reduction of 7% was obtained on Switchboard, while a factor 5 reduction in the amount of labeled data needed for classification was achieved on Dihana. The passive Support Vector Machine learner used as baseline in itself significantly improves the state of the art in dialogue act classification on both corpora. On Switchboard it gives a 31% error reduction compared to the previously best reported result
- …