70,650 research outputs found
Exploiting context when learning to classify
This paper addresses the problem of classifying observations when
features are context-sensitive, specifically when the testing set involves a context
that is different from the training set. The paper begins with a precise definition of
the problem, then general strategies are presented for enhancing the performance
of classification algorithms on this type of problem. These strategies are tested on
two domains. The first domain is the diagnosis of gas turbine engines. The
problem is to diagnose a faulty engine in one context, such as warm weather,
when the fault has previously been seen only in another context, such as cold
weather. The second domain is speech recognition. The problem is to recognize
words spoken by a new speaker, not represented in the training set. For both
domains, exploiting context results in substantially more accurate classification
Robust classification with context-sensitive features
This paper addresses the problem of classifying observations when features are context-sensitive, especially when the testing set involves a context that is different from the training set. The paper begins with a precise definition of the problem, then general strategies are presented for enhancing the performance of classification algorithms on this type of problem. These strategies are tested on three domains. The first domain is the diagnosis of gas turbine engines. The problem is to diagnose a faulty engine in one context, such as warm weather, when the fault has previously been seen only in another context, such as cold weather. The second domain is speech recognition. The context is given by the identity of the speaker. The problem is to recognize words spoken by a new speaker, not represented in the training set. The third domain is medical prognosis. The problem is to predict whether a patient with hepatitis will live or die. The context is the age of the patient. For all three domains, exploiting context results in substantially more accurate classification
PerfWeb: How to Violate Web Privacy with Hardware Performance Events
The browser history reveals highly sensitive information about users, such as
financial status, health conditions, or political views. Private browsing modes
and anonymity networks are consequently important tools to preserve the privacy
not only of regular users but in particular of whistleblowers and dissidents.
Yet, in this work we show how a malicious application can infer opened websites
from Google Chrome in Incognito mode and from Tor Browser by exploiting
hardware performance events (HPEs). In particular, we analyze the browsers'
microarchitectural footprint with the help of advanced Machine Learning
techniques: k-th Nearest Neighbors, Decision Trees, Support Vector Machines,
and in contrast to previous literature also Convolutional Neural Networks. We
profile 40 different websites, 30 of the top Alexa sites and 10 whistleblowing
portals, on two machines featuring an Intel and an ARM processor. By monitoring
retired instructions, cache accesses, and bus cycles for at most 5 seconds, we
manage to classify the selected websites with a success rate of up to 86.3%.
The results show that hardware performance events can clearly undermine the
privacy of web users. We therefore propose mitigation strategies that impede
our attacks and still allow legitimate use of HPEs
Quantum-inspired Machine Learning on high-energy physics data
Tensor Networks, a numerical tool originally designed for simulating quantum
many-body systems, have recently been applied to solve Machine Learning
problems. Exploiting a tree tensor network, we apply a quantum-inspired machine
learning technique to a very important and challenging big data problem in high
energy physics: the analysis and classification of data produced by the Large
Hadron Collider at CERN. In particular, we present how to effectively classify
so-called b-jets, jets originating from b-quarks from proton-proton collisions
in the LHCb experiment, and how to interpret the classification results. We
exploit the Tensor Network approach to select important features and adapt the
network geometry based on information acquired in the learning process.
Finally, we show how to adapt the tree tensor network to achieve optimal
precision or fast response in time without the need of repeating the learning
process. These results pave the way to the implementation of high-frequency
real-time applications, a key ingredient needed among others for current and
future LHCb event classification able to trigger events at the tens of MHz
scale.Comment: 13 pages, 4 figure
Distributed Online Big Data Classification Using Context Information
Distributed, online data mining systems have emerged as a result of
applications requiring analysis of large amounts of correlated and
high-dimensional data produced by multiple distributed data sources. We propose
a distributed online data classification framework where data is gathered by
distributed data sources and processed by a heterogeneous set of distributed
learners which learn online, at run-time, how to classify the different data
streams either by using their locally available classification functions or by
helping each other by classifying each other's data. Importantly, since the
data is gathered at different locations, sending the data to another learner to
process incurs additional costs such as delays, and hence this will be only
beneficial if the benefits obtained from a better classification will exceed
the costs. We model the problem of joint classification by the distributed and
heterogeneous learners from multiple data sources as a distributed contextual
bandit problem where each data is characterized by a specific context. We
develop a distributed online learning algorithm for which we can prove
sublinear regret. Compared to prior work in distributed online data mining, our
work is the first to provide analytic regret results characterizing the
performance of the proposed algorithm
The management of context-sensitive features: A review of strategies
In this paper, we review five heuristic strategies for handling context- sensitive features in supervised machine learning from examples. We discuss two methods for recovering lost (implicit) contextual information. We mention some evidence that hybrid strategies can have a synergetic effect. We then show how the work of several machine learning researchers fits into this framework. While we do not claim that these strategies exhaust the possibilities, it appears that the framework includes all of the techniques that can be found in the published literature on context-sensitive learning
- …