43,588 research outputs found
Hybrid Approaches for Classification Under Information Acquisition Cost Constraint
The practical use of classification systems may be limited because the current classification systems do not allow decision makers to incorporate cost constraint. For example, in several financial applications (loan approval, credit scoring, etc.) an applicant is asked to submit a processing fee with the application (Mookerjee and Mannino 1997). The processing fee may be used to validate the information entered in the application. From an economic standpoint, it is important that the cost of validating the information not exceed the processing fee. Traditional classification systems do not allow the decision maker to incorporate information acquisition cost constraint. We term the problem of designing a classification system, where information acquisition costs are considered,astheproblemofclassificationwithinformationacquisitioncostconstraint(CIACC). TheCIACCproblemisaNP hard problem and is very difficult to solve to optimality
Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission
By deploying machine-learning algorithms at the network edge, edge learning
can leverage the enormous real-time data generated by billions of mobile
devices to train AI models, which enable intelligent mobile applications. In
this emerging research area, one key direction is to efficiently utilize radio
resources for wireless data acquisition to minimize the latency of executing a
learning task at an edge server. Along this direction, we consider the specific
problem of retransmission decision in each communication round to ensure both
reliability and quantity of those training data for accelerating model
convergence. To solve the problem, a new retransmission protocol called
data-importance aware automatic-repeat-request (importance ARQ) is proposed.
Unlike the classic ARQ focusing merely on reliability, importance ARQ
selectively retransmits a data sample based on its uncertainty which helps
learning and can be measured using the model under training. Underpinning the
proposed protocol is a derived elegant communication-learning relation between
two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data
uncertainty. This relation facilitates the design of a simple threshold based
policy for importance ARQ. The policy is first derived based on the classic
classifier model of support vector machine (SVM), where the uncertainty of a
data sample is measured by its distance to the decision boundary. The policy is
then extended to the more complex model of convolutional neural networks (CNN)
where data uncertainty is measured by entropy. Extensive experiments have been
conducted for both the SVM and CNN using real datasets with balanced and
imbalanced distributions. Experimental results demonstrate that importance ARQ
effectively copes with channel fading and noise in wireless data acquisition to
achieve faster model convergence than the conventional channel-aware ARQ.Comment: This is an updated version: 1) extension to general classifiers; 2)
consideration of imbalanced classification in the experiments. Submitted to
IEEE Journal for possible publicatio
A Machine learning approach to POS tagging
We have applied inductive learning of statistical decision trees
and relaxation labelling to the Natural Language Processing (NLP)
task of morphosyntactic disambiguation (Part Of Speech Tagging).
The learning process is supervised and obtains a language
model oriented to resolve POS ambiguities. This model consists
of a set of statistical decision trees expressing distribution of
tags and words in some relevant contexts.
The acquired language models are complete enough to be directly
used as sets of POS disambiguation rules, and include more complex
contextual information than simple collections of n-grams usually
used in statistical taggers.
We have implemented a quite simple and fast tagger that has been
tested and evaluated on the Wall Street Journal (WSJ) corpus with
a remarkable accuracy.
However, better results can be obtained by translating the trees
into rules to feed a flexible relaxation labelling based tagger.
In this direction we describe a tagger which is able to use
information of any kind (n-grams, automatically acquired constraints,
linguistically motivated manually written constraints, etc.), and in
particular to incorporate the machine learned decision trees.
Simultaneously, we address the problem of tagging when only
small training material is available, which is crucial in any process
of constructing, from scratch, an annotated corpus. We show that quite
high accuracy can be achieved with our system in this situation.Postprint (published version
Synergy-Based Hand Pose Sensing: Optimal Glove Design
In this paper we study the problem of improving human hand pose sensing
device performance by exploiting the knowledge on how humans most frequently
use their hands in grasping tasks. In a companion paper we studied the problem
of maximizing the reconstruction accuracy of the hand pose from partial and
noisy data provided by any given pose sensing device (a sensorized "glove")
taking into account statistical a priori information. In this paper we consider
the dual problem of how to design pose sensing devices, i.e. how and where to
place sensors on a glove, to get maximum information about the actual hand
posture. We study the continuous case, whereas individual sensing elements in
the glove measure a linear combination of joint angles, the discrete case,
whereas each measure corresponds to a single joint angle, and the most general
hybrid case, whereas both continuous and discrete sensing elements are
available. The objective is to provide, for given a priori information and
fixed number of measurements, the optimal design minimizing in average the
reconstruction error. Solutions relying on the geometrical synergy definition
as well as gradient flow-based techniques are provided. Simulations of
reconstruction performance show the effectiveness of the proposed optimal
design.Comment: Submitted to International Journal of Robotics Research 201
Towards Cancer Hybrid Automata
This paper introduces Cancer Hybrid Automata (CHAs), a formalism to model the
progression of cancers through discrete phenotypes. The classification of
cancer progression using discrete states like stages and hallmarks has become
common in the biology literature, but primarily as an organizing principle, and
not as an executable formalism. The precise computational model developed here
aims to exploit this untapped potential, namely, through automatic verification
of progression models (e.g., consistency, causal connections, etc.),
classification of unreachable or unstable states and computer-generated
(individualized or universal) therapy plans. The paper builds on a
phenomenological approach, and as such does not need to assume a model for the
biochemistry of the underlying natural progression. Rather, it abstractly
models transition timings between states as well as the effects of drugs and
clinical tests, and thus allows formalization of temporal statements about the
progression as well as notions of timed therapies. The model proposed here is
ultimately based on hybrid automata, and we show how existing controller
synthesis algorithms can be generalized to CHA models, so that therapies can be
generated automatically. Throughout this paper we use cancer hallmarks to
represent the discrete states through which cancer progresses, but other
notions of discretely or continuously varying state formalisms could also be
used to derive similar therapies.Comment: In Proceedings HSB 2012, arXiv:1208.315
- …