1,884 research outputs found
Learning from Noisy Label Distributions
In this paper, we consider a novel machine learning problem, that is,
learning a classifier from noisy label distributions. In this problem, each
instance with a feature vector belongs to at least one group. Then, instead of
the true label of each instance, we observe the label distribution of the
instances associated with a group, where the label distribution is distorted by
an unknown noise. Our goals are to (1) estimate the true label of each
instance, and (2) learn a classifier that predicts the true label of a new
instance. We propose a probabilistic model that considers true label
distributions of groups and parameters that represent the noise as hidden
variables. The model can be learned based on a variational Bayesian method. In
numerical experiments, we show that the proposed model outperforms existing
methods in terms of the estimation of the true labels of instances.Comment: Accepted in ICANN201
Parallel ion strings in linear multipole traps
Additional radio-frequency (rf) potentials applied to linear multipole traps
create extra field nodes in the radial plane which allow one to confine single
ions, or strings of ions, in totally rf field-free regions. The number of nodes
depends on the order of the applied multipole potentials and their relative
distance can be easily tuned by the amplitude variation of the applied
voltages. Simulations using molecular dynamics show that strings of ions can be
laser cooled down to the Doppler limit in all directions of space. Once cooled,
organized systems can be moved with very limited heating, even if the cooling
process is turned off
A class of Hamilton-Jacobi equations on Banach-Finsler manifolds
The concept of subdifferentiability is studied in the context of
Finsler manifolds (modeled on a Banach space with a Lipschitz bump
function). A class of Hamilton-Jacobi equations defined on Finsler
manifolds is studied and several results related to the existence and
uniqueness of viscosity solutions are obtained.Comment: 24 page
On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow
Abundant data is the key to successful machine learning. However, supervised
learning requires annotated data that are often hard to obtain. In a
classification task with limited resources, Active Learning (AL) promises to
guide annotators to examples that bring the most value for a classifier. AL can
be successfully combined with self-training, i.e., extending a training set
with the unlabelled examples for which a classifier is the most certain. We
report our experiences on using AL in a systematic manner to train an SVM
classifier for Stack Overflow posts discussing performance of software
components. We show that the training examples deemed as the most valuable to
the classifier are also the most difficult for humans to annotate. Despite
carefully evolved annotation criteria, we report low inter-rater agreement, but
we also propose mitigation strategies. Finally, based on one annotator's work,
we show that self-training can improve the classification accuracy. We conclude
the paper by discussing implication for future text miners aspiring to use AL
and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International
Conference on Evaluation and Assessment in Software Engineering, 201
The Potential of Restarts for ProbSAT
This work analyses the potential of restarts for probSAT, a quite successful
algorithm for k-SAT, by estimating its runtime distributions on random 3-SAT
instances that are close to the phase transition. We estimate an optimal
restart time from empirical data, reaching a potential speedup factor of 1.39.
Calculating restart times from fitted probability distributions reduces this
factor to a maximum of 1.30. A spin-off result is that the Weibull distribution
approximates the runtime distribution for over 93% of the used instances well.
A machine learning pipeline is presented to compute a restart time for a
fixed-cutoff strategy to exploit this potential. The main components of the
pipeline are a random forest for determining the distribution type and a neural
network for the distribution's parameters. ProbSAT performs statistically
significantly better than Luby's restart strategy and the policy without
restarts when using the presented approach. The structure is particularly
advantageous on hard problems.Comment: Eurocast 201
ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)
We present ExplainIt!, a declarative, unsupervised root-cause analysis engine
that uses time series monitoring data from large complex systems such as data
centres. ExplainIt! empowers operators to succinctly specify a large number of
causal hypotheses to search for causes of interesting events. ExplainIt! then
ranks these hypotheses, reducing the number of causal dependencies from
hundreds of thousands to a handful for human understanding. We show how a
declarative language, such as SQL, can be effective in declaratively
enumerating hypotheses that probe the structure of an unknown probabilistic
graphical causal model of the underlying system. Our thesis is that databases
are in a unique position to enable users to rapidly explore the possible causal
mechanisms in data collected from diverse sources. We empirically demonstrate
how ExplainIt! had helped us resolve over 30 performance issues in a commercial
product since late 2014, of which we discuss a few cases in detail.Comment: SIGMOD Industry Track 201
Biosurfactant-mediated biodegradation of straight and methyl-branched alkanes by Pseudomonas aeruginosa ATCC 55925
Accidental oil spills and waste disposal are important sources for environmental pollution. We investigated the biodegradation of alkanes by Pseudomonas aeruginosa ATCC 55925 in relation to a rhamnolipid surfactant produced by the same bacterial strain. Results showed that the linear C11-C21 compounds in a heating oil sample degraded from 6% to 100%, whereas the iso-alkanes tended to be recalcitrant unless they were exposed to the biosurfactant; under such condition total biodegradation was achieved. Only the biodegradation of the commercial C12-C19 alkanes could be demonstrated, ranging from 23% to 100%, depending on the experimental conditions. Pristane (a C19 branched alkane) only biodegraded when present alone with the biosurfactant and when included in an artificial mixture even without the biosurfactant. In all cases the biosurfactant significantly enhanced biodegradation. The electron scanning microscopy showed that cells depicted several adaptations to growth on hydrocarbons, such as biopolymeric spheres with embedded cells distributed over different layers on the spherical surfaces and cells linked to each other by extracellular appendages. Electron transmission microscopy revealed transparent inclusions, which were associated with hydrocarbon based-culture cells. These patterns of hydrocarbon biodegradation and cell adaptations depended on the substrate bioavailability, type and length of hydrocarbon
Spectral Graph Convolutions for Population-based Disease Prediction
Exploiting the wealth of imaging and non-imaging information for disease
prediction tasks requires models capable of representing, at the same time,
individual features as well as data associations between subjects from
potentially large populations. Graphs provide a natural framework for such
tasks, yet previous graph-based approaches focus on pairwise similarities
without modelling the subjects' individual characteristics and features. On the
other hand, relying solely on subject-specific imaging feature vectors fails to
model the interaction and similarity between subjects, which can reduce
performance. In this paper, we introduce the novel concept of Graph
Convolutional Networks (GCN) for brain analysis in populations, combining
imaging and non-imaging data. We represent populations as a sparse graph where
its vertices are associated with image-based feature vectors and the edges
encode phenotypic information. This structure was used to train a GCN model on
partially labelled graphs, aiming to infer the classes of unlabelled nodes from
the node features and pairwise associations between subjects. We demonstrate
the potential of the method on the challenging ADNI and ABIDE databases, as a
proof of concept of the benefit from integrating contextual information in
classification tasks. This has a clear impact on the quality of the
predictions, leading to 69.5% accuracy for ABIDE (outperforming the current
state of the art of 66.8%) and 77% for ADNI for prediction of MCI conversion,
significantly outperforming standard linear classifiers where only individual
features are considered.Comment: International Conference on Medical Image Computing and
Computer-Assisted Interventions (MICCAI) 201
Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns
First-person stories can be analyzed by means of egocentric pictures acquired
throughout the whole active day with wearable cameras. This manuscript presents
an egocentric dataset with more than 45,000 pictures from four people in
different environments such as working or studying. All the images were
manually labeled to identify three patterns of interest regarding people's
lifestyle: socializing, eating and sedentary. Additionally, two different
approaches are proposed to classify egocentric images into one of the 12 target
categories defined to characterize these three patterns. The approaches are
based on machine learning and deep learning techniques, including traditional
classifiers and state-of-art convolutional neural networks. The experimental
results obtained when applying these methods to the egocentric dataset
demonstrated their adequacy for the problem at hand.Comment: Accepted at First International Workshop on Social Signal Processing
and Beyond, 19th International Conference on Image Analysis and Processing
(ICIAP), September 201
Community Aliveness: Discovering Interaction Decay Patterns in Online Social Communities
Online Social Communities (OSCs) provide a medium for connecting people,
sharing news, eliciting information, and finding jobs, among others. The
dynamics of the interaction among the members of OSCs is not always growth
dynamics. Instead, a or dynamics often
happens, which makes an OSC obsolete. Understanding the behavior and the
characteristics of the members of an inactive community help to sustain the
growth dynamics of these communities and, possibly, prevents them from being
out of service. In this work, we provide two prediction models for predicting
the interaction decay of community members, namely: a Simple Threshold Model
(STM) and a supervised machine learning classification framework. We conducted
evaluation experiments for our prediction models supported by a of decayed communities extracted from the StackExchange platform. The
results of the experiments revealed that it is possible, with satisfactory
prediction performance in terms of the F1-score and the accuracy, to predict
the decay of the activity of the members of these communities using
network-based attributes and network-exogenous attributes of the members. The
upper bound of the prediction performance of the methods we used is and
for the F1-score and the accuracy, respectively. These results indicate
that network-based attributes are correlated with the activity of the members
and that we can find decay patterns in terms of these attributes. The results
also showed that the structure of the decayed communities can be used to
support the alive communities by discovering inactive members.Comment: pre-print for the 4th European Network Intelligence Conference -
11-12 September 2017 Duisburg, German
- …