Search CORE

1,884 research outputs found

Learning from Noisy Label Distributions

Author: A Culotta
CM Bishop
F Pedregosa
TG Dietterich
Publication venue
Publication date: 10/08/2017
Field of study

In this paper, we consider a novel machine learning problem, that is, learning a classifier from noisy label distributions. In this problem, each instance with a feature vector belongs to at least one group. Then, instead of the true label of each instance, we observe the label distribution of the instances associated with a group, where the label distribution is distorted by an unknown noise. Our goals are to (1) estimate the true label of each instance, and (2) learn a classifier that predicts the true label of a new instance. We propose a probabilistic model that considers true label distributions of groups and parameters that represent the noise as hidden variables. The model can be learned based on a variational Bayesian method. In numerical experiments, we show that the proposed model outperforms existing methods in terms of the estimation of the true labels of instances.Comment: Accepted in ICANN201

arXiv.org e-Print Archive

Crossref

Parallel ion strings in linear multipole traps

Author: A. Calisti
C. Champenois
D. Gerlich
J. Pedregosa-Gutierrez
M. Knoop
M. Marciante
Publication venue: 'American Physical Society (APS)'
Publication date: 18/02/2011
Field of study

Additional radio-frequency (rf) potentials applied to linear multipole traps create extra field nodes in the radial plane which allow one to confine single ions, or strings of ions, in totally rf field-free regions. The number of nodes depends on the order of the applied multipole potentials and their relative distance can be easily tuned by the amplitude variation of the applied voltages. Simulations using molecular dynamics show that strings of ions can be laser cooled down to the Doppler limit in all directions of space. Once cooled, organized systems can be moved with very limited heating, even if the cooling process is turned off

arXiv.org e-Print Archive

A class of Hamilton-Jacobi equations on Banach-Finsler manifolds

Author: Jaramillo J. A.
Jimenez-Sevilla M.
Rodenas-Pedregosa J. L.
Sanchez-Gonzalez L.
Publication venue: 'Elsevier BV'
Publication date: 02/12/2014
Field of study

The concept of subdifferentiability is studied in the context of

C^1

Finsler manifolds (modeled on a Banach space with a Lipschitz

C^1

bump function). A class of Hamilton-Jacobi equations defined on

C^1

Finsler manifolds is studied and several results related to the existence and uniqueness of viscosity solutions are obtained.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

Author: Allamanis M.
Chowdhury S.
Cicchetti A.
Lin Y.
Pedregosa F.
Settles B.
Settles B.
Soliman M.
Ying A.
Publication venue
Publication date: 01/01/2017
Field of study

Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to train an SVM classifier for Stack Overflow posts discussing performance of software components. We show that the training examples deemed as the most valuable to the classifier are also the most difficult for humans to annotate. Despite carefully evolved annotation criteria, we report low inter-rater agreement, but we also propose mitigation strategies. Finally, based on one annotator's work, we show that self-training can improve the classification accuracy. We conclude the paper by discussing implication for future text miners aspiring to use AL and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 201

arXiv.org e-Print Archive

Lund University Publications

Crossref

Swedish Institute of Computer Science Publications Database

The Potential of Restarts for ProbSAT

Author: A Arbelaez
A Balint
A Balint
A Biere
F Pedregosa
G Völkel
J-H Lorenz
M Hollander
M Luby
S Haim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/04/2019
Field of study

This work analyses the potential of restarts for probSAT, a quite successful algorithm for k-SAT, by estimating its runtime distributions on random 3-SAT instances that are close to the phase transition. We estimate an optimal restart time from empirical data, reaching a potential speedup factor of 1.39. Calculating restart times from fitted probability distributions reduces this factor to a maximum of 1.30. A spin-off result is that the Weibull distribution approximates the runtime distribution for over 93% of the used instances well. A machine learning pipeline is presented to compute a restart time for a fixed-cutoff strategy to exploit this potential. The main components of the pipeline are a random forest for determining the distribution type and a neural network for the distribution's parameters. ProbSAT performs statistically significantly better than Luby's restart strategy and the policy without restarts when using the presented approach. The structure is particularly advantageous on hard problems.Comment: Eurocast 201

arXiv.org e-Print Archive

Crossref

ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)

Author: Benjamini Y.
Cohen I.
Jeyakumar V.
Pedregosa F.
Seth A. K.
Shimizu S.
Tenenbaum J. B.
Wang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/03/2019
Field of study

We present ExplainIt!, a declarative, unsupervised root-cause analysis engine that uses time series monitoring data from large complex systems such as data centres. ExplainIt! empowers operators to succinctly specify a large number of causal hypotheses to search for causes of interesting events. ExplainIt! then ranks these hypotheses, reducing the number of causal dependencies from hundreds of thousands to a handful for human understanding. We show how a declarative language, such as SQL, can be effective in declaratively enumerating hypotheses that probe the structure of an unknown probabilistic graphical causal model of the underlying system. Our thesis is that databases are in a unique position to enable users to rapidly explore the possible causal mechanisms in data collected from diverse sources. We empirically demonstrate how ExplainIt! had helped us resolve over 30 performance issues in a commercial product since late 2014, of which we discuss a few cases in detail.Comment: SIGMOD Industry Track 201

arXiv.org e-Print Archive

Crossref

Biosurfactant-mediated biodegradation of straight and methyl-branched alkanes by Pseudomonas aeruginosa ATCC 55925

Author: Laborda Fernando
Pedregosa Ana M
Rocha Carlos A
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Accidental oil spills and waste disposal are important sources for environmental pollution. We investigated the biodegradation of alkanes by Pseudomonas aeruginosa ATCC 55925 in relation to a rhamnolipid surfactant produced by the same bacterial strain. Results showed that the linear C11-C21 compounds in a heating oil sample degraded from 6% to 100%, whereas the iso-alkanes tended to be recalcitrant unless they were exposed to the biosurfactant; under such condition total biodegradation was achieved. Only the biodegradation of the commercial C12-C19 alkanes could be demonstrated, ranging from 23% to 100%, depending on the experimental conditions. Pristane (a C19 branched alkane) only biodegraded when present alone with the biosurfactant and when included in an artificial mixture even without the biosurfactant. In all cases the biosurfactant significantly enhanced biodegradation. The electron scanning microscopy showed that cells depicted several adaptations to growth on hydrocarbons, such as biopolymeric spheres with embedded cells distributed over different layers on the spherical surfaces and cells linked to each other by extracellular appendages. Electron transmission microscopy revealed transparent inclusions, which were associated with hydrocarbon based-culture cells. These patterns of hydrocarbon biodegradation and cell adaptations depended on the substrate bioavailability, type and length of hydrocarbon

Crossref

PubMed Central

Spectral Graph Convolutions for Population-based Disease Prediction

Author: A Abraham
A Martino Di
C Ledig
DI Shuman
F Pedregosa
M Havaei
R Wolz
RS Desikan
S Parisot
T Brosch
T Tong
Publication venue
Publication date: 16/05/2017
Field of study

Exploiting the wealth of imaging and non-imaging information for disease prediction tasks requires models capable of representing, at the same time, individual features as well as data associations between subjects from potentially large populations. Graphs provide a natural framework for such tasks, yet previous graph-based approaches focus on pairwise similarities without modelling the subjects' individual characteristics and features. On the other hand, relying solely on subject-specific imaging feature vectors fails to model the interaction and similarity between subjects, which can reduce performance. In this paper, we introduce the novel concept of Graph Convolutional Networks (GCN) for brain analysis in populations, combining imaging and non-imaging data. We represent populations as a sparse graph where its vertices are associated with image-based feature vectors and the edges encode phenotypic information. This structure was used to train a GCN model on partially labelled graphs, aiming to infer the classes of unlabelled nodes from the node features and pairwise associations between subjects. We demonstrate the potential of the method on the challenging ADNI and ABIDE databases, as a proof of concept of the benefit from integrating contextual information in classification tasks. This has a clear impact on the quality of the predictions, leading to 69.5% accuracy for ABIDE (outperforming the current state of the art of 66.8%) and 77% for ADNI for prediction of MCI conversion, significantly outperforming standard linear classifiers where only individual features are considered.Comment: International Conference on Medical Image Computing and Computer-Assisted Interventions (MICCAI) 201

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns

Author: A Cartas
A Natekin
A Torralba
AR Doherty
BC Russell
CJ Burges
E Talavera
F Pedregosa
M Bolanos
M Dimiccoli
N Srivastava
O Kramer
O Russakovsky
Publication venue
Publication date: 25/07/2017
Field of study

First-person stories can be analyzed by means of egocentric pictures acquired throughout the whole active day with wearable cameras. This manuscript presents an egocentric dataset with more than 45,000 pictures from four people in different environments such as working or studying. All the images were manually labeled to identify three patterns of interest regarding people's lifestyle: socializing, eating and sedentary. Additionally, two different approaches are proposed to classify egocentric images into one of the 12 target categories defined to characterize these three patterns. The approaches are based on machine learning and deep learning techniques, including traditional classifiers and state-of-art convolutional neural networks. The experimental results obtained when applying these methods to the egocentric dataset demonstrated their adequacy for the problem at hand.Comment: Accepted at First International Workshop on Social Signal Processing and Beyond, 19th International Conference on Image Analysis and Processing (ICIAP), September 201

arXiv.org e-Print Archive

Crossref

Community Aliveness: Discovering Interaction Decay Patterns in Online Social Communities

Author: A Capocci
A-L Barabási
A-L Barabási
C Cortes
DJ Watts
EM Jin
F Pedregosa
G Kossinets
H Ebel
ME Newman
Mohammed Abufouda
S. N Dorogovtsev
Publication venue
Publication date: 14/07/2017
Field of study

Online Social Communities (OSCs) provide a medium for connecting people, sharing news, eliciting information, and finding jobs, among others. The dynamics of the interaction among the members of OSCs is not always growth dynamics. Instead, a

\textit{decay}

\textit{inactivity}

dynamics often happens, which makes an OSC obsolete. Understanding the behavior and the characteristics of the members of an inactive community help to sustain the growth dynamics of these communities and, possibly, prevents them from being out of service. In this work, we provide two prediction models for predicting the interaction decay of community members, namely: a Simple Threshold Model (STM) and a supervised machine learning classification framework. We conducted evaluation experiments for our prediction models supported by a

\textit{ground truth}

of decayed communities extracted from the StackExchange platform. The results of the experiments revealed that it is possible, with satisfactory prediction performance in terms of the F1-score and the accuracy, to predict the decay of the activity of the members of these communities using network-based attributes and network-exogenous attributes of the members. The upper bound of the prediction performance of the methods we used is

0.91

and

0.83

for the F1-score and the accuracy, respectively. These results indicate that network-based attributes are correlated with the activity of the members and that we can find decay patterns in terms of these attributes. The results also showed that the structure of the decayed communities can be used to support the alive communities by discovering inactive members.Comment: pre-print for the 4th European Network Intelligence Conference - 11-12 September 2017 Duisburg, German

arXiv.org e-Print Archive

Crossref