Search CORE

172,719 research outputs found

Noiseless Independent Factor Analysis with mixing constraints in a semi-supervised framework. Application to railway device fault diagnosis.

Author: Aknin Patrice
Côme Etienne
Denoeux Thierry
Oukhellou Latifa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2009
Field of study

International audienceIn Independent Factor Analysis (IFA), latent components (or sources) are recovered from only their linear observed mixtures. Both the mixing process and the source densities (that are assumed to be gener- ated according to mixtures of Gaussians) are learned from observed data. This paper investigates the possibility of estimating the IFA model in its noiseless setting when two kinds of prior information are incorporated: constraints on the mixing process and partial knowledge on the cluster membership of some examples. Semi-supervised or partially supervised learning frameworks can thus be handled. These two proposals have been initially motivated by a real-world application that concerns fault diag- nosis of a railway device. Results from this application are provided to demonstrate the ability of our approach to enhance estimation accuracy and remove indeterminacy commonly encountered in unsupervised IFA such as source permutations

Hal-Diderot

The Dawn of Open Access to Phylogenetic Data

Author: A Gelman
A Stoltzfus
AA Alsheikh-Ali
AJ Drummond
AJ Moore
Andrew F. Magee
BP Blackburne
Brian R. Moore
BT Drew
BT Drew
C Notredame
CJ Savage
D Rabosky
DA Morrison
DG Roche
E Evangelou
HA Piwowar
HA Piwowar
HA Piwowar
HA Piwowar
HA Piwowar
HA Piwowar
HA Piwowar
J Hughes
J Leebens-Mack
JD Thompson
JM Wicherts
JM Wicherts
KM Wong
L Rieseberg
M Plummer
MA Suchard
MAF Noor
MC Whitlock
MC Whitlock
MD Rausher
Michael R. May
MJ Donoghue
MJ Sanderson
MJ Sanderson
MK Uyenoyama
OG Pybus
RM O'brien
S Kullback
SJ Ceci
SP Brooks
T Vines
TH Vines
TJ Vision
William J. Murphy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

The scientific enterprise depends critically on the preservation of and open access to published data. This basic tenet applies acutely to phylogenies (estimates of evolutionary relationships among species). Increasingly, phylogenies are estimated from increasingly large, genome-scale datasets using increasingly complex statistical methods that require increasing levels of expertise and computational investment. Moreover, the resulting phylogenetic data provide an explicit historical perspective that critically informs research in a vast and growing number of scientific disciplines. One such use is the study of changes in rates of lineage diversification (speciation - extinction) through time. As part of a meta-analysis in this area, we sought to collect phylogenetic data (comprising nucleotide sequence alignment and tree files) from 217 studies published in 46 journals over a 13-year period. We document our attempts to procure those data (from online archives and by direct request to corresponding authors), and report results of analyses (using Bayesian logistic regression) to assess the impact of various factors on the success of our efforts. Overall, complete phylogenetic data for ~60% of these studies are effectively lost to science. Our study indicates that phylogenetic data are more likely to be deposited in online archives and/or shared upon request when: (1) the publishing journal has a strong data-sharing policy; (2) the publishing journal has a higher impact factor, and; (3) the data are requested from faculty rather than students. Although the situation appears dire, our analyses suggest that it is far from hopeless: recent initiatives by the scientific community -- including policy changes by journals and funding agencies -- are improving the state of affairs

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

eScholarship - University of California

The Complexity of Partial Function Extension for Coverage Functions

Author: Bhaskar Umang
Kumar Gunjan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

Coverage functions are an important subclass of submodular functions, finding applications in machine learning, game theory, social networks, and facility location. We study the complexity of partial function extension to coverage functions. That is, given a partial function consisting of a family of subsets of [m] and a value at each point, does there exist a coverage function defined on all subsets of [m] that extends this partial function? Partial function extension is previously studied for other function classes, including boolean functions and convex functions, and is useful in many fields, such as obtaining bounds on learning these function classes. We show that determining extendibility of a partial function to a coverage function is NP-complete, establishing in the process that there is a polynomial-sized certificate of extendibility. The hardness also gives us a lower bound for learning coverage functions. We then study two natural notions of approximate extension, to account for errors in the data set. The two notions correspond roughly to multiplicative point-wise approximation and additive L_1 approximation. We show upper and lower bounds for both notions of approximation. In the second case we obtain nearly tight bounds

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server