172,719 research outputs found
Noiseless Independent Factor Analysis with mixing constraints in a semi-supervised framework. Application to railway device fault diagnosis.
International audienceIn Independent Factor Analysis (IFA), latent components (or sources) are recovered from only their linear observed mixtures. Both the mixing process and the source densities (that are assumed to be gener- ated according to mixtures of Gaussians) are learned from observed data. This paper investigates the possibility of estimating the IFA model in its noiseless setting when two kinds of prior information are incorporated: constraints on the mixing process and partial knowledge on the cluster membership of some examples. Semi-supervised or partially supervised learning frameworks can thus be handled. These two proposals have been initially motivated by a real-world application that concerns fault diag- nosis of a railway device. Results from this application are provided to demonstrate the ability of our approach to enhance estimation accuracy and remove indeterminacy commonly encountered in unsupervised IFA such as source permutations
The Dawn of Open Access to Phylogenetic Data
The scientific enterprise depends critically on the preservation of and open
access to published data. This basic tenet applies acutely to phylogenies
(estimates of evolutionary relationships among species). Increasingly,
phylogenies are estimated from increasingly large, genome-scale datasets using
increasingly complex statistical methods that require increasing levels of
expertise and computational investment. Moreover, the resulting phylogenetic
data provide an explicit historical perspective that critically informs
research in a vast and growing number of scientific disciplines. One such use
is the study of changes in rates of lineage diversification (speciation -
extinction) through time. As part of a meta-analysis in this area, we sought to
collect phylogenetic data (comprising nucleotide sequence alignment and tree
files) from 217 studies published in 46 journals over a 13-year period. We
document our attempts to procure those data (from online archives and by direct
request to corresponding authors), and report results of analyses (using
Bayesian logistic regression) to assess the impact of various factors on the
success of our efforts. Overall, complete phylogenetic data for ~60% of these
studies are effectively lost to science. Our study indicates that phylogenetic
data are more likely to be deposited in online archives and/or shared upon
request when: (1) the publishing journal has a strong data-sharing policy; (2)
the publishing journal has a higher impact factor, and; (3) the data are
requested from faculty rather than students. Although the situation appears
dire, our analyses suggest that it is far from hopeless: recent initiatives by
the scientific community -- including policy changes by journals and funding
agencies -- are improving the state of affairs
The Complexity of Partial Function Extension for Coverage Functions
Coverage functions are an important subclass of submodular functions, finding applications in machine learning, game theory, social networks, and facility location. We study the complexity of partial function extension to coverage functions. That is, given a partial function consisting of a family of subsets of [m] and a value at each point, does there exist a coverage function defined on all subsets of [m] that extends this partial function? Partial function extension is previously studied for other function classes, including boolean functions and convex functions, and is useful in many fields, such as obtaining bounds on learning these function classes.
We show that determining extendibility of a partial function to a coverage function is NP-complete, establishing in the process that there is a polynomial-sized certificate of extendibility. The hardness also gives us a lower bound for learning coverage functions. We then study two natural notions of approximate extension, to account for errors in the data set. The two notions correspond roughly to multiplicative point-wise approximation and additive L_1 approximation. We show upper and lower bounds for both notions of approximation. In the second case we obtain nearly tight bounds
- …