69 research outputs found

    Learning Output Embeddings in Structured Prediction

    Full text link
    A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension by means of output kernels, and then, solving a regression problem in this output space. A prediction in the original space is computed by solving a pre-image problem. In such an approach, the embedding, linked to the target loss, is defined prior to the learning phase. In this work, we propose to jointly learn a finite approximation of the output embedding and the regression function into the new feature space. For that purpose, we leverage a priori information on the outputs and also unexploited unsupervised output data, which are both often available in structured prediction problems. We prove that the resulting structured predictor is a consistent estimator, and derive an excess risk bound. Moreover, the novel structured prediction tool enjoys a significantly smaller computational complexity than former output kernel methods. The approach empirically tested on various structured prediction problems reveals to be versatile and able to handle large datasets

    Protein-protein interaction network inference with semi-supervised Output Kernel Regression

    Get PDF
    National audienceIn this work, we address the problem of protein-protein interaction network inference as a semi-supervised output kernel learning problem. Using the kernel trick in the output space allows one to reduce the problem of learning from pairs to learning a single variable function with values in a Hilbert space. We turn to the Reproducing Kernel Hilbert Space theory devoted to vector- valued functions, which provides us with a general framework for output kernel regression. In this framework, we propose a novel method which allows to extend Output Kernel Regression to semi-supervised learning. We study the relevance of this approach on transductive link prediction using artificial data and a protein-protein interaction network of S. Cerevisiae using a very low percentage of labeled data

    An ant-plant mutualism induces shifts in the protist community structure of a tank-bromeliad

    Get PDF
    Although ants may induce community-wide effects via changes in physical habitats in terrestrial environments, their influence on aquatic communities living in plant-held waters remains largely underexplored. The neotropical tank-bromeliad Aechmea mertensii (Bromeliaceae) occurs along forest edges in ant-gardens initiated by Camponotus femoratus or by Pachycondyla goeldii. Its leaves form wells that hold rainwater and provide suitable habitats for many aquatic organisms. We postulated that these ant-plant mutualisms indirectly affect the microbial community structure via changes in the environmental conditions experienced by the plants. To test this hypothesis, we analyzed the protist communities from 63 tank-bromeliads associated with either C. femoratus or P. goeldii (hereafter Cf-Aechmea and Pg-Aechmea) along a forest edge in French Guiana. For each plant, a large number of environmental variables (including habitat structure, food resources, incident radiation and the presence of aquatic invertebrates) were quantified to determine their relative importance in driving any observed differences across ant-associated plants. Pg-Aechmea are located in sun-exposed areas and hold low volumes of water and low amounts of detritus, whereas Cf-Aechmea are located in partially shaded areas and accumulate higher amounts of water and detritus. Protists (i.e., protozoa and algae) inhabiting Cf-Aechmea exhibit greater richness and abundances than those in Pg-Aechmea. Variations in detritus content, number of leaves, incident radiation, and the epiphyte richness of the ant-garden were the main factors explaining the variation in protist richness. A shift in the functional group composition of protists between bromeliads tended by different ant species suggested that mutualistic ants indirectly mediate changes in the microbial food web

    Food-web structure in relation to environmental gradients and predator-prey ratios in tank-bromeliad ecosystems

    Get PDF
    Little is known of how linkage patterns between species change along environmental gradients. The small, spatially discrete food webs inhabiting tank-bromeliads provide an excellent opportunity to analyse patterns of community diversity and food-web topology (connectance, linkage density, nestedness) in relation to key environmental variables (habitat size, detrital resource, incident radiation) and predators: prey ratios. We sampled 365 bromeliads in a wide range of understorey environments in French Guiana and used gut contents of invertebrates to draw the corresponding 365 connectance webs. At the bromeliad scale, habitat size (water volume) determined the number of species that constitute food-web nodes, the proportion of predators, and food-web topology. The number of species as well as the proportion of predators within bromeliads declined from open to forested habitats, where the volume of water collected by bromeliads was generally lower because of rainfall interception by the canopy. A core group of microorganisms and generalist detritivores remained relatively constant across environments. This suggests that (i) a highly-connected core ensures food-web stability and key ecosystem functions across environments, and (ii) larger deviations in food-web structures can be expected following disturbance if detritivores share traits that determine responses to environmental changes. While linkage density and nestedness were lower in bromeliads in the forest than in open areas, experiments are needed to confirm a trend for lower food-web stability in the understorey of primary forests

    Environmental determinants of macroinvertebrate diversity in small water bodies: insights from tank-bromeliads

    Get PDF
    The interlocking leaves of tank-forming bromeliads (Bromeliaceae) collect rainwater and detritus, thus creating a freshwater habitat for specialized organisms. Their abundance and the possibility of quantifying communities with accuracy give us unparalleled insight into how changes in local to regional environments influence community diversity in small water bodies. We sampled 365 bromeliads (365 invertebrate communities) along a southeastern to northwestern range in French Guiana. Geographic locality determined the species pool for bromeliad invertebrates, and local environments determined the abundance patterns through the selection of traits that are best adapted to the bromeliad habitats. Patterns in community structure mostly emerged from patterns of predator species occurrence and abundance across local-regional environments, while the set of detritivores remained constant. Water volume had a strong positive correlation with invertebrate diversity, making it a biologically relevant measure of the pools' carrying capacity. The significant effects of incoming detritus and incident light show that changes in local environments (e.g., the conversion of forest to cropping systems) strongly influence freshwater communities. Because changes in local environments do not affect detritivores and predators equally, one may expect functional shifts as sets of invertebrates with particular traits are replaced or complemented by other sets with different traits

    Are Algae Relevant to the Detritus-Based Food Web in Tank-Bromeliads?

    Get PDF
    We assessed the occurrence of algae in five species of tank-bromeliads found in contrasting environmental sites in a Neotropical, primary rainforest around the Nouragues Research Station, French Guiana. The distributions of both algal abundance and biomass were examined based on physical parameters, the morphological characteristics of bromeliad species and with regard to the structure of other aquatic microbial communities held in the tanks. Algae were retrieved in all of the bromeliad species with mean densities ranging from ∼102 to 104 cells/mL. Their biomass was positively correlated to light exposure and bacterial biomass. Algae represented a tiny component of the detrital food web in shaded bromeliads but accounted for up to 30 percent of the living microbial carbon in the tanks of Catopsis berteroniana, located in a highly exposed area. Thus, while nutrient supplies are believed to originate from wind-borne particles and trapped insects (i.e., allochtonous organic matter), our results indicate that primary producers (i.e., autochtonous organic matter) are present in this insectivorous bromeliad. Using a 24-h incubation of size-fractionated and manipulated samples from this plant, we evaluated the impact of mosquito foraging on algae, other microorganisms and rotifers. The prey assemblages were greatly altered by the predation of mosquito larvae. Grazing losses indicated that the dominant algal taxon, Bumilleriopsis sp., like protozoa and rotifers, is a significant part of the diet of mosquito larvae. We conclude that algae are a relevant functional community of the aquatic food web in C. berteroniana and might form the basis of a complementary non-detrital food web

    Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models

    No full text
    In small molecule identification from tandem mass (MS/MS) spectra, input–output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data

    Protein-protein interaction network inference using statistical learning

    No full text
    L'objectif de cette thèse est de développer des outils de prédiction d'interactions entre protéines qui puissent être appliqués en particulier sur le réseau d’interaction autour de la protéine CFTR, qui est impliquée dans la mucoviscidose. Le développement de méthodes de prédiction in silico peut s'avérer utile pour suggérer aux biologistes de nouvelles cibles d'interaction. Nous proposons une nouvelle méthode pour la prédiction de liens dans un réseau. Afin de bénéficier de l'information des données non étiquetées, nous nous plaçons dans le cadre de l'apprentissage semi-supervisé. Nous abordons ce problème de prédiction comme une tâche d'apprentissage d'un noyau de sortie. Un noyau de sortie est supposé coder les proximités existantes entres les nœuds du graphe et l'objectif est d'approcher ce noyau à partir de descriptions appropriées en entrée. L'utilisation de l'astuce du noyau dans l'ensemble de sortie permet de réduire le problème d'apprentissage à celui d'une fonction d'une seule variable à valeurs dans un espace de Hilbert. En choisissant les fonctions candidates pour la régression dans un espace de Hilbert à noyau reproduisant à valeur opérateur, nous développons, comme dans le cas de fonctions à valeurs scalaires, des outils de régularisation. Nous établissons en particulier des théorèmes de représentation, qui permettent de définir de nouveaux modèles de régression. Nous avons testé l'approche développée sur des données artificielles, des problèmes test ainsi que sur un réseau d'interaction chez la levure et obtenu de très bons résultats. Puis nous l'avons appliquée à la prédiction d'interactions entre protéines dans le cas d'un réseau construit autour de CFTR.The aim of this thesis is to develop tools for predicting interactions between proteins that can be applied to the human proteins forming a network with the CFTR protein. This protein, when defective, is involved in cystic fibrosis. The development of in silico prediction methods can be useful for biologists to suggest new interaction targets. We propose a new method to solve the link prediction problem. To benefit from the information of unlabeled data, we place ourselves in the semi-supervised learning framework. Link prediction is addressed as an output kernel learning task, referred as Output Kernel Regression. An output kernel is assumed to encode the proximities of nodes in the target graph and the goal is to approximate this kernel by using appropriate input features. Using the kernel trick in the output space allows one to reduce the problem of learning from pairs to learning a single variable function with output values in a Hilbert space. By choosing candidates for regression functions in a reproducing kernel Hilbert space with operator valued kernels, we develop tools for regularization as for scalar-valued functions. We establish representer theorems in the supervised and semi-supervised cases and use them to define new regression models for different cost functions. We first tested the developed approach on transductive link prediction using artificial data, benchmark data as well as a protein-protein interaction network of the yeast and we obtained very good results. Then we applied it to the prediction of protein interactions in a network built around the CFTR protein
    • …
    corecore