3,190 research outputs found

    Clustering and Classification Methods for Gene Expression Data Analysis

    Get PDF
    Efficient use of the large data sets generated by gene expression microarray experiments requires computerized data analysis approaches. In this chapter we briefly describe and illustrate two broad families of commonly used data analysis methods: class discovery and class prediction methods. A wide range of alternative approaches for clustering and classification of gene expression data are available. While differences in efficiency do exist, none of the well established approaches is uniformly superior to others. Choosing an approach requires consideration of the goals of the analysis, the background knowledge, and the specific experimental constraints. The quality of an algorithm is important, but is not in itself a guarantee of the quality of a specific data analysis. Uncertainty, sensitivity analysis and, in the case of classifiers, external validation or cross-validation should be used to support the legitimacy of results of microarray data analyses

    MHC polymorphism in a songbird : Fitness, mate choice, and sexual conflict

    Get PDF
    Sex differences in immune responses have been observed across a wide range of animal species, with the generaltendency that males have weaker immune responses than females. These differences are at least partly caused by immune-regulating effects of sex hormones, and have been associated with an increased prevalence of autoimmune disorders in females and with a general tendency for males to be parasitized more often than females. Because of these differences, male and female phenotypes may be regarded as different immunological environments, however it has not previously been investigated whether sex differences in immune responses may lead to sexually antagonistic selection on immune system genes.The first chapter of this thesis presents a literature study of the effects of sex hormones on the strength of immune responses in vertebrates, based on which we propose the hypothesis that sexual selection may drive sexually antagonistic selection on genes associated with the immune system. In the following chapters, we investigated this hypothesis using major histocompatibility complex class I (MHC-I) genes in a species subject to strong sexual selection, a socially polygynous songbird, the great reed warbler Acrocephalus arundinaceus.MHC genes play an important role in vertebrate adaptive immunity where they enable recognition and elimination of pathogens. Due to an ongoing co-evolution between hosts and their pathogens, the MHC genes are among the most variable genes known in vertebrates. It has been hypothesized that hosts benefit from having high MHC diversity, because this confers protection against a wider range of pathogens. Interestingly, we found evidence for a sexual conflict and for sexually differential selection on MHC-I diversity in our great reed warbler study population.It has been predicted that genes associated with disease resistance should be advantageous in terms of sexual selection, and this prediction is central to our hypothesis that sexual selection may drive sexually antagonistic selection on genes associated with the immune system. We therefore investigated whether MHC-I genes were associated with sexual selection in great reed warblers. Our results indicated that MHC-I diversity (i) conveys a ‘good genes’ benefit to females that select older males and males with large song repertoires, and (ii) affects the ability of males to acquire attractive territories. These results confirmed that MHC-I genes are associated with sexual selection, and thereby corroborated our hypothesis that sexual selection may be driving the observed sexual conflict over MHC-I diversity via immune regulating effects of sex hormones.Finally, we investigated the source of MHC-I genotypic variation in great reed warblers by analyzing segregation of MHC-I haplotypes and performing phylogenetic reconstruction on MHC-I alleles. We identified five distinct clades in a phylogenetic tree that indicate the presence of several divergent MHC-I loci in the great reed warbler genome. Analyses of positive selection implied that each putative MHC-I locus may have evolved slightly different functions. Importantly, variation in MHC-I diversity between haplotypes was largely explained by variation within two specific clades, suggesting that the sexual conflict over MHC-I diversity may be caused by sexually antagonistic effects associated with alleles from these clades in particular.Our results suggest that sexually antagonistic selection is an important force in the evolution of vertebrate adaptive immunity, which may be important for a comprehensive, evolutionary understanding of autoimmune diseases and other costs associated with immune responses in vertebrates. The results presented in this thesis invite further studies that investigate the generality of sexually antagonistic selection over immune system genes in other species, as well as more detailed studies of the mechanisms underlying such sexual conflicts

    Chemical Communication in Songbirds

    Get PDF
    Avian chemical communication has been understudied due to the misconception that olfaction is unimportant or even lacking in birds. Early work focused on the olfactory foraging capabilities of seabirds because of their ecology (open ocean foraging) and large olfactory bulbs. In contrast, olfaction in passerine birds, comprising over half of all extant avian taxa, was long overlooked due to their relatively small olfactory bulbs. It is now well established that passerines can smell, and their olfactory acuity is comparable to that of macrosmatic mammals such as rats. Much of our theory on communication and mate choice has involved studying visual and acoustic signals in birds, especially passerines. However, there is mounting evidence that chemical cues are a previously overlooked but important element of avian communication and mate choice. I used gas chromatography to explore sources of variation in song sparrow (Melospiza melodia) preen oil. I then performed behavioural experiments to test whether song sparrows are capable of discriminating among preen oil odour cues. Finally, I explored the hypothesis that major histocompatibility complex (MHC) genotype underlies variation in preen gland microbiota and that this contributes to variation in preen oil chemical composition, providing a potential mechanism for MHC-based mate assessment. Preen oil differed between birds experimentally infected with haemosporidian malaria parasites (Plasmodium sp.) and sham-inoculated controls; between populations, ages, sexes, and breeding versus postbreeding seasons; and with MHC genotype. Song sparrows used preen oil odour to discriminate between the sexes, and to discriminate the MHC similarity and diversity of potential mates. Preen gland microbes differed between populations and sexes, and covaried with MHC genotype but not with preen oil composition. Collectively, my thesis establishes that preen oil is information-rich and that birds use preen oil odour cues in ecologically relevant contexts. I provide some of the first evidence that pathogen exposure alters chemical cues in birds, that birds use odour cues to discriminate the MHC genotype of potential mates, and that MHC genotype is positively correlated with both preen gland microbes and preen oil chemical composition

    PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding

    Full text link
    We are now witnessing significant progress of deep learning methods in a variety of tasks (or datasets) of proteins. However, there is a lack of a standard benchmark to evaluate the performance of different methods, which hinders the progress of deep learning in this field. In this paper, we propose such a benchmark called PEER, a comprehensive and multi-task benchmark for Protein sEquence undERstanding. PEER provides a set of diverse protein understanding tasks including protein function prediction, protein localization prediction, protein structure prediction, protein-protein interaction prediction, and protein-ligand interaction prediction. We evaluate different types of sequence-based methods for each task including traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein language models. In addition, we also investigate the performance of these methods under the multi-task learning setting. Experimental results show that large-scale pre-trained protein language models achieve the best performance for most individual tasks, and jointly training multiple tasks further boosts the performance. The datasets and source codes of this benchmark are all available at https://github.com/DeepGraphLearning/PEER_BenchmarkComment: Accepted by NeurIPS 2022 Dataset and Benchmark Track. arXiv v2: source code released; arXiv v1: release all benchmark result

    A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

    Get PDF
    Background Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. Results The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd=2.7%), 97.6% (sd=2.8%) and 90.8% (sd=5.5%) and average specificities of: 93.6% (sd=4.1%), 99% (sd=2.2%) and 79.4% (sd=9.8%) in 100 independent two-fold cross-validations. Conclusions We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific can be interpreted as a sub-mode and retained for further analysis to identify potential biomarkers. As opposed to standard matrix factorization methods this can be achieved on a sample (experiment)-by-sample basis. Postulating one or more components with indifferent features enables their removal from disease and control specific components on a sample-by-sample basis. This yields selected components with reduced complexity and generally, it increases prediction accuracy

    Identifying a few foot-and-mouth disease virus signature nucleotide strings for computational genotyping

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Serotypes of the Foot-and-Mouth disease viruses (FMDVs) were generally determined by biological experiments. The computational genotyping is not well studied even with the availability of whole viral genomes, due to uneven evolution among genes as well as frequent genetic recombination. Naively using sequence comparison for genotyping is only able to achieve a limited extent of success.</p> <p>Results</p> <p>We used 129 FMDV strains with known serotype as training strains to select as many as 140 most serotype-specific nucleotide strings. We then constructed a linear-kernel Support Vector Machine classifier using these 140 strings. Under the leave-one-out cross validation scheme, this classifier was able to assign correct serotype to 127 of these 129 strains, achieving 98.45% accuracy. It also assigned serotype correctly to an independent test set of 83 other FMDV strains downloaded separately from NCBI GenBank.</p> <p>Conclusion</p> <p>Computational genotyping is much faster and much cheaper than the wet-lab based biological experiments, upon the availability of the detailed molecular sequences. The high accuracy of our proposed method suggests the potential of utilizing a few signature nucleotide strings instead of whole genomes to determine the serotypes of novel FMDV strains.</p

    Bases comportementales, moléculaires et cellulaires gouvernant l'apprentissage ambigu et la mémoire chez la drosophile

    Get PDF
    Extraire les liens prédictifs au sein d'un environnement permet d'appréhender la structure logique du monde. Ceci constitue la base des phénomènes d'apprentissage qui permettent d'établir des liens associatifs entre des évènements de notre entourage. Tout environnement naturel englobe une grande diversité de stimuli composés (i.e. intégrant plusieurs éléments). La façon dont ces stimuli composés sont appréhendés et associés à un renforcement éventuel (i.e. évènement plaisant ou aversif) est un thème fondamental de l'apprentissage associatif. Théoriquement, un stimulus composé AB peut être appris comme la somme de ses composants (A+B), un traitement dit élémentaire, comme un stimulus à part entière (traitement configural, AB=X) ou encore comme une entité comportant à la fois certaines caractéristiques de ses composants ainsi que des propriétés uniques (ou Indice Unique, AB = A+B+u). Ces deux dernières théories permettent notamment d'expliquer la résolution de problèmes ambigus tels que le Negative Patterning (NP) au cours duquel les composants du stimulus AB sont renforcés lorsque présentés seuls mais pas lorsqu'ils sont présentés en tant que composé. Bien que les réseaux neuronaux impliqués dans l'apprentissage associatif élémentaire soient bien connus, les mécanismes permettant la résolution d'apprentissages non élémentaires sont encore peu compris. Dans cette étude, nous démontrons pour la première fois que la Drosophile est capable d'apprentissage non-élémentaire de type NP. L'étude comportementale de la résolution du NP par les mouches montre qu'il passe par la répétition de cycles de conditionnement conduisant à un changement de représentation du mélange AB, s'éloignant peu à peu de la représentation de ses composants A et B. Nous développons ensuite un modèle computationnel à partir de données in vivo sur l'architecture et le fonctionnement des réseaux neuronaux de l'apprentissage olfactif chez la Drosophile, ce qui nous permet de proposer un mécanisme théorique permettant d'expliquer l'apprentissage du NP et dont la validité peut être testée grâce à des outils neurogénétiques. Lors d'un apprentissage de NP, les mouches acquièrent tout d'abord un premier lien associatif entre les composants A et B associés au renforcement, créant par la même occasion une ambiguïté avec leur mélange AB, présenté sans renforcement. Au cours des cycles de conditionnement, les représentations de A et B vis-à-vis de AB sont modulées de façon différentielle, inhibant progressivement la réponse neuronale au stimulus non renforcé tout en renforçant la réponse aux stimuli renforcés. Cette modulation augmente le contraste entre A, B et AB et permet aux drosophiles de résoudre la tâche de NP. Nous identifions les neurones APL (Anterior Paired Lateral) comme implémentation plausible de ce mécanisme, car l'engagement de leur activité inhibitrice spécifiquement durant la présentation de AB est nécessaire pour acquérir le NP sans altérer leurs capacités d'apprentissage dans des tâches non-ambiguës. Nous explorons ensuite l'implication des neurones APL dans un contexte plus général de résolution d'apprentissages ambigus. Pour conclure, notre travail établit la Drosophile comme modèle d'étude d'apprentissage non élémentaire, en proposant une première exploration des réseaux neuronaux sous-jacents à l'aide d'outils uniques à ce modèle. Il ouvre la voie à de nombreux projets dédiés à la compréhension des mécanismes neuronaux permettant aux animaux d'extraire des liens associatifs robustes dans un environnement complexe.Animals' survival heavily relies on their ability to establish causal relationships within their environment. That is made possible through learning experiences during which animals build associative links between the events they are exposed to. Most of the encountered stimuli are actually compounds, the constituents of which may have been reinforced (i.e., associated with a pleasant or unpleasant stimulus) in a different, sometimes opposed way. How compounds are perceived and processed is a central topic in the field of associative learning. In theory, a given compound AB may be learnt as the sum of its components (A+B), which is referred to as "Elemental learning", but it may also be learnt as a distinct stimulus (which Is called "Configural learning"). Finally, AB may bear both constituent-related and compound-specific features called "Unique Cues" (AB = A+B+u). Configural and unique cue processing enable the resolution of ambiguous tasks such as Negative Patterning (NP), during which A and B are reinforced when presented alone but not in a compound AB. Although neural correlates of simple associative learning are well described, those involved in non-elemental learning remain unclear. In this project, we rework a typical olfactory conditioning protocol based on semi-automated olfactory/electric shocks association, allowing us to demonstrate for the first time that Drosophila is able to solve NP tasks. Behavioural study of NP solving shows that its resolution relies on training repetition leading to a gradual change in the compound AB representation, shifting away from its constituents and thus becoming easier to distinguish. Next, we develop a computational model of olfactory associative learning in drosophila based on structural and functional in vivo data. Exploratory simulations of the model allow us to identify a theoretical mechanism enabling NP acquisition, the validity of which can be tested in vivo using neurogenetical tools only available in Drosophila. We propose that during a NP training, flies first acquire associative links between A, B and their reinforcement, which induces an ambiguity as the compound AB is presented without reinforcement. However, over the course of training cycles, non-reinforced stimuli representation is inhibited while the reinforced stimuli representation is consolidated. This differential modulation eventually leads to a shift in odours representation allowing flies to better distinguish between the constituents and their compound thus facilitating NP resolution. We identify APL (Anterior Paired Lateral) neurons as a plausible implementation of this theoretical mechanism, as APL inhibitory activity is specifically engaged during the non-reinforced stimulus presentation, which is necessary for NP acquisition but dispensable for non-ambiguous forms of learning. Lastly, we explore APL role in a broader context of ambiguity resolution. In conclusion, our work validates Drosophila as a robust model to investigate non-elementary learning, and present a promising model of the underlying neural mechanisms using a combination of behaviour, modelling and neurogenetical tools. We believe this opens the way to numerous interesting projects focused on understanding how animals extract robust associations in a complex world
    corecore