446 research outputs found

    Phenotyping in the era of genomics: MaTrics—a digital character matrix to document mammalian phenotypic traits

    Get PDF
    A new and uniquely structured matrix of mammalian phenotypes, MaTrics (Mammalian Traits for Comparative Genomics) in a digital form is presented. By focussing on mammalian species for which genome assemblies are available, MaTrics provides an interface between mammalogy and comparative genomics. MaTrics was developed within a project aimed to find genetic causes of phenotypic traits of mammals using Forward Genomics. This approach requires genomes and comprehensive and recorded information on homologous phenotypes that are coded as discrete categories in a matrix. MaTrics is an evolving online resource providing information on phenotypic traits in numeric code; traits are coded either as absent/present or with several states as multistate. The state record for each species is linked to at least one reference (e.g., literature, photographs, histological sections, CT scans, or museum specimens) and so MaTrics contributes to digitalization of museum collections. Currently, MaTrics covers 147 mammalian species and includes 231 characters related to structure, morphology, physiology, ecology, and ethology and available in a machine actionable NEXUS-format*. Filling MaTrics revealed substantial knowledge gaps, highlighting the need for phenotyping efforts. Studies based on selected data from MaTrics and using Forward Genomics identified associations between genes and certain phenotypes ranging from lifestyles (e.g., aquatic) to dietary specializations (e.g., herbivory, carnivory). These findings motivate the expansion of phenotyping in MaTrics by filling research gaps and by adding taxa and traits. Only databases like MaTrics will provide machine actionable information on phenotypic traits, an important limitation to genomics. MaTrics is available within the data repository Morph·D·Base (www.morphdbase.de)

    Molecular neuroanatomy: mouse-human homologies and the landscape of genes implicated in language disorders

    Get PDF
    The distinctiveness of brain structures and circuits depends on interacting gene products, yet the organization of these molecules (the "transcriptome") within and across brain areas remains unclear. High-throughput, neuroanatomically-specific gene expression datasets such as the Allen Human Brain Atlas (AHBA) and Allen Mouse Brain Atlas (AMBA) have recently become available, providing unprecedented opportunities to quantify molecular neuroanatomy. This dissertation seeks to clarify how transcriptomic organization relates to conventional neuroanatomy within and across species, and to introduce the use of gene expression data as a bridge between genotype and phenotype in complex behavioral disorders. The first part of this work examines large-scale, regional transcriptomic organization separately in the mouse and human brain. The use of dimensionality reduction methods and cross-sample correlations both revealed greater similarity between samples drawn from the same brain region. Sample profiles and differentially expressed genes across regions in the human brain also showed consistent anatomical specificity in a second human dataset with distinct sampling properties. The frequent use of mouse models in clinical research points to the importance of comparing molecular neuroanatomical organization across species. The second part of this dissertation describes three comparative approaches. First, at genome scale, expression profiles within homologous brain regions tended to show higher similarity than those from non-homologous regions, with substantial variability across regions. Second, gene subsets (defined using co-expression relationships or shared annotations), which provide region-specific, cross-species molecular signatures were identified. Finally, brain-wide expression patterns of orthologous genes were compared. Neuron and oligodendrocyte markers were more correlated than expected by chance, while astrocyte markers were less so. The localization and co-expression of genes reflect functional relationships that may underlie high-level functions. The final part of this dissertation describes a database of genes that have been implicated in speech and language disorders, and identifies brain regions where they are preferentially expressed or co-expressed. Several brain structures with functions relevant to four speech and language disorders showed co-expression of genes associated with these disorders. In particular, genes associated with persistent developmental stuttering showed stronger preferential co-expression in the basal ganglia, a structure of known importance in this disorder

    Contextual Analysis of Large-Scale Biomedical Associations for the Elucidation and Prioritization of Genes and their Roles in Complex Disease

    Get PDF
    Vast amounts of biomedical associations are easily accessible in public resources, spanning gene-disease associations, tissue-specific gene expression, gene function and pathway annotations, and many other data types. Despite this mass of data, information most relevant to the study of a particular disease remains loosely coupled and difficult to incorporate into ongoing research. Current public databases are difficult to navigate and do not interoperate well due to the plethora of interfaces and varying biomedical concept identifiers used. Because no coherent display of data within a specific problem domain is available, finding the latent relationships associated with a disease of interest is impractical. This research describes a method for extracting the contextual relationships embedded within associations relevant to a disease of interest. After applying the method to a small test data set, a large-scale integrated association network is constructed for application of a network propagation technique that helps uncover more distant latent relationships. Together these methods are adept at uncovering highly relevant relationships without any a priori knowledge of the disease of interest. The combined contextual search and relevance methods power a tool which makes pertinent biomedical associations easier to find, easier to assimilate into ongoing work, and more prominent than currently available databases. Increasing the accessibility of current information is an important component to understanding high-throughput experimental results and surviving the data deluge

    A Theory of Conceptual Advance: Explaining Conceptual Change in Evolutionary, Molecular, and Evolutionary Developmental Biology

    Get PDF
    The theory of concepts advanced in the dissertation aims at accounting for a) how a concept makes successful practice possible, and b) how a scientific concept can be subject to rational change in the course of history. Traditional accounts in the philosophy of science have usually studied concepts in terms only of their reference; their concern is to establish a stability of reference in order to address the incommensurability problem. My discussion, in contrast, suggests that each scientific concept consists of three components of content: 1) reference, 2) inferential role, and 3) the epistemic goal pursued with the concept's use. I argue that in the course of history a concept can change in any of these three components, and that change in one component—including change of reference—can be accounted for as being rational relative to other components, in particular a concept's epistemic goal.This semantic framework is applied to two cases from the history of biology: the homology concept as used in 19th and 20th century biology, and the gene concept as used in different parts of the 20th century. The homology case study argues that the advent of Darwinian evolutionary theory, despite introducing a new definition of homology, did not bring about a new homology concept (distinct from the pre-Darwinian concept) in the 19th century. Nowadays, however, distinct homology concepts are used in systematics/evolutionary biology, in evolutionary developmental biology, and in molecular biology. The emergence of these different homology concepts is explained as occurring in a rational fashion. The gene case study argues that conceptual progress occurred with the transition from the classical to the molecular gene concept, despite a change in reference. In the last two decades, change occurred internal to the molecular gene concept, so that nowadays this concept's usage and reference varies from context to context. I argue that this situation emerged rationally and that the current variation in usage and reference is conducive to biological practice.The dissertation uses ideas and methodological tools from the philosophy of mind and language, the philosophy of science, the history of science, and the psychology of concepts

    A benchmark for biomedical knowledge graph based similarity

    Get PDF
    Tese de mestrado em Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2020Os grafos de conhecimento biomédicos são cruciais para sustentar aplicações em grandes quantidades de dados nas ciências da vida e saúde. Uma das aplicações mais comuns dos grafos de conhecimento nas ciências da vida é o apoio à comparação de entidades no grafo por meio das suas descrições ontológicas. Estas descrições suportam o cálculo da semelhança semântica entre duas entidades, e encontrar as suas semelhanças e diferenças é uma técnica fundamental para diversas aplicações, desde a previsão de interações proteína-proteína até à descoberta de associações entre doenças e genes, a previsão da localização celular de proteínas, entre outros. Na última década, houve um esforço considerável no desenvolvimento de medidas de semelhança semântica para grafos de conhecimento biomédico mas, até agora, a investigação nessa área tem-se concentrado na comparação de conjuntos de entidades relativamente pequenos. Dada a diversa gama de aplicações para medidas de semelhança semântica, é essencial apoiar a avaliação em grande escala destas medidas. No entanto, fazê-lo não é trivial, uma vez que não há um padrão ouro para a semelhança de entidades biológicas. Uma solução possível é comparar estas medidas com outras medidas ou proxies de semelhança. As entidades biológicas podem ser comparadas através de diferentes ângulos, por exemplo, a semelhança de sequência e estrutural de duas proteínas ou as vias metabólicas afetadas por duas doenças. Estas medidas estão relacionadas com as características relevantes das entidades, portanto podem ajudar a compreender como é que as abordagens de semelhança semântica capturam a semelhança das entidades. O objetivo deste trabalho é desenvolver um benchmark, composto por data sets e métodos de avaliação automatizados. Este benchmark deve sustentar a avaliação em grande escala de medidas de semelhança semântica para entidades biológicas, com base na sua correlação com diferentes propriedades das entidades. Para atingir este objetivo, uma metodologia para o desenvolvimento de data sets de referência para semelhança semântica foi desenvolvida e aplicada a dois grafos de conhecimento: proteínas anotadas com a Gene Ontology e genes anotados com a Human Phenotype Ontology. Este benchmark explora proxies de semelhança com base na semelhança de sequência, função molecular e interações de proteínas e semelhança de genes baseada em fenótipos, e fornece cálculos de semelhança semântica com medidas representativas do estado da arte, para uma avaliação comparativa. Isto resultou num benchmark composto por uma coleção de 21 data sets de referência com tamanhos variados, cobrindo quatro espécies e diferentes níveis de anotação das entidades, e técnicas de avaliação ajustadas aos data sets.Biomedical knowledge graphs are crucial to support data intensive applications in the life sciences and healthcare. One of the most common applications of knowledge graphs in the life sciences is to support the comparison of entities in the graph through their ontological descriptions. These descriptions support the calculation of semantic similarity between two entities, and finding their similarities and differences is a cornerstone technique for several applications, ranging from prediction of protein-protein interactions to the discovering of associations between diseases and genes, the prediction of cellular localization of proteins, among others. In the last decade there has been a considerable effort in developing semantic similarity measures for biomedical knowledge graphs, but the research in this area has so far focused on the comparison of relatively small sets of entities. Given the wide range of applications for semantic similarity measures, it is essential to support the large-scale evaluation of these measures. However, this is not trivial since there is no gold standard for biological entity similarity. One possible solution is to compare these measures to other measures or proxies of similarity. Biological entities can be compared through different lenses, for instance the sequence and structural similarity of two proteins or the metabolic pathways affected by two diseases. These measures relate to relevant characteristics of the underlying entities, so they can help to understand how well semantic similarity approaches capture entity similarity. The goal of this work is to develop a benchmark for semantic similarity measures, composed of data sets and automated evaluation methods. This benchmark should support the large-scale evaluation of semantic similarity measures for biomedical entities, based on their correlation to different properties of biological entities. To achieve this goal, a methodology for the development of benchmark data sets for semantic similarity was developed and applied to two knowledge graphs: proteins annotated with the Gene Ontology and genes annotated with the Human Phenotype Ontology. This benchmark explores proxies of similarity calculated based on protein sequence similarity, protein molecular function similarity, protein-protein interactions and phenotype-based gene similarity, and provides semantic similarity computations with state-of-the-art representative measures, for a comparative evaluation of the measures. This resulted in a benchmark made up of a collection of 21 benchmark data sets with varying sizes, covering four different species at different levels of annotation completion and evaluation techniques fitted to the data sets characteristics

    What Could Cognition Be, If Not Human Cognition?: Individuating cognitive abilities in the light of evolution

    Get PDF
    I argue that an explicit distinction between cognitive characters and cognitive phenotypes is needed for empirical progress in the cognitive sciences and their integration with evolution-guided sciences. I elaborate what ontological commitment to characters involves and how such a commitment would clarify ongoing debates about the relations between human and nonhuman cognition and the extent of cognitive abilities across biological species. I use theoretical proposals in episodic memory, language, and sociocultural bases of cognition to illustrate how cognitive characters are being introduced in scientific practice

    What Could Cognition Be, If Not Human Cognition?: Individuating cognitive abilities in the light of evolution

    Get PDF
    I argue that an explicit distinction between cognitive characters and cognitive phenotypes is needed for empirical progress in the cognitive sciences and their integration with evolution-guided sciences. I elaborate what ontological commitment to characters involves and how such a commitment would clarify ongoing debates about the relations between human and nonhuman cognition and the extent of cognitive abilities across biological species. I use theoretical proposals in episodic memory, language, and sociocultural bases of cognition to illustrate how cognitive characters are being introduced in scientific practice

    A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several computational candidate gene selection and prioritization methods have recently been developed. These <it>in silico </it>selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known.</p> <p>Results</p> <p>The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (<url>http://main.g2.bx.psu.edu/</url>). Nine genes (<it>APLN</it>, <it>ZC4H2</it>, <it>MAGED4</it>, <it>MAGED4B</it>, <it>RAP2C</it>, <it>FAM156A</it>, <it>FAM156B</it>, <it>TBL1X</it>, and <it>UXT</it>) were highlighted as highly-ranked XLMR methods.</p> <p>Conclusions</p> <p>The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR.</p> <p><it>Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi).</it></p

    Multidisciplinary approaches in evolutionary linguistics

    Get PDF
    Studying language evolution has become resurgent in modern scientific research. In this revival field, approaches from a number of disciplines other than linguistics, including (paleo)anthropology and archaeology, animal behaviors, genetics, neuroscience, computer simulation, and psychological experimentation, have been adopted, and a wide scope of topics have been examined in one way or another, covering not only world languages, but also human behaviors, brains and cultural products, as well as nonhuman primates and other species remote to humans. In this paper, together with a survey of recent findings based on these many approaches, we evaluate how this multidisciplinary perspective yields important insights into a comprehensive understanding of language, its evolution, and human cognition.postprin
    corecore