152 research outputs found
Gaussianisation for fast and accurate inference from cosmological data
We present a method to transform multivariate unimodal non-Gaussian posterior
probability densities into approximately Gaussian ones via non-linear mappings,
such as Box--Cox transformations and generalisations thereof. This permits an
analytical reconstruction of the posterior from a point sample, like a Markov
chain, and simplifies the subsequent joint analysis with other experiments.
This way, a multivariate posterior density can be reported efficiently, by
compressing the information contained in MCMC samples. Further, the model
evidence integral (i.e. the marginal likelihood) can be computed analytically.
This method is analogous to the search for normal parameters in the cosmic
microwave background, but is more general. The search for the optimally
Gaussianising transformation is performed computationally through a
maximum-likelihood formalism; its quality can be judged by how well the
credible regions of the posterior are reproduced. We demonstrate that our
method outperforms kernel density estimates in this objective. Further, we
select marginal posterior samples from Planck data with several distinct
strongly non-Gaussian features, and verify the reproduction of the marginal
contours. To demonstrate evidence computation, we Gaussianise the joint
distribution of data from weak lensing and baryon acoustic oscillations (BAO),
for different cosmological models, and find a preference for flat CDM.
Comparing to values computed with the Savage-Dickey density ratio, and
Population Monte Carlo, we find good agreement of our method within the spread
of the other two.Comment: 14 pages, 9 figure
Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology
Researchers use animal studies to better understand human diseases. In recent years, large-scale phenotype studies such as Phenoscape and EuroPhenome have been initiated to identify genetic causes of a species' phenome. Species-specific phenotype ontologies are required to capture and report about all findings and to automatically infer results relevant to human diseases. The integration of the different phenotype ontologies into a coherent framework is necessary to achieve interoperability for cross-species research. Here, we investigate the quality and completeness of two different methods to align the Human Phenotype Ontology and the Mammalian Phenotype Ontology. The first method combines lexical matching with inference over the ontologies' taxonomic structures, while the second method uses a mapping algorithm based on the formal definitions of the ontologies. Neither method could map all concepts. Despite the formal definitions method provides mappings for more concepts than does the lexical matching method, it does not outperform the lexical matching in a biological use case. Our results suggest that combining both approaches will yield a better mappings in terms of completeness, specificity and application purposes
Improving Disease Gene Prioritization by Comparing the Semantic Similarity of Phenotypes in Mice with Those of Human Diseases
Despite considerable progress in understanding the molecular origins of hereditary human diseases, the molecular basis of several thousand genetic diseases still remains unknown. High-throughput phenotype studies are underway to systematically assess the phenotype outcome of targeted mutations in model organisms. Thus, comparing the similarity between experimentally identified phenotypes and the phenotypes associated with human diseases can be used to suggest causal genes underlying a disease. In this manuscript, we present a method for disease gene prioritization based on comparing phenotypes of mouse models with those of human diseases. For this purpose, either human disease phenotypes are “translated” into a mouse-based representation (using the Mammalian Phenotype Ontology), or mouse phenotypes are “translated” into a human-based representation (using the Human Phenotype Ontology). We apply a measure of semantic similarity and rank experimentally identified phenotypes in mice with respect to their phenotypic similarity to human diseases. Our method is evaluated on manually curated and experimentally verified gene–disease associations for human and for mouse. We evaluate our approach using a Receiver Operating Characteristic (ROC) analysis and obtain an area under the ROC curve of up to . Furthermore, we are able to confirm previous results that the Vax1 gene is involved in Septo-Optic Dysplasia and suggest Gdf6 and Marcks as further potential candidates. Our method significantly outperforms previous phenotype-based approaches of prioritizing gene–disease associations. To enable the adaption of our method to the analysis of other phenotype data, our software and prioritization results are freely available under a BSD licence at http://code.google.com/p/phenomeblast/wiki/CAMP. Furthermore, our method has been integrated in PhenomeNET and the results can be explored using the PhenomeBrowser at http://phenomebrowser.net
Relations as patterns: bridging the gap between OBO and OWL.
BACKGROUND: Most biomedical ontologies are represented in the OBO Flatfile Format, which is an easy-to-use graph-based ontology language. The semantics of the OBO Flatfile Format 1.2 enforces a strict predetermined interpretation of relationship statements between classes. It does not allow flexible specifications that provide better approximations of the intuitive understanding of the considered relations. If relations cannot be accurately expressed then ontologies built upon them may contain false assertions and hence lead to false inferences. Ontologies in the OBO Foundry must formalize the semantics of relations according to the OBO Relationship Ontology (RO). Therefore, being able to accurately express the intended meaning of relations is of crucial importance. Since the Web Ontology Language (OWL) is an expressive language with a formal semantics, it is suitable to de ne the meaning of relations accurately. RESULTS: We developed a method to provide definition patterns for relations between classes using OWL and describe a novel implementation of the RO based on this method. We implemented our extension in software that converts ontologies in the OBO Flatfile Format to OWL, and also provide a prototype to extract relational patterns from OWL ontologies using automated reasoning. The conversion software is freely available at http://bioonto.de/obo2owl, and can be accessed via a web interface. CONCLUSIONS: Explicitly defining relations permits their use in reasoning software and leads to a more flexible and powerful way of representing biomedical ontologies. Using the extended langua0067e and semantics avoids several mistakes commonly made in formalizing biomedical ontologies, and can be used to automatically detect inconsistencies. The use of our method enables the use of graph-based ontologies in OWL, and makes complex OWL ontologies accessible in a graph-based form. Thereby, our method provides the means to gradually move the representation of biomedical ontologies into formal knowledge representation languages that incorporates an explicit semantics. Our method facilitates the use of OWL-based software in the back-end while ontology curators may continue to develop ontologies with an OBO-style front-end
Ontology design patterns to disambiguate relations between genes and gene products in GENIA
MOTIVATION: Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. RESULTS: We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. AVAILABILITY: Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/
Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning
Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies
Evaluation and cross-comparison of lexical entities of biological interest (LexEBI)
MOTIVATION:
Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical "term space" (the "Lexeome"), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness).
RESULT:
This study compiles a resource for lexical terms of biomedical interest in a standard format (called "LexEBI"), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions.
CONCLUSION:
LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources
Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources
Motivation The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features.Ideally all three resources, i.e.~corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them.Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other.We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance.In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs. RESULTS: In general, machine learning approaches (ML-Tag) for PGN tagging show higher F1-measureperformance against the BioCreative-II and Jnlpba GSCs (exact matching), whereas the lexicon basedapproaches (LexTag) in combination with disambiguation methods show better results on FsuPrgeand PennBio. The ML-Tag solutions balance precision and recall, whereas the LexTag solutions havedifferent precision and recall profiles at the same F1-measure across all corpora. Higher recall isachieved with larger lexical resources, which also introduce more noise (false positive results). TheML-Tag solutions certainly perform best, if the test corpus is from the same GSC as the trainingcorpus. As expected, the false negative errors characterize the test corpora and - on the other hand- the profiles of the false positive mistakes characterize the tagging solutions. Lex-Tag solutions thatare based on a large terminological resource in combination with false positive filtering produce betterresults, which, in addition, provide concept identifiers from a knowledge source in contrast to ML-Tagsolutions. CONCLUSION: The standard ML-Tag solutions achieve high performance, but not across all corpora, and thus shouldbe trained using several different corpora to reduce possible biases. The LexTag solutions havedifferent profiles for their precision and recall performance, but with similar F1-measure. This resultis surprising and suggests that they cover a portion of the most common naming standards, but copedifferently with the term variability across the corpora. The false positive filtering applied to LexTagsolutions does improve the results by increasing their precision without compromising significantlytheir recall. The harmonisation of the annotation schemes in combination with standardized lexicalresources in the tagging solutions will enable their comparability and will pave the way for a sharedstandard
Home-buyer Sentiment and Hurricane Landfalls
The researchers looked at how hurricanes impact real estate markets and home-buyer sentiment. Sentiment is related to the perception of risk by investors in the securities markets, but is not quantifiable, so the researchers looked at developing proxies. They used three proxies to determine the most meaningful one, which included the spread between listing and selling prices, the average days of a house on the market, and the number of single-family houses sold per month. They looked at homeowner sentiment from 1995 to 2002 in the Cape Fear region and the impact of Hurricanes Fran, Bonnie, and Floyd on the market. When they looked at the prices and days on the market, they found that after Bonnie there was not a difference in sentiment. Then after Fran there was some difference. Then after Floyd, more difference. The proxy impacted most was the days a home was on the market. The researchers concluded that the market suffers after successive hurricane landfalls, but that sentiment recovers a year or more after the hurricane
- …