403 research outputs found
Benchmarking some Portuguese S&T system research units: 2nd Edition
The increasing use of productivity and impact metrics for evaluation and
comparison, not only of individual researchers but also of institutions,
universities and even countries, has prompted the development of bibliometrics.
Currently, metrics are becoming widely accepted as an easy and balanced way to
assist the peer review and evaluation of scientists and/or research units,
provided they have adequate precision and recall.
This paper presents a benchmarking study of a selected list of representative
Portuguese research units, based on a fairly complete set of parameters:
bibliometric parameters, number of competitive projects and number of PhDs
produced. The study aimed at collecting productivity and impact data from the
selected research units in comparable conditions i.e., using objective metrics
based on public information, retrievable on-line and/or from official sources
and thus verifiable and repeatable. The study has thus focused on the activity
of the 2003-06 period, where such data was available from the latest official
evaluation.
The main advantage of our study was the application of automatic tools,
achieving relevant results at a reduced cost. Moreover, the results over the
selected units suggest that this kind of analyses will be very useful to
benchmark scientific productivity and impact, and assist peer review.Comment: 26 pages, 20 figures F. Couto, D. Faria, B. Tavares, P.
Gon\c{c}alves, and P. Verissimo, Benchmarking some portuguese S\&T system
research units: 2nd edition, DI/FCUL TR 13-03, Department of Informatics,
University of Lisbon, February 201
Disjunctive shared information between ontology concepts: application to Gene Ontology
<p>Abstract</p> <p>Background</p> <p>The large-scale effort in developing, maintaining and making biomedical ontologies available motivates the application of similarity measures to compare ontology concepts or, by extension, the entities described therein. A common approach, known as semantic similarity, compares ontology concepts through the information content they share in the ontology. However, different disjunctive ancestors in the ontology are frequently neglected, or not properly explored, by semantic similarity measures.</p> <p>Results</p> <p>This paper proposes a novel method, dubbed DiShIn, that effectively exploits the multiple inheritance relationships present in many biomedical ontologies. DiShIn calculates the shared information content of two ontology concepts, based on the information content of the disjunctive common ancestors of the concepts being compared. DiShIn identifies these disjunctive ancestors through the number of distinct paths from the concepts to their common ancestors.</p> <p>Conclusions</p> <p>DiShIn was applied to Gene Ontology and its performance was evaluated against state-of-the-art measures using CESSM, a publicly available evaluation platform of protein similarity measures. By modifying the way traditional semantic similarity measures calculate the shared information content, DiShIn was able to obtain a statistically significant higher correlation between semantic and sequence similarity. Moreover, the incorporation of DiShIn in existing applications that exploit multiple inheritance would reduce their execution time.</p
Semantic Similarity in Cheminformatics
Similarity in chemistry has been applied to a variety of problems: to predict biochemical properties of molecules, to disambiguate chemical compound references in natural language, to understand the evolution of metabolic pathways, to predict drug-drug interactions, to predict therapeutic substitution of antibiotics, to estimate whether a compound is harmful, etc. While measures of similarity have been created that make use of the structural properties of the molecules, some ontologies (the Chemical Entities of Biological Interest (ChEBI) being one of the most relevant) capture chemistry knowledge in machine-readable formats and can be used to improve our notions of molecular similarity. Ontologies in the biomedical domain have been extensively used to compare entities of biological interest, a technique known as ontology-based semantic similarity. This has been applied to various biologically relevant entities, such as genes, proteins, diseases, and anatomical structures, as well as in the chemical domain. This chapter introduces the fundamental concepts of ontology-based semantic similarity, its application in cheminformatics, its relevance in previous studies, and future potential. It also discusses the existing challenges in this area, tracing a parallel with other domains, particularly genomics, where this technique has been used more often and for longer
Complex associations between genetic variants and clinical profiles in autism spectrum disorder patients: an integrative systems biology approach
A complexidade genética e clínica que caracterizam a per turbação do espetro
do autismo (PEA) têm limitado o desenvolvimento de biomarcadores que
permitam um diagnóstico precoce e um prognóstico fiável, assim como uma
abordagem personalizada para a inter venção terapêutica. Neste estudo
pretendeu-se desenvolver uma abordagem integrativa para predição da
apresentação clínica baseada em informação de variantes genéticas (Copy
Number Variants, CNVs), com aplicação clínica no diagnóstico e prognóstico
na PEA. Para tal, técnicas de aprendizagem automática (machine learning)
foram aplicadas a dados clínicos e genéticos de 2446 doentes com PEA,
recrutados no âmbito do consórcio Autism Genome Project. Análise de
clustering de dados clínicos multidimensionais definiu, nesta população,
dois subgrupos de pacientes com per fis clínicos diferindo significativamente
em termos de capacidade verbal, nível cognitivo, gravidade da doença e
compor tamento adaptativo. A análise dos CNVs que afetam especificamente
genes do cérebro, nos mesmos indivíduos, identificou 15 processos biológicos
enriquecidos em genes alterados. A aplicação de um algoritmo de
machine learning para classificação dos doentes com apresentação clínica
mais disfuncional, com base nos processos biológicos alterados, mostrou
que correlações entre fenótipo clínico e biologia subjacente são possíveis
na PEA e que, para grupos populacionais com dados informativos, existe
um poder preditivo razoável. Para implementação deste conceito na prática
clínica serão necessários estudos mais alargados com dados clínicos e
genómicos mais completos.The genetic and clinical complexity that characterize Autism Spectrum Disorder
(ASD) has hindered the development of biomarkers for early diagnosis
and reliable prognosis, as well as a personalized to therapeutic inter vention.
This study aimed to develop an integrative approach for clinical presentation
prediction based on Copy Number Variants (CNVs), with clinical application
for diagnosis and prognosis of ASD. For this purpose, machine learning techniques
were applied to a dataset of 2446 patients with ASD, recruited by the
Autism Genome Project. Clustering analysis of multidimensional clinical data
allowed the definition of two patient subgroups in this population, with clinical
profiles dif fering significantly in verbal ability, cognitive level, disease severity
and adaptive behavior. In the same subjects, analysis of CNVs specifically
af fecting brain-expressed genes identified 15 biological processes enriched
for the disrupted genes. A machine learning algorithm was trained and tested
to classif y patients with more dysfunctional clinical presentation based on
altered biological processes. The results showed that correlations between
clinical phenotype and underlying biology can be established in ASD and that,
for datasets with suf ficiently informative data, there is a reasonable predictive
power. Fur ther studies with more complete clinical and genomic data are
needed to implement this concept in clinical practice.info:eu-repo/semantics/publishedVersio
A Silver Standard Corpus of Human Phenotype-Gene Relations
Human phenotype-gene relations are fundamental to fully understand the origin
of some phenotypic abnormalities and their associated diseases. Biomedical
literature is the most comprehensive source of these relations, however, we
need Relation Extraction tools to automatically recognize them. Most of these
tools require an annotated corpus and to the best of our knowledge, there is no
corpus available annotated with human phenotype-gene relations. This paper
presents the Phenotype-Gene Relations (PGR) corpus, a silver standard corpus of
human phenotype and gene annotations and their relations. The corpus consists
of 1712 abstracts, 5676 human phenotype annotations, 13835 gene annotations,
and 4283 relations. We generated this corpus using Named-Entity Recognition
tools, whose results were partially evaluated by eight curators, obtaining a
precision of 87.01%. By using the corpus we were able to obtain promising
results with two state-of-the-art deep learning tools, namely 78.05% of
precision. The PGR corpus was made publicly available to the research
community.Comment: NAACL 201
Exploiting disjointness axioms to improve semantic similarity measures
Motivation: Representing domain knowledge in biology has traditionally been accomplished by creating simple hierarchies of classes with textual annotations. Recently, expressive ontology languages, such as Web Ontology Language, have become more widely adopted, supporting axioms that express logical relationships other than class-subclass, e.g. disjointness. This is improving the coverage and validity of the knowledge contained in biological ontologies. However, current semantic tools still need to adapt to this more expressive information. In this article, we propose a method to integrate disjointness axioms, which are being incorporated in real-world ontologies, such as the Gene Ontology and the chemical entities of biological interest ontology, into semantic similarity, the measure that estimates the closeness in meaning between classes. Results: We present a modification of the measure of shared information content, which extends the base measure to allow the incorporation of disjointness information. To evaluate our approach, we applied it to several randomly selected datasets extracted from the chemical entities of biological interest ontology. In 93.8% of these datasets, our measure performed better than the base measure of shared information content. This supports the idea that semantic similarity is more accurate if it extends beyond the hierarchy of classes of the ontology. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
- …