4,085 research outputs found

    Infectious Disease Ontology

    Get PDF
    Technological developments have resulted in tremendous increases in the volume and diversity of the data and information that must be processed in the course of biomedical and clinical research and practice. Researchers are at the same time under ever greater pressure to share data and to take steps to ensure that data resources are interoperable. The use of ontologies to annotate data has proven successful in supporting these goals and in providing new possibilities for the automated processing of data and information. In this chapter, we describe different types of vocabulary resources and emphasize those features of formal ontologies that make them most useful for computational applications. We describe current uses of ontologies and discuss future goals for ontology-based computing, focusing on its use in the field of infectious diseases. We review the largest and most widely used vocabulary resources relevant to the study of infectious diseases and conclude with a description of the Infectious Disease Ontology (IDO) suite of interoperable ontology modules that together cover the entire infectious disease domain

    Concept-based query expansion for retrieving gene related publications from MEDLINE

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Advances in biotechnology and in high-throughput methods for gene analysis have contributed to an exponential increase in the number of scientific publications in these fields of study. While much of the data and results described in these articles are entered and annotated in the various existing biomedical databases, the scientific literature is still the major source of information. There is, therefore, a growing need for text mining and information retrieval tools to help researchers find the relevant articles for their study. To tackle this, several tools have been proposed to provide alternative solutions for specific user requests.</p> <p>Results</p> <p>This paper presents QuExT, a new PubMed-based document retrieval and prioritization tool that, from a given list of genes, searches for the most relevant results from the literature. QuExT follows a concept-oriented query expansion methodology to find documents containing concepts related to the genes in the user input, such as protein and pathway names. The retrieved documents are ranked according to user-definable weights assigned to each concept class. By changing these weights, users can modify the ranking of the results in order to focus on documents dealing with a specific concept. The method's performance was evaluated using data from the 2004 TREC genomics track, producing a mean average precision of 0.425, with an average of 4.8 and 31.3 relevant documents within the top 10 and 100 retrieved abstracts, respectively.</p> <p>Conclusions</p> <p>QuExT implements a concept-based query expansion scheme that leverages gene-related information available on a variety of biological resources. The main advantage of the system is to give the user control over the ranking of the results by means of a simple weighting scheme. Using this approach, researchers can effortlessly explore the literature regarding a group of genes and focus on the different aspects relating to these genes.</p

    Combining global and local semantic contexts for improving biomedical information retrieval

    Get PDF
    Présenté lors de l'European Conference on Information Retrieval 2011International audienceIn the context of biomedical information retrieval (IR), this paper explores the relationship between the document's global context and the query's local context in an attempt to overcome the term mismatch problem between the user query and documents in the collection. Most solutions to this problem have been focused on expanding the query by discovering its context, either \textit{global} or \textit{local}. In a global strategy, all documents in the collection are used to examine word occurrences and relationships in the corpus as a whole, and use this information to expand the original query. In a local strategy, the top-ranked documents retrieved for a given query are examined to determine terms for query expansion. We propose to combine the document's global context and the query's local context in an attempt to increase the term overlap between the user query and documents in the collection via document expansion (DE) and query expansion (QE). The DE technique is based on a statistical method (IR-based) to extract the most appropriate concepts (global context) from each document. The QE technique is based on a blind feedback approach using the top-ranked documents (local context) obtained in the first retrieval stage. A comparative experiment on the TREC 2004 Genomics collection demonstrates that the combination of the document's global context and the query's local context shows a significant improvement over the baseline. The MAP is significantly raised from 0.4097 to 0.4532 with a significant improvement rate of +10.62\% over the baseline. The IR performance of the combined method in terms of MAP is also superior to official runs participated in TREC 2004 Genomics and is comparable to the performance of the best run (0.4075)

    TDR Targets 6: driving drug discovery for human pathogens through intensive chemogenomic data integration

    Get PDF
    The volume of biological, chemical and functional data deposited in the public domain is growing rapidly, thanks to next generation sequencing and highly-automated screening technologies. These datasets represent invaluable resources for drug discovery, particularly for less studied neglected disease pathogens. To leverage these datasets, smart and intensive data integration is required to guide computational inferences across diverse organisms. The TDR Targets chemogenomics resource integrates genomic data from human pathogens and model organisms along with information on bioactive compounds and their annotated activities. This report highlights the latest updates on the available data and functionality in TDR Targets 6. Based on chemogenomic network models providing links between inhibitors and targets, the database now incorporates network-driven target prioritizations, and novel visualizations of network subgraphs displaying chemical- and target-similarity neighborhoods along with associated target-compound bioactivity links. Available data can be browsed and queried through a new user interface, that allow users to perform prioritizations of protein targets and chemical inhibitors. As such, TDR Targets now facilitates the investigation of drug repurposing against pathogen targets, which can potentially help in identifying candidate targets for bioactive compounds with previously unknown targets.Fil: Urán Landaburu, Héctor Lionel. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Biotecnológicas; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; ArgentinaFil: Berenstein, Ariel José. Fundación Instituto Leloir; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Videla, Santiago. Fundación Instituto Leloir; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Maru, Parag. National Chemical Laboratory; India. Academy of Scientific and Innovative Research; IndiaFil: Shanmugam, Dhanasekaran. Academy of Scientific and Innovative Research; India. National Chemical Laboratory; IndiaFil: Chernomoretz, Ariel. Fundación Instituto Leloir; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; ArgentinaFil: Agüero, Fernan Gonzalo. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Biotecnológicas; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Biológicas y Tecnológicas. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto de Investigaciones Biológicas y Tecnológicas; Argentin

    A cooperative framework for molecular biology database integration using image object selection

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: diversity of molecular biology databases schemas, schema constructs and schema implementation multi-database query using image object keying, database integration technologies using context graph, automated navigation among these databases. This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

    Analysis of biomedical and health queries: Lessons learned from TREC and CLEF evaluation benchmarks

    Get PDF
    International audienceBACKGROUND:Inherited ichthyoses represent a group of rare skin disorders characterized by scaling, hyperkeratosis and inconstant erythema, involving most of the tegument. Epidemiology remains poorly described. This study aims to evaluate the prevalence of inherited ichthyosis (excluding very mild forms) and its different clinical forms in France.METHODS:Capture - recapture method was used for this study. According to statistical requirements, 3 different lists (reference/competence centres, French association of patients with ichthyosis and internet network) were used to record such patients. The study was conducted in 5 areas during a closed period.RESULTS:The prevalence was estimated at 13.3 per million people (/M) (CI95\%, [10.9 - 17.6]). With regard to autosomal recessive congenital ichthyosis, the prevalence was estimated at 7/M (CI 95\% [5.7 - 9.2]), with a prevalence of lamellar ichthyosis and congenital ichthyosiform erythroderma of 4.5/M (CI 95\% [3.7 - 5.9]) and 1.9/M (CI 95\% [1.6 - 2.6]), respectively. Prevalence of keratinopathic forms was estimated at 1.1/M (CI 95\% [0.9 - 1.5]). Prevalence of syndromic forms (all clinical forms together) was estimated at 1.9/M (CI 95\% [1.6 - 2.6]).CONCLUSIONS:Our results constitute a crucial basis to properly size the necessary health measures that are required to improve patient care and design further clinical studies

    A Bioinformatics Approach for Evaluating Evolutionary Convergence of Gene Family Size in Hematophagous Insects

    Get PDF
    The act of blood-feeding can be nutritionally rewarding for blood-feeding arthropods. However, blood digestion can release pro-oxidant molecules such as heme and iron at potentially harmful levels. If left uncontrolled, this heme/iron can cause oxidative damage and eventually cell death. This has led to the evolution of various adaptations that protect blood-feeding arthropods against iron- and heme-associated damage. Here I postulate that the signature of this adaptation can be observed in patterns of gene family size. To test this hypothesis, I explore convergent evolutionary expansions and contractions of gene families in distinct lineages of hematophagous insects. Specifically, I compare the gene content present in available genomes from blood- feeding and non-blood feeding arthropods (including outgroup taxa in the Lepidoptera [moths & butterflies]), to identify possible changes in gene family size in the blood-feeding taxa. Of the 206 heme/iron-associated genes identified from the model insect, Drosophila melanogaster, five were overrepresented (potentially duplicated) in the blood-feeding taxa: spook (cyp307A1), spookier (cyp307A2), cytochrome P450 12e1 (cyp12e1), hormone receptor and 51 (Hr51), NADH dehydrogenase (ubiquinone) B16.6 subunit (ND-B16.6), and seven were underrepresented (potentially lost). However, when only Dipteran (fly) and Siphonaptera (flea) genomes were included in the analysis, just one iron gene and one heme gene (NADH dehydrogenase (ubiquinone) PDSW subunit (ND-PDSW) were overrepresented in the blood- feeding taxa. Interestingly, the expanded cytochrome genes are known detoxifiers of many compounds, including heme and iron. More broadly, the analytical approach I employee here could be used to evaluate functional convergence for other phenotypic traits, conditional on the availability of annotated genomic data

    A cooperative framework for molecular biology database integration using image object selection.

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration’ and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: - diversity of molecular biology databases schemas, schema constructs and schema implementation -multi-database query using image object keying -database integration technologies using context graph - automated navigation among these databases This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This/involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work
    • …
    corecore