Search CORE

1,514 research outputs found

Gene prioritization and clustering by multi-view text mining

Author: De Moor Bart
Moreau Yves
Tranchevent Leon-Charles
Yu Shi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The impact of sequence database choice on metaproteomic results in gut microbiota studies

Author: Addis Maria Filippa
Deligios Massimo
Fraumene Cristina
Manghina Valeria
Martens Lennart
Muth Thilo
Pagnozzi Daniela
Palomba Antonio
Rapp Erdmann
Tanca Alessandro
Uzzau Sergio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification. Results: Here, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a "merged" database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields. Conclusions: This study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources

AIR Universita degli studi di Milano

Ghent University Academic Bibliography

PubMed Central

MPG.PuRe

Exploration of User Groups in VEXUS

Author: Amer-Yahia Sihem
Comba Joao
Moreira Viviane
Omidvar-Tehrani Behrooz
Zegarra Fabian Colque
Publication venue
Publication date: 10/12/2017
Field of study

We introduce VEXUS, an interactive visualization framework for exploring user data to fulfill tasks such as finding a set of experts, forming discussion groups and analyzing collective behaviors. User data is characterized by a combination of demographics like age and occupation, and actions such as rating a movie, writing a paper, following a medical treatment or buying groceries. The ubiquity of user data requires tools that help explorers, be they specialists or novice users, acquire new insights. VEXUS lets explorers interact with user data via visual primitives and builds an exploration profile to recommend the next exploration steps. VEXUS combines state-of-the-art visualization techniques with appropriate indexing of user data to provide fast and relevant exploration

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

Author: Bastien Olivier
Birkholtz Lyn-Marie
Breton Vincent
Grando Delphine
Hofmann-Apitius Martin
Jacq Nicolas
Joubert Fourie
Kasam Vinod
Louw Abraham I
Maréchal Eric
Ortet Philippe
Roy Sylvaine
Saïdani Nadia
Wells Gordon
Zimmermann Marc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

Hal - Université Grenoble Alpes

HAL AMU

Fraunhofer-ePrints

HAL Clermont Université

HAL Descartes

HAL-CEA

ProdInra

arXiv.org e-Print Archive

HAL-IN2P3

Springer - Publisher Connector

PubMed Central

UPSpace at the University of Pretoria

An Enhancement Method for Japanese-English Automated Translation

Author: Winiwarter Werner
Wloka Bartholomäus
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 15/09/2010
Field of study

We present a method for improving existing statistical machine translation methods using a knowledge base compiled from a bilingual corpus as well as sequence alignment and pattern matching techniques from the area of machine learning and bioinformatics. An alignment algorithm identifies similar sentences, which are then used to construct a better word order for the translation. Our preliminary test results indicate a significant improvement of the translation quality.

Biblioteka Nauki - repozytorium artykuÅÃ³w

Investigationes Linguisticae

Accurate prediction of protein relative solvent accessibility using a balanced model

Author: B Lee
B Petersen
C Chothia
C Fan
C Mirabello
CN Magnan
DT Jones
E Faraggi
G Pollastri
G Pollastri
G Wang
I Walsh
J Eickholt
J Eickholt
J Hoskins
J Lafferty
J Lafferty
J Sim
J Sun
J Zhang
JY Wang
K Joo
KI Cho
L Wang
P Cong
Peisheng Cong
PW Rose
R Adamczak
S Liu
S Wang
S Wang
S Wang
SF Altschul
Tonghua Li
W Kabsch
W Li
Wei Wu
WR Atchley
Z Tang
Zhiheng Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

Author: Alexopoulou Dimitra
Andreopoulos Bill
Dietze Heiko
Doms Andreas
Gandon Fabien
Hakenberg Jörg
Khelif Khaled
Schroeder Michael
Wächter Thomas
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. Results The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate. Conclusion Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently modelled ontology. Overall, the results show that well structured ontologies can play a very important role to improve disambiguation. Availability The three benchmark datasets created for the purpose of disambiguation are available in Additional file <supplr sid="S1">1</supplr>. <suppl id="S1"> <title> Additional file 1 </title> <text> Benchmark datasets used in the experiments. The three corpora (High quality/Low quantity corpus; Medium quality/Medium quantity corpus; Low quality/High quantity corpus) are given in the form of PubMed identifiers (PMID) for True/False cases for the 7 ambiguous terms examined (GO/MeSH/UMLS identifiers are also given). </text> <file name="1471-2105-10-28-S1.txt"> Click here for file </file> </suppl

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

SJSU ScholarWorks

7th German Conference on Chemoinformatics: 25 CIC-Workshop : Goslar, Germany, 6 - 8 November 2011 ; meeting abstracts / Edited by Frank Oellien, Uli Fechner and Thomas Engel

Author: Engel Thomas
Fechner Uli
Oellien Frank
Publication venue
Publication date: 01/05/2012
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main