10 research outputs found

    Combining syntactic and ontological knowledge to extract biologically relevant relations from scientific papers

    Get PDF
    Bringing biologists and text miners closer together is a major aim towards the general usage of literature mining tools. Our contribution to this aim is an end-user tool for the extraction of problem-specific biologically relevant relations. Development efforts are being focused on easy-to-use text mining workflows including commonly available entity recognisers and syntactic processors, and the construction of a userfriendly environment that enables problemspecific tailoring by biologists.Fundação para a Ciência e a Tecnologia (FCT)Systems Biology as a Driver for Industrial Biotechnology (SYSINBIO

    Network Analysis of Obesity Expression Data

    Get PDF
    There are numerous genetic and environmental factors associated significantly with obesity, which could be used as potential diagnostic biomarkers. The molecular mechanisms, development, differentiation, and disease gene expression data provide crucial insights as these differentially expressed genes could have major effects on diet-induced obesity and such effect is not seen in animals. Genomics and proteomics are major branches for better understanding the normal function of the tissues and their interactions with the environment i.e. characterizing the tissues in which the newly discovered genes are expressed, helps in understanding the development of tissues, ageing mechanisms, and signalling routes that enable the tissues to function and also direct the similitude, parallelism and other levels of aptness betwixt two or more gene artefacts. It is traditionally known that hypothalamic and brain stem centres are intricate in the mandate of food absorption and energy equilibrium, but statistics on the associated governing elements and their genes was scant until the utmost decagon and have been identified to be strongly expressed in variety of tissues. NPY plays a notable part in anxiety, tension, corpulence, and vitality homeostasis through incitement of NPY-Y1 receptors (Y1Rs) in the mind. NPY1R quality is the protein accomplice of qualities that are utilized as model as a part of mouse and in addition in people. Utilizing diverse bioinformatics instruments, the relative examination of NPY1R at quality and additionally at protein level can be assessed for biomarker of stoutness malady. In this manner, the system science thinks about point to predict the quality of heftiness which could be taken as a biomarker in human by examining with the quality that already has been utilized as marker as a part of model life forms

    Clique-based data mining for related genes in a biomedical database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph.</p> <p>Results</p> <p>We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes.</p> <p>Conclusion</p> <p>We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.</p

    Open Biomedical Ontology-based Medline exploration

    Get PDF
    Abstract Background Effective Medline database exploration is critical for the understanding of high throughput experimental results and the development of novel hypotheses about the mechanisms underlying the targeted biological processes. While existing solutions enhance Medline exploration through different approaches such as document clustering, network presentations of underlying conceptual relationships and the mapping of search results to MeSH and Gene Ontology trees, we believe the use of multiple ontologies from the Open Biomedical Ontology can greatly help researchers to explore literature from different perspectives as well as to quickly locate the most relevant Medline records for further investigation. Results We developed an ontology-based interactive Medline exploration solution called PubOnto to enable the interactive exploration and filtering of search results through the use of multiple ontologies from the OBO foundry. The PubOnto program is a rich internet application based on the FLEX platform. It contains a number of interactive tools, visualization capabilities, an open service architecture, and a customizable user interface. It is freely accessible at: http://brainarray.mbni.med.umich.edu/brainarray/prototype/pubonto .http://deepblue.lib.umich.edu/bitstream/2027.42/112693/1/12859_2009_Article_3295.pd

    Doctor of Philosophy

    Get PDF
    dissertationThe objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often overwhelmed by excessive, irrelevant data. Scientists have applied natural language processing technologies to improve retrieval. Text summarization, a natural language processing approach, simplifies bibliographic data while filtering it to address a user's need. Traditional text summarization can necessitate the use of multiple software applications to accommodate diverse processing refinements known as "points-of-view." A new, statistical approach to text summarization can transform this process. Combo, a statistical algorithm comprised of three individual metrics, determines which elements within input data are relevant to a user's specified information need, thus enabling a single software application to summarize text for many points-of-view. In this dissertation, I describe this algorithm, and the research process used in developing and testing it. Four studies comprised the research process. The goal of the first study was to create a conventional schema accommodating a genetic disease etiology point-of-view, and an evaluative reference standard. This was accomplished through simulating the task of secondary genetic database curation. The second study addressed the development iv and initial evaluation of the algorithm, comparing its performance to the conventional schema using the previously established reference standard, again within the task of secondary genetic database curation. The third and fourth studies evaluated the algorithm's performance in accommodating additional points-of-view in a simulated clinical decision support task. The third study explored prevention, while the fourth evaluated performance for prevention and drug treatment, comparing results to a conventional treatment schema's output. Both summarization methods identified data that were salient to their tasks. The conventional genetic disease etiology and treatment schemas located salient information for database curation and decision support, respectively. The Combo algorithm located salient genetic disease etiology, treatment, and prevention data, for the associated tasks. Dynamic text summarization could potentially serve additional purposes, such as consumer health information delivery, systematic review creation, and primary research. This technology may benefit many user groups

    NAMED ENTITY RECOGNITION FROM BIOMEDICAL TEXT -AN INFORMATION EXTRACTION TASK

    Get PDF
    Biomedical Text Mining targets the Extraction of significant information from biomedical archives. Bio TM encompasses Information Retrieval (IR) and Information Extraction (IE). The Information Retrieval will retrieve the relevant Biomedical Literature documents from the various Repositories like PubMed, MedLine etc., based on a search query. The IR Process ends up with the generation of corpus with the relevant document retrieved from the Publication databases based on the query. The IE task includes the process of Preprocessing of the document, Named Entity Recognition (NER) from the documents and Relationship Extraction. This process includes Natural Language Processing, Data Mining techniques and machine Language algorithm. The preprocessing task includes tokenization, stop word Removal, shallow parsing, and Parts-Of-Speech tagging. NER phase involves recognition of well-defined objects such as genes, proteins or cell-lines etc. This process leads to the next phase that is extraction of relationships (IE). The work was based on machine learning algorithm Conditional Random Field (CRF)

    Semantic models as metrics for kernel-based interaction identification

    Get PDF
    Automatic detection of protein-protein interactions (PPIs) in biomedical publications is vital for efficient biological research. It also presents a host of new challenges for pattern recognition methodologies, some of which will be addressed by the research in this thesis. Proteins are the principal method of communication within a cell; hence, this area of research is strongly motivated by the needs of biologists investigating sub-cellular functions of organisms, diseases, and treatments. These researchers rely on the collaborative efforts of the entire field and communicate through experimental results published in reviewed biomedical journals. The substantial number of interactions detected by automated large-scale PPI experiments, combined with the ease of access to the digitised publications, has increased the number of results made available each day. The ultimate aim of this research is to provide tools and mechanisms to aid biologists and database curators in locating relevant information. As part of this objective this thesis proposes, studies, and develops new methodologies that go some way to meeting this grand challenge. Pattern recognition methodologies are one approach that can be used to locate PPI sentences; however, most accurate pattern recognition methods require a set of labelled examples to train on. For this particular task, the collection and labelling of training data is highly expensive. On the other hand, the digital publications provide a plentiful source of unlabelled data. The unlabelled data is used, along with word cooccurrence models, to improve classification using Gaussian processes, a probabilistic alternative to the state-of-the-art support vector machines. This thesis presents and systematically assesses the novel methods of using the knowledge implicitly encoded in biomedical texts and shows an improvement on the current approaches to PPI sentence detection

    Développement d'une méthode de prédiction des sites d'interaction protéines-molécules à partir de la structure primaire

    Full text link
    Nicolas Delsaux (2008). Développement d’une méthode de prédiction des sites d’interaction protéines-molécules à partir de la structure primaire (thèse de doctorat). Gembloux, Belgique, Faculté Universitaires des Sciences Agronomiques, 204p., 20 tabl., 81 fig.Résumé : Ce travail a pour but d’améliorer nos connaissances des interfaces et d’aider les scientifiques à caractériser au mieux les protéines de fonction et de structure encore inconnues. Pour cela, nous avons construit des banques de données de structures tridimensionnelles de complexes et leurs interfaces ont été analysées au niveau atomique. L’analyse détaillée des interfaces a permis de confirmer le rôle important des acides aminés aromatiques et de l’arginine ainsi que de montrer quels couples de résidus sont significativement favorisés dans celles-ci. De plus, l’importance du volume des résidus voisins des sites d’interaction et de la conformation des acides nucléiques a pu être montrée. Les principales variables corrélées aux interfaces sont : trois propensions à être en interaction, le type de résidu et sa position dans la séquence, les prédictions d’accessibilité et de structures secondaires, la prédiction en ‘Receptor Binding Domain’, et la présence de certains motifs protéiques. Finalement, une méthode de prédiction des sites d’interaction été mise au point. Cette méthode est l’une des seules à n’utiliser que des informations directement accessible à partir de la séquence et donne des résultats très encourageants. La spécificité obtenue est en effet suffisante pour améliorer les résultats expérimentaux obtenus par mutagénèse dirigée.Nicolas Delsaux (2008). Development of a prediction method of protein-molecule interaction sites from primary structure (doctoral thesis, in French). Gembloux, Belgium, Agricultural University, 204p., 20 tabl., 81 fig.Summary: The objective of this work is to improve the knowledge about interfaces and to assist the characterization of proteins with unknown function and structure. To this aim, we constructed databases of three-dimensional structures of complexes and their interfaces were analyzed at the atomic scale. The in-depth analysis of interfaces confirms the preponderance of aromatic amino acids and of arginine. Residue pairs that are significantly favored at the interfaces were also identified. Moreover, the importance of interaction sites neighboring residue volumes and of the nucleic acid’s conformation has been highlighted. The main parameters correlated to interaction sites are: three interaction propensities, residue’s type and its location within the sequence, predictions of accessibility and of secondary structures, prediction to be a Receptor Binding Domain, and the presence of protein patterns. Finally, a prediction method of interaction sites has been developed. This method is one of the few which only use information directly accessible from protein sequence and gives promising results. The achieved specificity is indeed sufficient to improve experimental results of site-directed mutagenesis
    corecore