Search CORE

27,917 research outputs found

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Building a semantically annotated corpus of clinical texts

Author: Andrea Setzer
Angus Roberts
Denny
Franzén
Friedman
Gennari
George Demetriou
Hersh
Hripcsak
Ian Roberts
Kim
Lindberg
Mark Hepple
Meystre
Pestian
Robert Gaizauskas
Roberts
Tanabe
Yikun Guo
Publication venue: 'Elsevier BV'
Publication date: 01/10/2009
Field of study

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

Elsevier - Publisher Connector

Crossref

White Rose Research Online

Texture Analysis Methods for Medical Image Characterisation

Author: William Henry Nailon
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Crossref

Playing hide and seek on the genomic playground: unveiling biological function from literature

Author: Van Landeghem Sofie
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

Recommended from our members

Development and validation of blood-based proteomic biomarker-sociodemographic diagnostic prediction models to identify major depressive disorder among symptomatic individuals

Author: Han Sung Yeon
Publication venue: University of Cambridge
Publication date: 01/09/2020
Field of study

Major depressive disorder (MDD) is a highly prevalent and disabling condition with a complex pathophysiology that has not been fully elucidated to date. While the socioeconomic burden of the disease is significant, many individuals remain undiagnosed or misdiagnosed. This is largely because the current diagnostic approach that relies on clinical evaluations of signs and symptoms can be subjective, and time and resources tend to be rather limited in primary care where the majority seek help for depression. Therefore, there is a significant and pressing need for an objective, reliable and readily accessible diagnostic test to enable earlier and more accurate diagnosis of MDD. In particular, as individuals experiencing subthreshold levels of depressive symptoms have an increased risk of developing MDD, it would be clinically relevant for such a diagnostic test to be able to identify depressed patients and/or individuals with high risks of incident MDD among symptomatic individuals. This thesis sought to develop risk prediction models that could potentially be utilised within a clinical setting to facilitate earlier and more accurate diagnosis of MDD. Such models were used to obtain probability estimates of the investigated individuals having or developing MDD based on their blood-based proteomic profiles and other characteristics, including sociodemographic and lifestyle factors. A targeted mass spectrometry approach was used to measure the abundances of a panel of peptides representing proteins, many of which have been previously associated with psychiatric disorders. Biomarkers were investigated in serum samples, which are widely used for blood-based biomarker discovery, as well as in dried blood spot samples, which are relatively novel in the field and carry several advantages. Importantly, this thesis focused on adopting appropriate statistical methods to ensure that the diagnostic predictions made by the models were accurate and reproducible, by addressing problems of model overfitting and model selection uncertainty. A particularly significant aspect of this was the development and application of a multimodel-based approach combining feature extraction and model averaging, which resulted in improved model predictive performance and generalisability. Diagnostic prediction models based on serum proteomic, sociodemographic/lifestyle and clinical data were shown to be able to differentiate between subthreshold symptomatic individuals who developed and did not develop MDD. Additionally, diagnostic prediction models based on dried blood spot proteomic and digital mental health assessment data were shown to be able to identify currently depressed patients without an existing MDD diagnosis as well as currently not depressed patients with an existing MDD diagnosis among subthreshold symptomatic individuals. These results clearly demonstrate the potential of such prediction models to be used as an aid to the diagnosis of MDD in clinical practice, especially within the primary care setting. Moreover, MDD was found to be associated with several blood-based proteomic biomarkers, which mainly represented an immune/inflammatory profile, as well as with various other patient features, most notably body mass index and childhood trauma. Although further investigations are needed, these associations reveal disturbances in the stress response pathways involving the hypothalamic-pituitary-adrenal axis in the pathophysiology of depression

Apollo (Cambridge)

Integrated mining of feature spaces for bioinformatics domain discovery

Author: Chowriappa Pradeep
Publication venue: Louisiana Tech Digital Commons
Publication date: 01/10/2008
Field of study

One of the major challenges in the field of bioinformatics is the elucidation of protein folding for the functional annotation of proteins. The factors that govern protein folding include the chemical, physical, and environmental conditions of the protein\u27s surroundings, which can be measured and exploited for computational discovery purposes. These conditions enable the protein to transform from a sequence of amino acids to a globular three-dimensional structure. Information concerning the folded state of a protein has significant potential to explain biochemical pathways and their involvement in disorders and diseases. This information impacts the ways in which genetic diseases are characterized and cured and in which designer drugs are created. With the exponential growth of protein databases and the limitations of experimental protein structure determination, sophisticated computational methods have been developed and applied to search for, detect, and compare protein homology. Most computational tools developed for protein structure prediction are primarily based on sequence similarity searches. These approaches have improved the prediction accuracy of high sequence similarity proteins but have failed to perform well with proteins of low sequence similarity. Data mining offers unique algorithmic computational approaches that have been used widely in the development of automatic protein structure classification and prediction. In this dissertation, we present a novel approach for the integration of physico-chemical properties and effective feature extraction techniques for the classification of proteins. Our approaches overcome one of the major obstacles of data mining in protein databases, the encapsulation of different hydrophobicity residue properties into a much reduced feature space that possess high degrees of specificity and sensitivity in protein structure classification. We have developed three unique computational algorithms for coherent feature extraction on selected scale properties of the protein sequence. When plagued by the problem of the unequal cardinality of proteins, our proposed integration scheme effectively handles the varied sizes of proteins and scales well with increasing dimensionality of these sequences. We also detail a two-fold methodology for protein functional annotation. First, we exhibit our success in creating an algorithm that provides a means to integrate multiple physico-chemical properties in the form of a multi-layered abstract feature space, with each layer corresponding to a physico-chemical property. Second, we discuss a wavelet-based segmentation approach that efficiently detects regions of property conservation across all layers of the created feature space. Finally, we present a unique graph-theory based algorithmic framework for the identification of conserved hydrophobic residue interaction patterns using identified scales of hydrophobicity. We report that these discriminatory features are specific to a family of proteins, which consist of conserved hydrophobic residues that are then used for structural classification. We also present our rigorously tested validation schemes, which report significant degrees of accuracy to show that homologous proteins exhibit the conservation of physico-chemical properties along the protein backbone. We conclude our discussion by summarizing our results and contributions and by listing our goals for future research

Louisiana Tech Digital Commons

Microarray Data Mining and Gene Regulatory Network Analysis

Author: Li Ying
Publication venue: The Aquila Digital Community
Publication date: 01/05/2011
Field of study

The novel molecular biological technology, microarray, makes it feasible to obtain quantitative measurements of expression of thousands of genes present in a biological sample simultaneously. Genome-wide expression data generated from this technology are promising to uncover the implicit, previously unknown biological knowledge. In this study, several problems about microarray data mining techniques were investigated, including feature(gene) selection, classifier genes identification, generation of reference genetic interaction network for non-model organisms and gene regulatory network reconstruction using time-series gene expression data. The limitations of most of the existing computational models employed to infer gene regulatory network lie in that they either suffer from low accuracy or computational complexity. To overcome such limitations, the following strategies were proposed to integrate bioinformatics data mining techniques with existing GRN inference algorithms, which enables the discovery of novel biological knowledge. An integrated statistical and machine learning (ISML) pipeline was developed for feature selection and classifier genes identification to solve the challenges of the curse of dimensionality problem as well as the huge search space. Using the selected classifier genes as seeds, a scale-up technique is applied to search through major databases of genetic interaction networks, metabolic pathways, etc. By curating relevant genes and blasting genomic sequences of non-model organisms against well-studied genetic model organisms, a reference gene regulatory network for less-studied organisms was built and used both as prior knowledge and model validation for GRN reconstructions. Networks of gene interactions were inferred using a Dynamic Bayesian Network (DBN) approach and were analyzed for elucidating the dynamics caused by perturbations. Our proposed pipelines were applied to investigate molecular mechanisms for chemical-induced reversible neurotoxicity

Aquila Digital Community

Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data

Author: Conrad Tim
Cvetkovic Nada
Genzel Martin
Kutyniok Gitta
Leichtle Alexander
Schütte Christof
Vybiral Jan
Wulkow Niklas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/11/2016
Field of study

Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets

arXiv.org e-Print Archive

Institutional Repository of the Freie Universität Berlin

DepositOnce

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Bern Open Repository and Information System (BORIS)