270 research outputs found

    Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

    Full text link

    Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

    Get PDF
    Background: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids. Methodology: A positive set of abstracts was defined by the terms 'breast cancer' and 'lung cancer' in conjunction with 14 separate 'biofluids' (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms '(biofluid) NOT breast cancer' or '(biofluid) NOT lung cancer.' More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method's performance. Results: Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI's On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI's Genes & Disease, NCI's Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer. Conclusions: We developed a semi-automated process for determining a list of putative biomarkers for breast and lung cancer. New knowledge is presented in the form of biomarker lists; ranked, newly discovered biomarker-disease-biofluid relationships; and biomarker specificity across biofluids

    LITERATURE MINING SUSTAINS AND ENHANCES KNOWLEDGE DISCOVERY FROM OMIC STUDIES

    Get PDF
    Genomic, proteomic and other experimentally generated data from studies of biological systems aiming to discover disease biomarkers are currently analyzed without sufficient supporting evidence from the literature due to complexities associated with automated processing. Extracting prior knowledge about markers associated with biological sample types and disease states from the literature is tedious, and little research has been performed to understand how to use this knowledge to inform the generation of classification models from ‘omic’ data. Using pathway analysis methods to better understand the underlying biology of complex diseases such as breast and lung cancers is state-of-the-art. However, the problem of how to combine literature-mining evidence with pathway analysis evidence is an open problem in biomedical informatics research. This dissertation presents a novel semi-automated framework, named Knowledge Enhanced Data Analysis (KEDA), which incorporates the following components: 1) literature mining of text; 2) classification modeling; and 3) pathway analysis. This framework aids researchers in assigning literature-mining-based prior knowledge values to genes and proteins associated with disease biology. It incorporates prior knowledge into the modeling of experimental datasets, enriching the development process with current findings from the scientific community. New knowledge is presented in the form of lists of known disease-specific biomarkers and their accompanying scores obtained through literature mining of millions of lung and breast cancer abstracts. These scores can subsequently be used as prior knowledge values in Bayesian modeling and pathway analysis. Ranked, newly discovered biomarker-disease-biofluid relationships which identify biomarker specificity across biofluids are presented. A novel method of identifying biomarker relationships is discussed that examines the attributes from the best-performing models. Pathway analysis results from the addition of prior information, ultimately lead to more robust evidence for pathway involvement in diseases of interest based on statistically significant standard measures of impact factor and p-values. The outcome of implementing the KEDA framework is enhanced modeling and pathway analysis findings. Enhanced knowledge discovery analysis leads to new disease-specific entities and relationships that otherwise would not have been identified. Increased disease understanding, as well as identification of biomarkers for disease diagnosis, treatment, or therapy targets should ultimately lead to validation and clinical implementation

    Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine

    Get PDF
    Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.11Ysciescopu

    Towards a comprehensive characterisation of the human internal chemical exposome: Challenges and perspectives

    Get PDF
    The holistic characterisation of the human internal chemical exposome using high-resolution mass spectrometry (HRMS) would be a step forward to investigate the environmental AE tiology of chronic diseases with an unprecedented precision. HRMS-based methods are currently operational to reproducibly profile thousands of endogenous metabolites as well as externally-derived chemicals and their biotransformation products in a large number of biological samples from human cohorts. These approaches provide a solid ground for the discovery of unrecognised biomarkers of exposure and metabolic effects associated with many chronic diseases. Nevertheless, some limitations remain and have to be overcome so that chemical exposomics can provide unbiased detection of chemical exposures affecting disease susceptibility in epidemiological studies. Some of these limitations include (i) the lack of versatility of analytical techniques to capture the wide diversity of chemicals; (ii) the lack of analytical sensitivity that prevents the detection of exogenous (and endogenous) chemicals occurring at (ultra) trace levels from restricted sample amounts, and (iii) the lack of automation of the annotation/identification process. In this article, we discuss a number of technological and methodological limitations hindering applications of HRMS-based methods and propose initial steps to push towards a more comprehensive characterisation of the internal chemical exposome. We also discuss other challenges including the need for harmonisation and the difficulty inherent in assessing the dynamic nature of the internal chemical exposome, as well as the need for establishing a strong international collaboration, high level networking, and sustainable research infrastructure. A great amount of research, technological development and innovative bio-informatics tools are still needed to profile and characterise the "invisible" (not profiled), "hidden" (not detected) and "dark" (not annotated) components of the internal chemical exposome and concerted efforts across numerous research fields are paramount

    Statistical correlation based methods for enhanced interpretation of and information recovery from NMR metabolic data sets

    No full text
    Owing to its ability to capture a systemic and temporal metabolic description of an organism’s response to a treatment, metabonomics is a well-established and valuable approach in elucidating the effects and mechanisms of a given perturbation. However, to optimise information recovery from the complex datasets generated, chemometric methods are essential. The work presented in this thesis focuses on the development of novel methods, and the use of existing methods in new applications to ease data interpretation and enhance information recovery from 1H Nuclear Magnetic Resonance (NMR) metabonomic datasets using correlation based methods. Although the methods here are largely applied to toxicological data, they could be equally valuable in the analysis of any metabonomic dataset, and indeed potentially to other ‘omics’ data presenting similar analytical challenges. The first two methodological approaches relate to novel extensions of Statistical Total Correlation Spectroscopy (STOCSY), a valuable tool in elucidation of both inter- and intra-metabolite spectral intensity correlations in NMR metabonomic datasets. In the first, STOCSY is utilised in STOCSY-editing, a method for the selective identification and downscaling of the peaks from unwanted metabolites such as those arising from xenobiotics. Structurally correlated peaks from drug metabolites are first identified using STOCSY, and the returned correlation information utilised to scale the spectra across these regions, producing a modified set of spectra in which drug metabolite contributions are reduced, endogenous peaks reconstructed and thus, analysis by pattern recognition methods without drug metabolite interferences facilitated. In the second, the STOCSY approach is extended in Iterative-STOCSY, where metabolic associations are followed over several rounds of STOCSY through calculation of correlation coefficients initially from a driver spectral peak of interest, and subsequently from all peaks identified as correlating above a set threshold to peaks picked in the previous round. The condensation of putatively structurally related peaks into single nodes, and representation of the otherwise complex network in a fully interactive plot of node-to-node connections and corresponding spectral data, allows the ready exploration of both inter- and intrametabolite relationships and a more directed approach to the identification of biomarkers of the studied perturbation. Finally various clustering methods are investigated with the aim of providing improved structural (intra-metabolite) versus non-structural (inter-metabolite) assignment. Thus, this thesis presents a framework for the enhanced identification, recovery and characterisation of inter- and intrametabolite relationships and how these are affected by metabonomic perturbation

    Discovery Of discriminative LC-MS and 1H NMR metabolomics markers

    Get PDF
    There is a growing trend to look for novel markers of altered phenotype that are not associated with existing biological knowledge. This exploratory approach has led to greater emphasis on generating and analyzing large amounts of data simultaneously. Discovery of metabolic markers through analysis of non-targeted, high- throughput data is a challenging, time-consuming process. Two of the most popular analytical techniques in metabolic profiling are 1H Nuclear Magnetic Resonance (NMR) spectroscopy and Liquid Chromatography (LC) -Mass Spectrometry (MS). There are many challenges associated with the interpretation of these complex metabolomic datasets and automated methods are critical for extracting biologically meaningful information from them.This work describes the development and application of several novel approaches for the analysis and interpretation of NMR and LC-MS data. A weighted, constrained least-squares algorithm which uses a linear mixture of reference standard data to model complex urine NMR spectra is discussed. This method was evaluated through applications on simulated and experimental datasets. The evaluation of this method suggests that the weighted least-squares approach is effective for identifying biochemical discriminators of varying physiological states. Next, a method for clustering MS instrumental artifacts and a stochastic local search algorithm for the automated assignment of large, complex MS-based metabolomic datasets is presented. Instrumental clusters, peaks grouped together by shared peak shape in the temporal domain, serve as a guide for the number of assignments necessary to completely explain a given dataset. Mass only assignments are then refined through the intersection of peak correlation pairs with a database of biochemically relevant interaction pairs. Further refinement is achieved through a stochastic local search optimization algorithm that selects individual assignments for each instrumental cluster. The algorithm works by choosing the peak assignment that maximally explains the connectivity of a given cluster. The findings indicate that this methodology provides a significant advantage over standard methods for the assignment of metabolites in an LC-MS dataset.Finally, a multi-platform (NMR, LC-MS, microarray) investigation of metabolic disturbances associated with the leptin receptor defective (db/db) mouse model of type 2 diabetes using the developed methodologies is described. Several urinary metabolites were found to be associated with diabetes and/or diabetes progression and confirmed in both NMR and LC-MS datasets. The confirmed metabolites were trimethylamine-noxide (TMAO), creatine, carnitine, and phenylalanine. Additionally, many metabolic markers were found by either NMR or LC-MS, but could not be found in both, due to instrumental limitations. This indicates that the combined use of NMR and LC-MS instrumentation provides complementary information that would be otherwise unattainable. Pathway analyses of urinary metabolites and liver, muscle, and adipose tissue transcripts from the db/db model were also performed. Metabolite and liver transcript levels associated with the TCA cycle and steroid processes were altered in db/db mice, as was gene expression in muscle and liver associated with fatty acid processing. The findings implicate a number of processes known to be associated with diabetes and reveal tissue specific responses to the condition. When studying metabolic disorders such as diabetes, platform integrated profiling of metabolite alterations in biofluids can provide important insight into the processes underlying the disease.Ph.D., Biomedical Science -- Drexel University, 200

    Incorporating standardised drift-tube ion mobility to enhance non-targeted assessment of the wine metabolome (LC×IM-MS)

    Get PDF
    Liquid chromatography with drift-tube ion mobility spectrometry-mass spectrometry (LCxIM-MS) is emerging as a powerful addition to existing LC-MS workflows for addressing a diverse range of metabolomics-related questions [1,2]. Importantly, excellent precision under repeatability and reproducibility conditions of drift-tube IM separations [3] supports the development of non-targeted approaches for complex metabolome assessment such as wine characterisation [4]. In this work, fundamentals of this new analytical metabolomics approach are introduced and application to the analysis of 90 authentic red and white wine samples originating from Macedonia is presented. Following measurements, intersample alignment of metabolites using non-targeted extraction and three-dimensional alignment of molecular features (retention time, collision cross section, and high-resolution mass spectra) provides confidence for metabolite identity confirmation. Applying a fingerprinting metabolomics workflow allows statistical assessment of the influence of geographic region, variety, and age. This approach is a state-of-the-art tool to assess wine chemodiversity and is particularly beneficial for the discovery of wine biomarkers and establishing product authenticity based on development of fingerprint libraries

    Dolphin and whale: development, evaluation and application of novel bioinformatics tools for metabolite profiling in high throughput 1H-NMR analysis

    Get PDF
    El perfilat de metabòlits es la tasca més difícil dins l'anàlisi espectral de RMN. El seu objectiu es comprendre els processos biològics que tenen lloc en un moment concret mitjançant la identificació i quantificació dels metabòlits presents en mescles d' RMN complexes. Un espectre de RMN està compost per ressonàncies d'un gran nombre de metabòlits, i aquestes se solen solapar entre elles, canviar de posició depenent del pH de la mostra i poden quedar emmascarades per senyals de macromolècules. Tots aquests problemes compliquen la identificació i quantificació de metabòlits, pel que obtenir un perfil de metabòlits curat en una mostra pot ser un gran repte inclús per usuaris experts. En aquest context, la motivació d'aquesta tesi va néixer amb l'objectiu de donar automatismes i funcions fàcils de fer servir per al perfilat de metabòlits en RMN, millorant la qualitat dels resultats i reduint el temps d'anàlisi. Per fer-ho, es van implementar un conjunt d'algoritmes que van acabar empaquetats en dos programes, Dolphin i Whale.El perfilado de metabolitos es la tarea más difícil dentro del análisis espectral de RMN. Su objetivo es comprender los procesos biológicos que tienen lugar en un momento concreto a través de la identificación y cuantificación de los metabolitos presentes en mezclas de RMN complejas. Un espectro de RMN está compuesto por resonancias de un gran numero de metabolitos, y éstas a menudo se solapan entre ellas, cambian de posición dependiendo del pH de la muestra y pueden quedar enmascaradas por señales de macromoléculas. Todos estos problemas complican la identificación y cuantificación de metabolitos, por lo que obtener un perfilado de metabolitos curado en una muestra puede ser un gran reto incluso para usuarios expertos. En este contexto, la motivación de esta tesis nació con el objetivo de dar automatismos y funciones fáciles de usar para el perfilado de metabolitos en RMN, mejorando la calidad de los resultados y reduciendo el tiempo de análisis. Para hacerlo, se implementaron un conjunto de algoritmos que acabaron empaquetados en dos programas, Dolphin y Whale.Metabolite profiling is the most challenging approach in NMR spectral analysis. It aims to comprehend biological processes occurring in a certain moment through identifying and quantifying metabolites present in complex NMR mixtures. An NMR spectrum is composed by resonances of a huge number of metabolites, and these resonances often overlap between them, shift position depending on the sample pH and can be masked by macromolecules signals. All these drawbacks hinder metabolite identification and quantification, so obtaining a cured metabolite profile of a sample can be a very big issue even for expert users. In this context, the motivation of this thesis was born with the aim to provide automatisms and user-friendly interactive functions for NMR metabolite profiling, improving the quality of the results and reducing the time span of the analysis. To do so, several algorisms were implemented and embedded into two software packages, Dolphin and Whale
    corecore