19 research outputs found

    An effective disease risk indicator tool

    Get PDF
    Each mixture of deficient molecular families of a specific disease induces the disease at a different time frame in the future. Based on this, we propose a novel methodology for personalizing a person’s level of future susceptibility to a specific disease by inferring the mixture of his/her molecular families, whose combined deficiencies is likely to induce the disease. We implemented the methodology in a working system called DRIT, which consists of the following components: logic inferencer, information extractor, risk indicator, and interrelationship between molecular families modeler. The information extractor takes advantage of the exponential increase of biomedical literature to extract the common biomarkers that test positive among most patients with a specific disease. The logic inferencer transforms the hierarchical interrelationships between the molecular families of a disease into rule-based specifications. The interrelationship between molecular families modeler models the hierarchical interrelationships between the molecular families, whose biomarkers were extracted by the information extractor. It employs the specification rules and the inference rules for predicate logic to infer as many as possible probable deficient molecular families for a person based on his/her few molecular families, whose biomarkers tested positive by medical screening. The risk indicator outputs a risk indicator value that reflects a person’s level of future susceptibility to the disease. We evaluated DRIT by comparing it experimentally with a comparable method. Results revealed marked improvement

    Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

    Get PDF
    Background: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids. Methodology: A positive set of abstracts was defined by the terms 'breast cancer' and 'lung cancer' in conjunction with 14 separate 'biofluids' (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms '(biofluid) NOT breast cancer' or '(biofluid) NOT lung cancer.' More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method's performance. Results: Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI's On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI's Genes & Disease, NCI's Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer. Conclusions: We developed a semi-automated process for determining a list of putative biomarkers for breast and lung cancer. New knowledge is presented in the form of biomarker lists; ranked, newly discovered biomarker-disease-biofluid relationships; and biomarker specificity across biofluids

    Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

    Full text link

    A text mining based approach for biomarker discovery

    Get PDF
    Dissertação de mestrado em BioinformáticaBiomarkers have long been heralded as potential motivators for the emergence of new treatment and diagnostic procedures for disease conditions. However, for many years, the biomarker discovery process could only be achieved through experimental means, serving as a deterrent for their increase in popularity as the usually large number of candidates resulted in a costly and time-consuming discovery process. The increase in computational capabilities has led to a change in the paradigm of biomarker discovery, migrating from the clinical laboratory to in silico environments. Furthermore, text mining, the act of automatically extracting information from text through computational means, has seen a rise in popularity in the biomedical fields. The number of studies and clinical trials in these fields has greatly increased in the past years, making the task of manually examining and annotating these, at the very least, incredibly cumbersome. Adding to this, even though the development of efficient and thorough natural language processing is still an on-going process, the potential for the discovery of common reported and hidden behaviours in the scientific literature is too high to be ignored. Several tools, technologies, pipelines and frameworks already exist capable of, at least, giving a glimpse on how the analysis of the available pile of scientific literature can pave the way for the development of novel medical techniques that might help in the prevention, diagnostic and treatment of diseases. As such, a novel approach is presented in this work for achieving biomarker discov ery, one that integrates both gene-disease associations extracted from current biomedical literature and RNA-Seq gene expression data in an L1-regularization mixed-integer linear programming model for identifying potential biomarkers, potentially providing an optimal and robust genetic signature for disease diagnostic and helping identify novel biomarker candidates. This analysis was carried out on five publicly available RNA-Seq datasets ob tained from the Genomic Data Commons Data Portal, related to breast, colon, lung and prostate cancer, and head and neck squamous cell carcinoma. Hyperparameter optimiza tion was also performed for this approach, and the performance of the optimal set of pa rameters was compared against other machine learning methods.Os biomarcadores há muito que são considerados como os motivadores principais para o desenvolvimento de novos procedimentos de diagnóstico e tratamento de doenças. No entanto, ate há relativamente pouco tempo, o processo de descoberta de biomarcadores estava dependente de métodos experimentais, sendo este um elemento dissuasor da sua aplicação e estudo em massa dado que o número elevado de candidatos implicava um processo de averiguação extremamente dispendioso e demorado. O grande aumento do poder computacional nas últimas décadas veio contrariar esta tendência, levando a migração do processo de descoberta de biomarcadores do laboratório para o ambiente in silico. Para além disso, a aplicação de processos de mineração de textos, que consistem na extração de informação de documentos através de meios computacionais, tem visto um aumento da sua popularidade na comunidade biomédica devido ao aumento exponencial do número de estudos e ensaios clínicos nesta área, tornando todo o processo de analise e anotação manual destes bastante laborioso. A adicionar a isto, apesar do desenvolvimento de métodos eficientes capazes de processar linguagem natural na sua plenitude seja um processo que ainda esteja a decorrer, o potencial para a descoberta de comportamentos reportados e escondidos na literatura e demasiado elevado para ser ignorado. Já existem diversas ferramentas e tecnologias capazes de, pelo menos, dar uma indicação de como a análise da literatura científica disponível pode abrir o caminho para o desenvolvimento de novas técnicas e procedimentos médicos que poder ao auxiliar na prevenção, diagnóstico e tratamento de doenças. Como tal, e apresentado neste trabalho um novo método para realizar a descoberta de biomarcadores, que considera simultaneamente associações entre genes e doenças, já extraídas da literatura biomédica e dados de expressão de genes RNA-Seq num modelo de otimização linear com regularização L1 com variáveis contínuas e inteiras (MILP) para identificar possíveis biomarcadores, sendo capaz potencialmente de providenciar assinaturas genéticas ótimas e robustas para o diagnostico de doenças e ajudar a identificar novos candidatos a biomarcador. Esta análise foi levada a cabo em cinco conjuntos de dados RNA-Seq obtidos através do Portal de Dados do Genomic Data Commons (GDC) relacionados com os cancros da mama, colon, pulmão, próstata, e carcinoma escamoso da cabeça e pescoço. Realizou-se também uma otimização dos hiperparâmetros deste método, e o desempenho do conjunto ideal de parâmetros foi comparado com o de outros métodos de aprendizagem máquina

    Personizing the prediction of future susceptibility to a specific disease

    Get PDF
    A traceable biomarker is a member of a disease’s molecular pathway. A disease may be associated with several molecular pathways. Each different combination of these molecular pathways, to which detected traceable biomarkers belong, may serve as an indicative of the elicitation of the disease at a different time frame in the future. Based on this notion, we introduce a novel methodology for personalizing an individual’s degree of future susceptibility to a specific disease. We implemented the methodology in a working system called Susceptibility Degree to a Disease Predictor (SDDP). For a specific disease d, let S be the set of molecular pathways, to which traceable biomarkers detected from most patients of d belong. For the same disease d, let S′ be the set of molecular pathways, to which traceable biomarkers detected from a certain individual belong. SDDP is able to infer the subset S′′ ⊆{S-S′} of undetected molecular pathways for the individual. Thus, SDDP can infer undetected molecular pathways of a disease for an individual based on few molecular pathways detected from the individual. SDDP can also help in inferring the combination of molecular pathways in the set {S′+S′′}, whose traceable biomarkers collectively is an indicative of the disease. SDDP is composed of the following four components: information extractor, interrelationship between molecular pathways modeler, logic inferencer, and risk indicator. The information extractor takes advantage of the exponential increase of biomedical literature to automatically extract the common traceable biomarkers for a specific disease. The interrelationship between molecular pathways modeler models the hierarchical interrelationships between the molecular pathways of the traceable biomarkers. The logic inferencer transforms the hierarchical interrelationships between the molecular pathways into rule-based specifications. It employs the specification rules and the inference rules for predicate logic to infer as many as possible undetected molecular pathways of a disease for an individual. The risk indicator outputs a risk indicator value that reflects the individual’s degree of future susceptibility to the disease. We evaluated SDDP by comparing it experimentally with other methods. Results revealed marked improvement

    Clustering of Alzheimer's and Parkinson's disease based on genetic burden of shared molecular mechanisms

    Get PDF
    One of the visions of precision medicine has been to re-define disease taxonomies based on molecular characteristics rather than on phenotypic evidence. However, achieving this goal is highly challenging, specifically in neurology. Our contribution is a machine-learning based joint molecular subtyping of Alzheimer’s (AD) and Parkinson’s Disease (PD), based on the genetic burden of 15 molecular mechanisms comprising 27 proteins (e.g. APOE) that have been described in both diseases. We demonstrate that our joint AD/PD clustering using a combination of sparse autoencoders and sparse non-negative matrix factorization is reproducible and can be associated with significant differences of AD and PD patient subgroups on a clinical, pathophysiological and molecular level. Hence, clusters are disease-associated. To our knowledge this work is the first demonstration of a mechanism based stratification in the field of neurodegenerative diseases. Overall, we thus see this work as an important step towards a molecular mechanism-based taxonomy of neurological disorders, which could help in developing better targeted therapies in the future by going beyond classical phenotype based disease definitions

    A Knowledge-based Integrative Modeling Approach for <em>In-Silico</em> Identification of Mechanistic Targets in Neurodegeneration with Focus on Alzheimer’s Disease

    Get PDF
    Dementia is the progressive decline in cognitive function due to damage or disease in the body beyond what might be expected from normal aging. Based on neuropathological and clinical criteria, dementia includes a spectrum of diseases, namely Alzheimer's dementia, Parkinson's dementia, Lewy Body disease, Alzheimer's dementia with Parkinson's, Pick's disease, Semantic dementia, and large and small vessel disease. It is thought that these disorders result from a combination of genetic and environmental risk factors. Despite accumulating knowledge that has been gained about pathophysiological and clinical characteristics of the disease, no coherent and integrative picture of molecular mechanisms underlying neurodegeneration in Alzheimer’s disease is available. Existing drugs only offer symptomatic relief to the patients and lack any efficient disease-modifying effects. The present research proposes a knowledge-based rationale towards integrative modeling of disease mechanism for identifying potential candidate targets and biomarkers in Alzheimer’s disease. Integrative disease modeling is an emerging knowledge-based paradigm in translational research that exploits the power of computational methods to collect, store, integrate, model and interpret accumulated disease information across different biological scales from molecules to phenotypes. It prepares the ground for transitioning from ‘descriptive’ to “mechanistic” representation of disease processes. The proposed approach was used to introduce an integrative framework, which integrates, on one hand, extracted knowledge from the literature using semantically supported text-mining technologies and, on the other hand, primary experimental data such as gene/protein expression or imaging readouts. The aim of such a hybrid integrative modeling approach was not only to provide a consolidated systems view on the disease mechanism as a whole but also to increase specificity and sensitivity of the mechanistic model by providing disease-specific context. This approach was successfully used for correlating clinical manifestations of the disease to their corresponding molecular events and led to the identification and modeling of three important mechanistic components underlying Alzheimer’s dementia, namely the CNS, the immune system and the endocrine components. These models were validated using a novel in-silico validation method, namely biomarker-guided pathway analysis and a pathway-based target identification approach was introduced, which resulted in the identification of the MAPK signaling pathway as a potential candidate target at the crossroad of the triad components underlying disease mechanism in Alzheimer’s dementia

    Identification of potential serum protein biomarkers for recurrence in gastric cancer patients using a quantitative multiple reaction monitoring approach

    Get PDF
    학위논문 (박사)-- 서울대학교 융합과학기술대학원 분자의학 및 바이오제약학과, 2017. 8. 이유진.Gastric cancer (GC) is one of the most common cancers representing the second leading cause of cancer-related mortality. Despite improvements in clinical therapies of GC, the recurrence rate of GC patients remains high (~55%) with advanced stage of the disease. Therefore, it is essential to understand of GC recurrence mechanisms that would help effective clinical application for GC diagnosis and prognosis. Here, we aimed to identify potential serum biomarkers for recurrence in gastric cancers with an established quantitative multiple reaction monitoring (MRM) approach using GC patient serum samples. To build up a serum biomarker development platform, we first generated serum biomarker candidates through comprehensive proteomic approach. By employing both preliminary MRM and automated detection of inaccurate and imprecise transitions (AuDIT) analysis with stable isotope–labeled internal standard (SIS) peptides using pooled GC patient serum samples, we established a quantitative MRM analysis of 94 proteins as final MRM target proteins. To establish the multi-biomarker panel for identification of GC recurrence, we conducted the quantitative MRM analysis with 180 individual patients divided into the two groups, i.e. response group (n=133) and recurrence group (n=47), who received chemotherapy after D2 lymph node dissection in both groups, as a training set. By a stringent statistical analysis with quantitative MRM data of training sets individual samples, the 6-marker panel, consisting of alpha-1-antichymotrypsin (SERPINA3), apolipoprotein A-II (APOA2), apolipoprotein C-I (APOC1), clusterin (CLU), inter-alpha-trypsin inhibitor heavy chain H4 (ITIH4), and leucine-rich alpha-2-glycoprotein (LRG1), was constructed. These proteins showed the differentially expressed levels (p-value < 0.05) between the two groups with an area under the curve (AUC) value of 0.810 and high prediction rates in both groups (95.5% and 61.7% in response and recurrence groups, respectively). To verify the 6-marker panel, we further applied MRM analysis with independent patient samples (n=64), i.e. response group (n=43) and recurrence group (n=21), as a test set. We demonstrated that 6 marker proteins showed the correlated expression patterns as in a training set with statistical significance (p-value < 0.05). We propose that these proteins can serve as diagnostic signatures to identify the recurrence in GC patients and our quantitative MRM assay based serum biomarker development platform could serve as a valuable tool in the clinical biomarker discovery-verification process.Introduction 1 Materials and Methods 7 Results 19 Discussion 70 References 77 Abstract in Korean 92Docto
    corecore