Search CORE

115 research outputs found

Extracting biomedical relations from biomedical literature

Author: Maldonado Tânia Sofia Guerreiro
Publication venue
Publication date: 01/01/2018
Field of study

Tese de mestrado em Bioinformática e Biologia Computacional, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, em 2018A ciência, e em especial o ramo biomédico, testemunham hoje um crescimento de conhecimento a uma taxa que clínicos, cientistas e investigadores têm dificuldade em acompanhar. Factos científicos espalhados por diferentes tipos de publicações, a riqueza de menções etiológicas, mecanismos moleculares, pontos anatómicos e outras terminologias biomédicas que não se encontram uniformes ao longo das várias publicações, para além de outros constrangimentos, encorajaram a aplicação de métodos de text mining ao processo de revisão sistemática. Este trabalho pretende testar o impacto positivo que as ferramentas de text mining juntamente com vocabulários controlados (enquanto forma de organização de conhecimento, para auxílio num posterior momento de recolha de informação) têm no processo de revisão sistemática, através de um sistema capaz de criar um modelo de classificação cujo treino é baseado num vocabulário controlado (MeSH), que pode ser aplicado a uma panóplia de literatura biomédica. Para esse propósito, este projeto divide-se em duas tarefas distintas: a criação de um sistema, constituído por uma ferramenta que pesquisa a base de dados PubMed por artigos científicos e os grava de acordo com etiquetas pré-definidas, e outra ferramenta que classifica um conjunto de artigos; e a análise dos resultados obtidos pelo sistema criado, quando aplicado a dois casos práticos diferentes. O sistema foi avaliado através de uma série de testes, com recurso a datasets cuja classificação era conhecida, permitindo a confirmação dos resultados obtidos. Posteriormente, o sistema foi testado com recurso a dois datasets independentes, manualmente curados por investigadores cuja área de investigação se relaciona com os dados. Esta forma de avaliação atingiu, por exemplo, resultados de precisão cujos valores oscilam entre os 68% e os 81%. Os resultados obtidos dão ênfase ao uso das tecnologias e ferramentas de text mining em conjunto com vocabulários controlados, como é o caso do MeSH, como forma de criação de pesquisas mais complexas e dinâmicas que permitam melhorar os resultados de problemas de classificação, como são aqueles que este trabalho retrata.Science, and the biomedical field especially, is witnessing a growth in knowledge at a rate at which clinicians and researchers struggle to keep up with. Scientific evidence spread across multiple types of scientific publications, the richness of mentions of etiology, molecular mechanisms, anatomical sites, as well as other biomedical terminology that is not uniform across different writings, among other constraints, have encouraged the application of text mining methods in the systematic reviewing process. This work aims to test the positive impact that text mining tools together with controlled vocabularies (as a way of organizing knowledge to aid, at a later time, to collect information) have on the systematic reviewing process, through a system capable of creating a classification model which training is based on a controlled vocabulary (MeSH) that can be applied to a variety of biomedical literature. For that purpose, this project was divided into two distinct tasks: the creation a system, consisting of a tool that searches the PubMed search engine for scientific articles and saves them according to pre-defined labels, and another tool that classifies a set of articles; and the analysis of the results obtained by the created system when applied to two different practical cases. The system was evaluated through a series of tests, using datasets whose classification results were previously known, allowing the confirmation of the obtained results. Afterwards, the system was tested by using two independently-created datasets which were manually curated by researchers working in the field of study. This last form of evaluation achieved, for example, precision scores as low as 68%, and as high as 81%. The results obtained emphasize the use of text mining tools, along with controlled vocabularies, such as MeSH, as a way to create more complex and comprehensive queries to improve the performance scores of classification problems, with which the theme of this work relates

Universidade de Lisboa: Repositório.UL

Novel Extensions of Label Propagation for Biomarker Discovery in Genomic Data

Author: Stokes Matthew
Publication venue
Publication date: 25/09/2014
Field of study

One primary goal of analyzing genomic data is the identification of biomarkers which may be causative of, correlated with, or otherwise biologically relevant to disease phenotypes. In this work, I implement and extend a multivariate feature ranking algorithm called label propagation (LP) for biomarker discovery in genome-wide single-nucleotide polymorphism (SNP) data. This graph-based algorithm utilizes an iterative propagation method to efficiently compute the strength of association between a SNP and a phenotype. I developed three extensions to the LP algorithm, with the goal of tailoring it to genomic data. The first extension is a modification to the LP score which yields a variable-level score for each SNP, rather than a score for each SNP genotype. The second extension incorporates prior biological knowledge that is encoded as a prior value for each SNP. The third extension enables the combination of rankings produced by LP and another feature ranking algorithm. The LP algorithm, its extensions, and two control algorithms (chi squared and sparse logistic regression) were applied to 11 genomic datasets, including a synthetic dataset, a semi-synthetic dataset, and nine genome-wide association study (GWAS) datasets covering eight diseases. The quality of each feature ranking algorithm was evaluated by using a subset of top-ranked SNPs to construct a classifier, whose predictive power was evaluated in terms of the area under the Receiver Operating Characteristic curve. Top-ranked SNPs were also evaluated for prior evidence of being associated with disease using evidence from the literature. The LP algorithm was found to be effective at identifying predictive and biologically meaningful SNPs. The single-score extension performed significantly better than the original algorithm on the GWAS datasets. The prior knowledge extension did not improve on the feature ranking results, and in some cases it reduced the predictive power of top-ranked variants. The ranking combination method was effective for some pairs of algorithms, but not for others. Overall, this work’s main results are the formulation and evaluation of several algorithmic extensions of LP for use in the analysis of genomic data, as well as the identification of several disease-associated SNPs

D-Scholarship@Pitt

The Future of Data Analysis in the Neurosciences

Author: Bzdok Danilo
Yeo B.T. Thomas
Publication venue: HAL CCSD
Publication date: 05/08/2016
Field of study

Neuroscience is undergoing faster changes than ever before. Over 100 years our field qualitatively described and invasively manipulated single or few organisms to gain anatomical, physiological, and pharmacological insights. In the last 10 years neuroscience spawned quantitative big-sample datasets on microanatomy, synaptic connections, optogenetic brain-behavior assays, and high-level cognition. While growing data availability and information granularity have been amply discussed, we direct attention to a routinely neglected question: How will the unprecedented data richness shape data analysis practices? Statistical reasoning is becoming more central to distill neurobiological knowledge from healthy and pathological brain recordings. We believe that large-scale data analysis will use more models that are non-parametric, generative, mixing frequentist and Bayesian aspects, and grounded in different statistical inferences

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-CEA

Practical Strategies for Extreme Missing Data Imputation in Dementia Diagnosis

Author: Bucholc Magda
Ding Xuemei
Finn David P.
Liu Shuo
McClean Paula L.
McCombe Niamh
Prasad Girijesh
Todd Stephen
Wong-Lin KongFatt
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/07/2021
Field of study

Ulster University's Research Portal

A practical computerized decision support system for predicting the severity of Alzheimer's disease of an individual

Author: Bjourson Anthony
Bucholc Magda
Ding Xuemei
Finn David
Glass David H.
Maguire Liam
McClean Paula
Prasad Girijesh
Todd Stephen
Wang H.
Wang Haiying / HY
Wong-Lin KongFatt
Publication venue: 'Elsevier BV'
Publication date: 15/09/2019
Field of study

Ulster University's Research Portal

On quantitative issues pertaining to the detection of epistatic genetic architectures

Author: Woodward Carleigh
Publication venue
Publication date: 25/01/2021
Field of study

Converging empirical evidence portrays epistasis (i.e., gene-gene interaction) as a ubiquitous property of genetic architectures and protagonist in complex trait variability. While researchers employ sophisticated technologies to detect epistasis, the scarcity of robust instances of detection in human populations is striking. To evaluate the empirical issues pertaining to epistatic detection, we analytically characterize the statistical detection problem and elucidate two candidate explanations. The first examines whether population-level manifestations of epistasis arising in nature are small; consequently, for sample-sizes employed in research, the power delivered by detectors may be disadvantageously small. The second considers whether gene-environmental association generates bias in estimates of genotypic values diminishing the power of detection. By simulation study, we adjudicate the merits of both explanations and the power to detect epistasis under four digenic architectures. In agreement with both explanations, our findings implicate small epistatic effect-sizes and gene-environmental association as mechanisms that obscure the detection of epistasis

Simon Fraser University Institutional Repository

Development of machine learning based computational tools for stratified healthcare and personalised medicine

Author: Prasad Bodhayan
Publication venue
Publication date: 01/06/2023
Field of study

Ulster University's Research Portal

Artificial intelligence for dementia research methods optimization

Author: Badhwar AmanPreet
Bucholc Magda
Clarke Natasha
Dehsarvi Amir
James Charlotte
Khleifat Ahmad Al
Llewellyn David J.
Lourida Ilianna
Madan Christopher R.
Marzi Sarah J.
Ranson Janice M.
Schilder Brian M.
Shand Cameron
Tamburin Stefano
Tantiangco Hanz M.
Publication venue
Publication date: 01/01/2023
Field of study

Artificial intelligence (AI) and machine learning (ML) approaches are increasingly being used in dementia research. However, several methodological challenges exist that may limit the insights we can obtain from high-dimensional data and our ability to translate these findings into improved patient outcomes. To improve reproducibility and replicability, researchers should make their well-documented code and modeling pipelines openly available. Data should also be shared where appropriate. To enhance the acceptability of models and AI-enabled systems to users, researchers should prioritize interpretable methods that provide insights into how decisions are generated. Models should be developed using multiple, diverse datasets to improve robustness, generalizability, and reduce potentially harmful bias. To improve clarity and reproducibility, researchers should adhere to reporting guidelines that are co-produced with multiple stakeholders. If these methodological challenges are overcome, AI and ML hold enormous promise for changing the landscape of dementia research and care. HIGHLIGHTS: Machine learning (ML) can improve diagnosis, prevention, and management of dementia. Inadequate reporting of ML procedures affects reproduction/replication of results. ML models built on unrepresentative datasets do not generalize to new datasets. Obligatory metrics for certain model structures and use cases have not been defined. Interpretability and trust in ML predictions are barriers to clinical translation

Catalogo dei prodotti della ricerca

Ulster University's Research Portal

King's Research Portal

Recommended from our members

Using molecular QTLs to identify cell types and causal variants for complex traits

Author: Schwartzentruber Jeremy Andrew
Publication venue: University of Cambridge
Publication date: 29/01/2018
Field of study

Genetic associations have been discovered for many human complex traits, and yet for most associated loci the causal variants and molecular mechanisms remain unknown. Studies mapping quantitative trait loci (QTLs) for molecular phenotypes, such as gene expression, RNA splicing, and chromatin accessibility, provide rich data that can link variant effects in specific cell types with complex traits. These genetic effects can also now be modeled in vitro by differentiating human induced pluripotent stem cells (iPSCs) into specific cell types, including inaccessible cell types such as those of the brain. In this thesis, I explore a range of approaches for using QTLs to identify causal variants and to link these with molecular functions and complex traits. In Chapter 2, I describe QTL mapping in 123 sensory neuronal cell lines differentiated from human iPSCs. I observed that gene expression was highly variable across iPSC-derived neuronal cultures in specific gene categories, and that a portion of this variability was explained by commonly used iPSC culture conditions, which influenced differentiation efficiency. A number of QTLs overlapped with common disease associations; however, using simulations I showed that identifying causal regulatory variants with a recall-by- genotype approach in iPSC-derived neurons is likely to require large sample sizes, even for variants with moderately large effect sizes. In Chapter 3, I developed a computational model that uses publicly available gene expression QTL data, along with molecular annotations, to generate cell type-specific probability of regulatory function (PRF) scores for each variant. I found that predictive power was improved when the model was modified to use the quantitative value of annotations. PRF scores outperformed other genome-wide scores, including CADD and GWAVA, in identifying likely causal eQTL variants. In Chapter 4, I used PRF scores to identify relevant cell types and to fine map potential causal variants using summary association statistics in six complex traits. By examining individual loci in detail, I showed how the enrichments contributing to a high PRF score are transparent, which can help to distinguish plausible causal variant predictions from model misspecification.Wellcome Trust Sanger Institut

Apollo (Cambridge)