234 research outputs found
Random walks on mutual microRNA-target gene interaction network improve the prediction of disease-associated microRNAs
Background: MicroRNAs (miRNAs) have been shown to play an important role in pathological initiation, progression and maintenance. Because identification in the laboratory of disease-related miRNAs is not straightforward, numerous network-based methods have been developed to predict novel miRNAs in silico. Homogeneous networks (in which every node is a miRNA) based on the targets shared between miRNAs have been widely used to predict their role in disease phenotypes. Although such homogeneous networks can predict potential disease-associated miRNAs, they do not consider the roles of the target genes of the miRNAs. Here, we introduce a novel method based on a heterogeneous network that not only considers miRNAs but also the corresponding target genes in the network model. Results: Instead of constructing homogeneous miRNA networks, we built heterogeneous miRNA networks consisting of both miRNAs and their target genes, using databases of known miRNA-target gene interactions. In addition, as recent studies demonstrated reciprocal regulatory relations between miRNAs and their target genes, we considered these heterogeneous miRNA networks to be undirected, assuming mutual miRNA-target interactions. Next, we introduced a novel method (RWRMTN) operating on these mutual heterogeneous miRNA networks to rank candidate disease-related miRNAs using a random walk with restart (RWR) based algorithm. Using both known disease-associated miRNAs and their target genes as seed nodes, the method can identify additional miRNAs involved in the disease phenotype. Experiments indicated that RWRMTN outperformed two existing state-of-the-art methods: RWRMDA, a network-based method that also uses a RWR on homogeneous (rather than heterogeneous) miRNA networks, and RLSMDA, a machine learning-based method. Interestingly, we could relate this performance gain to the emergence of "disease modules" in the heterogeneous miRNA networks used as input for the algorithm. Moreover, we could demonstrate that RWRMTN is stable, performing well when using both experimentally validated and predicted miRNA-target gene interaction data for network construction. Finally, using RWRMTN, we identified 76 novel miRNAs associated with 23 disease phenotypes which were present in a recent database of known disease-miRNA associations. Conclusions: Summarizing, using random walks on mutual miRNA-target networks improves the prediction of novel disease-associated miRNAs because of the existence of "disease modules" in these networks
Prediction of miRNA-disease associations with a vector space model
MicroRNAs play critical roles in many physiological processes. Their
dysregulations are also closely related to the development and progression of
various human diseases, including cancer. Therefore, identifying new microRNAs
that are associated with diseases contributes to a better understanding of
pathogenicity mechanisms. MicroRNAs also represent a tremendous opportunity in
biotechnology for early diagnosis. To date, several in silico methods have been
developed to address the issue of microRNA-disease association prediction.
However, these methods have various limitations. In this study, we investigate
the hypothesis that information attached to miRNAs and diseases can be revealed
by distributional semantics. Our basic approach is to represent distributional
information on miRNAs and diseases in a high-dimensional vector space and to
define associations between miRNAs and diseases in terms of their vector
similarity. Cross validations performed on a dataset of known miRNA-disease
associations demonstrate the excellent performance of our method. Moreover, the
case study focused on breast cancer confirms the ability of our method to
discover new disease-miRNA associations and to identify putative false
associations reported in databases
Untargeted sequencing of circulating microRNAs in a healthy and diseased older population
We performed untargeted profiling of circulating microRNAs (miRNAs) in a well characterized cohort of older adults to verify associations of health and disease-related biomarkers with systemic miRNA expression. Differential expression analysis revealed 30 miRNAs that significantly differed between healthy active, healthy sedentary and sedentary cardiovascular risk patients. Increased expression of miRNAs miR-193b-5p, miR-122-5p, miR-885-3p, miR-193a-5p, miR-34a-5p, miR-505-3p, miR-194-5p, miR-27b-3p, miR-885-5p, miR-23b-5b, miR-365a-3p, miR-365b-3p, miR-22-5p was associated with a higher metabolic risk profile, unfavourable macro- and microvascular health, lower physical activity (PA) as well as cardiorespiratory fitness (CRF) levels. Increased expression of miR-342-3p, miR-1-3p, miR-92b-5p, miR-454-3p, miR-190a-5p and miR-375-3p was associated with a lower metabolic risk profile, favourable macro- and microvascular health as well as higher PA and CRF. Of note, the first two principal components explained as much as 20% and 11% of the data variance. miRNAs and their potential target genes appear to mediate disease- and health-related physiological and pathophysiological adaptations that need to be validated and supported by further downstream analysis in future studies.Clinical Trial Registration: ClinicalTrials.gov: NCT02796976 ( https://clinicaltrials.gov/ct2/show/NCT02796976 )
A network-based approach to uncover microRNA-mediated disease comorbidities and potential pathobiological implications.
Disease-disease relationships (e.g., disease comorbidities) play crucial roles in pathobiological manifestations of diseases and personalized approaches to managing those conditions. In this study, we develop a network-based methodology, termed meta-path-based Disease Network (mpDisNet) capturing algorithm, to infer disease-disease relationships by assembling four biological networks: disease-miRNA, miRNA-gene, disease-gene, and the human protein-protein interactome. mpDisNet is a meta-path-based random walk to reconstruct the heterogeneous neighbors of a given node. mpDisNet uses a heterogeneous skip-gram model to solve the network representation of the nodes. We find that mpDisNet reveals high performance in inferring clinically reported disease-disease relationships, outperforming that of traditional gene/miRNA-overlap approaches. In addition, mpDisNet identifies network-based comorbidities for pulmonary diseases driven by underlying miRNA-mediated pathobiological pathways (i.e., hsa-let-7a- or hsa-let-7b-mediated airway epithelial apoptosis and pro-inflammatory cytokine pathways) as derived from the human interactome network analysis. The mpDisNet offers a powerful tool for network-based identification of disease-disease relationships with miRNA-mediated pathobiological pathways
NETWORK ANALYTICS FOR THE MIRNA REGULOME AND MIRNA-DISEASE INTERACTIONS
miRNAs are non-coding RNAs of approx. 22 nucleotides in length that inhibit gene expression at the post-transcriptional level. By virtue of this gene regulation mechanism, miRNAs play a critical role in several biological processes and patho-physiological conditions, including cancers. miRNA behavior is a result of a multi-level complex interaction network involving miRNA-mRNA, TF-miRNA-gene, and miRNA-chemical interactions; hence the precise patterns through which a miRNA regulates a certain disease(s) are still elusive. Herein, I have developed an integrative genomics methods/pipeline to (i) build a miRNA regulomics and data analytics repository, (ii) create/model these interactions into networks and use optimization techniques, motif based analyses, network inference strategies and influence diffusion concepts to predict miRNA regulations and its role in diseases, especially related to cancers. By these methods, we are able to determine the regulatory behavior of miRNAs and potential causal miRNAs in specific diseases and potential biomarkers/targets for drug and medicinal therapeutics
TLHNMDA: Triple Layer Heterogeneous Network Based Inference for MiRNA-Disease Association Prediction
In recent years, microRNAs (miRNAs) have been confirmed to be involved in many important biological processes and associated with various kinds of human complex diseases. Therefore, predicting potential associations between miRNAs and diseases with the huge number of verified heterogeneous biological datasets will provide a new perspective for disease therapy. In this article, we developed a novel computational model of Triple Layer Heterogeneous Network based inference for MiRNA-Disease Association prediction (TLHNMDA) by using the experimentally verified miRNA-disease associations, miRNA-long noncoding RNA (lncRNA) interactions, miRNA function similarity information, disease semantic similarity information and Gaussian interaction profile kernel similarity for lncRNAs into an triple layer heterogeneous network to predict new miRNA-disease associations. As a result, the AUCs of TLHNMDA are 0.8795 and 0.8795 ± 0.0010 based on leave-one-out cross validation (LOOCV) and 5-fold cross validation, respectively. Furthermore, TLHNMDA was implemented on three complex human diseases to evaluate predictive ability. As a result, 84% (kidney neoplasms), 78% (lymphoma) and 76% (prostate neoplasms) of top 50 predicted miRNAs for the three complex diseases can be verified by biological experiments. In addition, based on the HMDD v1.0 database, 98% of top 50 potential esophageal neoplasms-associated miRNAs were confirmed by experimental reports. It is expected that TLHNMDA could be a useful model to predict potential miRNA-disease associations with high prediction accuracy and stability
Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach
Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases
Computational analysis of the colonic transciptome & in vitro biomarker analysis using a novel microfluidic quantum dot linked immunoassay
DNA microarray technology facilitates the high throughput analysis of transcriptional disease regulation by measuring the relative expression levels of transcripts present within a tissue. While such computational approaches have been used to study the genetic regulation of a variety of illnesses, such studies often suffer from inadequate patient sample sizes and statistical power resulting in conflicting results and lab-specific bias. In order to overcome these limitations and fully utilize the wealth of publicly available genomic data, an integrated microarray analysis method was used to analyze and interpret microarray data in the context of colonic diseases including colorectal carcinoma (CRC) and inflammatory bowel disease (IBD).The results of this work indicate widespread genetic perturbations related to IBD in which a variety of cell types are implicated including resident host enterocytes, innate and adaptive immune cells as well as native luminal microflora. Our work has identified subtle genetic differences between IBD phenotypes for the realization of disease specific therapeutic treatments as well as novel diagnostic biomarkers. Furthermore, our analysis has revealed significant overlap in the genetic regulation and predisposition to IBD, lupus, type 1 diabetes, graves disease and rheumatoid arthritis, providing the first genetic link between the enteropathic disease symptoms associated with IBD. Druggable pathways involved in these diseases as well as known therapeutic drug targets were also analyzed for the potential repositioning of existing therapeutics for the treatment of IBD.IBD patients are known to be at an elevated risk for developing colorectal carcinoma, with risk increasing with the duration of the disease. In order to better understand the phenotype shift from IBD to cancerous phenotypes, integrated microarray analysis was used to identify gene signatures, implicated pathways and novel discriminatory biomarkers for differentiating between IBD and CRC phenotypes. Our diagnostic panels were shown to accurately differentiate between phenotypes using an independent dataset for validation.In order to transition the identified biomarkers to the clinic for diagnostic use, a novel microfluidic quantum dot linked immunosorbent assay platform was developed with enhanced surface chemistry and reaction kinetics. The developed prototype has the capability of multiplexed biomarker detection within clinically relevant samples for the stratification of disease phenotypes. In order to validate our design, human samples spiked with the fecal IBD biomarker lactoferrin were analyzed. Results indicate increased sensitivity and signal to noise ratios over our predicate device, with a reduction of the limit of detection. This proof of concept device shows great promise as a portable bedside diagnostic device for multiplexed biomarker analysis within a clinical setting.Ph.D., Biomedical Engineering -- Drexel University, 201
Discovery of tissue specific network properties associated with cancer driver genes
Tese de Mestrado em Bioquímica, Faculdade de Ciências, Universidade de Lisboa, 2022Using the notion of disease modules, network medicine has effectively identified diseaseassociated genes in recent years. In biological networks, genes linked to a particular illness tend to
interact closely [1]. These networks allow both physical and functional connections between
biomolecules to be identified, resulting in a map of cell components and processes that constitute
biological systems [2]. Not all disease-associated genes, however, have a major impact on disease
phenotype.
The discovery of important genes able to produce or change disease phenotype paves the path
to new therapies and a personalized medicine strategy. Recent research has found that biological
network topological features per se may accurately predict perturbation effects in a dynamical model of
the system with a 65-80% accuracy [3, 4].
Biological networks differ depending on whatever tissue or cell type is being studied. As a
result, each gene's topological features and ability to impact the system may alter [5].
The main goal of this thesis is to discover network topological parameters associated with
influential cancer driver genes using context specific networks. In order to achieve this, we evaluated
local network features around each driver gene across multiple tissue specific networks, including
tissues that are affected in the disease and others where the gene perturbation has no significant effect.
We aimed to identify topological parameters and its characteristics contributing to the cancer driver
gene’s influential role.
The results of this dissertation point out that several topological parameters can be used to
determine cancer “driver” genes. We found that these genes have higher values of topological
parameters, such as Degree or Closeness, in tissues where they tend to cause cancer. We also found that
this difference is present in oncogenes and tumor suppressor genes. Another factor that we found to
influence the value of topological parameters is the number of tissues in which these genes cause the
disease. There is an increasing trend of topological parameter values with the increase of the number of
tissues in which they cause cancer. Together, these results support the significant association of
topological parameters like the Degree with the influential role of a driver gene in cancer.Usando a noção de módulos de doença, a medicina de redes identificou eficazmente nos últimos
anos genes associados a doenças. Nas redes biológicas, os genes ligados a uma determinada doença
tendem a interagir proximamente [1] . Essas redes permitem que conexões físicas e funcionais entre
biomoléculas sejam identificadas, resultando num mapa de componentes celulares e processos que
constituem sistemas biológicos [2]. Nem todos os genes associados à doença, no entanto, têm um grande
impacto no fenótipo da doença.
A descoberta de genes importantes capazes de produzir ou alterar o fenótipo da doença abre
caminho para novas terapias e uma estratégia de medicina personalizada. Pesquisas recentes
descobriram que as características topológicas da rede biológica podem prever com precisão os efeitos
de perturbação num modelo dinâmico do sistema com uma precisão de 65-80% [3, 4].
As redes biológicas diferem dependendo do tipo de tecido ou célula estudado. Como resultado,
as características topológicas de cada gene e a capacidade de impactar o sistema podem ser alteradas
[5].
O principal objetivo desta dissertação é descobrir parâmetros topológicos de rede associados a
genes promotores de cancro usando redes específicas de tecido. Para conseguir isso, avaliamos as
características da rede local em torno de cada gene promotor em várias redes específicas de tecidos,
incluindo tecidos afetados pela doença e outros onde a perturbação do gene não tem efeito significativo.
Deste modo, podemos identificar parâmetros topológicos e as características que contribuem para o
papel influente dos genes promotores do cancro.
Para atingir os nossos objetivos, começámos por construir e otimizar as nossas redes específicas
de tecidos. Cada rede específica de tecido foi construída usando quatro bases de dados diferentes de
interações proteína-proteína, vias de sinalização e fatores de transcrição. Tentámos quatro métodos
diferentes de construir as redes, incluindo o uso do filtro de níveis de expressão génica acima de 0,1 e 5
transcritos por milhão em cada tecido. Construímos também uma matriz associando os genes promotores
de cancro (retirados de uma base de dados online de genes promotores de cancro) aos tecidos onde
provocam a doença. Cada gene promotor foi inserido em seis categorias diferentes de acordo com o
número de tecidos onde provocam cancro, sendo a categoria seis aquela que inclui os genes que
provocam a doença em seis ou mais tecidos. Começámos por comparar os valores dos parâmetros
topológicos dos genes em tecidos onde estes provocam a doença versus os seus valores em tecidos onde
não a provocam. Esses valores também foram comparados com uma lista de genes associados ao cancro
(retirados de uma base de dados online de genes associados a doenças), mas não promotores de cancro,
e uma lista de genes não associados a nenhuma doença. Este estudo foi feito sobre os quatro diferentes
métodos de construção de rede. Continuámos o estudo observando como os parâmetros topológicos mostraram diferenças ao nível do tecido. Analisámos em cada tecido os valores dos parâmetros
topológicos dos genes promotores que causam a doença num determinado tecido versus os valores dos
genes que não causam doença naquele tecido. Depois de comparar os valores dos parâmetros
topológicos usando todos os genes promotores juntos num grupo global, queríamos verificar se a
diferença entre os valores destes nos tecidos onde causam cancro versus os valores nos tecidos onde não
provocam a doença, também estava presente dentro das categorias do número de tecidos onde os genes
promotores causam cancro e como esses valores aumentam ou diminuem ao longo dessas categorias.
Avaliamos em seguida o impacto combinado dos valores dos parâmetros topológicos (selecionando o
parâmetro topológico “Degree”) de genes promotores de cancro em tecidos onde causam doença versus
onde não causam e também a diferença entre estes ao longo das seis diferentes categorias de número de
tecidos onde provocam cancro, usando um Modelo Linear Generalizado (GLM) para avaliar a interação
desses fatores.
Da base de dados de onde retiramos a lista de genes promotores de cancro, também retiramos
uma lista de oncogenes e genes supressores de tumor que usámos para avaliar também as diferenças dos
valores dos seus parâmetros topológicos nos tecidos onde causam cancro versus os tecidos onde não
causam. A fim de avaliar outras variáveis que possam ter impacto para além dos parâmetros topológicos
e que possam também diferir dependendo do número de tecidos onde os genes “drivers” causam a
doença, usamos os dados da base de dados de onde retiramos os genes promotores que incluíam
informações sobre o número de interações que cada gene promotor estabelece com diferentes miRNA e
sobre o número de complexos proteicos que estes genes integram. Também avaliamos o impacto da
expressão génica nas diferentes categorias de número de tecidos. Por fim, enriquecemos funcionalmente
os genes promotores de cancro, usando dois métodos diferentes. No primeiro método usamos os genes
que tinham uma diferença topológica maior (para este estudo usamos apenas o parâmetro topológico
“Degree”) entre os tecidos onde causam ou não cancro. Classificamos cada gene como positivo,
negativo e não significativo com base na diferença entre o valor médio do “Degree” nos tecidos onde
causam cancro versus o valor nos tecidos onde não causam. O segundo método foi o enriquecimento
dos diferentes genes promotores de cancro de acordo com o número de tecidos que causam cancro.
Fizemos esse estudo usando as diferentes categorias de número de tecidos.
Globalmente, os nossos resultados sugerem que os valores dos parâmetros topológicos (por
exemplo, “Degree“ e “Closeness”) tendem a ser maiores nos tecidos em que os genes promoteres de
cancro provocam a doença ( “Tissue Drivers”), seguidos pelos valores dos genes de cancro que são não
promotores de cancro mas estão associados ao desenvolvimento da doença (“Disease Genes”), os
valores dos genes promotores de cancro nos tecidos onde não causam cancro (“NonTissueDrivers”) e
por último, com os menores valores de parâmetros topológicos, os genes que não estão associados a
qualquer doença. A diferença entre os valores dos parâmetros topológicos nos “TissueDrivers” versus
“NonTissueDrivers” é estatisticamente significativa na maioria dos parâmetros topológicos testados e
nos diferentes métodos de rede utilizados, exceto no método “JustHuRiTPM5Zminmax” (usando apenas a base de dados Huri). Quando analisámos em cada tecido os valores dos parâmetros topológicos,
pudemos ver que os valores de “Degree” tendem a ser maiores nos genes promotores de cancro que
causam cancro naquele tecido em comparação com os genes promotores que não provocam cancro nesse
tecido. Essa diferença é estatisticamente significativa em muitos dos tecidos analisados.
Em relação a como os valores dos parâmetros topológicos se comportam ao longo das diferentes
categorias associadas ao número de tecidos em que os genes promotores causam cancro, descobrimos
que nos genes promotores de cancro que causam doença em apenas em um e dois tecidos, o valor do
“Degree” nos tecidos onde causam cancro é menor que o valor apresentado nos tecidos onde não causam
cancro. Observamos a tendência inversa nos genes promotores que causam cancro em seis ou mais
tecidos (o valor do “Degree” é maior nos tecidos onde causam cancro). Observamos também que o valor
do “Degree” aumenta gradativamente ao longo do número da categoria de tecidos, atingindo o valor
mais alto na categoria seis (constituída por genes promotores que provocam cancro em seis ou mais
tecidos).
No modelo linear generalizado (GLM), pudemos ver o efeito combinado da variável tipo de
tecido (onde o gene promotor provoca ou não cancro, mostrando uma diferença estatisticamente
significativa entre estas duas situações) e da variável número de tecidos onde os genes promotores
provocam cancro (mostrando também uma valor estatisticamente significativo entre as diferentes
categorias). A interação entre esses dois fatores também foi estatisticamente significativa.
Também pudemos observar valores de “Degree” estatisticamente diferentes entre os genes
promotores supressores de tumor nos tecidos que causam cancro (com valores mais altos) e os valores
nos tecidos onde não causam. Vimos também a mesma diferença nos Oncogenes, mas com menor
significância. Os valores do “Degree” nos genes Supressores de Tumores foram inferiores aos valores
do “Degree” apresentados pelos Oncogenes.
Pudemos igualmente ver uma clara tendência de correlação entre o aumento do número de
tecidos com o aumento do número de complexos que os genes promotores de cancro integram. O mesmo
comportamento foi observado em relação ao número de miRNAs com os quais os genes promotores
interagem.
Em relação à expressão do mRNA ao longo das categorias de número de tecidos, pudemos ver
uma diferença estatisticamente significativa nas categorias dois e três entre os valores dos genes
promotores(em relação ao parâmetro topológico “Degree”) nos tecidos onde causam cancro versus onde
não causam.
Finalmente, no estudo de enriquecimento de funções pudemos ver que os processos biológicos,
funções moleculares e componentes celulares que obtivemos enriquecidos usando o método das
diferentes categorias de número de tecidos estão muito mais relacionados com os processos de cancro
baseados na literatura (“hallmarks of cancer”). Não conseguimos encontrar uma divisão muito clara
entre funções biológicas enriquecidas que tiveram uma diferença de z-score do “Degree” acima de 1 e aqueles com diferença abaixo de -1. Não encontramos nenhum processo de enriquecimento funcional
relevante em nenhum desses dois grupos de genes e que de alguma forma os pudesse distinguir entre si.
Os resultados desta dissertação apontam para que vários parâmetros topológicos possam estar
associados a genes promotores de cancro. Verificámos que estes genes têm valores de parâmetros
topológicos, como o Degree ou Closeness, mais elevados nos tecidos onde tendencionalmente provocam
cancro. Verificámos também que esta diferença está presente nos oncogenes e nos genes supressores de
tumor. Outro fator que verificamos influenciar o valor dos parâmetros topológicos, é o número de
tecidos em que estes genes provocam a doença. Há uma tendência crescente do valor topológico com
um número de tecidos em que provocam cancro
- …