423 research outputs found

    Optimizing Alzheimer's disease prediction using the nomadic people algorithm

    Get PDF
    The problem with using microarray technology to detect diseases is that not each is analytically necessary. The presence of non-essential gene data adds a computing load to the detection method. Therefore, the purpose of this study is to reduce the high-dimensional data size by determining the most critical genes involved in Alzheimer's disease progression. A study also aims to predict patients with a subset of genes that cause Alzheimer's disease. This paper uses feature selection techniques like information gain (IG) and a novel metaheuristic optimization technique based on a swarm’s algorithm derived from nomadic people’s behavior (NPO). This suggested method matches the structure of these individuals' lives movements and the search for new food sources. The method is mostly based on a multi-swarm method; there are several clans, each seeking the best foraging opportunities. Prediction is carried out after selecting the informative genes of the support vector machine (SVM), frequently used in a variety of prediction tasks. The accuracy of the prediction was used to evaluate the suggested system's performance. Its results indicate that the NPO algorithm with the SVM model returns high accuracy based on the gene subset from IG and NPO methods

    Identification of Novel Cancer-Related Genes with a Prognostic Role Using Gene Expression and Protein-Protein Interaction Network Data

    Get PDF
    Early cancer diagnosis and prognosis prediction are necessary for cancer patients. Effective identification of cancer-related genes and biomarkers and survival prediction for cancer patients would facilitate personalized treatment of cancer patients. This study aimed to investigate a method for integrating data regarding gene expression and protein-protein interaction networks to identify cancer-related prognostic genes via random walk with restart algorithm and survival analysis. Known cancer-related genes in protein-protein interaction networks were considered seed genes, and the random walk algorithm was used to identify candidate cancer-related genes. Thereafter, using the univariant Cox regression model, gene expression data were screened to identify survival-related genes. Furthermore, candidate genes and survival-related genes were screened to identify cancer-related prognostic genes. Finally, the effectiveness of the method was verified through gene function analysis and survival prediction. The results indicate that the cancer-related genes can be considered prognostic cancer biomarkers and provide a basis for cancer diagnosis

    Aproximaciones bioinformáticas para identificación de perfiles epigenéticos en procesos neuropatológicos

    Get PDF
    Degenerative neurological diseases, such as Alzheimer, Multiple Sclerosis or Huntington Disease, are illnesses that are not well-known while at the same time having a significant impact on the quality of life of the patients and their survival. The focus of this dissertation is finding biomarkers for the identification of these diseases, ideally in a rapid a reliable manner. The analysis was carried out using DNA CpG methylation data. In recent years there has been very significant technological improvements. It is currently possible to obtain the methylation levels for hundreds of thousands of CpG in a patient in a fast and reliable manner. It is however challenging to analyze these amounts of new data. A reasonable approach to tackle this issue is using machine learning techniques that have proven useful in many other fields. In this dissertation I developed a nonlinear approach to identifying combinations of CpGs DNA methylation data, as biomarkers for Alzheimer (AD) disease. It will be shown that this approach increases the accuracy of the detection on patients with AD when compared to directly using all the data available. I also analyzed the case of Huntington Disease (HD).Using nonlinear techniques I was able to reduce the number of CpGs considered from hundreds of thousands to 237 using a non-linear approach. It will be shown that using only these 237 CpGs and non-linear techniques such as artificial neural networks makes it possible to accurately differentiate between control and HD patients. Additionally, in this dissertation I present a technique, based on the concept of Shannon Entropy, to select CpGs as inputs for non-linear classification algorithms. It will be shown that this approach generates accurate classifications that are a statistically significant improvement over using all the data available or randomly selecting the same number of CpGs. The results seems to clearly illustrate that the analysis of the DNA methylation data, for the identification of patients suffering from the degenerative neurological diseases above mentioned, needs to be carefully carry out. Having the possibility of analyzing hundreds of thousands of CpGs level does not necessarily translate into better results as some of these levels might be unrelated and only adding noise to the analysis. It will be shown that the proposed algorithms generate accurate results while at the same time decreasing the number of CpGs used. For instance, in the case of Alzheimer the results obtained with the proposed algorithm generate a sensitivity of 0.9007 and a specificity of 0.9485. One of the underlying expectations is that in the future there will be curative treatments for these illnesses, which do not currently exists. It is also assumed that early detection, similarly to many other diseases, might be important when such treatments appear. Using the current technology it is relatively simple to analyze DNA methylation data and hence it can become an interesting biomarker in the context of these illnesses.Las enfermedades neurológicas degenerativas, como el Alzheimer, la Esclerosis Múltiple o la Enfermedad de Huntington son enfermedades que aún no son del todo conocidas y, al mismo tiempo, tienen un gran impacto en la calidad de vida del paciente y en su supervivencia. El enfoque de esta tesis es encontrar biomarcadores para la identificación de estas enfermedades, idealmente de una manera rápida y precisa. El análisis se llevó a cabo utilizando datos de metilación de ADN CpG. En los últimos años se han producido mejoras tecnológicas muy significativas. Actualmente es posible obtener los niveles de metilación para cientos de miles de CpG en un paciente de una manera rápida y confiable. Sin embargo, es difícil analizar estas cantidades de nuevos datos. Un enfoque razonable para abordar este problema es el uso de técnicas de aprendizaje automático que han demostrado ser útiles en muchos otros campos. En esta tesis doctoral desarrolle un enfoque no lineal para identificar combinaciones de datos de metilación del ADN (CpGs), como biomarcadores para la enfermedad de Alzheimer (EA). Se demostrará que este algoritmo aumenta la precisión de la detección en pacientes con EA en comparación con el uso directo de todos los datos disponibles. También analice el caso de la enfermedad de Huntington (EH). Usando técnicas no lineales pude reducir el número de CpG considerados de cientos de miles a 237 utilizando también un enfoque no lineal. Se demostrará que el uso de solo estos 237 CpG y técnicas no lineales como las redes neuronales artificiales permite diferenciar con precisión entre pacientes de control y EH. Adicionalmente, en esta tesis presento una técnica, basada en el concepto de Entropía de Shannon, para seleccionar CpGs como entradas para algoritmos de clasificación no lineal. Se demostrará que este enfoque genera clasificaciones precisas con una mejora estadísticamente significativa sobre el uso de todos los datos disponibles o la selección aleatoria del mismo número de CpG. Los resultados parecen ilustrar claramente que el análisis de los datos de metilación del ADN, para la identificación de pacientes que sufren de la enfermedad neurológica degenerativa antes mencionada, debe llevarse a cabo cuidadosamente. Tener la posibilidad de analizar cientos de miles de niveles de CpG no necesariamente se traduce en mejores resultados, ya que algunos de estos niveles pueden no estar relacionados y solo agregar ruido al análisis. Se demostrará que los algoritmos propuestos generan resultados precisos y, al mismo tiempo, disminuyen el número de CpG utilizados. Por ejemplo, en el caso del Alzheimer los resultados obtenidos con el algoritmo propuesto generan una sensibilidad de 0,9007 y una especificidad de 0,9485. Una de las expectativas subyacentes es que en el futuro habrá tratamientos curativos para estas enfermedades, que actualmente no existen. También se supone que la detección temprana, de manera similar a muchas otras enfermedades, podría ser importante cuando aparecen tales tratamientos. Utilizando la tecnología actual, es relativamente simple analizar los datos de metilación del ADN y, por lo tanto, puede convertirse en un biomarcador interesante en el contexto de estas enfermedades

    Transcription factor networks play a key role in human brain evolution and disorders

    Get PDF
    Although the human brain has been studied over past decades at morphological and histological levels, much remains unknown about its molecular and genetic mechanisms. Furthermore, when compared with our closest relative the chimpanzee, the human brain strikingly shows great morphological changes that have been often associated with our cognitive specializations and skills. Nevertheless, such drastic changes in the human brain may have arisen not only through morphological changes but also through changes in the expression levels of genes and transcripts. Gene regulatory networks are complex and large-scale sets of protein interactions that play a fundamental role at the core of cellular and tissue functions. Among the most important players of such regulatory networks are transcription factors (TFs) and the transcriptional circuitries in which TFs are the central nodes. Over past decades, several studies have focused on the functional characterization of brain-specific TFs, highlighting their pathways, interactions, and target genes implicated in brain development and often disorders. However, one of the main limitations of such studies is the data collection which is generally based on an individual experiment using a single TF. To understand how TFs might contribute to such human-specific cognitive abilities, it is necessary to integrate the TFs into a system level network to emphasize their potential pathways and circuitry. This thesis proceeds with a novel systems biology approach to infer the evolution of these networks. Using human, chimpanzee, and rhesus macaque, we spanned circa 35 million years of evolution to infer ancestral TF networks and the TF-TF interactions that are conserved or shared in important brain regions. Additionally, we developed a novel method to integrate multiple TF networks derived from human frontal lobe next-generation sequencing data into a high confidence consensus network. In this study, we also integrated a manually curated list of TFs important for brain function and disorders. Interestingly, such “Brain-TFs” are important hubs of the consensus network, emphasizing their biological role in TF circuitry in the human frontal lobe. This thesis describes two major studies in which DNA microarray and RNA-sequencing (RNA-seq) datasets have been mined, directing the TFs and their potential target genes into co-expression networks in human and non-human primate brain genome-wide expression datasets. In a third study we functionally characterized ZEB2, a TF implicated in brain development and linked with Mowat-Wilson syndrome, using human, chimpanzee, and orangutan cell lines. This work introduces not only an accurate analysis of ZEB2 targets, but also an analysis of the evolution of ZEB2 binding sites and the regulatory network controlled by ZEB2 in great apes, spanning circa 16 million years of evolution. In summary, those studies demonstrated the critical role of TFs on the gene regulatory networks of human frontal lobe evolution and functions, emphasizing the potential relationships between TF circuitries and such cognitive skills that make humans unique

    Machine Learning Application in Genomic, Exercise, and Vital Datasets

    Get PDF
    Abstracts PURPOSE Machine learning (ML) refers to newly developed computer algorithms that are improved through iterative experiences. ML applications are expected to assist humans in analyzing large amounts of data. This review has outlined the application of ML in analyzing variable vital data such as walking steps, exercise intensity, heart rate, sleeping hours, sleep quality, resting heart rate, blood pressure, and calorie consumption in a day. Vital data consist of different variables that are closely related to genomic or exercise data. The prediction of healthy traits from a vital dataset has become a necessity in personalized medicine. METHODS Considerations and repeated tasks in supervised, semi-supervised, and unsupervised ML methods are presented. ML methods such as artificial neural networks, Bayesian networks, support vector machines, and decision trees have been widely used in biomedical studies to develop predictive models. Through vital data, these models can help in effective and accurate decision-making for a healthier life. PURPOSE Models based on genomic, exercise, and vital datasets provide a healthy lifestyle through regular exercise. We have provided guidelines to help in the selection of these ML methods and their practical application for variable vital data analysis. CONCLUSIONS Our guidelines could serve as a foundation for implementing both participatory medicine and data-driven exercise science

    Exploring the Epigenome of Neurons and Glia in Vitro to Determine their Utility as a Model for Alzheimer's Disease

    Get PDF
    Alzheimer’s disease is a progressive neurodegenerative condition that is characterised by distinct neuropathological changes. Within the last decade post mortem human brain samples have been used to show that there are robust epigenetic changes occurring in the brain during disease. However, as these samples are collected shortly after death they are a reflection of only the very end stages of disease. Through the exposure of differentiated adult cells to exogenous reprogramming factors it is now possible to generate induced pluripotent stem cells which have the potential to differentiate into any cell type in the body. Over recent years reseach has moved towards using these stem cells to generate neurons or microglia in order to study diseases of ageing such as Alzheimer’s disease. However, there are relatively few epigenetic studies that have been undertaken using induced pluripotent stem cells. As there are global cellular epigenetic changes occurring during the induction of pluripotency and re-differentiation it is critical to ensure we understand the DNA methylation changes occurring during normal neuronal differentiation before using these as a model of Alzheimer’s disease or other diseases of ageing. The aim of this thesis is to first characterise the DNA methylation changes that are occurring in neuronal and microglial models that are exposed to AD-relevant exposures such as differentiation and maturation, drug treatment and immune challenge. This will largely be achieved through measuring DNA methylation using the Illumina Infinium HumanMethylationEPIC BeadChip array which provides information on the DNA methylation levels at over 850,000 loci across the genome.Alzheimers Research UKAlzheimer's Societ

    Regulatory genomic consequences of polygenic risk burden for Alzheimer’s disease

    Get PDF
    Dementia is an umbrella term used to describe a group of symptoms associated with global cognitive impairment and is a major contributor to the global burden of disease; currently there are over 50 million individuals affected world-wide. Due to the ageing population and lack of effective disease-modifying treatments, this number is expected to triple by 2050. Dementia encompasses a number of neurological diseases, including Alzheimer’s disease (AD), which accounts for 60-80% of cases. There is a well-established genetic component to AD and genome wide-association studies have identified >75 variants robustly associated with disease. Little is known about the functional mechanisms by which risk variants mediate disease susceptibility; as the majority of these variants do not index coding variants affecting protein structure they are hypothesised to influence gene regulation, supported by the observation that they are enriched in regulatory domains including enhancers. The primary aim of this thesis was to assess whether genetic liability for AD is associated with regulatory genomic variation (i.e. epigenetic and transcriptomic) in whole blood and the human cortex. Epigenome-wide association studies and multi-omic methods were utilised to explore the molecular mechanisms leading to disease. The results from this thesis indicate that epigenetic mechanisms are involved in AD pathogenesis and provide further support for several established AD pathways such as lipid and cholesterol metabolism, Aβ, tau and APP processing as well as a role for the immune system. The analyses incorporating AD genetic variation with DNA methylation infer that there are both direct cis genetic effects and indirect polygenic effects on regulatory processes which are involved in the aetiology of AD. Although there were consistencies at some loci across the whole blood and cortex analyses, there was also evidence for heterogeneity across tissues which might represent tissue specific effects in areas primarily affected in AD (e.g. the cortex) in comparison to peripheral tissues. In summary, using multiple approaches, I characterised the complex relationship between genetic and epigenetic variation, enabling the exploration of molecular genomic mechanisms driving AD pathogenesis in both peripheral and brain tissues and prioritised genes which could be targeted in future functional studies

    Methods in and Applications of the Sequencing of Short Non-Coding RNAs

    Get PDF
    Short non-coding RNAs are important for all domains of life. With the advent of modern molecular biology their applicability to medicine has become apparent in settings ranging from diagonistic biomarkers to therapeutics and fields ranging from oncology to neurology. In addition, a critical, recent technological development is high-throughput sequencing of nucleic acids. The convergence of modern biotechnology with developments in RNA biology presents opportunities in both basic research and medical settings. Here I present two novel methods for leveraging high-throughput sequencing in the study of short non-coding RNAs, as well as a study in which they are applied to Alzheimer\u27s Disease (AD). The computational methods presented here include High-throughput Annotation of Modified Ribonucleotides (HAMR), which enables researchers to detect post-transcriptional covalent modifications to RNAs in a high-throughput manner. In addition, I describe Classification of RNAs by Analysis of Length (CoRAL), a computational method that allows researchers to characterize the pathways responsible for short non-coding RNA biogenesis. Lastly, I present an application of the study of non-coding RNAs to Alzheimer\u27s disease. When applied to the study of AD, it is apparent that several classes of non-coding RNAs, particularly tRNAs and tRNA fragments, show striking changes in the dorsolateral prefrontal cortex of affected human brains. Interestingly, the nature of these changes differs between mitochondrial and nuclear tRNAs, implicating an association between Alzheimer\u27s disease and perturbation of mitochondrial function. In addition, by combining known genetic factors of AD with genes that are differentially expressed and targets of regulatory RNAs that are differentially expressed, I construct a network of genes that are potentially relevant to the pathogenesis of the disease. By combining genetics data with novel results from the study of non-coding RNAs, we can further elucidate the molecular mechanisms that underly Alzheimer\u27s disease pathogenesis

    Examining epigenetic variation in the brain in mental illness

    Get PDF
    Mental health represents one of the most significant and increasing burdens to global public health. Depression and schizophrenia, among other mental illnesses, constitute strong risk factors for suicidality which results in over 800,000 deaths every year. The majority of suicides worldwide are indeed related to psychiatric diseases. A growing body of genetic, epigenetic and epidemiological evidence suggests that psychiatric disorders are highly complex phenotypes originating from the multilevel interplay between the strong genetic component and a range of environmental and psychosocial factors. Deeper understanding about the biology of the genome has led to increased interest for the role of non-sequence-based variation in the etiology of neuropsychiatric phenotypes, including suicidality. Epigenetic alterations and gene expression dysregulation have been repetitively reported in post-mortem brain of individuals who died by suicide. To date, however, studies characterizing disease-associated methylomic and transcriptomic variation in the brain have been limited by screening performed in bulk tissue and by the assessment of a single marker at a time. The main aim of this thesis was to investigate DNA methylation and miRNA expression differences in post-mortem brain associated with suicidality and unravel the complexity of epigenetic signals in a heterogeneous tissue like the human brain by developing a method to profile genomic variation at the resolution of individual neural cell types. The results here reported, provide further support for a suicide-specific epigenetic signature, independent from comorbidity with other psychiatric phenotypes, as well as confirming the strong bias perpetrated by bulk tissue studies hence the need to examine genomic variations in purified cell types. In summary, this thesis has identified a) a suicide-specific signal in two different epigenetic markers (DNA methylation and miRNA expression) and b) a protocol to simultaneously profile DNA methylation levels across three purified cell types in the healthy brain highlighting the utility of cell sorting for identifying cell type-driven epigenetic differences associated with etiological variation in complex psychiatric phenotypes.1) ARUK-PPG2018A-010 – “Developing approaches to address neural cell heterogeneity in genomic studies of Alzheimer's disease”. 2) SBF001\1011 - “Using functional epigenomics to dissect the molecular architecture of schizophrenia
    corecore