9 research outputs found

    Predicting Parkinson's Disease Genes Based on Node2vec and Autoencoder

    Get PDF
    Identifying genes associated with Parkinson's disease plays an extremely important role in the diagnosis and treatment of Parkinson's disease. In recent years, based on the guilt-by-association hypothesis, many methods have been proposed to predict disease-related genes, but few of these methods are designed or used for Parkinson's disease gene prediction. In this paper, we propose a novel prediction method for Parkinson's disease gene prediction, named N2A-SVM. N2A-SVM includes three parts: extracting features of genes based on network, reducing the dimension using deep neural network, and predicting Parkinson's disease genes using a machine learning method. The evaluation test shows that N2A-SVM performs better than existing methods. Furthermore, we evaluate the significance of each step in the N2A-SVM algorithm and the influence of the hyper-parameters on the result. In addition, we train N2A-SVM on the recent dataset and used it to predict Parkinson's disease genes. The predicted top-rank genes can be verified based on literature study

    Network-Based Methods for Approaching Human Pathologies from a Phenotypic Point of View

    Get PDF
    Network and systemic approaches to studying human pathologies are helping us to gain insight into the molecular mechanisms of and potential therapeutic interventions for human diseases, especially for complex diseases where large numbers of genes are involved. The complex human pathological landscape is traditionally partitioned into discrete “diseases”; however, that partition is sometimes problematic, as diseases are highly heterogeneous and can differ greatly from one patient to another. Moreover, for many pathological states, the set of symptoms (phenotypes) manifested by the patient is not enough to diagnose a particular disease. On the contrary, phenotypes, by definition, are directly observable and can be closer to the molecular basis of the pathology. These clinical phenotypes are also important for personalised medicine, as they can help stratify patients and design personalised interventions. For these reasons, network and systemic approaches to pathologies are gradually incorporating phenotypic information. This review covers the current landscape of phenotype-centred network approaches to study different aspects of human diseasesThis work was partially funded by The Spanish Ministry of Economy and Competitiveness with European Regional Development Fund [grant numbers PID2019-108096RB-C21 and PID2019-108096RB-C22]; the European Food Safety Authority [grant number GP/EFSA/ENCO/2020/02]; the Andalusian Government with European Regional Development Fund [grant numbers UMA18- FEDERJA-102 and PAIDI 2020:PY20-00372]; Fundacion Progreso y Salud [grant number PI-0075-2017], also from the Andalusian Government; the Ramón Areces foundation, which funds project for the investigation of rare disease (National call for research on life and material sciences, XIX edition); the University of Malaga (Ayudas del I Plan Propio) and the Institute of Health Carlos III which funds the IMPaCT-Data project. The CIBERER is an initiative from the Institute of Health Carlos III. The conclusions, findings and opinions expressed in this scientific paper reflect only the view of the authors and not the official position of the European Food Safety Authority. Partial funding for open access charge: Universidad de Málag

    k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification

    Get PDF
    In this paper, a computational method based on machine learning technique for identifying Alzheimer's disease genes is proposed. Compared with most existing machine learning based methods, existing methods predict Alzheimer's disease genes by using structural magnetic resonance imaging (MRI) technique. Most methods have attained acceptable results, but the cost is expensive and time consuming. Thus, we proposed a computational method for identifying Alzheimer disease genes by use of the sequence information of proteins, and classify the feature vectors by random forest. In the proposed method, the gene protein information is extracted by adaptive k-skip-n-gram features. The proposed method can attain the accuracy to 85.5% on the selected UniProt dataset, which has been demonstrated by the experimental results

    Exposing the Causal Effect of Body Mass Index on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study

    Get PDF
    Introduction: High body mass index (BMI) is a positive associated phenotype of type 2 diabetes mellitus (T2DM). Abundant studies have observed this from a clinical perspective. Since the rapid increase in a large number of genetic variants from the genome-wide association studies (GWAS), common SNPs of BMI and T2DM were identified as the genetic basis for understanding their associations. Currently, their causality is beginning to blur.Materials and Methods: To classify it, a Mendelian randomisation (MR), using genetic instrumental variables (IVs) to explore the causality of intermediate phenotype and disease, was utilized here to test the effect of BMI on the risk of T2DM. In this article, MR was carried out on GWAS data using 52 independent BMI SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated using inverse-variance weighted method for the assessment of 5 kg/m2 higher BMI on the risk of T2DM. The leave-one-out validation was conducted to identify the effect of individual SNPs. MR-Egger regression was utilized to detect potential pleiotropic bias of variants.Results: We obtained the high OR (1.470; 95% CI 1.170 to 1.847; P = 0.001), low intercept (0.004, P = 0.661), and small fluctuation of ORs {from -0.039 [(1.412 – 1.470) / 1.470)] to 0.075 [(1.568– 1.470) / 1.470)] in leave-one-out validation.Conclusion: We validate the causal effect of high BMI on the risk of T2DM. The low intercept shows no pleiotropic bias of IVs. The small alterations of ORs activated by removing individual SNPs showed no single SNP drives our estimate

    Efficient Network Domination for Life Science Applications

    Get PDF
    With the ever-increasing size of data available to researchers, traditional methods of analysis often cannot scale to match problems being studied. Often only a subset of variables may be utilized or studied further, motivating the need of techniques that can prioritize variable selection. This dissertation describes the development and application of graph theoretic techniques, particularly the notion of domination, for this purpose. In the first part of this dissertation, algorithms for vertex prioritization in the field of network controllability are studied. Here, the number of solutions to which a vertex belongs is used to classify said vertex and determine its suitability in controlling a network. Novel efficient scalable algorithms are developed and analyzed. Empirical tests demonstrate the improvement of these algorithms over those already established in the literature. The second part of this dissertation concerns the prioritization of genes for loss-of-function allele studies in mice. The International Mouse Phenotyping Consortium leads the initiative to develop a loss-of-function allele for each protein coding gene in the mouse genome. Only a small proportion of untested genes can be selected for further study. To address the need to prioritize genes, a generalizable data science strategy is developed. This strategy models genes as a gene-similarity graph, and from it selects subset that will be further characterized. Empirical tests demonstrate the method’s utility over that of pseudorandom selection and less computationally demanding methods. Finally, part three addresses the important task of preprocessing in the context of noisy public health data. Many public health databases have been developed to collect, curate, and store a variety of environmental measurements. Idiosyncrasies in these measurements, however, introduce noise to data found in these databases in several ways including missing, incorrect, outlying, and incompatible data. Beyond noisy data, multiple measurements of similar variables can introduce problems of multicollinearity. Domination is again employed in a novel graph method to handle autocorrelation. Empirical results using the Public Health Exposome dataset are reported. Together these three parts demonstrate the utility of subset selection via domination when applied to a multitude of data sources from a variety of disciplines in the life sciences

    Bioinformatics applied to human genomics and proteomics: development of algorithms and methods for the discovery of molecular signatures derived from omic data and for the construction of co-expression and interaction networks

    Get PDF
    [EN] The present PhD dissertation develops and applies Bioinformatic methods and tools to address key current problems in the analysis of human omic data. This PhD has been organised by main objectives into four different chapters focused on: (i) development of an algorithm for the analysis of changes and heterogeneity in large-scale omic data; (ii) development of a method for non-parametric feature selection; (iii) integration and analysis of human protein-protein interaction networks and (iv) integration and analysis of human co-expression networks derived from tissue expression data and evolutionary profiles of proteins. In the first chapter, we developed and tested a new robust algorithm in R, called DECO, for the discovery of subgroups of features and samples within large-scale omic datasets, exploring all feature differences possible heterogeneity, through the integration of both data dispersion and predictor-response information in a new statistic parameter called h (heterogeneity score). In the second chapter, we present a simple non-parametric statistic to measure the cohesiveness of categorical variables along any quantitative variable, applicable to feature selection in all types of big data sets. In the third chapter, we describe an analysis of the human interactome integrating two global datasets from high-quality proteomics technologies: HuRI (a human protein-protein interaction network generated by a systematic experimental screening based on Yeast-Two-Hybrid technology) and Cell-Atlas (a comprehensive map of subcellular localization of human proteins generated by antibody imaging). This analysis aims to create a framework for the subcellular localization characterization supported by the human protein-protein interactome. In the fourth chapter, we developed a full integration of three high-quality proteome-wide resources (Human Protein Atlas, OMA and TimeTree) to generate a robust human co-expression network across tissues assigning each human protein along the evolutionary timeline. In this way, we investigate how old in evolution and how correlated are the different human proteins, and we place all them in a common interaction network. As main general comment, all the work presented in this PhD uses and develops a wide variety of bioinformatic and statistical tools for the analysis, integration and enlighten of molecular signatures and biological networks using human omic data. Most of this data corresponds to sample cohorts generated in recent biomedical studies on specific human diseases

    Measuring phenotype-phenotype similarity through the interactome

    No full text
    Abstract Background Recently, measuring phenotype similarity began to play an important role in disease diagnosis. Researchers have begun to pay attention to develop phenotype similarity measurement. However, existing methods ignore the interactions between phenotype-associated proteins, which may lead to inaccurate phenotype similarity. Results We proposed a network-based method PhenoNet to calculate the similarity between phenotypes. We localized phenotypes in the network and calculated the similarity between phenotype-associated modules by modeling both the inter- and intra-similarity. Conclusions PhenoNet was evaluated on two independent evaluation datasets: gene ontology and gene expression data. The result shows that PhenoNet performs better than the state-of-art methods on all evaluation tests

    Identifying Relevant Evidence for Systematic Reviews and Review Updates

    Get PDF
    Systematic reviews identify, assess and synthesise the evidence available to answer complex research questions. They are essential in healthcare, where the volume of evidence in scientific research publications is vast and cannot feasibly be identified or analysed by individual clinicians or decision makers. However, the process of creating a systematic review is time consuming and expensive. The pace of scientific publication in medicine and related fields also means that evidence bases are continually changing and review conclusions can quickly become out of date. Therefore, developing methods to support the creating and updating of reviews is essential to reduce the workload required and thereby ensure that reviews remain up to date. This research aims to support systematic reviews, thus improving healthcare through natural language processing and information retrieval techniques. More specifically, this thesis aims to support the process of identifying relevant evidence for systematic reviews and review updates to reduce the workload required from researchers. This research proposes methods to improve studies ranking for systematic reviews. In addition, this thesis describes a dataset of systematic review updates in the field of medicine created using 25 Cochrane reviews. Moreover, this thesis develops an algorithm to automatically refine the Boolean query to improve the identification of relevant studies for review updates. The research demonstrates that automating the process of identifying relevant evidence can reduce the workload of conducting and updating systematic reviews
    corecore