18 research outputs found
Recommended from our members
GeneFishing to reconstruct context specific portraits of biological processes.
Rapid advances in genomic technologies have led to a wealth of diverse data, from which novel discoveries can be gleaned through the application of robust statistical and computational methods. Here, we describe GeneFishing, a semisupervised computational approach to reconstruct context-specific portraits of biological processes by leveraging gene-gene coexpression information. GeneFishing incorporates multiple high-dimensional statistical ideas, including dimensionality reduction, clustering, subsampling, and results aggregation, to produce robust results. To illustrate the power of our method, we applied it using 21 genes involved in cholesterol metabolism as "bait" to "fish out" (or identify) genes not previously identified as being connected to cholesterol metabolism. Using simulation and real datasets, we found that the results obtained through GeneFishing were more interesting for our study than those provided by related gene prioritization methods. In particular, application of GeneFishing to the GTEx liver RNA sequencing (RNAseq) data not only reidentified many known cholesterol-related genes, but also pointed to glyoxalase I (GLO1) as a gene implicated in cholesterol metabolism. In a follow-up experiment, we found that GLO1 knockdown in human hepatoma cell lines increased levels of cellular cholesterol ester, validating a role for GLO1 in cholesterol metabolism. In addition, we performed pantissue analysis by applying GeneFishing on various tissues and identified many potential tissue-specific cholesterol metabolism-related genes. GeneFishing appears to be a powerful tool for identifying related components of complex biological systems and may be used across a wide range of applications
IDENTIFICATION OF GENETIC DEFECTS IN X-LINKED MENTAL RETARDATION
Backgrounds: X-linked mental retardation (XLMR) has been the focus of MR research
because of 40% excess of males with MR. Genetic defects are estimated to account
for 50% MR cases. There are still 56 non-syndromic (MRX) and 35 syndromic XLMR
(MRXS) loci with unknown causative genes.
Aims: Identification of the genetic defects in XLMR families.
Methods: Four MRXS and 6 MRX families were studied. Clinical dysmorphologic
examination and conventional cytogenetic analysis were performed followed by Fragile-
X exclusion. Linkage analysis was conducted with highly polymorphic STR-markers on
the X-chromosome followed by LOD scores calculation. An FMR1 X-inactivation assay
was performed in 15 females from all families, followed by AR method if the result were
uninformative for FMR1. Candidate genes were selected in linkage interval and mutation
analysis was performed.
Results: Gross numerical chromosomal abnormalities and Fragile-X were excluded in all
10 families. Ten XLMR families showed intervals varying from 20 Mb to 121 Mb.
Family W92-053 (mental retardation and hypomyelination) showed no mutation in
HSD17B10, UBQLN2, SYP, ARGHEF. Two families with MR and congenital
hydrocephalus (P03-0452 and 13753/HC) showed no mutations in SLITRK2 and
SLITRK4. Family DF27004 (MR and overgrowth features) showed no mutations in
GPC3. Family W092-053, PO3-0452, DF27004, and W08-2152 showed skewed Xinactivation
in the obligate carrier female.
Conclusions: Genetic defects identification in ten families showed varying linkage
intervals from 20 Mb to 121 Mb with varying LOD scores from 0,17 to 3.3, skewed Xinactivation
in 4 families, and no mutation in the candidate genes. STR markers analysis
was useful in determining linkage intervals, narrowing down the region of interest for
further studies, and genetic counselling.
Keywords: X-linked mental retardation, genetic defects, linkage analysis, mutation
analysi
Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes
Genetics and “omics” studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future. 
Comorbidity of asthma and hypertension may be mediated by shared genetic dysregulation and drug side effects
Zolotareva O, Saik OV, Königs C, et al. Comorbidity of asthma and hypertension may be mediated by shared genetic dysregulation and drug side effects. Scientific Reports. 2019;9(1): 16302.Asthma and hypertension are complex diseases coinciding more frequently than expected by chance. Unraveling the mechanisms of comorbidity of asthma and hypertension is necessary for choosing the most appropriate treatment plan for patients with this comorbidity. Since both diseases have a strong genetic component in this article we aimed to find and study genes simultaneously associated with asthma and hypertension. We identified 330 shared genes and found that they form six modules on the interaction network. A strong overlap between genes associated with asthma and hypertension was found on the level of eQTL regulated genes and between targets of drugs relevant for asthma and hypertension. This suggests that the phenomenon of comorbidity of asthma and hypertension may be explained by altered genetic regulation or result from drug side effects. In this work we also demonstrate that not only drug indications but also contraindications provide an important source of molecular evidence helpful to uncover disease mechanisms. These findings give a clue to the possible mechanisms of comorbidity and highlight the direction for future research
New locus underlying auriculocondylar syndrome (ARCND): 430 kb duplication involving TWIST1 regulatory elements
Background Auriculocondylar syndrome (ARCND) is a rare genetic disease that affects structures derived from the first and second pharyngeal arches, mainly resulting in micrognathia and auricular malformations. To date, pathogenic variants have been identified in three genes involved in the EDN1-DLX5/6 pathway (PLCB4, GNAI3 and EDN1) and some cases remain unsolved. Here we studied a large unsolved four-generation family. Methods We performed linkage analysis, resequencing and Capture-C to investigate the causative variant of this family. To test the pathogenicity of the CNV found, we modelled the disease in patient craniofacial progenitor cells, including induced pluripotent cell (iPSC)-derived neural crest and mesenchymal cells. Results This study highlights a fourth locus causative of ARCND, represented by a tandem duplication of 430 kb in a candidate region on chromosome 7 defined by linkage analysis. This duplication segregates with the disease in the family (LOD score=2.88) and includes HDAC9, which is located over 200 kb telomeric to the top candidate gene TWIST1. Notably, Capture-C analysis revealed multiple cis interactions between the TWIST1 promoter and possible regulatory elements within the duplicated region. Modelling of the disease revealed an increased expression of HDAC9 and its neighbouring gene, TWIST1, in neural crest cells. We also identified decreased migration of iPSC-derived neural crest cells together with dysregulation of osteogenic differentiation in iPSC-affected mesenchymal stem cells. Conclusion Our findings support the hypothesis that the 430 kb duplication is causative of the ARCND phenotype in this family and that deregulation of TWIST1 expression during craniofacial development can contribute to the phenotype.Molecular Technology and Informatics for Personalised Medicine and Healt
Integration and visualisation of clinical-omics datasets for medical knowledge discovery
In recent decades, the rise of various omics fields has flooded life sciences with unprecedented amounts of high-throughput data, which have transformed the way biomedical research is conducted. This trend will only intensify in the coming decades, as the cost of data acquisition will continue to decrease. Therefore, there is a pressing need to find novel ways to turn this ocean of raw data into waves of information and finally distil those into drops of translational medical knowledge. This is particularly challenging because of the incredible richness of these datasets, the humbling complexity of biological systems and the growing abundance of clinical metadata, which makes the integration of disparate data sources even more difficult.
Data integration has proven to be a promising avenue for knowledge discovery in biomedical research. Multi-omics studies allow us to examine a biological problem through different lenses using more than one analytical platform. These studies not only present tremendous opportunities for the deep and systematic understanding of health and disease, but they also pose new statistical and computational challenges. The work presented in this thesis aims to alleviate this problem with a novel pipeline for omics data integration.
Modern omics datasets are extremely feature rich and in multi-omics studies this complexity is compounded by a second or even third dataset. However, many of these features might be completely irrelevant to the studied biological problem or redundant in the context of others. Therefore, in this thesis, clinical metadata driven feature selection is proposed as a viable option for narrowing down the focus of analyses in biomedical research.
Our visual cortex has been fine-tuned through millions of years to become an outstanding pattern recognition machine. To leverage this incredible resource of the human brain, we need to develop advanced visualisation software that enables researchers to explore these vast biological datasets through illuminating charts and interactivity. Accordingly, a substantial portion of this PhD was dedicated to implementing truly novel visualisation methods for multi-omics studies.Open Acces
Identifying disease-associated genes based on artificial intelligence
Identifying disease-gene associations can help improve the understanding of disease mechanisms, which has a variety of applications, such as early diagnosis and drug development. Although experimental techniques, such as linkage analysis, genome-wide association studies (GWAS), have identified a large number of associations, identifying disease genes is still challenging since experimental methods are usually time-consuming and expensive. To solve these issues, computational methods are proposed to predict disease-gene associations.
Based on the characteristics of existing computational algorithms in the literature, we can roughly divide them into three categories: network-based methods, machine learning-based methods, and other methods. No matter what models are used to predict disease genes, the proper integration of multi-level biological data is the key to improving prediction accuracy. This thesis addresses some limitations of the existing computational algorithms, and integrates multi-level data via artificial intelligence techniques. The thesis starts with a comprehensive review of computational methods, databases, and evaluation methods used in predicting disease-gene associations, followed by one network-based method and four machine learning-based methods.
The first chapter introduces the background information, objectives of the studies and structure of the thesis. After that, a comprehensive review is provided in the second chapter to discuss the existing algorithms as well as the databases and evaluation methods used in existing studies. Having the objectives and future directions, the thesis then presents five computational methods for predicting disease-gene associations.
The first method proposed in Chapter 3 considers the issue of non-disease gene selection. A shortest path-based strategy is used to select reliable non-disease genes from a disease gene network and a differential network. The selected genes are then used by a network-energy model to improve its performance. The second method proposed in Chapter 4 constructs sample-based networks for case samples and uses them to predict disease genes. This strategy improves the quality of protein-protein interaction (PPI) networks, which further improves the prediction accuracy. Chapter 5 presents a generic model which applies multimodal deep belief nets (DBN) to fuse different types of data. Network embeddings extracted from PPI networks and gene ontology (GO) data are fused with the multimodal DBN to obtain cross-modality representations. Chapter 6 presents another deep learning model which uses a convolutional neural network (CNN) to integrate gene similarities with other types of data. Finally, the fifth method proposed in Chapter 7 is a nonnegative matrix factorization (NMF)-based method. This method maps diseases and genes onto a lower-dimensional manifold, and the geodesic distance between diseases and genes are used to predict their associations. The method can predict disease genes even if the disease under consideration has no known associated genes.
In summary, this thesis has proposed several artificial intelligence-based computational algorithms to address the typical issues existing in computational algorithms. Experimental results have shown that the proposed methods can improve the accuracy of disease-gene prediction
The molecular genetics of familial cardiomyopathy
Introduction The cardiomyopathies are responsible for approximately 5.9 of 100,000 deaths in the general global population and in sub-Saharan Africa (SSA), these myocardial diseases are observed in 21.4% of patients with heart failure. The precise etiology of the cardiomyopathies is currently not well known and through our research we aim to contribute to the genetic landscape and bridge the gaps in knowledge for the different cardiomyopathies as SSA could provide some very important insights into the cardiomyopathies and identify other possible disease mechanisms. Methods Through next generation sequencing techniques such as whole exome sequencing and targeted resequencing we studied three South African families with severe cardiomyopathy. Clinical diagnosis and recruitment of cardiomyopathy patients into the study was done at Groote Schuur Hospital, Cape Town by a panel of experts. Next generation sequencing data was analysed and filtered through various stringent criteria and the final list of variants were validated through Sanger sequencing. Results In the first multi-generational family with severe dilated cardiomyopathy (DCM) (DCM 334), we identified a pathogenic DMPK c.1067C>T(p.P356L) variant in the proband and her affected father. We also screened a cohort of 542 cardiomyopathy probands though Sanger sequencing of the DMPK gene and identified the DMPK c.1477C>T(p.R493C) variant as a variant of unknown significance. We then investigated a three-generation family with four affected family members who were also affected with severe DCM (DCM343). We used whole exome sequencing and identified the pathogenic BAG3 c.925C>T (p.R309Ter) variant as the cause of disease within this family. Viral infection, anti-hypertensive medication and genetic modifiers in RYR1 and NEB contributed to the variable phenotype among the individuals with the BAG3 variant. Through targeted resequencing we also identified the same pathogenic BAG3 variant in 2 of the 634 cardiomyopathy probands screened. In the third family, we investigated a South African family affected with severe arrhythmogenic cardiomyopathy (ACM). We used whole exome sequencing and targeted resequencing in combination and identified the pathogenic PKP2 c.2197_2202InsGdelCACACC (p.H733Afs*8) as the cause of disease in the proband and his father. We also present evidence of the ALPK3 c.2701C>T(p.Q901Ter) variant modifying the phenotypic manifestation which correlates with the variable penetrance that is seen among ACM families. Conclusion Through this project, we have identified many firsts. To the best of our knowledge, we are the first to show that DMPK is associated with primary DCM in severely affected young patients. As a first for South Africa, we not only identified the pathogenic BAG3 variant in a family with severe DCM, but we also identified the same variant in two additional probands, raising the possibility of a founder effect. In the third and final family with ACM, we identified the pathogenic PKP2 variant as the cause of disease within this family with the novel ALPK3 variant acting as a possible modifier. Our research has added to what is currently known about the cardiomyopathies in Africa but there is still much work to be done as we believe we have just scratched the tip of the iceberg