20 research outputs found
An innovative approach for testing bioinformatics programs using metamorphic testing
Background: Recent advances in experimental and computational technologies have fueled the development of many sophisticated bioinformatics programs. The correctness of such programs is crucial as incorrectly computed results may lead to wrong biological conclusion or misguide downstream experimentation. Common software testing procedures involve executing the target program with a set of test inputs and then verifying the correctness of the test outputs. However, due to the complexity of many bioinformatics programs, it is often difficult to verify the correctness of the test outputs. Therefore our ability to perform systematic software testing is greatly hindered
A model selection approach to discover age-dependent gene expression patterns using quantile regression models
<p>Abstract</p> <p>Background</p> <p>It has been a long-standing biological challenge to understand the molecular regulatory mechanisms behind mammalian ageing. Harnessing the availability of many ageing microarray datasets, a number of studies have shown that it is possible to identify genes that have age-dependent differential expression (DE) or differential variability (DV) patterns. The majority of the studies identify "interesting" genes using a linear regression approach, which is known to perform poorly in the presence of outliers or if the underlying age-dependent pattern is non-linear. Clearly a more robust and flexible approach is needed to identify genes with various age-dependent gene expression patterns.</p> <p>Results</p> <p>Here we present a novel model selection approach to discover genes with linear or non-linear age-dependent gene expression patterns from microarray data. To identify DE genes, our method fits three quantile regression models (constant, linear and piecewise linear models) to the expression profile of each gene, and selects the least complex model that best fits the available data. Similarly, DV genes are identified by fitting and comparing two quantile regression models (non-DV and the DV models) to the expression profile of each gene. We show that our approach is much more robust than the standard linear regression approach in discovering age-dependent patterns. We also applied our approach to analyze two human brain ageing datasets and found many biologically interesting gene expression patterns, including some very interesting DV patterns, that have been overlooked in the original studies. Furthermore, we propose that our model selection approach can be extended to discover DE and DV genes from microarray datasets with discrete class labels, by considering different quantile regression models.</p> <p>Conclusion</p> <p>In this paper, we present a novel application of quantile regression models to identify genes that have interesting linear or non-linear age-dependent expression patterns. One important contribution of this paper is to introduce a model selection approach to DE and DV gene identification, which is most commonly tackled by null hypothesis testing approaches. We show that our approach is robust in analyzing real and simulated datasets. We believe that our approach is applicable in many ageing or time-series data analysis tasks.</p
Binding of Transcription Factor GabR to DNA Requires Recognition of DNA Shape at a Location Distinct from its Cognate Binding Site
Mechanisms for transcription factor recognition of specific DNA base sequences are well characterized and recent studies demonstrate that the shape of these cognate binding sites is also important. Here, we uncover a new mechanism where the transcription factor GabR simultaneously recognizes two cognate binding sites and the shape of a 29 bp DNA sequence that bridges these sites. Small-angle X-ray scattering and multi-angle laser light scattering are consistent with a model where the DNA undergoes a conformational change to bend around GabR during binding. In silico predictions suggest that the bridging DNA sequence is likely to be bendable in one direction and kinetic analysis of mutant DNA sequences with biolayer interferometry, allowed the independent quantification of the relative contribution of DNA base and shape recognition in the GabR–DNA interaction. These indicate that the two cognate binding sites as well as the bendability of the DNA sequence in between these sites are required to form a stable complex. The mechanism of GabR–DNA interaction provides an example where the correct shape of DNA, at a clearly distinct location from the cognate binding site, is required for transcription factor binding and has implications for bioinformatics searches for novel binding sites
Genetic screening reveals phospholipid metabolism as a key regulator of the biosynthesis of the redox-active lipid coenzyme Q.
Mitochondrial energy production and function rely on optimal concentrations of the essential redox-active lipid, coenzyme Q (CoQ). CoQ deficiency results in mitochondrial dysfunction associated with increased mitochondrial oxidative stress and a range of pathologies. What drives CoQ deficiency in many of these pathologies is unknown, just as there currently is no effective therapeutic strategy to overcome CoQ deficiency in humans. To date, large-scale studies aimed at systematically interrogating endogenous systems that control CoQ biosynthesis and their potential utility to treat disease have not been carried out. Therefore, we developed a quantitative high-throughput method to determine CoQ concentrations in yeast cells. Applying this method to the Yeast Deletion Collection as a genome-wide screen, 30 genes not known previously to regulate cellular concentrations of CoQ were discovered. In combination with untargeted lipidomics and metabolomics, phosphatidylethanolamine N-methyltransferase (PEMT) deficiency was confirmed as a positive regulator of CoQ synthesis, the first identified to date. Mechanistically, PEMT deficiency alters mitochondrial concentrations of one-carbon metabolites, characterized by an increase in the S-adenosylmethionine to S-adenosylhomocysteine (SAM-to-SAH) ratio that reflects mitochondrial methylation capacity, drives CoQ synthesis, and is associated with a decrease in mitochondrial oxidative stress. The newly described regulatory pathway appears evolutionary conserved, as ablation of PEMT using antisense oligonucleotides increases mitochondrial CoQ in mouse-derived adipocytes that translates to improved glucose utilization by these cells, and protection of mice from high-fat diet-induced insulin resistance. Our studies reveal a previously unrecognized relationship between two spatially distinct lipid pathways with potential implications for the treatment of CoQ deficiencies, mitochondrial oxidative stress/dysfunction, and associated diseases
ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis
<p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of <it>Drosophila melanogaster</it>.</p> <p>Results</p> <p>Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis.</p> <p>Conclusions</p> <p>Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.</p
A genetic ensemble approach for gene-gene interaction identification
<p>Abstract</p> <p>Background</p> <p>It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging.</p> <p>Methods</p> <p>In this paper, we propose a new approach for identifying such gene-gene and gene-environment interactions underlying complex diseases. This is a hybrid algorithm and it combines genetic algorithm (GA) and an ensemble of classifiers (called genetic ensemble). Using this approach, the original problem of SNP interaction identification is converted into a data mining problem of combinatorial feature selection. By collecting various single nucleotide polymorphisms (SNP) subsets as well as environmental factors generated in multiple GA runs, patterns of gene-gene and gene-environment interactions can be extracted using a simple combinatorial ranking method. Also considered in this study is the idea of combining identification results obtained from multiple algorithms. A novel formula based on pairwise <it>double fault </it>is designed to quantify the degree of complementarity.</p> <p>Conclusions</p> <p>Our simulation study demonstrates that the proposed genetic ensemble algorithm has comparable identification power to Multifactor Dimensionality Reduction (MDR) and is slightly better than Polymorphism Interaction Analysis (PIA), which are the two most popular methods for gene-gene interaction identification. More importantly, the identification results generated by using our genetic ensemble algorithm are highly complementary to those obtained by PIA and MDR. Experimental results from our simulation studies and real world data application also confirm the effectiveness of the proposed genetic ensemble algorithm, as well as the potential benefits of combining identification results from different algorithms.</p
A voting approach to identify a small number of highly predictive genes using multiple classifiers
<p>Abstract</p> <p>Background</p> <p>Microarray gene expression profiling has provided extensive datasets that can describe characteristics of cancer patients. An important challenge for this type of data is the discovery of gene sets which can be used as the basis of developing a clinical predictor for cancer. It is desirable that such gene sets be compact, give accurate predictions across many classifiers, be biologically relevant and have good biological process coverage.</p> <p>Results</p> <p>By using a new type of multiple classifier voting approach, we have identified gene sets that can predict breast cancer prognosis accurately, for a range of classification algorithms. Unlike a wrapper approach, our method is not specialised towards a single classification technique. Experimental analysis demonstrates higher prediction accuracies for our sets of genes compared to previous work in the area. Moreover, our sets of genes are generally more compact than those previously proposed. Taking a biological viewpoint, from the literature, most of the genes in our sets are known to be strongly related to cancer.</p> <p>Conclusion</p> <p>We show that it is possible to obtain superior classification accuracy with our approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.</p