52 research outputs found

    Wide-ranging functions of E2F4 in transcriptional activation and repression revealed by genome-wide analysis

    Get PDF
    The E2F family of transcription factors has important roles in cell cycle progression. E2F4 is an E2F family member that has been proposed to be primarily a repressor of transcription, but the scope of its binding activity and functions in transcriptional regulation is not fully known. We used ChIP sequencing (ChIP-seq) to identify around 16 000 E2F4 binding sites which potentially regulate 7346 downstream target genes with wide-ranging functions in DNA repair, cell cycle regulation, apoptosis, and other processes. While half of all E2F4 binding sites (56%) occurred near transcription start sites (TSSs), ∼20% of sites occurred more than 20 kb away from any annotated TSS. These distal sites showed histone modifications suggesting that E2F4 may function as a long-range regulator, which we confirmed by functional experimental assays on a subset. Overexpression of E2F4 and its transcriptional cofactors of the retinoblastoma (Rb) family and its binding partner DP-1 revealed that E2F4 acts as an activator as well as a repressor. E2F4 binding sites also occurred near regulatory elements for miRNAs such as let-7a and mir-17, suggestive of regulation of miRNAs by E2F4. Taken together, our genome-wide analysis provided evidence of versatile roles of E2F4 and insights into its functions

    Shape-based peak identification for ChIP-Seq

    Get PDF
    We present a new algorithm for the identification of bound regions from ChIP-seq experiments. Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We demonstrate the accuracy of our method on existing datasets, and we show that it can discover previously missed regions and can more clearly discriminate between multiple binding events. The software T-PIC (Tree shape Peak Identification for ChIP-Seq) is available at http://math.berkeley.edu/~vhower/tpic.htmlComment: 12 pages, 6 figure

    Multi Agent System for Machine Learning Under Uncertainty in Cyber Physical Manufacturing System

    Get PDF
    Recent advancement in predictive machine learning has led to its application in various use cases in manufacturing. Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it. While accuracy is important, focusing primarily on it poses an overfitting danger, exposing manufacturers to risk, ultimately hindering the adoption of these techniques. In this paper, we determine the sources of uncertainty in machine learning and establish the success criteria of a machine learning system to function well under uncertainty in a cyber-physical manufacturing system (CPMS) scenario. Then, we propose a multi-agent system architecture which leverages probabilistic machine learning as a means of achieving such criteria. We propose possible scenarios for which our architecture is useful and discuss future work. Experimentally, we implement Bayesian Neural Networks for multi-tasks classification on a public dataset for the real-time condition monitoring of a hydraulic system and demonstrate the usefulness of the system by evaluating the probability of a prediction being accurate given its uncertainty. We deploy these models using our proposed agent-based framework and integrate web visualisation to demonstrate its real-time feasibility

    Novel epigenetic clock for fetal brain development predicts prenatal age for cellular stem cell models and derived neurons

    Get PDF
    Induced pluripotent stem cells (iPSCs) and their differentiated neurons (iPSC-neurons) are a widely used cellular model in the research of the central nervous system. However, it is unknown how well they capture age-associated processes, particularly given that pluripotent cells are only present during the earliest stages of mammalian development. Epigenetic clocks utilize coordinated age-associated changes in DNA methylation to make predictions that correlate strongly with chronological age. It has been shown that the induction of pluripotency rejuvenates predicted epigenetic age. As existing clocks are not optimized for the study of brain development, we developed the fetal brain clock (FBC), a bespoke epigenetic clock trained in human prenatal brain samples in order to investigate more precisely the epigenetic age of iPSCs and iPSC-neurons. The FBC was tested in two independent validation cohorts across a total of 194 samples, confirming that the FBC outperforms other established epigenetic clocks in fetal brain cohorts. We applied the FBC to DNA methylation data from iPSCs and embryonic stem cells and their derived neuronal precursor cells and neurons, finding that these cell types are epigenetically characterized as having an early fetal age. Furthermore, while differentiation from iPSCs to neurons significantly increases epigenetic age, iPSC-neurons are still predicted as being fetal. Together our findings reiterate the need to better understand the limitations of existing epigenetic clocks for answering biological research questions and highlight a limitation of iPSC-neurons as a cellular model of age-related diseases

    A Genome-Wide Screen for Genetic Variants That Modify the Recruitment of REST to Its Target Genes

    Get PDF
    Increasing numbers of human diseases are being linked to genetic variants, but our understanding of the mechanistic links leading from DNA sequence to disease phenotype is limited. The majority of disease-causing nucleotide variants fall within the non-protein-coding portion of the genome, making it likely that they act by altering gene regulatory sequences. We hypothesised that SNPs within the binding sites of the transcriptional repressor REST alter the degree of repression of target genes. Given that changes in the effective concentration of REST contribute to several pathologies—various cancers, Huntington's disease, cardiac hypertrophy, vascular smooth muscle proliferation—these SNPs should alter disease-susceptibility in carriers. We devised a strategy to identify SNPs that affect the recruitment of REST to target genes through the alteration of its DNA recognition element, the RE1. A multi-step screen combining genetic, genomic, and experimental filters yielded 56 polymorphic RE1 sequences with robust and statistically significant differences of affinity between alleles. These SNPs have a considerable effect on the the functional recruitment of REST to DNA in a range of in vitro, reporter gene, and in vivo analyses. Furthermore, we observe allele-specific biases in deeply sequenced chromatin immunoprecipitation data, consistent with predicted differenes in RE1 affinity. Amongst the targets of polymorphic RE1 elements are important disease genes including NPPA, PTPRT, and CDH4. Thus, considerable genetic variation exists in the DNA motifs that connect gene regulatory networks. Recently available ChIP–seq data allow the annotation of human genetic polymorphisms with regulatory information to generate prior hypotheses about their disease-causing mechanism

    Mutagenesis Objective Search and Selection Tool (MOSST): an algorithm to predict structure-function related mutations in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Functionally relevant artificial or natural mutations are difficult to assess or predict if no structure-function information is available for a protein. This is especially important to correctly identify functionally significant non-synonymous single nucleotide polymorphisms (nsSNPs) or to design a site-directed mutagenesis strategy for a target protein. A new and powerful methodology is proposed to guide these two decision strategies, based only on conservation rules of physicochemical properties of amino acids extracted from a multiple alignment of a protein family where the target protein belongs, with no need of explicit structure-function relationships.</p> <p>Results</p> <p>A statistical analysis is performed over each amino acid position in the multiple protein alignment, based on different amino acid physical or chemical characteristics, including hydrophobicity, side-chain volume, charge and protein conformational parameters. The variances of each of these properties at each position are combined to obtain a global statistical indicator of the conservation degree of each property. Different types of physicochemical conservation are defined to characterize relevant and irrelevant positions. The differences between statistical variances are taken together as the basis of hypothesis tests at each position to search for functionally significant mutable sites and to identify specific mutagenesis targets. The outcome is used to statistically predict physicochemical consensus sequences based on different properties and to calculate the amino acid propensities at each position in a given protein. Hence, amino acid positions are identified that are putatively responsible for function, specificity, stability or binding interactions in a family of proteins. Once these key functional positions are identified, position-specific statistical distributions are applied to divide the 20 common protein amino acids in each position of the protein's primary sequence into a group of functionally non-disruptive amino acids and a second group of functionally deleterious amino acids.</p> <p>Conclusions</p> <p>With this approach, not only conserved amino acid positions in a protein family can be labeled as functionally relevant, but also non-conserved amino acid positions can be identified to have a physicochemically meaningful functional effect. These results become a discriminative tool in the selection and elaboration of rational mutagenesis strategies for the protein. They can also be used to predict if a given nsSNP, identified, for instance, in a genomic-scale analysis, can have a functional implication for a particular protein and which nsSNPs are most likely to be functionally silent for a protein. This analytical tool could be used to rapidly and automatically discard any irrelevant nsSNP and guide the research focus toward functionally significant mutations. Based on preliminary results and applications, this technique shows promising performance as a valuable bioinformatics tool to aid in the development of new protein variants and in the understanding of function-structure relationships in proteins.</p

    Genomic sequencing in clinical trials

    Get PDF
    Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to find its way into clinical trials both nationally and worldwide. We highlight the currently available types of genomic sequencing platforms, outline the advantages and disadvantages of each, and compare first- and next-generation techniques with respect to capabilities, quality, and cost. We describe the current geographical distributions and types of disease conditions in which these technologies are used, and how next-generation sequencing is strategically being incorporated into new and existing studies. Lastly, recent major breakthroughs and the ongoing challenges of using genomic sequencing in clinical research are discussed

    Screening of conditions controlling spectrophotometric sequential injection analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Despite its potential benefits over univariate, chemometrics is rarely utilized for optimizing sequential injection analysis (SIA) methods. Specifically, in previous vis-spectrophotometric SIA methods, chemometrically optimized conditions were confined within flow rate and reagent concentrations while other conditions were ignored.</p> <p>Results</p> <p>The current manuscript reports, for the first time, a comprehensive screening of conditions controlling vis-spectrophotometric SIA. A new diclofenac assay method was adopted. The method was based on oxidizing diclofenac by permanganate (a major reagent) with sulfuric acid (a minor reagent). The reaction produced a spectrophotometrically detectable diclofenac form. The 2<sup>6 </sup>full-factorial design was utilized to study the effect of volumes of reagents and sample, in addition to flow rate and concentrations of reagents. The main effects and all interaction order effects on method performance, i.e. namely sensitivity, rapidity and reagent consumption, were determined. The method was validated and applied to pharmaceutical formulations (tablets, injection and gel).</p> <p>Conclusions</p> <p>Despite 64 experiments those conducted in the current study were cumbersome, the results obtained would reduce effort and time when developing similar SIA methods in the future. It is recommended to critically optimize effective and interacting conditions using other such optimization tools as fractional-factorial design, response surface and simplex, rather than full-factorial design that used at an initial optimization stage. In vis-spectrophotometric SIA methods those involve developing reactions with two reagents (major and minor), conditions affecting method performance are in the following order: sample volume > flow rate ≈ major reagent concentration >> major reagent volume ≈ minor reagent concentration >> minor reagent volume.</p

    A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

    Get PDF
    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.National Human Genome Research Institute (U.S.)National Institutes of Health (U.S.

    Replication Fork Polarity Gradients Revealed by Megabase-Sized U-Shaped Replication Timing Domains in Human Cell Lines

    Get PDF
    In higher eukaryotes, replication program specification in different cell types remains to be fully understood. We show for seven human cell lines that about half of the genome is divided in domains that display a characteristic U-shaped replication timing profile with early initiation zones at borders and late replication at centers. Significant overlap is observed between U-domains of different cell lines and also with germline replication domains exhibiting a N-shaped nucleotide compositional skew. From the demonstration that the average fork polarity is directly reflected by both the compositional skew and the derivative of the replication timing profile, we argue that the fact that this derivative displays a N-shape in U-domains sustains the existence of large-scale gradients of replication fork polarity in somatic and germline cells. Analysis of chromatin interaction (Hi-C) and chromatin marker data reveals that U-domains correspond to high-order chromatin structural units. We discuss possible models for replication origin activation within U/N-domains. The compartmentalization of the genome into replication U/N-domains provides new insights on the organization of the replication program in the human genome
    corecore