24 research outputs found

    A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types [preprint]

    Get PDF
    Semi-automated genome annotation methods such as Segway enable understanding of chromatin activity. Here we present chromatin state annotations of 164 human cell types using 1,615 genomics data sets. To produce these annotations, we developed a fully-automated annotation strategy in which we train separate unsupervised annotation models on each cell type and use a machine learning classifier to automate the state interpretation step. Using these annotations, we developed a measure of the functional importance of each genomic position called the functionality score, which allows us to aggregate information across cell types into a multi-cell type view. This score provides a measure of importance directly attributable to a specific activity in a specific set of cell types. In contrast to evolutionary conservation, this measure is not biased to detect only elements shared with related species. Using the functionality score, we combined all our annotations into a single cell type-agnostic encyclopedia that catalogs all human functional regulatory elements, enabling easy and intuitive interpretation of the effect of genome variants on phenotype, such as in disease-associated, evolutionarily conserved or positively selected loci. These resources, including cell type-specific annotations, enyclopedia, and a visualization server, are available at http://noble.gs.washington.edu/proj/encyclopedia

    Phase transition in Random Circuit Sampling

    Full text link
    Quantum computers hold the promise of executing tasks beyond the capability of classical computers. Noise competes with coherent evolution and destroys long-range correlations, making it an outstanding challenge to fully leverage the computation power of near-term quantum processors. We report Random Circuit Sampling (RCS) experiments where we identify distinct phases driven by the interplay between quantum dynamics and noise. Using cross-entropy benchmarking, we observe phase boundaries which can define the computational complexity of noisy quantum evolution. We conclude by presenting an RCS experiment with 70 qubits at 24 cycles. We estimate the computational cost against improved classical methods and demonstrate that our experiment is beyond the capabilities of existing classical supercomputers

    Realizing topologically ordered states on a quantum processor

    Get PDF
    The discovery of topological order has revolutionized the understanding of quantum matter in modern physics and provided the theoretical foundation for many quantum error correcting codes. Realizing topologically ordered states has proven to be extremely challenging in both condensed matter and synthetic quantum systems. Here, we prepare the ground state of the toric code Hamiltonian using an efficient quantum circuit on a superconducting quantum processor. We measure a topological entanglement entropy near the expected value of ln2\ln2, and simulate anyon interferometry to extract the braiding statistics of the emergent excitations. Furthermore, we investigate key aspects of the surface code, including logical state injection and the decay of the non-local order parameter. Our results demonstrate the potential for quantum processors to provide key insights into topological quantum matter and quantum error correction.Comment: 6 pages 4 figures, plus supplementary material

    Graphical models and automatic speech recognition

    No full text
    Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic-, pronunciation-, and language-modeling levels. A number of speech recognition techniques born directly out of the graphical-models paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov model-based speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model

    Submodular Selection of Assays v0.1

    No full text
    Submodular-Selection-of-Assays (SSA) Please see the following manuscript for more details: Kai Wei * , Maxwell W. Libbrecht * , Jeffrey A Bilmes, William S. Noble. "Evaluation and selection of panels of genomics assays." Submitted. Get the most recent version on github: https://github.com/melodi-lab/Submodular-Selection-of-Assays Abstract: Due to the high cost of sequencing-based genomics assays such as ChIP-seq and DNase-seq, epigenomic characterization of a cell type is typically carried out using a small panel of assay types. Deciding a priori which assays to perform is thus a critical step in many studies. We present submodular selection of assays (SSA), a method for choosing a diverse panel of genomic assays that leverages methods from the field of submodular optimization. More generally, this application serves as a model for how submodular optimization can be applied to other discrete problems in biology

    Additional file 2 of Choosing panels of genomics assays using submodular optimization

    No full text
    List of all assays used. File is in gzipped, tab-delimited format. Columns correspond to: (1) assay type, (2) cell type, (3) file name of file on original server, and (4) URL of server that the file was downloaded from. (TAB 300 kb
    corecore