62 research outputs found

    : Classifying and Generating Repetitive Elements in the Genome Using Deep Learning

    Get PDF
    Repetitive elements are sequence patterns in the genome which are duplicated in large quantity. They serve important functions both in genomic preservation and evolution, leading to the need for their fast and accurate classification. The current gold standard for repeat identification can be achieved by establishing correspondences between a well-annotated library of repetitive elements and a given query sequence. However, annotation quality is highly variable across species. Therefore, for genomes whose repeats are poorly annotated, de novo methods must be used. A common approach of de novo methods is to first check the sequence for protein domain conservation. The presence and order of these protein domains are used as features for an expert-crafted rule-based system or an optimized machine learning classifier. Although de novo approaches have achieved modest success, two problems remain. Firstly, they require lengthy consensus sequences which take time to assemble, and may not be representative of the true diversity of repetitive elements in the sample. Secondly, these approaches are heavily reliant on hand-picking a comprehensive set of protein domains, which may need to be constantly adjusted as new repetitive elements are discovered. In this thesis I show that deep learning models are competitive with pattern matching based approaches at the level of a shotgun sequencing strand for de novo classification of repeat elements. I also explore ways of embedding sequences using deep learning models. Finally, I made these tools available through a web-based interface.Bachelor of Scienc

    A Dataflow Graphical Language for Database Applications

    Get PDF
    In this paper we discuss a graphical language for information retrieval and processing. A lot of recent activity has occurred in the area of improving access to database systems. However, current results are restricted to simple interfacing of database systems. We propose a graphical language for specifying complex applications

    PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts

    Full text link
    Public disclosure of important security information, such as knowledge of vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and other online sources months before proper classification into structured databases. In order to facilitate timely discovery of such knowledge, we propose a novel semi-supervised learning algorithm, PACE, for identifying and classifying relevant entities in text sources. The main contribution of this paper is an enhancement of the traditional bootstrapping method for entity extraction by employing a time-memory trade-off that simultaneously circumvents a costly corpus search while strengthening pattern nomination, which should increase accuracy. An implementation in the cyber-security domain is discussed as well as challenges to Natural Language Processing imposed by the security domain.Comment: 6 pages, 3 figures, ieeeTran conference. International Conference on Machine Learning and Applications 201

    Virial Coefficients of 3-Flavor Fermionic Systems with Three-Body Contact Interactions in the Lattice Semi-Classical Approximation at Leading Order

    Get PDF
    Many of the interactions in classical and quantum systems are in the form of two-body forces, or sums of these forces. There exist a number of systems with non-additive forces, or three-body interactions. These come naturally in some effective field theories when the two-body interactions are insufficient alone, or in some cases like Efimov states, where the two-body interaction is suppressed and only bound trimers exist. Further, thermal properties of these systems are critical to understanding experimental data and verifying models. The virial expansion is a correction to the ideal gas law to relate thermal properties to the quantum statistical properties of Fermi gasses. In this work, the calculations of virial coefficients up to order 8 of a Fermi gas using a lattice semi-classical approximation are described. A number of computational processing tools were used, such as the python graph theory library, networkx, and the high-energy physics algebra code, FORM.Bachelor of Scienc

    Virial coefficients of trapped and un-trapped three-component fermions with three-body forces in arbitrary spatial dimensions

    Get PDF
    Using a coarse temporal lattice approximation, we calculate the first few terms of the virial expansion of a three-species fermion system with a three-body contact interaction in dd spatial dimensions, both in homogeneous space as well as in a harmonic trapping potential of frequency ω\omega. Using the three-body problem to renormalize, we report analytic results for the change in the fourth- and fifth-order virial coefficients Δb4\Delta b_4 and Δb5\Delta b_5 as functions of Δb3\Delta b_3. Additionally, we argue that in the ω0\omega \to 0 limit the relationship bnT=nd/2bnb_n^\text{T} = n^{-d/2} b_n holds between the trapped (T) and homogeneous coefficients for arbitrary temperature and coupling strength (not merely in scale-invariant regimes). Finally, we point out an exact, universal (coupling- and frequency-independent) relationship between Δb3T\Delta b_3^\text{T} in 1D with three-body forces and Δb2T\Delta b_2^\text{T} in 2D with two-body forces.Comment: 7 pages, 2 figure
    corecore