62 research outputs found
: Classifying and Generating Repetitive Elements in the Genome Using Deep Learning
Repetitive elements are sequence patterns in the genome which are duplicated in large quantity. They serve important functions both in genomic preservation and evolution, leading to the need for their fast and accurate classification. The current gold standard for repeat identification can be achieved by establishing correspondences between a well-annotated library of repetitive elements and a given query sequence. However, annotation quality is highly variable across species. Therefore, for genomes whose repeats are poorly annotated, de novo methods must be used. A common approach of de novo methods is to first check the sequence for protein domain conservation. The presence and order of these protein domains are used as features for an expert-crafted rule-based system or an optimized machine learning classifier. Although de novo approaches have achieved modest success, two problems remain. Firstly, they require lengthy consensus sequences which take time to assemble, and may not be representative of the true diversity of repetitive elements in the sample. Secondly, these approaches are heavily reliant on hand-picking a comprehensive set of protein domains, which may need to be constantly adjusted as new repetitive elements are discovered. In this thesis I show that deep learning models are competitive with pattern matching based approaches at the level of a shotgun sequencing strand for de novo classification of repeat elements. I also explore ways of embedding sequences using deep learning models. Finally, I made these tools available through a web-based interface.Bachelor of Scienc
A Dataflow Graphical Language for Database Applications
In this paper we discuss a graphical language for information retrieval and processing. A lot of recent activity has occurred in the area of improving access to database systems. However, current results are restricted to simple interfacing of database systems. We propose a graphical language for specifying complex applications
PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts
Public disclosure of important security information, such as knowledge of
vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and
other online sources months before proper classification into structured
databases. In order to facilitate timely discovery of such knowledge, we
propose a novel semi-supervised learning algorithm, PACE, for identifying and
classifying relevant entities in text sources. The main contribution of this
paper is an enhancement of the traditional bootstrapping method for entity
extraction by employing a time-memory trade-off that simultaneously circumvents
a costly corpus search while strengthening pattern nomination, which should
increase accuracy. An implementation in the cyber-security domain is discussed
as well as challenges to Natural Language Processing imposed by the security
domain.Comment: 6 pages, 3 figures, ieeeTran conference. International Conference on
Machine Learning and Applications 201
Virial Coefficients of 3-Flavor Fermionic Systems with Three-Body Contact Interactions in the Lattice Semi-Classical Approximation at Leading Order
Many of the interactions in classical and quantum systems are in the form of two-body forces, or sums of these forces. There exist a number of systems with non-additive forces, or three-body interactions. These come naturally in some effective field theories when the two-body interactions are insufficient alone, or in some cases like Efimov states, where the two-body interaction is suppressed and only bound trimers exist. Further, thermal properties of these systems are critical to understanding experimental data and verifying models. The virial expansion is a correction to the ideal gas law to relate thermal properties to the quantum statistical properties of Fermi gasses. In this work, the calculations of virial coefficients up to order 8 of a Fermi gas using a lattice semi-classical approximation are described. A number of computational processing tools were used, such as the python graph theory library, networkx, and the high-energy physics algebra code, FORM.Bachelor of Scienc
Virial coefficients of trapped and un-trapped three-component fermions with three-body forces in arbitrary spatial dimensions
Using a coarse temporal lattice approximation, we calculate the first few
terms of the virial expansion of a three-species fermion system with a
three-body contact interaction in spatial dimensions, both in homogeneous
space as well as in a harmonic trapping potential of frequency . Using
the three-body problem to renormalize, we report analytic results for the
change in the fourth- and fifth-order virial coefficients and
as functions of . Additionally, we argue that in the
limit the relationship holds
between the trapped (T) and homogeneous coefficients for arbitrary temperature
and coupling strength (not merely in scale-invariant regimes). Finally, we
point out an exact, universal (coupling- and frequency-independent)
relationship between in 1D with three-body forces and
in 2D with two-body forces.Comment: 7 pages, 2 figure
- …