Search CORE

381 research outputs found

Regular expression constrained sequence alignment revisited

Author: Kucherov Gregory
Pinhas Tamar
Ziv-Ukelson Michal
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2011
Field of study

International audienceImposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n^2t^4) time and O(n^2t^2) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the input non-deterministic automaton. A faster O(n^2t^3) time algorithm for the same problem was subsequently proposed. In this article, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n^2t^3/log t). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

GrapeTree : visualization of core genomic relationships among 100,000 bacterial pathogens

Author: Zhou Zhemin
Alikhan Nabil-Fareed
Sergeant Martin J.
Luhmann Nina
Vaz Catia
Francisco Alexandre P.
Carrico Joao Andre
Achtman Mark
Publication venue: Cold Spring Harbour
Publication date: 09/11/2017
Field of study

Current methods struggle to reconstruct and visualise the genomic relationships of ≥100,000 bacterial genomes. GrapeTree facilitates the analyses of allelic profiles from 10,000's of core genomes within a web browser window. GrapeTree implements a novel minimum spanning tree algorithm to reconstruct genetic relationships despite missing data together with a static "GrapeTree Layout" algorithm to render interactive visualisations of large trees. GrapeTree is a stand-along package for investigating Newick trees plus associated metadata and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among >160,000 genomes from bacterial pathogens. The GrapeTree package was released under the GPL v3.0 Licence

Crossref

Warwick Research Archives Portal Repository

FigShare

GrapeTree : visualization of core genomic relationships among 100,000 bacterial pathogens

Author: Achtman Mark
Alikhan Nabil-Fareed
Carrico Joao Andre
Francisco Alexandre P.
Luhmann Nina
Sergeant Martin J.
Vaz Catia
Zhou Zhemin
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 14/11/2017
Field of study

Repositório Científico do Instituto Politécnico de Lisboa

Crossref

Warwick Research Archives Portal Repository

Review of Extreme Multilabel Classification

Author: Das Shrutimoy
Dasgupta Arpan
Katyan Siddhant
Kumar Pawan
Publication venue
Publication date: 26/03/2023
Field of study

Extreme multilabel classification or XML, is an active area of interest in machine learning. Compared to traditional multilabel classification, here the number of labels is extremely large, hence, the name extreme multilabel classification. Using classical one versus all classification wont scale in this case due to large number of labels, same is true for any other classifiers. Embedding of labels as well as features into smaller label space is an essential first step. Moreover, other issues include existence of head and tail labels, where tail labels are labels which exist in relatively smaller number of given samples. The existence of tail labels creates issues during embedding. This area has invited application of wide range of approaches ranging from bit compression motivated from compressed sensing, tree based embeddings, deep learning based latent space embedding including using attention weights, linear algebra based embeddings such as SVD, clustering, hashing, to name a few. The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.Comment: 46 pages, 13 figure

arXiv.org e-Print Archive

Topological inference in graphs and images

Author: Vandaele Robin
Publication venue: Universiteit Gent. Faculteit Ingenieurswetenschappen en Architectuur
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography

Subject index volumes 1–92

Author
Publication venue: Published by Elsevier B.V.
Publication date
Field of study

Elsevier - Publisher Connector

Recommended from our members

Network and Algebraic Topology of Influenza Evolution

Author: Chan Joseph
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Evolution is a force that has molded human existence since its divergence from chimpanzees about 5.4 million years ago. In that same amount of time, an influenza virus, which replicates every six hours, would have undergone an equivalent number of generations over only a hundred years. The fast replication times of influenza, coupled with its high mutation rate, make the virus a perfect model to study real-time evolution at a mega-Darwin scale, more than a million times faster than human evolution. While recent developments in high-throughput sequencing provide an optimal opportunity to dissect their genetic evolution, a concurrent growth in computational tools is necessary to analyze the large influx of complex genomic data. In my thesis, I present novel computational methods to examine different aspects of influenza evolution. I first focus on seasonal influenza, particularly the problems that hamper public health initiatives to combat the virus. I introduce two new approaches: 1. The q2-coefficient, a method of quantifying pathogen surveillance, and 2. FluGraph, a technique that employs network topology to track the spread of seasonal influenza around the world. The second chapter of my thesis examines how mutations and reassortment combine to alter the course of influenza evolution towards pandemic formation. I highlight inherent deficiencies in the current phylogenetic paradigm for analyzing evolution and offer a novel methodology based on algebraic topology that comprehensively reconstructs both vertical and horizontal evolutionary events. I apply this method to viruses, with emphasis on influenza, but foresee broader application to cancer cells, bacteria, eukaryotes, and other taxa

Columbia University Academic Commons

Algorithmic Approaches to the Steiner Problem in Networks

Author: Vahdati Daneshmand Siavash
Publication venue: Universität Mannheim
Publication date: 01/01/2003
Field of study

Das Steinerproblem in Netzwerken ist das Problem, in einem gewichteten Graphen eine gegebene Menge von Knoten kostenminimal zu verbinden. Es ist ein klassisches NP-schweres Problem und ein fundamentales Problem bei der Netzwerkoptimierung mit vielen praktischen Anwendungen. Wir nehmen dieses Problem mit verschiedenen Mitteln in Angriff: Relaxationen, die die Zulässigkeitsbedingungen lockern, um eine optimale Lösung annähern zu können; Heuristiken, um gute, aber nicht garantiert optimale Lösungen zu finden; und Reduktionen, um die Probleminstanzen zu vereinfachen, ohne eine optimale Lösung zu zerstören. In allen Fällen untersuchen und verbessern wir bestehende Methoden, stellen neue vor und evaluieren sie experimentell. Wir integrieren diese Bausteine in einen exakten Algorithmus, der den Stand der Algorithmik für die optimale Lösung dieses Problems darstellt. Viele der vorgestellten Methoden können auch für verwandte Probleme von Nutzen sein

CiteSeerX

MAnnheim DOCument Server