Search CORE

362 research outputs found

Research on Pattern Matching with Wildcards and Length Constraints: Methods and Completeness

Author: Hu Xuegang
Wang Haiping
Xiang Taining
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

DESQ: Frequent Sequence Mining with Subsequence Constraints

Author: Beedkar Kaustubh
Gemulla Rainer
Publication venue
Publication date: 01/01/2016
Field of study

Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this paper, we show that many subsequence constraints---including and beyond those considered in the literature---can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive "pattern expressions" to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to compressed finite state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms---although more general---are competitive to existing state-of-the-art algorithms.Comment: Long version of the paper accepted at the IEEE ICDM 2016 conferenc

arXiv.org e-Print Archive

Crossref

MAnnheim DOCument Server

Sim4cc: a cross-species spliced alignment program

Author: Altschul
Arthur L. Delcher
Buhler
Burge
Burset
Chao
Consortium
Cover
Delcher
Dogan
Florea
Florea
Florea
Gerhard
Gibbs
Holt
Jackson
Kan
Keich
Kent
Lee
Leming Zhou
Lewis
Liliana Florea
Ma
Mihaela Pertea
Mott
Nei
Okazaki
Pertea
Pruitt
Schwartz
Slater
Strausberg
Tuskan
Usuka
Volfovsky
Warren
Waterston
Wheelan
Wheeler
Wilming
Wu
Zhang
Zhang
Zhou
Zhou
Zhou
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64 000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome

CiteSeerX

Crossref

PubMed Central

多様なポストゲノムデータのためのアラインメントフリーなアルゴリズムの構造

Author: Onodera Taku
小野寺拓
Publication venue: 情報理工学系研究科コンピュータ科学専攻
Publication date: 11/11/2015
Field of study

学位の種別: 課程博士審査委員会委員 : （主査）東京大学教授今井浩, 東京大学教授小林直樹, 東京大学教授五十嵐健夫, 東京大学教授杉山将, 東京大学講師笠原雅弘University of Tokyo(東京大学

31th International Symposium on Theoretical Aspects of Computer Science: STACS '14, March 5th to March 8th, 2014, Lyon, France

Author: STACS <31 2014, Lyon>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/03/2014
Field of study

Digitale Bibliothek Thüringen

Combinatorial Algorithms for Subsequence Matching: A Survey

Author: Kosche Maria
Koß Tore
Manea Florin
Siemer Stefan
Publication venue
Publication date: 10/10/2022
Field of study

In this paper we provide an overview of a series of recent results regarding algorithms for searching for subsequences in words or for the analysis of the sets of subsequences occurring in a word.Comment: This is a revised version of the paper with the same title which appeared in the Proceedings of NCMA 2022, EPTCS 367, 2022, pp. 11-27 (DOI: 10.4204/EPTCS.367.2). The revision consists in citing a series of relevant references which were not covered in the initial version, and commenting on how they relate to the results we survey. arXiv admin note: text overlap with arXiv:2206.1389

arXiv.org e-Print Archive

Energy efficient hardware acceleration of multimedia processing tools

Author: Kinane Andrew
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/2006
Field of study

The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

Irish Universities

DCU Online Research Access Service

Scalable frequent sequence mining with flexible subsequence constraints

Author: Bertsch Mattias
Gemulla Rainer
Renz-Wieland Alexander
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

We study scalable algorithms for frequent sequence mining under flexible subsequence constraints. Such constraints enable applications to specify concisely which patterns are of interest and which are not. We focus on the bulk synchronous parallel model with one round of communication; this model is suitable for platforms such as MapReduce or Spark. We derive a general framework for frequent sequence mining under this model and propose the D-SEQ and D-CAND algorithms within this framework. The algorithms differ in what data are communicated and how computation is split up among workers. To the best of our knowledge, D-SEQ and D-CAND are the first scalable algorithms for frequent sequence mining with flexible constraints. We conducted an experimental study on multiple real-world datasets that suggests that our algorithms scale nearly linearly, outperform common baselines, and offer acceptable generalization overhead over existing, less general mining algorithms

Crossref

MAnnheim DOCument Server