Search CORE

387 research outputs found

Approximate string matching with reduced alphabet

Author: B. Ďurian
E. Ukkonen
E. Ukkonen
E. Ukkonen
E. Ukkonen
E. Ukkonen
J. Kärkkäinen
J. Kärkkäinen
J. Tarhio
J. Tarhio
K. Fredriksson
K. Fredriksson
K. Fredriksson
L. Salmela
M. Fontaine
M.R. Garey
P. Jokinen
P. Jokinen
R. Baeza-Yates
R. Muth
R. Zhu
R.M. Karp
R.N. Horspool
R.S. Boyer
T. Berry
T. Lecroq
V. Mäkinen
V.L. Arlazarov
W.J. Masek
Z. Liu
Publication venue: Heidelberg, Berlin, Springer Verlag,
Publication date: 01/01/2010
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Sydämen kroonisen vajaatoiminnan nykyhoito

Author: Koskinen J
Ukkonen H
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date: 28/10/2022
Field of study

UTUPub

Recommended from our members

Twelve times faster yet accurate: a new state‐of‐the‐art in radiation schemes via performance and spectral optimization

Author: Hogan Robin J.
Ukkonen Peter
Publication venue: American Geophysical Union (AGU)
Publication date: 01/01/2024
Field of study

Radiation schemes are critical components of Earth system models that need to be both efficient and accurate. Despite the use of approximations such as 1D radiative transfer, radiation can account for a large share of the runtime of expensive climate simulations. Here we seek a new state‐of‐the‐art in speed and accuracy by combining code optimization with improved algorithms. To fully benefit from new spectrally reduced gas optics schemes, we restructure code to avoid short vectorized loops where possible by collapsing the spectral and vertical dimensions. Our main focus is the ecRad radiation scheme, where this requires batching of adjacent cloudy layers, trading some simplicity for improved vectorization and instruction‐level parallelism. When combined with common optimization techniques for serial code and porting widely used two‐stream kernels fully to single precision, we find that ecRad with the TripleClouds solver becomes 12 times faster than the operational radiation scheme in ECMWF's Integrated Forecast System (IFS) cycle 47r3, which uses a less accurate gas optics model (RRMTG) and a more noisy solver (McICA). After applying the spectral reduction and extensive optimizations to the more sophisticated SPARTACUS solver, we find that it’s 2.5 times faster than IFS cy47r3 radiation, making cloud 3D radiative effects affordable to compute in large‐scale models. The code optimization itself gave a threefold speedup for both solvers. While SPARTACUS is still under development, preliminary experiments show slightly improved medium‐range forecasts of 2‐m temperature in the tropics, and in year‐long coupled atmosphere‐ocean simulations the 3D effects warm the surface substantially

Central Archive at the University of Reading

Directory of Open Access Journals

Scheduling Jobs in Flowshops with the Introduction of Additional Machines in the Future

Author: A. Apostolico
B. Smyth
C.J. Colbourn
D. Gusfield
E. Ukkonen
E.M. McCreight
G. Manzini
G. Manzini
J. Fischer
J. Fischer
M.A. Bender
M.I. Abouelhoda
S. Burkhardt
S.J. Puglisi
T. Kasai
U. Manber
Publication venue: Elsevier
Publication date: 01/01/2008
Field of study

This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by Elsevier and can be found at: http://www.journals.elsevier.com/expert-systems-with-applications/.The problem of scheduling jobs to minimize total weighted tardiness in flowshops,\ud with the possibility of evolving into hybrid flowshops in the future, is investigated in\ud this paper. As this research is guided by a real problem in industry, the flowshop\ud considered has considerable flexibility, which stimulated the development of an\ud innovative methodology for this research. Each stage of the flowshop currently has\ud one or several identical machines. However, the manufacturing company is planning\ud to introduce additional machines with different capabilities in different stages in the\ud near future. Thus, the algorithm proposed and developed for the problem is not only\ud capable of solving the current flow line configuration but also the potential new\ud configurations that may result in the future. A meta-heuristic search algorithm based\ud on Tabu search is developed to solve this NP-hard, industry-guided problem. Six\ud different initial solution finding mechanisms are proposed. A carefully planned\ud nested split-plot design is performed to test the significance of different factors and\ud their impact on the performance of the different algorithms. To the best of our\ud knowledge, this research is the first of its kind that attempts to solve an industry-guided\ud problem with the concern for future developments

CiteSeerX

Crossref

ScholarsArchive@OSU

RMIT Research Repository

Efficient LZ78 factorization of grammar compressed text

Author: A. Amir
A. Jeż
E. Ukkonen
E.M. McCreight
J. Jansson
J. Westbrook
J. Ziv
J. Ziv
K. Goto
K. Goto
M. Crochemore
M. Li
M. Li
M.A. Bender
O. Berkman
P. Weiner
R. Cilibrasi
T. Kida
V. Freschi
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size

n

representing a text

S

of length

N

, our algorithm computes the LZ78 factorization of

T

O(n\sqrt{N}+m\log N)

time and

O(n\sqrt{N}+m)

space, where

m

is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the

n\sqrt{N}

term in the time and space complexities becomes either

nL

, where

L

is the length of the longest LZ78 factor, or

(N - \alpha)

where

\alpha \geq 0

is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of

S

of a certain length. Since

m = O(N/\log_\sigma N)

where

\sigma

is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when

\sigma

is constant, and can be more efficient when the text is compressible, i.e. when

m

and

n

are small.Comment: SPIRE 201

arXiv.org e-Print Archive

Crossref

On the suitability of suffix arrays for lempel-ziv data compression

Author: D. Gusfield
D. Salomon
E. McCreight
E. Ukkonen
J. Karkainen
J. Storer
J. Ziv
K. Sadakane
M. Abouelhoda
U. Manber
Publication venue: Springer-Verlag Berlin
Publication date: 01/01/2009
Field of study

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algorithm to search over it. We show that the resulting encoder attains roughly the same compression ratios as those based on suffix trees. However, the amount of memory required by the suffix array is fixed, and much lower than the variable amount of memory used by encoders based on suffix trees (which depends on the text to encode). We conclude that suffix arrays, when compared to suffix trees in terms of the trade-off among time, memory, and compression ratio, may be preferable in scenarios (e.g., embedded systems) where memory is at a premium and high speed is not critical

Repositório Científico do Instituto Politécnico de Lisboa

Crossref

Probabilistic retrieval of OCR degraded text using N-grams

Author: A. Zamora
C. Pierce
D.J. Cohen
E. Ukkonen
H. Turtle
J. Zobel
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Efficient Algorithms for String-Based Negative Selection

Author: E. Ukkonen
J. Timmis
S. Forrest
T. Stibor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract. String-based negative selection is an immune-inspired classi-fication scheme: Given a self-set S of strings, generate a set D of detectors that do not match any element of S. Then, use these detectors to parti-tion a monitor set M into self and non-self elements. Implementations of this scheme are often impractical because they need exponential time in the size of S to construct D. Here, we consider r-chunk and r-contiguous detectors, two common implementations that suffer from this problem, and show that compressed representations of D are constructible in poly-nomial time for any given S and r. Since these representations can them-selves be used to classify the elements in M, the worst-case running time of r-chunk and r-contiguous detector based negative selection is reduced from exponential to polynomial.

CiteSeerX

Crossref

MOODS: fast search for position weight matrix matches in DNA sequences

Author: Beckstette
Brown
C. Pizzi
E. Ukkonen
J. Korhonen
Lenhard
Matys
P. Martinmaki
P. Rastas
Staden
Stormo
Wu
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Summary: MOODS (MOtif Occurrence Detection Suite) is a software package for matching position weight matrices against DNA sequences. MOODS implements state-of-the-art online matching algorithms, achieving considerably faster scanning speed than with a simple brute-force search. MOODS is written in C++, with bindings for the popular BioPerl and Biopython toolkits. It can easily be adapted for different purposes and integrated into existing workflows. It can also be used as a C++ library

Crossref

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Genome Biol.

Author: Brazma A.
Coulson R.
Manke T.
Palin K.
Sand O.
Ukkonen E.
van Helden J.
Vingron M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/01/2009
Field of study

With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome

MPG.PuRe