Search CORE

27 research outputs found

Subsequences and Supersequences of Strings

Author: Fraser Campbell Bryce
Publication venue: ProQuest Dissertations & Theses,
Publication date: 01/01/1995
Field of study

Stringology - the study of strings - is a branch of algorithmics which been the sub-ject of mounting interest in recent years. Very recently, two books [M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, 1995] and [G. Stephen, String Searching Algorithms, World Scientific, 1994] have been published on the subject and at least two others are known to be in preparation. Problems on strings arise in information retrieval, version control, automatic spelling correction, and many other domains. However the greatest motivation for recent work in stringology has come from the field of molecular biology. String problems occur, for example, in genetic sequence construction, genetic sequence comparison, and phylogenetic tree construction. In this thesis we study a variety of string problems from a theoretical perspective. In particular, we focus on problems involving subsequences and supersequences of strings

Glasgow Theses Service

Towards a better solution to the shortest common supersequence problem: the deposition and reduction algorithm

Author: D Gusfield
D Sankoff
DE Foulser
EA Hubbell
G Nicosia
Hon Wai Leong
J Branke
JA Storer
K Ning
Kang Ning
P Barone
R Michels
RW Irving
S Kasif
T Jiang
TH Cormen
TK Sellis
VG Timkovsky
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The problem of finding a Shortest Common Supersequence (SCS) of a set of sequences is an important problem with applications in many areas. It is a key problem in biological sequences analysis. The SCS problem is well-known to be NP-complete. Many heuristic algorithms have been proposed. Some heuristics work well on a few long sequences (as in sequence comparison applications); others work well on many short sequences (as in oligo-array synthesis). Unfortunately, most do not work well on large SCS instances where there are many, long sequences. RESULTS: In this paper, we present a Deposition and Reduction (DR) algorithm for solving large SCS instances of biological sequences. There are two processes in our DR algorithm: deposition process, and reduction process. The deposition process is responsible for generating a small set of common supersequences; and the reduction process shortens these common supersequences by removing some characters while preserving the common supersequence property. Our evaluation on simulated data and real DNA and protein sequences show that our algorithm consistently produces the best results compared to many well-known heuristic algorithms, and especially on large instances. CONCLUSION: Our DR algorithm provides a partial answer to the open problem of designing efficient heuristic algorithm for SCS problem on many long sequences. Our algorithm has a bounded approximation ratio. The algorithm is efficient, both in running time and space complexity and our evaluation shows that it is practical even for SCS problems on many long sequences

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences

Average-case analysis via incompressibility

Author: Li M. (Ming)
Vitányi P.M.B. (Paul)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1997
Field of study

CWI's Institutional Repository

Efficient combinatorial algorithms for DNA microarray design

Author: Li Ying
Publication venue
Publication date
Field of study

University of Liverpool Repository

Expected length of longest common subsequences

Author: Dancík Vladimír
Publication venue
Publication date
Field of study

A longest common subsequence of two sequences is a sequence that is a subsequence of both the given sequences and has largest possible length. It is known that the expected length of a longest common subsequence is proportional to the length of the given sequences. The proportion, denoted by 7k, is dependent on the alphabet size k and the exact value of this proportion is not known even for a binary alphabet. To obtain lower bounds for the constants 7k, finite state machines computing a common subsequence of the inputs are built. Analysing the behaviour of the machines for random inputs we get lower bounds for the constants 7k. The analysis of the machines is based on the theory of Markov chains. An algorithm for automated production of lower bounds is described. To obtain upper bounds for the constants 7k, collations pairs of sequences with a marked common subsequence - are defined. Upper bounds for the number of collations of ‘small size’ can be easily transformed to upper bounds for the constants 7k. Combinatorial analysis is used to bound the number of collations. The methods used for producing bounds on the expected length of a common subsequence of two sequences are also used for other problems, namely a longest common subsequence of several sequences, a shortest common supersequence and a maximal adaptability

Warwick Research Archives Portal Repository

Algorithms for the Analysis of Spatio-Temporal Data from Team Sports

Author: Horton Michael
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 18/01/2018
Field of study

Modern object tracking systems are able to simultaneously record trajectories—sequences of time-stamped location points—for large numbers of objects with high frequency and accuracy. The availability of trajectory datasets has resulted in a consequent demand for algorithms and tools to extract information from these data. In this thesis, we present several contributions intended to do this, and in particular, to extract information from trajectories tracking football (soccer) players during matches. Football player trajectories have particular properties that both facilitate and present challenges for the algorithmic approaches to information extraction. The key property that we look to exploit is that the movement of the players reveals information about their objectives through cooperative and adversarial coordinated behaviour, and this, in turn, reveals the tactics and strategies employed to achieve the objectives. While the approaches presented here naturally deal with the application-specific properties of football player trajectories, they also apply to other domains where objects are tracked, for example behavioural ecology, traffic and urban planning

Sydney eScholarship

Website Fingerprinting: Attacks and Defenses

Author: Wang Tao
Publication venue: 'University of Waterloo'
Publication date: 02/12/2015
Field of study

Website fingerprinting attacks allow a local, passive eavesdropper to determine a client's web activity by leveraging features from her packet sequence. These attacks break the privacy expected by users of privacy technologies, including low-latency anonymity networks such as proxies, VPNs, or Tor. As a discipline, website fingerprinting is an application of machine learning techniques to the diverse field of privacy. To perform a website fingerprinting attack, the eavesdropping attacker passively records the time, direction, and size of the client's packets. Then, he uses a machine learning algorithm to classify the packet sequence so as to determine the web page it came from. In this work we construct and evaluate three new website fingerprinting attacks: Wa-OSAD, an attack using a modified edit distance as the kernel of a Support Vector Machine, achieving greater accuracy than attacks before it; Wa-FLev, an attack that quickly approximates an edit distance computation, allowing a low-resource attacker to deanonymize many clients at once; and Wa-kNN, the current state-of-the-art attack, which is effective and fast, with a very low false positive rate in the open-world scenario. While our new attacks perform well in theoretical scenarios, there are significant differences between the situation in the wild and in the laboratory. Specifically, we tackle concerns regarding the freshness of the training set, splitting packet sequences so that each part corresponds to one web page access (for easy classification), and removing misleading noise from the packet sequence. To defend ourselves against such attacks, we need defenses that are both efficient and provable. We rigorously define and motivate the notion of a provable defense in this work, and we present three new provable defenses: Tamaraw, which is a relatively efficient way to flood the channel with fixed-rate packet scheduling; Supersequence, which uses smallest common supersequences to save on bandwidth overhead; and Walkie-Talkie, which uses half-duplex communication to significantly reduce both bandwidth and time overhead, allowing a truly efficient yet provable defense

University of Waterloo's Institutional Repository

Solution Biases and Pheromone Representation Selection in Ant Colony Optimisation.

Author: Rath Werner
Schmidt S.
Publication venue
Publication date: 01/10/2005
Field of study

Bond University Research Portal

Solution Biases and Pheromone Representation Selection in Ant Colony Optimisation.

Author: Montgomery Erin
Publication venue
Publication date: 01/10/2005
Field of study

Combinatorial optimisation problems (COPs) pervade human society: scheduling, design, layout, distribution, timetabling, resource allocation and project management all feature problems where the solution is some combination of elements, the overall value of which needs to be either maximised or minimised (i.e., optimised), typically subject to a number of constraints. Thus, techniques to efficiently solve such problems are an important area of research. A popular group of optimisation algorithms are the metaheuristics, approaches that specify how to search the space of solutions in a problem independent way so that high quality solutions are likely to result in a reasonable amount of computational time. Although metaheuristic algorithms are specified in a problem independent manner, they must be tailored to suit each particular problem to which they are applied. This thesis investigates a number of aspects of the application of the relatively new Ant Colony Optimisation (ACO) metaheuristic to different COPs. The standard ACO metaheuristic is a constructive algorithm loosely based on the foraging behaviour of ant colonies, which are able to find the shortest path to a food source by indirect communication through pheromones. ACO’s artificial pheromone represents a model of the solution components that its artificial ants use to construct solutions. Developing an appropriate pheromone representation is a key aspect of the application of ACO to a problem. An examination of existing ACO applications and the constructive approach more generally reveals how the metaheuristic can be applied more systematically across a range of COPs. The two main issues addressed in this thesis are biases inherent in the constructive process and the systematic selection of pheromone representations. The systematisation of ACO should lead to more consistently high performance of the algorithm across different problems. Additionally, it supports the creation of a generalised ACO system, capable of adapting itself to suit many different combinatorial problems without the need for manual intervention

Bond University Research Portal

Swinburne Research Bank

Beam search for the longest common subsequence problem

Author: Blesa Aguilera Maria Josep
Blum Christian
López Ibañez Manuel
Publication venue
Publication date: 01/10/2008
Field of study

The longest common subsequence problem is a classical string problem that concerns finding the common part of a set of strings. It has several important applications, for example, pattern recognition or computational biology. Most research efforts up to now have focused on solving this problem optimally. In comparison, only few works exist dealing with heuristic approaches. In this work we present a deterministic beam search algorithm. The results show that our algorithm outperforms classical approaches as well as recent metaheuristic approaches.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC