2 research outputs found

    Finding conserved patterns in biological sequences, networks and genomes

    Get PDF
    Biological patterns are widely used for identifying biologically interesting regions within macromolecules, classifying biological objects, predicting functions and studying evolution. Good pattern finding algorithms will help biologists to formulate and validate hypotheses in an attempt to obtain important insights into the complex mechanisms of living things. In this dissertation, we aim to improve and develop algorithms for five biological pattern finding problems. For the multiple sequence alignment problem, we propose an alternative formulation in which a final alignment is obtained by preserving pairwise alignments specified by edges of a given tree. In contrast with traditional NPhard formulations, our preserving alignment formulation can be solved in polynomial time without using a heuristic, while having very good accuracy. For the path matching problem, we take advantage of the linearity of the query path to reduce the problem to finding a longest weighted path in a directed acyclic graph. We can find k paths with top scores in a network from the query path in polynomial time. As many biological pathways are not linear, our graph matching approach allows a non-linear graph query to be given. Our graph matching formulation overcomes the common weakness of previous approaches that there is no guarantee on the quality of the results. For the gene cluster finding problem, we investigate a formulation based on constraining the overall size of a cluster and develop statistical significance estimates that allow direct comparisons of clusters of different sizes. We explore both a restricted version which requires that orthologous genes are strictly ordered within each cluster, and the unrestricted problem that allows paralogous genes within a genome and clusters that may not appear in every genome. We solve the first problem in polynomial time and develop practical exact algorithms for the second one. In the gene cluster querying problem, based on a querying strategy, we propose an efficient approach for investigating clustering of related genes across multiple genomes for a given gene cluster. By analyzing gene clustering in 400 bacterial genomes, we show that our algorithm is efficient enough to study gene clusters across hundreds of genomes

    Parameterized complexity and polynomial-time approximation schemes

    Get PDF
    According to the theory of NPcompleteness, many problems that have important realworld applications are NPhard. This excludes the possibility of solving them in polynomial time unless P=NP. A number of approaches have been proposed in dealing with NPhard problems, among them are approximation algorithms and parameterized algorithms. The study of approximation algorithms tries to find good enough solutions instead of optimal solutions in polynomial time, while parameterized algorithms try to give exact solutions when a natural parameter is small. In this thesis, we study the structural properties of parameterized computation and approximation algorithms for NP optimization problems. In particular, we investigate the relationship between parameterized complexity and polynomialtime approximation scheme (PTAS) for NP optimization problems. We give nice characterizations for two important subclasses in PTAS: Fully Polynomial Time Approximation Scheme (FPTAS) and Effcient Polynomial Time Approximation Scheme (EPTAS), using the theory of parameterized complexity. Our characterization of the class FPTAS has its advantages over the former characterizations, and our characterization of EPTAS is the first systematic investigation of this new but important approximation class. We develop new techniques to derive strong computational lower bounds for certain parameterized problems based on the theory of parameterized complexity. For example, we prove that unless an unlikely collapse occurs in parameterized complexity theory, the clique problem could not be solved in time O(f (k)no(k)) for any function f . This lower bound matches the upper bound of the trivial algorithm that simply enumerates and checks all subsets of k vertices in the given graph of n vertices. We then extend our techniques to derive computational lower bounds for PTAS and EPTAS algorithms of NP optimization problems. We prove that certain NP optimization problems with known PTAS algorithms have no PTAS algorithms of running time O(f (1/Epsilon)no(1/Epsilon)) for any function f . Therefore, for these NP optimization problems, although theoretically they can be approximated in polynomial time to an arbitrarily small error bound Epsilon, they have no practically effective approximation algorithms for small error bound Epsilon. To our knowledge, this is the first time such lower bound results have been derived for PTAS algorithms. This seems to open a new direction for the study of computational lower bounds on the approximability of NP optimization problems
    corecore