Search CORE

113 research outputs found

Using Avida to test the effects of natural selection on phylogenetic reconstruction methods

Author: Hagstrom George I.
Hang Dehua H.
Ofria Charles
Torng Eric
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2004
Field of study

Phylogenetic trees group organisms by their ancestral relationships. There are a number of distinct algorithms used to reconstruct these trees from molecular sequence data, but different methods sometimes give conflicting results. Since there are few precisely known phylogenies, simulations are typically used to test the quality of reconstruction algorithms. These simulations randomly evolve strings of symbols to produce a tree, and then the algorithms are run with the tree leaves as inputs. Here we use Avida to test two widely used reconstruction methods, which gives us the chance to observe the effect of natural selection on tree reconstruction. We find that if the organisms undergo natural selection between branch points, the methods will be successful even on very large time scales. However, these algorithms often falter when selection is absent

CiteSeerX

Caltech Authors

Maximizing Network Topology Lifetime Using Mobile Node Rotation

Author: Eric Torng
Fatme El-Moukaddem
Guoliang Xing
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

The effect of natural selection on the performance of maximum parsimony

Author: Hang Dehua
Ofria Charles
Schmidt Thomas M
Torng Eric
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Maximum parsimony is one of the most commonly used and extensively studied phylogeny reconstruction methods. While current evaluation methodologies such as computer simulations provide insight into how well maximum parsimony reconstructs phylogenies, they tell us little about how well maximum parsimony performs on taxa drawn from populations of organisms that evolved subject to <it>natural selection </it>in addition to the random factors of drift and mutation. It is clear that natural selection has a significant impact on <it>Among Site Rate Variation </it>(ASRV) and the rate of accepted substitutions; that is, accepted mutations do not occur with uniform probability along the genome and some substitutions are more likely to occur than other substitutions. However, little is know about how ASRV and non-uniform character substitutions impact the performance of reconstruction methods such as maximum parsimony. To gain insight into these issues, we study how well maximum parsimony performs with data generated by Avida, a digital life platform where populations of digital organisms evolve subject to natural selective pressures. Results We first identify conditions where natural selection does affect maximum parsimony's reconstruction accuracy. In general, as we increase the probability that a significant adaptation will occur in an intermediate ancestor, the performance of maximum parsimony improves. In fact, maximum parsimony can correctly reconstruct small 4 taxa trees on data that have received surprisingly many mutations if the intermediate ancestor has received a significant adaptation. We demonstrate that this improved performance of maximum parsimony is attributable more to ASRV than to non-uniform character substitutions. Conclusion Maximum parsimony, as well as most other phylogeny reconstruction methods, may perform significantly better on actual biological data than is currently suggested by computer simulation studies because of natural selection. This is largely due to specific sites becoming fixed in the genome that perform functions associated with an improved fitness.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SRPT optimally utilizes faster machines to minimize flow time

Author: Coulston C.
Eric Torng
Jason McCullough
Leonardi S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A Unified Analysis of Paging and Caching

Author: Eric Torng
Publication venue
Publication date: 01/01/1998
Field of study

Paging (caching) is the problem of managing a twolevel memory hierarchy in order to minimize the time required to process a sequence of memory accesses. In order to measure this quantity, we define the system parameter miss penalty to represent the extra time required to access slow memory. In the context of paging, miss penalty is large, so most previous studies of on-line paging have implicitly set miss penalty = 1 in order to simplify the model. We show that this seemingly insignificant simplification substantially alters the precision of derived results. Consequently, we reintroduce miss penalty to the paging problem and present a more accurate analysis of on-line paging (and caching). We validate using this more accurate model by deriving intuitively appealing results for the paging problem which cannot be derived using the simplified model. 1 Introduction Over the past decade, competitive analysis has been extensively used to analyze the performance of paging 1 algorithms [20..

CiteSeerX

1 Applying Extra-Resource Analysis to Load Balancing

Author: Eric Torng
Eric Torng
Mark Brehob
Patchrawat Uthaisombut
Publication venue
Publication date
Field of study

Previously, extra-resource analysis has been used to argue that certain on-line algorithms are good choices for solving specific problems because these algorithms perform well with respect to the optimal off-line algorithm when given extra resources. We now introduce a new application for extra-resource analysis: deriving a qualitative divergence between off-line and on-line algorithms. We do this for the load balancing problem, the problem of assigning a list of jobs on m identical machines to minimize the makespan, the maximum load on any machine. We analyze the worst-case performance of on-line and off-line approximation algorithms relative to the performance of the optimal off-line algorithm when the approximation algorithms have k extra machines. Our main results are the following: The Longest-Processing-Time (LPT) algorithm will produce a schedule with makespan no larger than that of the optimal off-line algorithm if LPT has at least (4m − 1)/3 machines while the optimal off-line algorithm has m machines. In contrast, no on-line algorithm can guarantee the same with any number of extra machines

CiteSeerX

Non-Omniscient Scheduling

Author: Eric Torng
Publication venue
Publication date
Field of study

The goal of this research is to derive practically meaningful theoretical results which will aid the practitioner in the design and implementation of software and hardware systems. This leads to an immediate subgoal of finding models which are complex enough to include all the relevant attributes of a particular system but simple enough to be tractable to analysis. One particular attribute of many real systems that is often ignored by theoreticians is the fact that many systems must operate with only partial information about the input instance. To model the many forms of partial information that systems must cope with, we present several non-omniscient models of computation which generalize the previously used on-line model of computation which models lack of knowledge of the future. Within these models, we use competitive analysis to analyze and evaluate scheduling algorithms that arise in the design and implementation of three specific dynamic systems. For two systems, we offer new..

CiteSeerX

Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches.

Author: Eric Torng
Meznah Almutairy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Bioinformatics applications and pipelines increasingly use k-mer indexes to search for similar sequences. The major problem with k-mer indexes is that they require lots of memory. Sampling is often used to reduce index size and query time. Most applications use one of two major types of sampling: fixed sampling and minimizer sampling. It is well known that fixed sampling will produce a smaller index, typically by roughly a factor of two, whereas it is generally assumed that minimizer sampling will produce faster query times since query k-mers can also be sampled. However, no direct comparison of fixed and minimizer sampling has been performed to verify these assumptions. We systematically compare fixed and minimizer sampling using the human genome as our database. We use the resulting k-mer indexes for fixed sampling and minimizer sampling to find all maximal exact matches between our database, the human genome, and three separate query sets, the mouse genome, the chimp genome, and an NGS data set. We reach the following conclusions. First, using larger k-mers reduces query time for both fixed sampling and minimizer sampling at a cost of requiring more space. If we use the same k-mer size for both methods, fixed sampling requires typically half as much space whereas minimizer sampling processes queries only slightly faster. If we are allowed to use any k-mer size for each method, then we can choose a k-mer size such that fixed sampling both uses less space and processes queries faster than minimizer sampling. The reason is that although minimizer sampling is able to sample query k-mers, the number of shared k-mer occurrences that must be processed is much larger for minimizer sampling than fixed sampling. In conclusion, we argue that for any application where each shared k-mer occurrence must be processed, fixed sampling is the right sampling method

Directory of Open Access Journals

2 The k-client problem

Author: Eric Torng
Eric Torng
Houman Alborzi
Patchrawat Uthaisombut
Stephen Wagner
Publication venue
Publication date
Field of study

Virtually all previous research in on-line algorithms has focused on single-threaded systems where only a single sequence of requests compete for system resources. To model multi-threaded on-line systems, we define and analyze the k-client problem, a dual of the well-studied k-server problem. In the basic k-client problem, there is a single server and k clients, each of which generates a sequence of requests for service in a metric space. The crux of the problem is deciding which client’s request the single server should service rather than which server should be used to service the current request. We also consider variations where requests have non-zero processing times and where there are multiple servers as well as multiple clients. We evaluate the performance of algorithms using several cost functions including maximum completion time and average completion time. Two of the main results we derive are tight bounds on the performance of several commonly studied disk scheduling algorithms and lower bounds of lg k 2 +1 on the competitive ratio of any on-line algorithm for the maximum completion time and average completion time cost functions when k is a power of 2. Most of our results are essentially identical for the maximum completion time and average completion time cost functions

CiteSeerX