23,340 research outputs found

    A Running Time Improvement for Two Thresholds Two Divisors Algorithm

    Get PDF
    Chunking algorithms play an important role in data de-duplication systems. The Basic Sliding Window (BSW) algorithm is the first prototype of the content-based chunking algorithm which can handle most types of data. The Two Thresholds Two Divisors (TTTD) algorithm was proposed to improve the BSW algorithm in terms of controlling the variations of the chunk-size. In this project, we investigate and compare the BSW algorithm and TTTD algorithm from different factors by a series of systematic experiments. Up to now, no paper conducts these experimental evaluations for these two algorithms. This is the first value of this paper. According to our analyses and the results of experiments, we provide a running time improvement for the TTTD algorithm. Our new solution reduces about 7 % of the total running time and also reduces about 50 % of the large-sized chunks while comparing with the original TTTD algorithm and make average chunk-size closer to the expected chunk-size. These significant results are the second important value of this project

    Minimum Cuts in Near-Linear Time

    Full text link
    We significantly improve known time bounds for solving the minimum cut problem on undirected graphs. We use a ``semi-duality'' between minimum cuts and maximum spanning tree packings combined with our previously developed random sampling techniques. We give a randomized algorithm that finds a minimum cut in an m-edge, n-vertex graph with high probability in O(m log^3 n) time. We also give a simpler randomized algorithm that finds all minimum cuts with high probability in O(n^2 log n) time. This variant has an optimal RNC parallelization. Both variants improve on the previous best time bound of O(n^2 log^3 n). Other applications of the tree-packing approach are new, nearly tight bounds on the number of near minimum cuts a graph may have and a new data structure for representing them in a space-efficient manner

    TRECVID 2008 - goals, tasks, data, evaluation mechanisms and metrics

    Get PDF
    The TREC Video Retrieval Evaluation (TRECVID) 2008 is a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last 7 years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. In 2008, 77 teams (see Table 1) from various research organizations --- 24 from Asia, 39 from Europe, 13 from North America, and 1 from Australia --- participated in one or more of five tasks: high-level feature extraction, search (fully automatic, manually assisted, or interactive), pre-production video (rushes) summarization, copy detection, or surveillance event detection. The copy detection and surveillance event detection tasks are being run for the first time in TRECVID. This paper presents an overview of TRECVid in 2008

    Predicting Intermediate Storage Performance for Workflow Applications

    Full text link
    Configuring a storage system to better serve an application is a challenging task complicated by a multidimensional, discrete configuration space and the high cost of space exploration (e.g., by running the application with different storage configurations). To enable selecting the best configuration in a reasonable time, we design an end-to-end performance prediction mechanism that estimates the turn-around time of an application using storage system under a given configuration. This approach focuses on a generic object-based storage system design, supports exploring the impact of optimizations targeting workflow applications (e.g., various data placement schemes) in addition to other, more traditional, configuration knobs (e.g., stripe size or replication level), and models the system operation at data-chunk and control message level. This paper presents our experience to date with designing and using this prediction mechanism. We evaluate this mechanism using micro- as well as synthetic benchmarks mimicking real workflow applications, and a real application.. A preliminary evaluation shows that we are on a good track to meet our objectives: it can scale to model a workflow application run on an entire cluster while offering an over 200x speedup factor (normalized by resource) compared to running the actual application, and can achieve, in the limited number of scenarios we study, a prediction accuracy that enables identifying the best storage system configuration

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    A geometric method for model reduction of biochemical networks with polynomial rate functions

    Full text link
    Model reduction of biochemical networks relies on the knowledge of slow and fast variables. We provide a geometric method, based on the Newton polytope, to identify slow variables of a biochemical network with polynomial rate functions. The gist of the method is the notion of tropical equilibration that provides approximate descriptions of slow invariant manifolds. Compared to extant numerical algorithms such as the intrinsic low dimensional manifold method, our approach is symbolic and utilizes orders of magnitude instead of precise values of the model parameters. Application of this method to a large collection of biochemical network models supports the idea that the number of dynamical variables in minimal models of cell physiology can be small, in spite of the large number of molecular regulatory actors

    A geometric method for model reduction of biochemical networks with polynomial rate functions

    No full text
    corecore