7,203 research outputs found

    Likelihood-based inference of B-cell clonal families

    Full text link
    The human immune system depends on a highly diverse collection of antibody-making B cells. B cell receptor sequence diversity is generated by a random recombination process called "rearrangement" forming progenitor B cells, then a Darwinian process of lineage diversification and selection called "affinity maturation." The resulting receptors can be sequenced in high throughput for research and diagnostics. Such a collection of sequences contains a mixture of various lineages, each of which may be quite numerous, or may consist of only a single member. As a step to understanding the process and result of this diversification, one may wish to reconstruct lineage membership, i.e. to cluster sampled sequences according to which came from the same rearrangement events. We call this clustering problem "clonal family inference." In this paper we describe and validate a likelihood-based framework for clonal family inference based on a multi-hidden Markov Model (multi-HMM) framework for B cell receptor sequences. We describe an agglomerative algorithm to find a maximum likelihood clustering, two approximate algorithms with various trade-offs of speed versus accuracy, and a third, fast algorithm for finding specific lineages. We show that under simulation these algorithms greatly improve upon existing clonal family inference methods, and that they also give significantly different clusters than previous methods when applied to two real data sets

    Scaling Similarity Joins over Tree-Structured Data

    Get PDF
    Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this paper, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph. In order to scale up the join, the computed subgraphs are managed in a two-layer index. Our experimental results on real and synthetic data collections show that our approach outperforms the state-of-the-art methods by up to an order of magnitude.published_or_final_versio

    Optimal Decomposition Strategy For Tree Edit Distance

    Get PDF
    An ordered labeled tree is a tree where the left-to-right order among siblings is significant. Given two ordered labeled trees, the edit distance between them is the minimum cost edit operations that convert one tree to the other. In this thesis, we present an algorithm for the tree edit distance problem by using the optimal tree decomposition strategy. By combining the vertical compression of trees with optimal decomposition we can significantly reduce the running time of the algorithm. We compare our method with other methods both theoretically and experimentally. The test results show that our strategies on compressed trees are by far the best decomposition strategy, creating the least number of relevant sub-problems

    A* Orthogonal Matching Pursuit: Best-First Search for Compressed Sensing Signal Recovery

    Full text link
    Compressed sensing is a developing field aiming at reconstruction of sparse signals acquired in reduced dimensions, which make the recovery process under-determined. The required solution is the one with minimum â„“0\ell_0 norm due to sparsity, however it is not practical to solve the â„“0\ell_0 minimization problem. Commonly used techniques include â„“1\ell_1 minimization, such as Basis Pursuit (BP) and greedy pursuit algorithms such as Orthogonal Matching Pursuit (OMP) and Subspace Pursuit (SP). This manuscript proposes a novel semi-greedy recovery approach, namely A* Orthogonal Matching Pursuit (A*OMP). A*OMP performs A* search to look for the sparsest solution on a tree whose paths grow similar to the Orthogonal Matching Pursuit (OMP) algorithm. Paths on the tree are evaluated according to a cost function, which should compensate for different path lengths. For this purpose, three different auxiliary structures are defined, including novel dynamic ones. A*OMP also incorporates pruning techniques which enable practical applications of the algorithm. Moreover, the adjustable search parameters provide means for a complexity-accuracy trade-off. We demonstrate the reconstruction ability of the proposed scheme on both synthetically generated data and images using Gaussian and Bernoulli observation matrices, where A*OMP yields less reconstruction error and higher exact recovery frequency than BP, OMP and SP. Results also indicate that novel dynamic cost functions provide improved results as compared to a conventional choice.Comment: accepted for publication in Digital Signal Processin

    Crafting Concurrent Data Structures

    Get PDF
    Concurrent data structures lie at the heart of modern parallel programs. The design and implementation of concurrent data structures can be challenging due to the demand for good performance (low latency and high scalability) and strong progress guarantees. In this dissertation, we enrich the knowledge of concurrent data structure design by proposing new implementations, as well as general techniques to improve the performance of existing ones.The first part of the dissertation present an unordered linked list implementation that supports nonblocking insert, remove, and lookup operations. The algorithm is based on a novel ``enlist\u27\u27 technique that greatly simplifies the task of achieving wait-freedom. The value of our technique is also demonstrated in the creation of other wait-free data structures such as stacks and hash tables.The second data structure presented is a nonblocking hash table implementation which solves a long-standing design challenge by permitting the hash table to dynamically adjust its size in a nonblocking manner. Additionally, our hash table offers strong theoretical properties such as supporting unbounded memory. In our algorithm, we introduce a new ``freezable set\u27\u27 abstraction which allows us to achieve atomic migration of keys during a resize. The freezable set abstraction also enables highly efficient implementations which maximally exploit the processor cache locality. In experiments, we found our lock-free hash table performs consistently better than state-of-the-art implementations, such as the split-ordered list.The third data structure we present is a concurrent priority queue called the ``mound\u27\u27. Our implementations include nonblocking and lock-based variants. The mound employs randomization to reduce contention on concurrent insert operations, and decomposes a remove operation into smaller atomic operations so that multiple remove operations can execute in parallel within a pipeline. In experiments, we show that the mound can provide excellent latency at low thread counts.Lastly, we discuss how hardware transactional memory (HTM) can be used to accelerate existing nonblocking concurrent data structure implementations. We propose optimization techniques that can significantly improve the performance (1.5x to 3x speedups) of a variety of important concurrent data structures, such as binary search trees and hash tables. The optimizations also preserve the strong progress guarantees of the original implementations

    Constraint Programming based Local Search for the Vehicle Routing Problem with Time Windows

    Get PDF
    El projecte es centra en el "Vehicle Routing Problem with Time Windows". Explora i testeja un mètode basat en una formulació del problema en termes de programació de restriccions. Implementa un mètode de cerca local amb la capacitat de fer grans moviments anomenat "Large Neighbourhood Search"

    Fast network configuration in Software Defined Networking

    Get PDF
    Software Defined Networking (SDN) provides a framework to dynamically adjust and re-program the data plane with the use of flow rules. The realization of highly adaptive SDNs with the ability to respond to changing demands or recover after a network failure in a short period of time, hinges on efficient updates of flow rules. We model the time to deploy a set of flow rules by the update time at the bottleneck switch, and formulate the problem of selecting paths to minimize the deployment time under feasibility constraints as a mixed integer linear program (MILP). To reduce the computation time of determining flow rules, we propose efficient heuristics designed to approximate the minimum-deployment-time solution by relaxing the MILP or selecting the paths sequentially. Through extensive simulations we show that our algorithms outperform current, shortest path based solutions by reducing the total network configuration time up to 55% while having similar packet loss, in the considered scenarios. We also demonstrate that in a networked environment with a certain fraction of failed links, our algorithms are able to reduce the average time to reestablish disrupted flows by 40%

    Disassembly Planning and Costing Through Petri Net Approach

    Get PDF
    In the current consumer oriented environment, many new products appear in the market almost on a daily basis. Lured by advertisements and tempted by new product features, customers are constantly purchasing newer products. Acquiring newer products for often leads to throwing out older ones, but it is a totally different story for manufacturers. They need to consider the best way to reuse a product both for economic purposes and for environmental protection. Considerations for them often include: how to minimize total disassembly cost, how to achieve the lowest total disassembly time at each processing step, and how to sort valuable parts from hazardous parts as early as possible during the disassembly procedure. In this paper, we use a Disassembly Petri-Net (DPN) to generate the Disassembly Process Plan (DPP). This plan is a sequence of disassembly tasks from the initial stage of the whole product to the final stage where each part is separated from the other parts. This disassembly plan is very valuable for product recycling or remanufacturing. Prior to having the DPN, we apply an algorithm to generate a Disassembly Precedence Matrix (DPM) helped by the construction steps involved in SolidWorksâ„¢, a solid model software used to create the part in the first place. From the DPN, we find all feasible paths and generate the corresponding costs of disassembly based upon tool changes, changes in direction of the movement and individual part characteristics (e.g. hazardous components and recycle component). Cost data was extracted from previously published studies by Boothroyd et al. to obtain the handling time and disassembly time. Afterwards, we developed the optimal or near-optimal DPP for the best time and cost based disassembly options. In summary, this paper presents a systematic method to disassemble a part into its individual components and provides a cost figure for doing so. This is in contrast with many studies reported in the literature in that they concentrate either on a measure of disassembly complexity, or even if cost is presumably the driving force, their costs are arbitrary costs based on pre-selected values for such things as tool change penalty, disassembly direction change penalty or penalty for delaying removal of hazardous materials. In this paper, we are using disassembly times based on experimental work and/or industrial experience. Given the correct labor rate, our cost evaluation indeed yields a realistic cost value
    • …
    corecore