49,638 research outputs found

    Peptide mass fingerprinting using field-programmable gate arrays

    Get PDF
    The reconfigurable computing paradigm, which exploits the flexibility and versatility of field-programmable gate arrays (FPGAs), has emerged as a powerful solution for speeding up time-critical algorithms. This paper describes a reconfigurable computing solution for processing raw mass spectrometric data generated by MALDI-TOF instruments. The hardware-implemented algorithms for denoising, baseline correction, peak identification, and deisotoping, running on a Xilinx Virtex-2 FPGA at 180 MHz, generate a mass fingerprint that is over 100 times faster than an equivalent algorithm written in C, running on a Dual 3-GHz Xeon server. The results obtained using the FPGA implementation are virtually identical to those generated by a commercial software package MassLynx

    Graph Sample and Hold: A Framework for Big-Graph Analytics

    Full text link
    Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks etc), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes used to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for big-graph analytics, called Graph Sample and Hold (gSH). To begin, the proposed framework samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state. We then show how to produce unbiased estimators for various graph properties from the sample. Given that the graph analysis algorithms will run on a sample instead of the whole population, the runtime complexity of these algorithm is kept under control. Moreover, given that the estimators of graph properties are unbiased, the approximation error is kept under control. Finally, we show the performance of the proposed framework (gSH) on various types of graphs, such as social graphs, among others

    An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting

    Get PDF
    Unsupervised sense induction methods offer a solution to the problem of scarcity of semantic resources. These methods automatically extract semantic information from textual data and create resources adapted to specific applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a handcrafted semantic resource is used

    Simpler and Better Algorithms for Minimum-Norm Load Balancing

    Get PDF
    Recently, Chakrabarty and Swamy (STOC 2019) introduced the minimum-norm load-balancing problem on unrelated machines, wherein we are given a set J of jobs that need to be scheduled on a set of m unrelated machines, and a monotone, symmetric norm; We seek an assignment sigma: J -> [m] that minimizes the norm of the resulting load vector load_{sigma} in R_+^m, where load_{sigma}(i) is the load on machine i under the assignment sigma. Besides capturing all l_p norms, symmetric norms also capture other norms of interest including top-l norms, and ordered norms. Chakrabarty and Swamy (STOC 2019) give a (38+epsilon)-approximation algorithm for this problem via a general framework they develop for minimum-norm optimization that proceeds by first carefully reducing this problem (in a series of steps) to a problem called min-max ordered load balancing, and then devising a so-called deterministic oblivious LP-rounding algorithm for ordered load balancing. We give a direct, and simple 4+epsilon-approximation algorithm for the minimum-norm load balancing based on rounding a (near-optimal) solution to a novel convex-programming relaxation for the problem. Whereas the natural convex program encoding minimum-norm load balancing problem has a large non-constant integrality gap, we show that this issue can be remedied by including a key constraint that bounds the "norm of the job-cost vector." Our techniques also yield a (essentially) 4-approximation for: (a) multi-norm load balancing, wherein we are given multiple monotone symmetric norms, and we seek an assignment respecting a given budget for each norm; (b) the best simultaneous approximation factor achievable for all symmetric norms for a given instance
    corecore