49,638 research outputs found
Peptide mass fingerprinting using field-programmable gate arrays
The reconfigurable computing paradigm, which exploits the flexibility and versatility of field-programmable gate arrays (FPGAs), has emerged as a powerful solution for speeding up time-critical algorithms. This paper describes a reconfigurable computing solution for processing raw mass spectrometric data generated by MALDI-TOF instruments. The hardware-implemented algorithms for denoising, baseline correction, peak identification, and deisotoping, running on a Xilinx Virtex-2 FPGA at 180 MHz, generate a mass fingerprint that is over 100 times faster than an equivalent algorithm written in C, running on a Dual 3-GHz Xeon server. The results obtained using the FPGA implementation are virtually identical to those generated by a commercial software package MassLynx
Graph Sample and Hold: A Framework for Big-Graph Analytics
Sampling is a standard approach in big-graph analytics; the goal is to
efficiently estimate the graph properties by consulting a sample of the whole
population. A perfect sample is assumed to mirror every property of the whole
population. Unfortunately, such a perfect sample is hard to collect in complex
populations such as graphs (e.g. web graphs, social networks etc), where an
underlying network connects the units of the population. Therefore, a good
sample will be representative in the sense that graph properties of interest
can be estimated with a known degree of accuracy. While previous work focused
particularly on sampling schemes used to estimate certain graph properties
(e.g. triangle count), much less is known for the case when we need to estimate
various graph properties with the same sampling scheme. In this paper, we
propose a generic stream sampling framework for big-graph analytics, called
Graph Sample and Hold (gSH). To begin, the proposed framework samples from
massive graphs sequentially in a single pass, one edge at a time, while
maintaining a small state. We then show how to produce unbiased estimators for
various graph properties from the sample. Given that the graph analysis
algorithms will run on a sample instead of the whole population, the runtime
complexity of these algorithm is kept under control. Moreover, given that the
estimators of graph properties are unbiased, the approximation error is kept
under control. Finally, we show the performance of the proposed framework (gSH)
on various types of graphs, such as social graphs, among others
An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting
Unsupervised sense induction methods offer a solution to the
problem of scarcity of semantic resources. These methods
automatically extract semantic information from textual data
and create resources adapted to specific applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates
bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a handcrafted semantic resource is used
Simpler and Better Algorithms for Minimum-Norm Load Balancing
Recently, Chakrabarty and Swamy (STOC 2019) introduced the minimum-norm load-balancing problem on unrelated machines, wherein we are given a set J of jobs that need to be scheduled on a set of m unrelated machines, and a monotone, symmetric norm; We seek an assignment sigma: J -> [m] that minimizes the norm of the resulting load vector load_{sigma} in R_+^m, where load_{sigma}(i) is the load on machine i under the assignment sigma. Besides capturing all l_p norms, symmetric norms also capture other norms of interest including top-l norms, and ordered norms. Chakrabarty and Swamy (STOC 2019) give a (38+epsilon)-approximation algorithm for this problem via a general framework they develop for minimum-norm optimization that proceeds by first carefully reducing this problem (in a series of steps) to a problem called min-max ordered load balancing, and then devising a so-called deterministic oblivious LP-rounding algorithm for ordered load balancing.
We give a direct, and simple 4+epsilon-approximation algorithm for the minimum-norm load balancing based on rounding a (near-optimal) solution to a novel convex-programming relaxation for the problem. Whereas the natural convex program encoding minimum-norm load balancing problem has a large non-constant integrality gap, we show that this issue can be remedied by including a key constraint that bounds the "norm of the job-cost vector." Our techniques also yield a (essentially) 4-approximation for: (a) multi-norm load balancing, wherein we are given multiple monotone symmetric norms, and we seek an assignment respecting a given budget for each norm; (b) the best simultaneous approximation factor achievable for all symmetric norms for a given instance
- …