54,579 research outputs found
Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear
time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein
profiles from biological samples with the aim of discovering biomarkers for
disease. However, the raw protein profiles suffer from several sources of bias
or systematic variation which need to be removed via pre-processing before
meaningful downstream analysis of the data can be undertaken. Baseline
subtraction, an early pre-processing step that removes the non-peptide signal
from the spectra, is complicated by the following: (i) each spectrum has, on
average, wider peaks for peptides with higher mass-to-charge ratios (m/z), and
(ii) the time-consuming and error-prone trial-and-error process for optimising
the baseline subtraction input arguments. With reference to the aforementioned
complications, we present an automated pipeline that includes (i) a novel
`continuous' line segment algorithm that efficiently operates over data with a
transformed m/z-axis to remove the relationship between peptide mass and peak
width, and (ii) an input-free algorithm to estimate peak widths on the
transformed m/z scale. The automated baseline subtraction method was deployed
on six publicly available proteomic MS datasets using six different m/z-axis
transformations. Optimality of the automated baseline subtraction pipeline was
assessed quantitatively using the mean absolute scaled error (MASE) when
compared to a gold-standard baseline subtracted signal. Near-optimal baseline
subtraction was achieved using the automated pipeline. The advantages of the
proposed pipeline include informed and data specific input arguments for
baseline subtraction methods, the avoidance of time-intensive and subjective
piecewise baseline subtraction, and the ability to automate baseline
subtraction completely. Moreover, individual steps can be adopted as
stand-alone routines.Comment: 50 pages, 19 figure
Efficient Summing over Sliding Windows
This paper considers the problem of maintaining statistic aggregates over the
last W elements of a data stream. First, the problem of counting the number of
1's in the last W bits of a binary stream is considered. A lower bound of
{\Omega}(1/{\epsilon} + log W) memory bits for W{\epsilon}-additive
approximations is derived. This is followed by an algorithm whose memory
consumption is O(1/{\epsilon} + log W) bits, indicating that the algorithm is
optimal and that the bound is tight. Next, the more general problem of
maintaining a sum of the last W integers, each in the range of {0,1,...,R}, is
addressed. The paper shows that approximating the sum within an additive error
of RW{\epsilon} can also be done using {\Theta}(1/{\epsilon} + log W) bits for
{\epsilon}={\Omega}(1/W). For {\epsilon}=o(1/W), we present a succinct
algorithm which uses B(1 + o(1)) bits, where B={\Theta}(Wlog(1/W{\epsilon})) is
the derived lower bound. We show that all lower bounds generalize to randomized
algorithms as well. All algorithms process new elements and answer queries in
O(1) worst-case time.Comment: A shorter version appears in SWAT 201
Element Distinctness, Frequency Moments, and Sliding Windows
We derive new time-space tradeoff lower bounds and algorithms for exactly
computing statistics of input data, including frequency moments, element
distinctness, and order statistics, that are simple to calculate for sorted
data. We develop a randomized algorithm for the element distinctness problem
whose time T and space S satisfy T in O (n^{3/2}/S^{1/2}), smaller than
previous lower bounds for comparison-based algorithms, showing that element
distinctness is strictly easier than sorting for randomized branching programs.
This algorithm is based on a new time and space efficient algorithm for finding
all collisions of a function f from a finite set to itself that are reachable
by iterating f from a given set of starting points. We further show that our
element distinctness algorithm can be extended at only a polylogarithmic factor
cost to solve the element distinctness problem over sliding windows, where the
task is to take an input of length 2n-1 and produce an output for each window
of length n, giving n outputs in total. In contrast, we show a time-space
tradeoff lower bound of T in Omega(n^2/S) for randomized branching programs to
compute the number of distinct elements over sliding windows. The same lower
bound holds for computing the low-order bit of F_0 and computing any frequency
moment F_k, k neq 1. This shows that those frequency moments and the decision
problem F_0 mod 2 are strictly harder than element distinctness. We complement
this lower bound with a T in O(n^2/S) comparison-based deterministic RAM
algorithm for exactly computing F_k over sliding windows, nearly matching both
our lower bound for the sliding-window version and the comparison-based lower
bounds for the single-window version. We further exhibit a quantum algorithm
for F_0 over sliding windows with T in O(n^{3/2}/S^{1/2}). Finally, we consider
the computations of order statistics over sliding windows.Comment: arXiv admin note: substantial text overlap with arXiv:1212.437
- …