54,579 research outputs found

    Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm

    Get PDF
    Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios (m/z), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel `continuous' line segment algorithm that efficiently operates over data with a transformed m/z-axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed m/z scale. The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Near-optimal baseline subtraction was achieved using the automated pipeline. The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.Comment: 50 pages, 19 figure

    Efficient Summing over Sliding Windows

    Get PDF
    This paper considers the problem of maintaining statistic aggregates over the last W elements of a data stream. First, the problem of counting the number of 1's in the last W bits of a binary stream is considered. A lower bound of {\Omega}(1/{\epsilon} + log W) memory bits for W{\epsilon}-additive approximations is derived. This is followed by an algorithm whose memory consumption is O(1/{\epsilon} + log W) bits, indicating that the algorithm is optimal and that the bound is tight. Next, the more general problem of maintaining a sum of the last W integers, each in the range of {0,1,...,R}, is addressed. The paper shows that approximating the sum within an additive error of RW{\epsilon} can also be done using {\Theta}(1/{\epsilon} + log W) bits for {\epsilon}={\Omega}(1/W). For {\epsilon}=o(1/W), we present a succinct algorithm which uses B(1 + o(1)) bits, where B={\Theta}(Wlog(1/W{\epsilon})) is the derived lower bound. We show that all lower bounds generalize to randomized algorithms as well. All algorithms process new elements and answer queries in O(1) worst-case time.Comment: A shorter version appears in SWAT 201

    Element Distinctness, Frequency Moments, and Sliding Windows

    Full text link
    We derive new time-space tradeoff lower bounds and algorithms for exactly computing statistics of input data, including frequency moments, element distinctness, and order statistics, that are simple to calculate for sorted data. We develop a randomized algorithm for the element distinctness problem whose time T and space S satisfy T in O (n^{3/2}/S^{1/2}), smaller than previous lower bounds for comparison-based algorithms, showing that element distinctness is strictly easier than sorting for randomized branching programs. This algorithm is based on a new time and space efficient algorithm for finding all collisions of a function f from a finite set to itself that are reachable by iterating f from a given set of starting points. We further show that our element distinctness algorithm can be extended at only a polylogarithmic factor cost to solve the element distinctness problem over sliding windows, where the task is to take an input of length 2n-1 and produce an output for each window of length n, giving n outputs in total. In contrast, we show a time-space tradeoff lower bound of T in Omega(n^2/S) for randomized branching programs to compute the number of distinct elements over sliding windows. The same lower bound holds for computing the low-order bit of F_0 and computing any frequency moment F_k, k neq 1. This shows that those frequency moments and the decision problem F_0 mod 2 are strictly harder than element distinctness. We complement this lower bound with a T in O(n^2/S) comparison-based deterministic RAM algorithm for exactly computing F_k over sliding windows, nearly matching both our lower bound for the sliding-window version and the comparison-based lower bounds for the single-window version. We further exhibit a quantum algorithm for F_0 over sliding windows with T in O(n^{3/2}/S^{1/2}). Finally, we consider the computations of order statistics over sliding windows.Comment: arXiv admin note: substantial text overlap with arXiv:1212.437
    • …
    corecore