108,746 research outputs found

    Selection and sorting with limited storage

    Get PDF
    AbstractWhen selecting from, or sorting, a file stored on a read-only tape and the internal storage is rather limited, several passes of the input tape may be required. We study the relation between the amount of internal storage available and the number of passes required to select the Kth highest of N inputs. We show, for example, that to find the median in two passes requires at least ω(N12) and at most O(N12log N) internal storage. For probabilistic methods, θ(N12) internal storage is necessary and sufficient for a single pass method which finds the median with arbitrarily high probability

    Systematic reduction of Hyperspectral Images for high-throughput Plastic Characterization

    Full text link
    Hyperspectral Imaging (HSI) combines microscopy and spectroscopy to assess the spatial distribution of spectroscopically active compounds in objects, and has diverse applications in food quality control, pharmaceutical processes, and waste sorting. However, due to the large size of HSI datasets, it can be challenging to analyze and store them within a reasonable digital infrastructure, especially in waste sorting where speed and data storage resources are limited. Additionally, as with most spectroscopic data, there is significant redundancy, making pixel and variable selection crucial for retaining chemical information. Recent high-tech developments in chemometrics enable automated and evidence-based data reduction, which can substantially enhance the speed and performance of Non-Negative Matrix Factorization (NMF), a widely used algorithm for chemical resolution of HSI data. By recovering the pure contribution maps and spectral profiles of distributed compounds, NMF can provide evidence-based sorting decisions for efficient waste management. To improve the quality and efficiency of data analysis on hyperspectral imaging (HSI) data, we apply a convex-hull method to select essential pixels and wavelengths and remove uninformative and redundant information. This process minimizes computational strain and effectively eliminates highly mixed pixels. By reducing data redundancy, data investigation and analysis become more straightforward, as demonstrated in both simulated and real HSI data for plastic sorting

    Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

    Get PDF
    Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.Comment: To appear in proceedings of DOLAP 200

    Selection from read-only memory with limited workspace

    Full text link
    Given an unordered array of NN elements drawn from a totally ordered set and an integer kk in the range from 11 to NN, in the classic selection problem the task is to find the kk-th smallest element in the array. We study the complexity of this problem in the space-restricted random-access model: The input array is stored on read-only memory, and the algorithm has access to a limited amount of workspace. We prove that the linear-time prune-and-search algorithm---presented in most textbooks on algorithms---can be modified to use Θ(N)\Theta(N) bits instead of Θ(N)\Theta(N) words of extra space. Prior to our work, the best known algorithm by Frederickson could perform the task with Θ(N)\Theta(N) bits of extra space in O(NlgN)O(N \lg^{*} N) time. Our result separates the space-restricted random-access model and the multi-pass streaming model, since we can surpass the Ω(NlgN)\Omega(N \lg^{*} N) lower bound known for the latter model. We also generalize our algorithm for the case when the size of the workspace is Θ(S)\Theta(S) bits, where lg3NSN\lg^3{N} \leq S \leq N. The running time of our generalized algorithm is O(Nlg(N/S)+N(lgN)/lgS)O(N \lg^{*}(N/S) + N (\lg N) / \lg{} S), slightly improving over the O(Nlg(N(lgN)/S)+N(lgN)/lgS)O(N \lg^{*}(N (\lg N)/S) + N (\lg N) / \lg{} S) bound of Frederickson's algorithm. To obtain the improvements mentioned above, we developed a new data structure, called the wavelet stack, that we use for repeated pruning. We expect the wavelet stack to be a useful tool in other applications as well.Comment: 16 pages, 1 figure, Preliminary version appeared in COCOON-201

    Write-limited sorts and joins for persistent memory

    Get PDF
    To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byteaddressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into database systems. We do so in the context of query processing. We focus on the fundamental operations of sort and join processing. We introduce the notion of write-limited algorithms that effectively minimize the I/O cost. We give a high-level API that enables the system to dynamically optimize the workflow of the algorithms; or, alternatively, allows the developer to tune the write profile of the algorithms. We present four different techniques to incorporate persistent memory into the database processing stack in light of this API. We have implemented and extensively evaluated all our proposals. Our results show that the algorithms deliver on their promise of I/O-minimality and tunable performance. We showcase the merits and deficiencies of each implementation technique, thus taking a solid first step towards incorporating persistent memory into query processing. 1

    Optical Micromanipulation Techniques Combined with Microspectroscopic Methods

    Get PDF
    Předložená dizertační práce se zabývá kombinací optických mikromanipulací s mikrospektroskopickými metodami. Využili jsme laserovou pinzetu pro transport a třídění živých mikroorganismů, například jednobuněčných řas, či kvasinek. Ramanovskou spektroskopií jsme analyzovali chemické složení jednotlivých buněk a tyto informace jsme využili k automatické selekci buněk s vybranými vlastnostmi. Zkombinovali jsme pulsní amplitudově modulovanou fluorescenční mikrospektroskopii, optické mikromanipulace a jiné techniky ke zmapování stresové odpovědi opticky zachycených buněk při různých časech působení, vlnových délkách a intenzitách chytacího laseru. Vyrobili jsme různé typy mikrofluidních čipů a zkonstruovali jsme Ramanovu pinzetu pro třídění mikro-objektů, především živých buněk, v mikrofluidním prostředí.The subject of the presented Ph.D. thesis is a combination of optical micromanipulation and microspectroscopic methods. We used laser tweezers to transport and sort various living microorganisms, such as microalgal or yeast cells. We employed Raman microspectroscopy to analyze chemical composition of individual cells and we used the information about chemical composition to automatically select the cells of interest. We combined pulsed amplitude modulation fluorescence microspectroscopy, optical micromanipulation and other techniques to map the stress response of cells to various laser wavelengths, intensities and durations of optical trapping. We fabricated microfluidic chips of various designs and we constructed Raman-tweezers sorter of micro-objects such as living cells on a microfluidic platform.

    A Database for Fast Access to Particle-Gated Event Data

    Get PDF
    In nuclear physics experiments involving in-flight fragmentation of ions, usually a large number of different nuclei is produced and various detection systems are employed to identify the species event by event, e.g. by measuring their specific energy loss and time-of-flight. For such cases -- not necessarily limited to nuclear physics -- where subsets of a large dataset can be identified using a small number of measured signals a software for fast access to varying subsets of such a dataset has been developed. The software has been used successfully in the analysis of a one neutron knock-out experiment at GANIL

    An In-Place Sorting with O(n log n) Comparisons and O(n) Moves

    Full text link
    We present the first in-place algorithm for sorting an array of size n that performs, in the worst case, at most O(n log n) element comparisons and O(n) element transports. This solves a long-standing open problem, stated explicitly, e.g., in [J.I. Munro and V. Raman, Sorting with minimum data movement, J. Algorithms, 13, 374-93, 1992], of whether there exists a sorting algorithm that matches the asymptotic lower bounds on all computational resources simultaneously
    corecore