76 research outputs found

    Error Correction for DNA Sequencing via Disk Based Index and Box Queries

    Full text link
    The vast increase in DNA sequencing capacity over the last decade has quickly turned biology into a data-intensive science. Nevertheless, current sequencers such as Illumia HiSeq have high random per-base error rates, which makes sequencing error correction an indispensable require-ment for many sequence analysis applications. Most existing methods for error correction demand large expensive memory space, which limits their scalability for handling large datasets. In this thesis, we introduce a new disk based method, called DiskBQcor, for sequencing error correction. DiskBQcor stores k-mers for sequencing genome data along with their associated metadata in a disk based index tree, called the BoND-tree, and uses the index to efficiently process specially designed box queries to obtain relevant k-mers and their occurring frequencies. It takes an input read and locates the potential errors in the sequence. It then applies a comprehensive voting mech-anism and possibly an efficient binary encoding based assembly technique to verify and correct an erroneous base in a genome sequence under various conditions. To overcome the drawback of an offline approach such as DiskBQcor for wasting computing resources while DNA sequecing is in process, we suggest an online approach to correcting sequencing errors. The online processing strategies and accuracy measures are discussed. An algorithm for deleting indexed k-mers from the BoND-tree, which is a step stone for the online sequencing error correction, is also introduced. Our experiments demonstrate that the proposed methods are quite promising in error correction for sequencing genome data on disk. The resulting BoND-tree with correct k-mers can also be used for sequence analysis applications such as variant detection.Master of ScienceComputer and Information Science, College of Engineering and Computer ScienceUniversity of Michigan-Dearbornhttps://deepblue.lib.umich.edu/bitstream/2027.42/136615/3/Thesis_YarongGu_519_corrected.pd

    Low power VLSI implementation schemes for DCT-based image compression

    Get PDF

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    NASA Tech Briefs, Winter 1980

    Get PDF
    Topics include: NASA TU Services: Technology Utilization services that can assist you In learning about and applying NASA technology; New Product Ideas: A summary of selected innovations of value to manufacturers for the development of new products; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Life Sciences; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences

    Efficient Algorithms for Similarity and Skyline Summary on Multidimensional Datasets.

    Full text link
    Efficient management of large multidimensional datasets has attracted much attention in the database research community. Such large multidimensional datasets are common and efficient algorithms are needed for analyzing these data sets for a variety of applications. In this thesis, we focus our study on two very common classes of analysis: similarity and skyline summarization. We first focus on similarity when one of the dimensions in the multidimensional dataset is temporal. We then develop algorithms for evaluating skyline summaries effectively for both temporal and low-cardinality attribute domain datasets and propose different methods for improving the effectiveness of the skyline summary operation. This thesis begins by studying similarity measures for time-series datasets and efficient algorithms for time-series similarity evaluation. The first contribution of this thesis is a new algorithm which can be used to evaluate similarity methods whose matching criteria is bounded by a specified threshold value. The second contribution of this thesis is the development of a new time-interval skyline operator, which continuously computes the current skyline over a data stream. We present a new algorithm called LookOut for evaluating such queries efficiently, and empirically demonstrate the scalability of this algorithm. Current skyline evaluation techniques follow a common paradigm that eliminates data elements from skyline consideration by finding other elements in the dataset that dominate them. The performance of such techniques is heavily influenced by the underlying data distribution. The third contribution of this thesis is a novel technique called the Lattice Skyline Algorithm (LS) that is built around a new paradigm for skyline evaluation on datasets with attributes that are drawn from low-cardinality domains. The utility of the skyline as a data summarization technique is often diminished by the volume of points in the skyline The final contribution of this thesis is a novel scheme which remedies the skyline volume problem by ranking the elements of the skyline based on their importance to the skyline summary. Collectively, the techniques described in this thesis present efficient methods for two common and computationally intensive analysis operations on large multidimensional datasets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57643/2/mmorse_1.pd

    Sonic and Photonic Crystals

    Get PDF
    Sonic/phononic crystals termed acoustic/sonic band gap media are elastic analogues of photonic crystals and have also recently received renewed attention in many acoustic applications. Photonic crystals have a periodic dielectric modulation with a spatial scale on the order of the optical wavelength. The design and optimization of photonic crystals can be utilized in many applications by combining factors related to the combinations of intermixing materials, lattice symmetry, lattice constant, filling factor, shape of the scattering object, and thickness of a structural layer. Through the publications and discussions of the research on sonic/phononic crystals, researchers can obtain effective and valuable results and improve their future development in related fields. Devices based on these crystals can be utilized in mechanical and physical applications and can also be designed for novel applications as based on the investigations in this Special Issue

    Cumulative index to NASA Tech Briefs, 1986-1990, volumes 10-14

    Get PDF
    Tech Briefs are short announcements of new technology derived from the R&D activities of the National Aeronautics and Space Administration. These briefs emphasize information considered likely to be transferrable across industrial, regional, or disciplinary lines and are issued to encourage commercial application. This cumulative index of Tech Briefs contains abstracts and four indexes (subject, personal author, originating center, and Tech Brief number) and covers the period 1986 to 1990. The abstract section is organized by the following subject categories: electronic components and circuits, electronic systems, physical sciences, materials, computer programs, life sciences, mechanics, machinery, fabrication technology, and mathematics and information sciences

    Scalable parallel simulation of small-scale structures in cold dark matter

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2005.Includes bibliographical references (p. 179-181).We present a parallel implementation of the particle-particle/particle-mesh (P³M) algorithm for distributed memory clusters. The llp3m-hc code uses a hybrid method for both computation and domain decomposition. Long-range forces are computed using a Fourier transform gravity solver on a regular mesh; the mesh is distributed across parallel processes using a static one-dimensional slab domain decomposition. Short-range forces are computed by direct summation of close pairs; particles are distributed using a dynamic domain decomposition based on a space-filling Hilbert curve. A nearly-optimal method was devised to dynamically repartition the particle distribution so as to maintain load balance even for extremely inhomogeneous mass distributions. Tests using 800³ simulations on a 40-processor Beowulf cluster showed good load balance and scalability up to 80 processes. We discuss the limits on scalability imposed by communication and extreme clustering and suggest how they may be removed by extending our algorithm to include a new adaptive P³M technique, which we then introduce and present as a new llap3m-hc code. We optimize free parameters of adaptive P³M to minimize force errors and the timing required to compute short range forces. We apply our codes to simulate small scale structure of the universe at redshift z > 50. We observe and analyze the formation of caustics in the structure and compare it with the predictions of semi-analytic models of structure formation. The current limits on neutralino detection experiments assume a Maxwell-Boltzmann velocity distribution and smooth spatial distribution of dark matter.(cont.) It is shown in this thesis that inhomogeneous distribution of dark matter on small scales significantly changes the predicted event rates in direct detection dark matter experiments. The effect of spatial inhomogeneity weakens the upper limits on neutralino cross section produced in the Cryogenic Dark Matter Search Experiment.by Alexander V. Shirokov.Ph.D
    corecore