543 research outputs found

    Shrinking generators and statistical leakage

    Get PDF
    AbstractShrinking is a newly proposed technique for combining a pair of pseudo random binary sequences, (a,s), to form a new sequence, z, with better randomness, where randomness here stands for difficulty of prediction. The ones in the second sequence s are used to point out the bits in the sequence a to be included in z. The generator that performs this process is known as the shrinking generator (SG). In this paper, it is shown for the existing combining method that deviation from randomness in the statistics of a leads to the leakage of this statistics into z. We also show that it is sufficient for constructing a statistically balanced SG to at least have one statistically balanced generator. A new shrinking rule that yields statistically balanced output, even if a and s are not balanced, is then proposed. Self-shrinking in which a single pseudo random bit generator (PRBG) shrinks itself is also investigated and a modification of the existing shrinking rule is proposed. Simulation results show the robustness of the proposed methods. For self-shrinking, in particular, results show that the proposed shrinking rule yields sequences with balanced statistics even for extremely biased generators. This suggests possible application of the new rule to strengthen running key generators

    MICRO$EC: Cost Effective, Whole-Genome Sequencing

    Get PDF
    While the feasibility of whole human genome sequencing was proven by the success of the Human Genome Project several years ago, the prevalence of personal genome sequencing in the medical industry is still elusive due to its unrealistic cost and time requirements. Microeqisastartupcompanywiththegoalofovercomingtheselimitationsbysequencingaminimumof12completehumangenomesperdayatanerrorratelessthantenpartsinmillionataprofitablemarketpriceoflessthanUSeq is a startup company with the goal of overcoming these limitations by sequencing a minimum of 12 complete human genomes per day at an error rate less than ten parts in million at a profitable market price of less than US1000 per genome. To overcome the technology bottlenecks hindering current biotech companies from achieving these target throughput, error rate, and market price goals, Microeqhasdevelopedaninnovativesequencingtechniquethatusesshortreadfragmentswithhighcoverageonamicrofluidicsplatform.Short,amplifiedDNAfragmentsaregeneratedfromaninputofcustomersaliva.6basepair(bp)sequencehybridizationisusedforsequencingeachoftheDNAfragmentsindividually.TheresultsarethesehydridizationreadsarethenassembledviadeBruijngraphtheoryandthegraphicalreconstructionsofeachfragmentssequencearethenassembledtoacompletegenomeviashotgunsequencingwithanexpectederrorratelessthan1in100,000bp.Uponthecompletionoffinancialanalysis,bothasmallscalebusinessmodelproducing72genomesperdayatUSeq has developed an innovative sequencing technique that uses shortread fragments with high coverage on a microfluidics platform. Short, amplified DNA fragments are generated from an input of customer saliva. 6 base pair(bp) sequence hybridization is used for sequencing each of the DNA fragments individually. The results are these hydridization reads are then assembled via de Bruijn graph theory and the graphical reconstructions of each fragment’s sequence are then assembled to a complete genome via shotgun sequencing with an expected error rate less than 1 in 100,000bp. Upon the completion of financial analysis, both a small-scale business model producing 72 genomes per day at US999 per genome, and a largescale business model producing 52.2 genomes per year at a market price of US299pergenomewerefoundtobeprofitable,yieldingMicro299 per genome were found to be profitable, yielding Microeq investors return margins of ~90% and 300% for the small and large scale models, respectively. With a market price Micro$eq offers personal genome sequencing at one-tenth of its nearest potential competitor’s cost. Additionally, its ability for bulk-sequencing allows it to profitably venture into the previously untapped Pharmaceutical Industry market sector, enabling the creation of large-scale genome databases which are the next step forward in the quest for truly personalized

    OPTIMIZING THE ACCURACY OF LIGHTWEIGHT METHODS FOR SHORT READ ALIGNMENT AND QUANTIFICATION

    Get PDF
    The analysis of the high throughput sequencing (HTS) data includes a number of involved computational steps, ranging from the assembly of transcriptome, mapping or alignment of the reads to existing or assembled sequences, estimating the abundance of sequenced molecules, performing differential or comparative analysis between samples, and even inferring dynamics of interest from snapshot data. Many methods have been developed for these different tasks that provide various trade-offs in terms of accuracy and speed, because accuracy and robustness typically come at the expense of sacrificing speed and vice versa. In this work, I focus on the problems of alignment and quantification of RNA-seq data, and review different aspects of the available methods for these problems. I explore finding a reasonable balance between these competing goals, and introduce methods that provide accurate results without sacrificing speed. Alignment of sequencing reads to known reference sequences is a challenging computational step in the RNA-seq pipeline mainly because of the large size of sample data and reference sequences, and highly-repetitive sequence. Recently, the concept of lightweight alignment is introduced to accelerate the mapping step of abundance estimation.I collaborated with my colleagues to explore some of the shortcomings of the lightweight alignment methods, and to address those with a new approach called the selective-alignment. Moreover, we introduce an aligner, Puffaligner, which benefits from both the indexing approach introduced by the Pufferfish index and also selective-alignment to produce accurate alignments in a short amount of time compared to other popular aligners. To improve the speed of RNA-seq quantification given a collection of alignments, some tools group fragments (reads) into equivalence classes which are sets of fragments that are compatible with the same subset of reference sequences. Summarizing the fragments into equivalence classes factorizes the likelihood function being optimized and increases the speed of the typical optimization algorithms deployed. I explore how this factorization affects the accuracy of abundance estimates, and propose a new factorization approach that demonstrates higher fidelity to the non-approximate model. Finally, estimating the posterior distribution of the transcript expressions is a crucial step in finding robust and reliable estimates of transcript abundance in the presence of high levels of multi-mapping. To assess the accuracy of their point estimates, quantification tools generate inferential replicates using techniques such as Bootstrap sampling and Gibbs sampling. The utility of inferential replicates has been portrayed in different downstream RNA-seq applications, i.e., performing differential expression analysis. I explore how sampling from both observed and unobserved data points (reads) improves the accuracy of Bootstrap sampling. I demonstrate the utility of this approach in estimating allelic expression with RNA-seq reads, where the absence of unique mapping reads to reference transcripts is a major obstacle for calculating robust estimates

    Computational methods for analyzing ngs data to discover clinically relevant mutations

    Get PDF
    The advent of Next Generation Sequencing platforms started a new era of genomics where affordable genome wide sequencing is available for everyone. These technologies are capable of generating huge amounts of raw sequence data creating an urgent demand for new computational analysis tools and methods. Even the simplest NGS study requires many analysis steps and each step has unique challenges and ambiguities. Efficiently processing raw NGS data and eliminating false-positive signals have become the most challenging issue in genomics. It has been shown that NGS is very effective identifying disease-causing mutations if the data is processed and interpreted properly. In this dissertation, we presented an effective whole genome/exome analysis strategy which has successfully identified novel disease-causing mutations for Cerebrofaciothoracic Dysplasia, Klippel-Feil Syndrome, Spastic Paraplegia and Northern Epilepsy. We also presented a k-mer based method for finely mapping genomic structural variations by utilizing de novo assembly and local alignment. Compared to the mapping based read extraction method, the k-mer based method improved detection of all types of structural variations, in particular detection rate of insertions increased 21%. Moreover, our method is capable of resolving complete structures of complex rearrangements which had not been accomplished before

    Foundations of Software Science and Computation Structures

    Get PDF
    This open access book constitutes the proceedings of the 25th International Conference on Foundations of Software Science and Computational Structures, FOSSACS 2022, which was held during April 4-6, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 23 regular papers presented in this volume were carefully reviewed and selected from 77 submissions. They deal with research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems
    corecore