211 research outputs found

    On Binary de Bruijn Sequences from LFSRs with Arbitrary Characteristic Polynomials

    Full text link
    We propose a construction of de Bruijn sequences by the cycle joining method from linear feedback shift registers (LFSRs) with arbitrary characteristic polynomial f(x)f(x). We study in detail the cycle structure of the set Ω(f(x))\Omega(f(x)) that contains all sequences produced by a specific LFSR on distinct inputs and provide a fast way to find a state of each cycle. This leads to an efficient algorithm to find all conjugate pairs between any two cycles, yielding the adjacency graph. The approach is practical to generate a large class of de Bruijn sequences up to order n≈20n \approx 20. Many previously proposed constructions of de Bruijn sequences are shown to be special cases of our construction

    Large Genomes Assembly Using MAPREDUCE Framework

    Get PDF
    Knowing the genome sequence of an organism is the essential step toward understanding its genomic and genetic characteristics. Currently, whole genome shotgun (WGS) sequencing is the most widely used genome sequencing technique to determine the entire DNA sequence of an organism. Recent advances in next-generation sequencing (NGS) techniques have enabled biologists to generate large DNA sequences in a high-throughput and low-cost way. However, the assembly of NGS reads faces significant challenges due to short reads and an enormously high volume of data. Despite recent progress in genome assembly, current NGS assemblers cannot generate high-quality results or efficiently handle large genomes with billions of reads. In this research, we proposed a new Genome Assembler based on MapReduce (GAMR), which tackles both limitations. GAMR is based on a bi-directed de Bruijn graph and implemented using the MapReduce framework. We designed a distributed algorithm for each step in GAMR, making it scalable in assembling large-scale genomes. We also proposed novel gap-filling algorithms to improve assembly results to achieve higher accuracy and more extended continuity. We evaluated the assembly performance of GAMR using benchmark data and compared it against other NGS assemblers. We also demonstrated the scalability of GAMR by using it to assemble loblolly pine (~22Gbp). The results showed that GAMR finished the assembly much faster and with a much lower requirement of computing resources

    Searching for patterns in Conway's Game of Life

    Get PDF
    Conway’s Game of Life (Life) is a simple cellular automaton, discovered by John Conway in 1970, that exhibits complex emergent behavior. Life-enthusiasts have been looking for building blocks with specific properties (patterns) to answer unsolved problems in Life for the past five decades. Finding patterns in Life is difficult due to the large search space. Current search algorithms use an explorative approach based on the rules of the game, but this can only sample a small fraction of the search space. More recently, people have used Sat solvers to search for patterns. These solvers are not specifically tuned to this problem and thus waste a lot of time processing Life’s rules in an engine that does not understand them. We propose a novel Sat-based approach that replaces the binary tree used by traditional Sat solvers with a grid-based approach, complemented by an injection of Game of Life specific knowledge. This leads to a significant speedup in searching. As a fortunate side effect, our solver can be generalized to solve general Sat problems. Because it is grid-based, all manipulations are embarrassingly parallel, allowing implementation on massively parallel hardware

    Practical implementation of a dependently typed functional programming language

    Get PDF
    Types express a program's meaning, and checking types ensures that a program has the intended meaning. In a dependently typed programming language types are predicated on values, leading to the possibility of expressing invariants of a program's behaviour in its type. Dependent types allow us to give more detailed meanings to programs, and hence be more confident of their correctness. This thesis considers the practical implementation of a dependently typed programming language, using the Epigram notation defined by McBride and McKinna. Epigram is a high level notation for dependently typed functional programming elaborating to a core type theory based on Lu๙s UTT, using Dybjer's inductive families and elimination rules to implement pattern matching. This gives us a rich framework for reasoning about programs. However, a naive implementation introduces several run-time overheads since the type system blurs the distinction between types and values; these overheads include the duplication of values, and the storage of redundant information and explicit proofs. A practical implementation of any programming language should be as efficient as possible; in this thesis we see how the apparent efficiency problems of dependently typed programming can be overcome and that in many cases the richer type information allows us to apply optimisations which are not directly available in traditional languages. I introduce three storage optimisations on inductive families; forcing, detagging and collapsing. I further introduce a compilation scheme from the core type theory to G-machine code, including a pattern matching compiler for elimination rules and a compilation scheme for efficient run-time implementation of Peano's natural numbers. We also see some low level optimisations for removal of identity functions, unused arguments and impossible case branches. As a result, we see that a dependent type theory is an effective base on which to build a feasible programming language

    Ontology based model framework for conceptual design of treatment flow sheets

    Get PDF
    The primary objective of wastewater treatment is the removal of pollutants to meet given legal effluent standards. To further reduce operators costs additional recovery of resources and energy is desired by industrial and municipal wastewater treatment. Hence the objective in early stage of planning of treatment facilities lies in the identification and evaluation of promising configurations of treatment units. Obviously this early stage of planning may best be supported by software tools to be able to deal with a variety of different treatment configurations. In chemical process engineering various design tools are available that automatically identify feasible process configurations for the purpose to obtain desired products from given educts. In contrast, the adaptation of these design tools for the automatic generation of treatment unit configurations (process chains) to achieve preset effluent standards is hampered by the following three reasons. First, pollutants in wastewater are usually not defined as chemical substances but by compound parameters according to equal properties (e.g. all particulate matter). Consequently the variation of a single compound parameter leads to a change of related parameters (e.g. relation between Chemical Oxygen Demand and Total Suspended Solids). Furthermore, mathematical process models of treatment processes are tailored towards fractions of compound parameters. This hampers the generic representation of these process models which in turn is essential for automatic identification of treatment configurations. Second, treatment technologies for wastewater treatment rely on a variety of chemical, biological, and physical phenomena. Approaches to mathematically describe these phenomena cover a wide range of modeling techniques including stochastic, conceptual or deterministic approaches. Even more the consideration of temporal and spatial resolutions differ. This again hampers a generic representation of process models. Third, the automatic identification of treatment configurations may either be achieved by the use of design rules or by permutation of all possible combinations of units stored within a database of treatment units. The first approach depends on past experience translated into design rules. Hence, no innovative new treatment configurations can be identified. The second approach to identify all possible configurations collapses by extremely high numbers of treatment configurations that cannot be mastered. This is due to the phenomena of combinatorial explosion. It follows therefrom that an appropriate planning algorithm should function without the need of additional design rules and should be able to identify directly feasible configurations while discarding those impractical. This work presents a planning tool for the identification and evaluation of treatment configurations that tackles the before addressed problems. The planning tool comprises two major parts. An external declarative knowledge base and the actual planning tool that includes a goal oriented planning algorithm. The knowledge base describes parameters for wastewater characterization (i.e. material model) and a set of treatment units represented by process models (i.e. process model). The formalization of the knowledge base is achieved by the Web Ontology Language (OWL). The developed data model being the organization structure of the knowledge base describes relations between wastewater parameters and process models to enable for generic representation of process models. Through these parameters for wastewater characterization as well as treatment units can be altered or added to the knowledge base without the requirement to synchronize already included parameter representations or process models. Furthermore the knowledge base describes relations between parameters and properties of water constituents. This allows to track changes of all wastewater parameters which result from modeling of removal efficiency of applied treatment units. So far two generic treatment units have been represented within the knowledge base. These are separation and conversion units. These two raw types have been applied to represent different types of clarifiers and biological treatment units. The developed planning algorithm is based on a Means-Ends Analysis (MEA). This is a goal oriented search algorithm that posts goals from wastewater state and limit value restrictions to select those treatment units only that are likely to solve the treatment problem. Regarding this, all treatment units are qualified according to postconditions that describe the effect of each unit. In addition, units are also characterized by preconditions that state the application range of each unit. The developed planning algorithm furthermore allows for the identification of simple cycles to account for moving bed reactor systems (e.g. functional unit of aeration tank and clarifier). The evaluation of identified treatment configurations is achieved by total estimated cost of each configuration. The planning tool has been tested on five use cases. Some use cases contained multiple sources and sinks. This showed the possibility to identify water reuse capabilities as well as to identify solutions that go beyond end of pipe solutions. Beyond the originated area of application, the planning tool may be used for advanced interrogations. Thereby the knowledge base and planning algorithm may be further developed to address the objectives to identify configurations for any type of material and energy recovery

    Foundations of Software Science and Computation Structures

    Get PDF
    This open access book constitutes the proceedings of the 25th International Conference on Foundations of Software Science and Computational Structures, FOSSACS 2022, which was held during April 4-6, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 23 regular papers presented in this volume were carefully reviewed and selected from 77 submissions. They deal with research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems
    • …
    corecore