2,628 research outputs found
Flowcharting with D-charts
A D-Chart is a style of flowchart using control symbols highly appropriate to modern structured programming languages. The intent of a D-Chart is to provide a clear and concise one-for-one mapping of control symbols to high-level language constructs for purposes of design and documentation. The notation lends itself to both high-level and code-level algorithmic description. The various issues that may arise when representing, in D-Chart style, algorithms expressed in the more popular high-level languages are addressed. In particular, the peculiarities of mapping control constructs for Ada, PASCAL, FORTRAN 77, C, PL/I, Jovial J73, HAL/S, and Algol are discussed
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
The Application of Hybridized Genetic Algorithms to the Protein Folding Problem
The protein folding problem consists of attempting to determine the native conformation of a protein given its primary structure. This study examines various methods of hybridizing a genetic algorithm implementation in order to minimize an energy function and predict the conformation (structure) of Met-enkephalin. Genetic Algorithms are semi-optimal algorithms designed to explore and exploit a search space. The genetic algorithm uses selection, recombination, and mutation operators on populations of strings which represent possible solutions to the given problem. One step in solving the protein folding problem is the design of efficient energy minimization techniques. A conjugate gradient minimization technique is described and tested with different replacement frequencies. Baidwinian, Lamarckian, and probabilistic Lamarckian evolution are all tested. Another extension of simple genetic algorithms can be accomplished with niching. Niching works by de-emphasizing solutions based on their proximity to other solutions in the space. Several variations of niching are tested. Experiments are conducted to determine the benefits of each hybridization technique versus each other and versus the genetic algorithm by itself. The experiments are geared toward trying to find the lowest possible energy and hence the minimum conformation of Met-enkephalin. In the experiments, probabilistic Lamarckian strategies were successful in achieving energies below that of the published minimum in QUANTA
Parallel machine architecture and compiler design facilities
The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role
Logarithmic SAT Solution with Membrane Computing
P systems have been known to provide efficient polynomial (often linear) deterministic solutions to hard problems. In particular, cP systems have been shown to provide very crisp and efficient solutions to such problems, which are typically linear with small coefficients. Building on a recent result by Henderson et al., which solves SAT in square-root-sublinear time, this paper proposes an orders-of-magnitude-faster solution, running in logarithmic time, and using a small fixed-sized alphabet and ruleset (25 rules). To the best of our knowledge, this is the fastest deterministic solution across all extant P system variants. Like all other cP solutions, it is a complete solution that is not a member of a uniform family (and thus does not require any preprocessing). Consequently, according to another reduction result by Henderson et al., cP systems can also solve k-colouring and several other NP-complete problems in logarithmic time
The Fine-Grained Complexity of CFL Reachability
Many problems in static program analysis can be modeled as the context-free
language (CFL) reachability problem on directed labeled graphs. The CFL
reachability problem can be generally solved in time , where is the
number of vertices in the graph, with some specific cases that can be solved
faster. In this work, we ask the following question: given a specific CFL, what
is the exact exponent in the monomial of the running time? In other words, for
which cases do we have linear, quadratic or cubic algorithms, and are there
problems with intermediate runtimes? This question is inspired by recent
efforts to classify classic problems in terms of their exact polynomial
complexity, known as {\em fine-grained complexity}. Although recent efforts
have shown some conditional lower bounds (mostly for the class of combinatorial
algorithms), a general picture of the fine-grained complexity landscape for CFL
reachability is missing.
Our main contribution is lower bound results that pinpoint the exact running
time of several classes of CFLs or specific CFLs under widely believed lower
bound conjectures (Boolean Matrix Multiplication and -Clique). We
particularly focus on the family of Dyck- languages (which are strings with
well-matched parentheses), a fundamental class of CFL reachability problems. We
present new lower bounds for the case of sparse input graphs where the number
of edges is the input parameter, a common setting in the database
literature. For this setting, we show a cubic lower bound for Andersen's
Pointer Analysis which significantly strengthens prior known results.Comment: Appeared in POPL 2023. Please note the erratum on the first pag
Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space
Indexing highly repetitive texts - such as genomic databases, software
repositories and versioned text collections - has become an important problem
since the turn of the millennium. A relevant compressibility measure for
repetitive texts is r, the number of runs in their Burrows-Wheeler Transforms
(BWTs). One of the earliest indexes for repetitive collections, the Run-Length
FM-index, used O(r) space and was able to efficiently count the number of
occurrences of a pattern of length m in the text (in loglogarithmic time per
pattern symbol, with current techniques). However, it was unable to locate the
positions of those occurrences efficiently within a space bounded in terms of
r. In this paper we close this long-standing problem, showing how to extend the
Run-Length FM-index so that it can locate the occ occurrences efficiently
within O(r) space (in loglogarithmic time each), and reaching optimal time, O(m
+ occ), within O(r log log w ({\sigma} + n/r)) space, for a text of length n
over an alphabet of size {\sigma} on a RAM machine with words of w =
{\Omega}(log n) bits. Within that space, our index can also count in optimal
time, O(m). Multiplying the space by O(w/ log {\sigma}), we support count and
locate in O(dm log({\sigma})/we) and O(dm log({\sigma})/we + occ) time, which
is optimal in the packed setting and had not been obtained before in compressed
space. We also describe a structure using O(r log(n/r)) space that replaces the
text and extracts any text substring of length ` in almost-optimal time
O(log(n/r) + ` log({\sigma})/w). Within that space, we similarly provide direct
access to suffix array, inverse suffix array, and longest common prefix array
cells, and extend these capabilities to full suffix tree functionality,
typically in O(log(n/r)) time per operation.Comment: submitted version; optimal count and locate in smaller space: O(r log
log_w(n/r + sigma)
- …