3,579 research outputs found
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
PList-based Divide and Conquer Parallel Programming
This paper details an extension of a Java parallel programming framework – JPLF. The JPLF framework is a programming framework that helps programmers build parallel programs using existing building blocks. The framework is based on {em PowerLists} and PList Theories and it naturally supports multi-way Divide and Conquer. By using this framework, the programmer is exempted from dealing with all the complexities of writing parallel programs from scratch. This extension to the JPLF framework adds PLists support to the framework and so, it enlarges the applicability of the framework to a larger set of parallel solvable problems. Using this extension, we may apply more flexible data division strategies. In addition, the length of the input lists no longer has to be a power of two – as required by the PowerLists theory. In this paper we unveil new applications that emphasize the new class of computations that can be executed within the JPLF framework. We also give a detailed description of the data structures and functions involved in the PLists extension of the JPLF, and extended performance experiments are described and analyzed
Estimating the Algorithmic Complexity of Stock Markets
Randomness and regularities in Finance are usually treated in probabilistic
terms. In this paper, we develop a completely different approach in using a
non-probabilistic framework based on the algorithmic information theory
initially developed by Kolmogorov (1965). We present some elements of this
theory and show why it is particularly relevant to Finance, and potentially to
other sub-fields of Economics as well. We develop a generic method to estimate
the Kolmogorov complexity of numeric series. This approach is based on an
iterative "regularity erasing procedure" implemented to use lossless
compression algorithms on financial data. Examples are provided with both
simulated and real-world financial time series. The contributions of this
article are twofold. The first one is methodological : we show that some
structural regularities, invisible with classical statistical tests, can be
detected by this algorithmic method. The second one consists in illustrations
on the daily Dow-Jones Index suggesting that beyond several well-known
regularities, hidden structure may in this index remain to be identified
Performance Improvements of Common Sparse Numerical Linear Algebra Computations
Manufacturers of computer hardware are able to continuously sustain an unprecedented pace of progress in computing speed of their products, partially due to increased clock rates but also because of ever more complicated chip designs. With new processor families appearing every few years, it is increasingly harder to achieve high performance rates in sparse matrix computations. This research proposes new methods for sparse matrix factorizations and applies in an iterative code generalizations of known concepts from related disciplines. The proposed solutions and extensions are implemented in ways that tend to deliver efficiency while retaining ease of use of existing solutions. The implementations are thoroughly timed and analyzed using a commonly accepted set of test matrices. The tests were conducted on modern processors that seem to have gained an appreciable level of popularity and are fairly representative for a wider range of processor types that are available on the market now or in the near future. The new factorization technique formally introduced in the early chapters is later on proven to be quite competitive with state of the art software currently available. Although not totally superior in all cases (as probably no single approach could possibly be), the new factorization algorithm exhibits a few promising features. In addition, an all-embracing optimization effort is applied to an iterative algorithm that stands out for its robustness. This also gives satisfactory results on the tested computing platforms in terms of performance improvement. The same set of test matrices is used to enable an easy comparison between both investigated techniques, even though they are customarily treated separately in the literature. Possible extensions of the presented work are discussed. They range from easily conceivable merging with existing solutions to rather more evolved schemes dependent on hard to predict progress in theoretical and algorithmic research
Resource Bounded Immunity and Simplicity
Revisiting the thirty years-old notions of resource-bounded immunity and
simplicity, we investigate the structural characteristics of various immunity
notions: strong immunity, almost immunity, and hyperimmunity as well as their
corresponding simplicity notions. We also study limited immunity and
simplicity, called k-immunity and feasible k-immunity, and their simplicity
notions. Finally, we propose the k-immune hypothesis as a working hypothesis
that guarantees the existence of simple sets in NP.Comment: This is a complete version of the conference paper that appeared in
the Proceedings of the 3rd IFIP International Conference on Theoretical
Computer Science, Kluwer Academic Publishers, pp.81-95, Toulouse, France,
August 23-26, 200
Causality, Information and Biological Computation: An algorithmic software approach to life, disease and the immune system
Biology has taken strong steps towards becoming a computer science aiming at
reprogramming nature after the realisation that nature herself has reprogrammed
organisms by harnessing the power of natural selection and the digital
prescriptive nature of replicating DNA. Here we further unpack ideas related to
computability, algorithmic information theory and software engineering, in the
context of the extent to which biology can be (re)programmed, and with how we
may go about doing so in a more systematic way with all the tools and concepts
offered by theoretical computer science in a translation exercise from
computing to molecular biology and back. These concepts provide a means to a
hierarchical organization thereby blurring previously clear-cut lines between
concepts like matter and life, or between tumour types that are otherwise taken
as different and may not have however a different cause. This does not diminish
the properties of life or make its components and functions less interesting.
On the contrary, this approach makes for a more encompassing and integrated
view of nature, one that subsumes observer and observed within the same system,
and can generate new perspectives and tools with which to view complex diseases
like cancer, approaching them afresh from a software-engineering viewpoint that
casts evolution in the role of programmer, cells as computing machines, DNA and
genes as instructions and computer programs, viruses as hacking devices, the
immune system as a software debugging tool, and diseases as an
information-theoretic battlefield where all these forces deploy. We show how
information theory and algorithmic programming may explain fundamental
mechanisms of life and death.Comment: 30 pages, 8 figures. Invited chapter contribution to Information and
Causality: From Matter to Life. Sara I. Walker, Paul C.W. Davies and George
Ellis (eds.), Cambridge University Pres
- …