11,684 research outputs found
MatchPy: A Pattern Matching Library
Pattern matching is a powerful tool for symbolic computations, based on the
well-defined theory of term rewriting systems. Application domains include
algebraic expressions, abstract syntax trees, and XML and JSON data.
Unfortunately, no lightweight implementation of pattern matching as general and
flexible as Mathematica exists for Python Mathics,MacroPy,patterns,PyPatt.
Therefore, we created the open source module MatchPy which offers similar
pattern matching functionality in Python using a novel algorithm which finds
matches for large pattern sets more efficiently by exploiting similarities
between patterns.Comment: arXiv admin note: substantial text overlap with arXiv:1710.0007
Enhancing R with Advanced Compilation Tools and Methods
I describe an approach to compiling common idioms in R code directly to
native machine code and illustrate it with several examples. Not only can this
yield significant performance gains, but it allows us to use new approaches to
computing in R. Importantly, the compilation requires no changes to R itself,
but is done entirely via R packages. This allows others to experiment with
different compilation strategies and even to define new domain-specific
languages within R. We use the Low-Level Virtual Machine (LLVM) compiler
toolkit to create the native code and perform sophisticated optimizations on
the code. By adopting this widely used software within R, we leverage its
ability to generate code for different platforms such as CPUs and GPUs, and
will continue to benefit from its ongoing development. This approach
potentially allows us to develop high-level R code that is also fast, that can
be compiled to work with different data representations and sources, and that
could even be run outside of R. The approach aims to both provide a compiler
for a limited subset of the R language and also to enable R programmers to
write other compilers. This is another approach to help us write high-level
descriptions of what we want to compute, not how.Comment: Published in at http://dx.doi.org/10.1214/13-STS462 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
PonyGE2: Grammatical Evolution in Python
Grammatical Evolution (GE) is a population-based evolutionary algorithm,
where a formal grammar is used in the genotype to phenotype mapping process.
PonyGE2 is an open source implementation of GE in Python, developed at UCD's
Natural Computing Research and Applications group. It is intended as an
advertisement and a starting-point for those new to GE, a reference for
students and researchers, a rapid-prototyping medium for our own experiments,
and a Python workout. As well as providing the characteristic genotype to
phenotype mapping of GE, a search algorithm engine is also provided. A number
of sample problems and tutorials on how to use and adapt PonyGE2 have been
developed.Comment: 8 pages, 4 figures, submitted to the 2017 GECCO Workshop on
Evolutionary Computation Software Systems (EvoSoft
Gower as Data: Exploring the Application of Machine Learning to Gowerâs Middle English Corpus
Distant reading, a digital humanities method in wide use, involves processing and analyzing a large amount of text through computer programs. In treating texts as data, these methods can highlight trends in diction, themes, and linguistic patterns that individual readers may miss or critical traditions may obscure. Though several scholars have undertaken projects using topic models and text mining on Middle English texts, the nonstandard orthography of Middle English makes this process more challenging than for our counterparts in later literature.
This collaborative project uses Gowerâs Confessio Amantis as a small, fixed corpus for analysis. We employ natural language processing to reexamine the Confessioâs themes, adding data analysis to the more traditional close reading strategies of Gower scholarship. We use Gowerâs work as a case study both to help reduce the potential variants across textual versions and to more deeply investigate the corpus than distant reading normally allows.
Here, we share our initial findings as well as our methodologies. We hope to share resources that will allow other scholars to engage in similar types of projects
BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology
This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software
- âŠ