29 research outputs found
Executable Pseudocode for Graph Algorithms
Algorithms are written in pseudocode. However the implementation of an
algorithm in a conventional, imperative programming language can often be
scattered over hundreds of lines of code thus obscuring its essence. This can
lead to difficulties in understanding or verifying the code. Adapting or
varying the original algorithm can be laborious.
We present a case study showing the use of Common Lisp macros to provide an
embedded, domain-specific language for graph algorithms. This allows these
algorithms to be presented in Lisp in a form directly comparable to their
pseudocode, allowing rapid prototyping at the algorithm level.
As a proof of concept, we implement Brandes' algorithm for computing the
betweenness centrality of a graph and see how our implementation compares
favourably with state-of-the-art implementations in imperative programming
languages, not only in terms of clarity and verisimilitude to the pseudocode,
but also execution speed
Discovering Motifs in Real-World Social Networks
We built a framework for analyzing the contents of large social
networks, based on the approximate counting technique developed
by Gonen and Shavitt. Our toolbox was used on data from a large
forum---\texttt{boards.ie}---the
most prominent community website in Ireland.
For the purpose of this experiment, we were granted access to 10 years of
forum data. This is the first time the approximate counting
technique is tested on
real-world, social network data
A multiphysics and multiscale software environment for modeling astrophysical systems
We present MUSE, a software framework for combining existing computational
tools for different astrophysical domains into a single multiphysics,
multiscale application. MUSE facilitates the coupling of existing codes written
in different languages by providing inter-language tools and by specifying an
interface between each module and the framework that represents a balance
between generality and computational efficiency. This approach allows
scientists to use combinations of codes to solve highly-coupled problems
without the need to write new codes for other domains or significantly alter
their existing codes. MUSE currently incorporates the domains of stellar
dynamics, stellar evolution and stellar hydrodynamics for studying generalized
stellar systems. We have now reached a "Noah's Ark" milestone, with (at least)
two available numerical solvers for each domain. MUSE can treat multi-scale and
multi-physics systems in which the time- and size-scales are well separated,
like simulating the evolution of planetary systems, small stellar associations,
dense stellar clusters, galaxies and galactic nuclei.
In this paper we describe three examples calculated using MUSE: the merger of
two galaxies, the merger of two evolving stars, and a hybrid N-body simulation.
In addition, we demonstrate an implementation of MUSE on a distributed computer
which may also include special-purpose hardware, such as GRAPEs or GPUs, to
accelerate computations. The current MUSE code base is publicly available as
open source at http://muse.liComment: 24 pages, To appear in New Astronomy Source code available at
http://muse.l
Extracting causal relations on HIV drug resistance from literature
<p>Abstract</p> <p>Background</p> <p>In HIV treatment it is critical to have up-to-date resistance data of applicable drugs since HIV has a very high rate of mutation. These data are made available through scientific publications and must be extracted manually by experts in order to be used by virologists and medical doctors. Therefore there is an urgent need for a tool that partially automates this process and is able to retrieve relations between drugs and virus mutations from literature.</p> <p>Results</p> <p>In this work we present a novel method to extract and combine relationships between HIV drugs and mutations in viral genomes. Our extraction method is based on natural language processing (NLP) which produces grammatical relations and applies a set of rules to these relations. We applied our method to a relevant set of PubMed abstracts and obtained 2,434 extracted relations with an estimated performance of 84% for F-score. We then combined the extracted relations using logistic regression to generate resistance values for each <drug, mutation> pair. The results of this relation combination show more than 85% agreement with the Stanford HIVDB for the ten most frequently occurring mutations. The system is used in 5 hospitals from the Virolab project <url>http://www.virolab.org</url> to preselect the most relevant novel resistance data from literature and present those to virologists and medical doctors for further evaluation.</p> <p>Conclusions</p> <p>The proposed relation extraction and combination method has a good performance on extracting HIV drug resistance data. It can be used in large-scale relation extraction experiments. The developed methods can also be applied to extract other type of relations such as gene-protein, gene-disease, and disease-mutation.</p
Comparison of HIV-1 Genotypic Resistance Test Interpretation Systems in Predicting Virological Outcomes Over Time
Background: Several decision support systems have been developed to interpret HIV-1 drug resistance genotyping results. This study compares the ability of the most commonly used systems (ANRS, Rega, and Stanford's HIVdb) to predict virological outcome at 12, 24, and 48 weeks. Methodology/Principal Findings: Included were 3763 treatment-change episodes (TCEs) for which a HIV-1 genotype was available at the time of changing treatment with at least one follow-up viral load measurement. Genotypic susceptibility scores for the active regimens were calculated using scores defined by each interpretation system. Using logistic regression, we determined the association between the genotypic susceptibility score and proportion of TCEs having an undetectable viral load (<50 copies/ml) at 12 (8-16) weeks (2152 TCEs), 24 (16-32) weeks (2570 TCEs), and 48 (44-52) weeks (1083 TCEs). The Area under the ROC curve was calculated using a 10-fold cross-validation to compare the different interpretation systems regarding the sensitivity and specificity for predicting undetectable viral load. The mean genotypic susceptibility score of the systems was slightly smaller for HIVdb, with 1.92±1.17, compared to Rega and ANRS, with 2.22±1.09 and 2.23±1.05, respectively. However, similar odds ratio's were found for the association between each-unit increase in genotypic susceptibility score and undetectable viral load at week 12; 1.6 [95% confidence interval 1.5-1.7] for HIVdb, 1.7 [1.5-1.8] for ANRS, and 1.7 [1.9-1.6] for Rega. Odds ratio's increased over time, but remained comparable (odds ratio's ranging between 1.9-2.1 at 24 weeks and 1.9-2.
A niche width model of optimal specialization
Niche width theory, a part of organizational ecology, predicts whether “specialist” or “generalist” forms of organizations have higher “fitness,” in a continually changing environment. To this end, niche width theory uses a mathematical model borrowed from biology. In this paper, we first loosen the specialist-generalist dichotomy, so that we can predict the optimal degree of specialization. Second, we generalize the model to a larger class of environmental conditions, on the basis of the model’s underlying assumptions. Third, we criticize the way the biological model is treated in sociological theory. Two of the model’s dimensions seem to be confused, i.e., that of trait and environment; the predicted optimal specialization is a property of individual organizations, not of populations; and, the distinction between “fine” and “coarse grained” environments is superfluous
Steven de Rooij — Methods of Statistical Data Compression
Data compression is important not only for conserving resources; it also has applications in cryptography and it can be used as an estimator for redundancy in the data: this has many applications, such as prediction, classification and other difficult problems in machine learning. We study algorithms that perform lossless statistical data compression. Statistical data compression is attractive because it allows for separation of the problems of modelling and coding, both of which will be treated here. It seems safe to say that with the development of arithmetic coding in 1976, the problem of coding has been solved satisfactorily, while the problem of modelling remains very difficult to this day. We will restrict ourselves to online modelling. In chapter 2 we study the theoretical background of statistical data compression, relating results of information theory and probability theory to coding and modelling. Then we focus on more concrete issues: in chapter 3 we treat an adaptation of Ukkonen’s algorithm for the online construction of suffix trees, whic
Discovering motifs in real-world social networks
We built a framework for analyzing the contents of large social networks, based on the approximate counting technique developed by Gonen and Shavitt. Our toolbox was used on data from a large forum—boards.ie—the most prominent community website in Ireland. For the purpose of this experiment, we were granted access to 10 years of forum data. This is the first time the approximate counting technique is tested on real-world, social network data