3,503 research outputs found
The Locus Algorithm III: A Grid Computing system to generate catalogues of optimised pointings for Differential Photometry
This paper discusses the hardware and software components of the Grid
Computing system used to implement the Locus Algorithm to identify optimum
pointings for differential photometry of 61,662,376 stars and 23,799 quasars.
The scale of the data, together with initial operational assessments demanded a
High Performance Computing (HPC) system to complete the data analysis. Grid
computing was chosen as the HPC solution as the optimum choice available within
this project. The physical and logical structure of the National Grid computing
Infrastructure informed the approach that was taken. That approach was one of
layered separation of the different project components to enable maximum
flexibility and extensibility
Interdependent binary choices under social influence: phase diagram for homogeneous unbiased populations
Coupled Ising models are studied in a discrete choice theory framework, where
they can be understood to represent interdependent choice making processes for
homogeneous populations under social influence. Two different coupling schemes
are considered. The nonlocal or group interdependence model is used to study
two interrelated groups making the same binary choice. The local or individual
interdependence model represents a single group where agents make two binary
choices which depend on each other. For both models, phase diagrams, and their
implications in socioeconomic contexts, are described and compared in the
absence of private deterministic utilities (zero opinion fields).Comment: 17 pages, 3 figures. This is the pre-peer reviewed version of the
following article: Ana Fern\'andez del R\'io, Elka Korutcheva and Javier de
la Rubia, Interdependent binary choices under social influence, Wiley's
Complexity, 2012; which has been published in final form at
http://onlinelibrary.wiley.com/doi/10.1002/cplx.21397/abstrac
Recommended from our members
Kinetic multi-layer model of gas-particle interactions in aerosols and clouds (KM-GAP): linking condensation, evaporation and chemical reactions of organics, oxidants and water
We present a novel kinetic multi-layer model for gas-particle interactions in aerosols and clouds (KM-GAP) that treats explicitly all steps of mass transport and chemical reaction of semi-volatile species partitioning between gas phase, particle surface and particle bulk. KM-GAP is based on the PRA model framework (Pöschl-Rudich-Ammann, 2007), and it includes gas phase diffusion, reversible adsorption, surface reactions, bulk diffusion and reaction, as well as condensation, evaporation and heat transfer. The size change of atmospheric particles and the temporal evolution and spatial profile of the concentration of individual chemical species can be modelled along with gas uptake and accommodation coefficients. Depending on the complexity of the investigated system, unlimited numbers of semi-volatile species, chemical reactions, and physical processes can be treated, and the model shall help to bridge gaps in the understanding and quantification of multiphase chemistry and microphysics in atmo- spheric aerosols and clouds.
In this study we demonstrate how KM-GAP can be used to analyze, interpret and design experimental investigations of changes in particle size and chemical composition in response to condensation, evaporation, and chemical reaction. For the condensational growth of water droplets, our kinetic model results provide a direct link between laboratory observations and molecular dynamic simulations, confirming that the accommodation coefficient of water at 270 K is close to unity. Literature data on the evaporation of dioctyl phthalate as a function of particle size and time can be reproduced, and the model results suggest that changes in the experimental conditions like aerosol particle concentration and chamber geometry may influence the evaporation kinetics and can be optimized for eðcient probing of specific physical effects and parameters. With regard to oxidative aging of organic aerosol particles, we illustrate how the formation and evaporation of volatile reaction products like nonanal can cause a decrease in the size of oleic acid particles exposed to ozone
Evolution of statistical analysis in empirical software engineering research: Current state and steps forward
Software engineering research is evolving and papers are increasingly based
on empirical data from a multitude of sources, using statistical tests to
determine if and to what degree empirical evidence supports their hypotheses.
To investigate the practices and trends of statistical analysis in empirical
software engineering (ESE), this paper presents a review of a large pool of
papers from top-ranked software engineering journals. First, we manually
reviewed 161 papers and in the second phase of our method, we conducted a more
extensive semi-automatic classification of papers spanning the years 2001--2015
and 5,196 papers. Results from both review steps was used to: i) identify and
analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well
as relevant trends in usage of specific statistical methods (e.g.,
nonparametric tests and effect size measures) and, ii) develop a conceptual
model for a statistical analysis workflow with suggestions on how to apply
different statistical methods as well as guidelines to avoid pitfalls. Lastly,
we confirm existing claims that current ESE practices lack a standard to report
practical significance of results. We illustrate how practical significance can
be discussed in terms of both the statistical analysis and in the
practitioner's context.Comment: journal submission, 34 pages, 8 figure
topicmodels: An R Package for Fitting Topic Models
Topic models allow the probabilistic modeling of term frequency occurrences in documents. The fitted model can be used to estimate the similarity between documents as well as between a set of specified keywords using an additional layer of latent variables which are referred to as topics. The R package topicmodels provides basic infrastructure for fitting topic models based on data structures from the text mining package tm. The package includes interfaces to two algorithms for fitting topic models: the variational expectation-maximization algorithm provided by David M. Blei and co-authors and an algorithm using Gibbs sampling by Xuan-Hieu Phan and co-authors.
In and out of Madagascar : dispersal to peripheral islands, insular speciation and diversification of Indian Ocean daisy trees (Psiadia, Asteraceae)
This study was supported by the European Union’s HOTSPOTS Training Network (MEST-2005-020561)Madagascar is surrounded by archipelagos varying widely in origin, age and structure. Although small and geologically young, these archipelagos have accumulated disproportionate numbers of unique lineages in comparison to Madagascar, highlighting the role of waif-dispersal and rapid in situ diversification processes in generating endemic biodiversity. We reconstruct the evolutionary and biogeographical history of the genus Psiadia (Asteraceae), a plant genus with near equal numbers of species in Madagascar and surrounding islands. Analyzing patterns and processes of diversification, we explain species accumulation on peripheral islands and aim to offer new insights on the origin and potential causes for diversification in the Madagascar and Indian Ocean Islands biodiversity hotspot. Our results provide support for an African origin of the group, with strong support for non-monophyly. Colonization of the Mascarenes took place by two evolutionary distinct lineages from Madagascar, via two independent dispersal events, each unique for their spatial and temporal properties. Significant shifts in diversification rate followed regional expansion, resulting in co-occurring and phenotypically convergent species on high-elevation volcanic slopes. Like other endemic island lineages, Psiadia have been highly successful in dispersing to and radiating on isolated oceanic islands, typified by high habitat diversity and dynamic ecosystems fuelled by continued geological activity. Results stress the important biogeographical role for Rodrigues in serving as an outlying stepping stone from which regional colonization took place. We discuss how isolated volcanic islands contribute to regional diversity by generating substantial numbers of endemic species on short temporal scales. Factors pertaining to the mode and tempo of archipelago formation and its geographical isolation strongly govern evolutionary pathways available for species diversification, and the potential for successful diversification of dispersed lineages, therefore, appears highly dependent on the timing of arrival, as habitat and resource properties change dramatically over the course of oceanic island evolution.Publisher PDFPeer reviewe
Honoring the Lion: A Festschrift for Jan de Leeuw
This special volume celebrates the 20th anniversary of the Journal of Statistical Software (JSS) and is a Festschrift for its founding editor Jan de Leeuw. Jan recently retired from his long-held position as founding chair of the Department of Statistics at the University of California, Los Angeles. The contributions to this special volume look back at some of his research interests and accomplishments during the half-century that he has been active in psychometrics and statistics. In this introduction, the guest editors also reminisce on their own first encounters with Jan, ten years ago. Since that time JSS has solidified its place as a leading journal of computational statistics, a fact that has a lot to do with Jan's stewardship. We include a brief history of JSS
ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R
We introduce the C++ application and R package ranger. The software is a fast
implementation of random forests for high dimensional data. Ensembles of
classification, regression and survival trees are supported. We describe the
implementation, provide examples, validate the package with a reference
implementation, and compare runtime and memory usage with other
implementations. The new software proves to scale best with the number of
features, samples, trees, and features tried for splitting. Finally, we show
that ranger is the fastest and most memory efficient implementation of random
forests to analyze data on the scale of a genome-wide association study
Purposeful Searching for Citations of Scholarly Publications
Citation data contains the citations among scholarly publications. The data can be used to find relevant sources during research, identify emerging trends and research areas, compute metrics for comparing authors or journals, or for thematic clustering. Manual administration of citation data is limited due to the large number of publications. In this work, we hence lay the foundations for the automatic search for scientific citations. The unique characteristics are a purposeful search of citations for a specified set of publications (of e.g., an author or an institute). Therefore, search strategies will be developed and evaluated in this work in order to reduce the costs for the analysis of documents without citations to the given set of publications. In our experiments, for authors with more than 100 publications about 75 % of the citations were found. The purposeful strategy examined thereby only 1.5 % of the 120 million publications of the used data set
spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields
spam is an R package for sparse matrix algebra with emphasis on a Cholesky factorization of sparse positive definite matrices. The implemantation of spam is based on the competing philosophical maxims to be competitively fast compared to existing tools and to be easy to use, modify and extend. The first is addressed by using fast Fortran routines and the second by assuring S3 and S4 compatibility. One of the features of spam is to exploit the algorithmic steps of the Cholesky factorization and hence to perform only a fraction of the workload when factorizing matrices with the same sparsity structure. Simulations show that exploiting this break-down of the factorization results in a speed-up of about a factor 5 and memory savings of about a factor 10 for large matrices and slightly smaller factors for huge matrices. The article is motivated with Markov chain Monte Carlo methods for Gaussian Markov random fields, but many other statistical applications are mentioned that profit from an efficient Cholesky factorization as well.
- …