3,503 research outputs found

    The Locus Algorithm III: A Grid Computing system to generate catalogues of optimised pointings for Differential Photometry

    Get PDF
    This paper discusses the hardware and software components of the Grid Computing system used to implement the Locus Algorithm to identify optimum pointings for differential photometry of 61,662,376 stars and 23,799 quasars. The scale of the data, together with initial operational assessments demanded a High Performance Computing (HPC) system to complete the data analysis. Grid computing was chosen as the HPC solution as the optimum choice available within this project. The physical and logical structure of the National Grid computing Infrastructure informed the approach that was taken. That approach was one of layered separation of the different project components to enable maximum flexibility and extensibility

    Interdependent binary choices under social influence: phase diagram for homogeneous unbiased populations

    Full text link
    Coupled Ising models are studied in a discrete choice theory framework, where they can be understood to represent interdependent choice making processes for homogeneous populations under social influence. Two different coupling schemes are considered. The nonlocal or group interdependence model is used to study two interrelated groups making the same binary choice. The local or individual interdependence model represents a single group where agents make two binary choices which depend on each other. For both models, phase diagrams, and their implications in socioeconomic contexts, are described and compared in the absence of private deterministic utilities (zero opinion fields).Comment: 17 pages, 3 figures. This is the pre-peer reviewed version of the following article: Ana Fern\'andez del R\'io, Elka Korutcheva and Javier de la Rubia, Interdependent binary choices under social influence, Wiley's Complexity, 2012; which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1002/cplx.21397/abstrac

    Evolution of statistical analysis in empirical software engineering research: Current state and steps forward

    Full text link
    Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.Comment: journal submission, 34 pages, 8 figure

    topicmodels: An R Package for Fitting Topic Models

    Get PDF
    Topic models allow the probabilistic modeling of term frequency occurrences in documents. The fitted model can be used to estimate the similarity between documents as well as between a set of specified keywords using an additional layer of latent variables which are referred to as topics. The R package topicmodels provides basic infrastructure for fitting topic models based on data structures from the text mining package tm. The package includes interfaces to two algorithms for fitting topic models: the variational expectation-maximization algorithm provided by David M. Blei and co-authors and an algorithm using Gibbs sampling by Xuan-Hieu Phan and co-authors.

    In and out of Madagascar : dispersal to peripheral islands, insular speciation and diversification of Indian Ocean daisy trees (Psiadia, Asteraceae)

    Get PDF
    This study was supported by the European Union’s HOTSPOTS Training Network (MEST-2005-020561)Madagascar is surrounded by archipelagos varying widely in origin, age and structure. Although small and geologically young, these archipelagos have accumulated disproportionate numbers of unique lineages in comparison to Madagascar, highlighting the role of waif-dispersal and rapid in situ diversification processes in generating endemic biodiversity. We reconstruct the evolutionary and biogeographical history of the genus Psiadia (Asteraceae), a plant genus with near equal numbers of species in Madagascar and surrounding islands. Analyzing patterns and processes of diversification, we explain species accumulation on peripheral islands and aim to offer new insights on the origin and potential causes for diversification in the Madagascar and Indian Ocean Islands biodiversity hotspot. Our results provide support for an African origin of the group, with strong support for non-monophyly. Colonization of the Mascarenes took place by two evolutionary distinct lineages from Madagascar, via two independent dispersal events, each unique for their spatial and temporal properties. Significant shifts in diversification rate followed regional expansion, resulting in co-occurring and phenotypically convergent species on high-elevation volcanic slopes. Like other endemic island lineages, Psiadia have been highly successful in dispersing to and radiating on isolated oceanic islands, typified by high habitat diversity and dynamic ecosystems fuelled by continued geological activity. Results stress the important biogeographical role for Rodrigues in serving as an outlying stepping stone from which regional colonization took place. We discuss how isolated volcanic islands contribute to regional diversity by generating substantial numbers of endemic species on short temporal scales. Factors pertaining to the mode and tempo of archipelago formation and its geographical isolation strongly govern evolutionary pathways available for species diversification, and the potential for successful diversification of dispersed lineages, therefore, appears highly dependent on the timing of arrival, as habitat and resource properties change dramatically over the course of oceanic island evolution.Publisher PDFPeer reviewe

    Honoring the Lion: A Festschrift for Jan de Leeuw

    Get PDF
    This special volume celebrates the 20th anniversary of the Journal of Statistical Software (JSS) and is a Festschrift for its founding editor Jan de Leeuw. Jan recently retired from his long-held position as founding chair of the Department of Statistics at the University of California, Los Angeles. The contributions to this special volume look back at some of his research interests and accomplishments during the half-century that he has been active in psychometrics and statistics. In this introduction, the guest editors also reminisce on their own first encounters with Jan, ten years ago. Since that time JSS has solidified its place as a leading journal of computational statistics, a fact that has a lot to do with Jan's stewardship. We include a brief history of JSS

    ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

    Get PDF
    We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study

    Purposeful Searching for Citations of Scholarly Publications

    Get PDF
    Citation data contains the citations among scholarly publications. The data can be used to find relevant sources during research, identify emerging trends and research areas, compute metrics for comparing authors or journals, or for thematic clustering. Manual administration of citation data is limited due to the large number of publications. In this work, we hence lay the foundations for the automatic search for scientific citations. The unique characteristics are a purposeful search of citations for a specified set of publications (of e.g., an author or an institute). Therefore, search strategies will be developed and evaluated in this work in order to reduce the costs for the analysis of documents without citations to the given set of publications. In our experiments, for authors with more than 100 publications about 75 % of the citations were found. The purposeful strategy examined thereby only 1.5 % of the 120 million publications of the used data set

    spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields

    Get PDF
    spam is an R package for sparse matrix algebra with emphasis on a Cholesky factorization of sparse positive definite matrices. The implemantation of spam is based on the competing philosophical maxims to be competitively fast compared to existing tools and to be easy to use, modify and extend. The first is addressed by using fast Fortran routines and the second by assuring S3 and S4 compatibility. One of the features of spam is to exploit the algorithmic steps of the Cholesky factorization and hence to perform only a fraction of the workload when factorizing matrices with the same sparsity structure. Simulations show that exploiting this break-down of the factorization results in a speed-up of about a factor 5 and memory savings of about a factor 10 for large matrices and slightly smaller factors for huge matrices. The article is motivated with Markov chain Monte Carlo methods for Gaussian Markov random fields, but many other statistical applications are mentioned that profit from an efficient Cholesky factorization as well.
    • …
    corecore