46,711 research outputs found

    Tangos: the agile numerical galaxy organization system

    Get PDF
    We present Tangos, a Python framework and web interface for database-driven analysis of numerical structure formation simulations. To understand the role that such a tool can play, consider constructing a history for the absolute magnitude of each galaxy within a simulation. The magnitudes must first be calculated for all halos at all timesteps and then linked using a merger tree; folding the required information into a final analysis can entail significant effort. Tangos is a generic solution to this information organization problem, aiming to free users from the details of data management. At the querying stage, our example of gathering properties over history is reduced to a few clicks or a simple, single-line Python command. The framework is highly extensible; in particular, users are expected to define their own properties which tangos will write into the database. A variety of parallelization options are available and the raw simulation data can be read using existing libraries such as pynbody or yt. Finally, tangos-based databases and analysis pipelines can easily be shared with collaborators or the broader community to ensure reproducibility. User documentation is provided separately.Comment: Clarified various points and further improved code performance; accepted for publication in ApJS. Tutorials (including video) at http://tiny.cc/tango

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    Towards Loosely-Coupled Programming on Petascale Systems

    Full text link
    We have extended the Falkon lightweight task execution framework to make loosely coupled programming on petascale systems a practical and useful programming model. This work studies and measures the performance factors involved in applying this approach to enable the use of petascale systems by a broader user community, and with greater ease. Our work enables the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications. This approach allows a new-and potentially far larger-class of applications to leverage petascale systems, such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O performance encountered in making this model practical, and show results using both microbenchmarks and real applications from two domains: economic energy modeling and molecular dynamics. Our benchmarks show that we can scale up to 160K processor-cores with high efficiency, and can achieve sustained execution rates of thousands of tasks per second.Comment: IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SuperComputing/SC) 200

    PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation

    Full text link
    Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. This allows for the interactive data exploration of the largest datasets. In this paper we introduce the first framework for parallel online aggregation in which the estimation virtually does not incur any overhead on top of the actual execution. We define a generic interface to express any estimation model that abstracts completely the execution details. We design a novel estimator specifically targeted at parallel online aggregation. When executed by the framework over a massive 8TB8\text{TB} TPC-H instance, the estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders of magnitude smaller than the dataset size and without incurring overhead.Comment: 36 page

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Open Access Scientometrics and the UK Research Assessment Exercise

    No full text
    Scientometric predictors of research performance need to be validated by showing that they have a high correlation with the external criterion they are trying to predict. The UK Research Assessment Exercise (RAE) -- together with the growing movement toward making the full-texts of research articles freely available on the web -- offer a unique opportunity to test and validate a wealth of old and new scientometric predictors, through multiple regression analysis: Publications, journal impact factors, citations, co-citations, citation chronometrics (age, growth, latency to peak, decay rate), hub/authority scores, h-index, prior funding, student counts, co-authorship scores, endogamy/exogamy, textual proximity, download/co-downloads and their chronometrics, etc. can all be tested and validated jointly, discipline by discipline, against their RAE panel rankings in the forthcoming parallel panel-based and metric RAE in 2008. The weights of each predictor can be calibrated to maximize the joint correlation with the rankings. Open Access Scientometrics will provide powerful new means of navigating, evaluating, predicting and analyzing the growing Open Access database, as well as powerful incentives for making it grow faster

    Proteomics of Cytochrome c Oxidase-Negative versus -Positive Muscle Fiber Sections in Mitochondrial Myopathy

    Get PDF
    The mosaic distribution of cytochrome c oxidase(+) (COX+) and COX - muscle fibers in mitochondrial disorders allows the sampling of fibers with compensated and decompensated mitochondrial function from the same individual. We apply laser capture microdissection to excise individual COX+ and COX- fibers from the biopsies of mitochondrial myopathy patients. Using mass spectrometry-based proteomics, we quantify >4,000 proteins per patient. While COX+ fibers show a higher expression of respiratory chain components, COX- fibers display protean adaptive responses, including upregulation of mitochondrial ribosomes, translation proteins, and chaperones. Upregulated proteins include C1QBP, required for mitoribosome formation and protein synthesis, and STOML2, which organizes cardiolipin-enriched microdomains and the assembly of respiratory supercomplexes. Factoring in fast/slow fiber type, COX (-) slow fibers show a compensatory upregulation of beta-oxidation, the AAA(+) protease AFG3L1, and the OPA1-dependent cristae remodeling program. These findings reveal compensatory mechanisms in muscle fibers struggling with energy shortage and metabolic stress
    • …
    corecore