24,740 research outputs found

    Data Compression in the Petascale Astronomy Era: a GERLUMPH case study

    Full text link
    As the volume of data grows, astronomers are increasingly faced with choices on what data to keep -- and what to throw away. Recent work evaluating the JPEG2000 (ISO/IEC 15444) standards as a future data format standard in astronomy has shown promising results on observational data. However, there is still a need to evaluate its potential on other type of astronomical data, such as from numerical simulations. GERLUMPH (the GPU-Enabled High Resolution cosmological MicroLensing parameter survey) represents an example of a data intensive project in theoretical astrophysics. In the next phase of processing, the ~27 terabyte GERLUMPH dataset is set to grow by a factor of 100 -- well beyond the current storage capabilities of the supercomputing facility on which it resides. In order to minimise bandwidth usage, file transfer time, and storage space, this work evaluates several data compression techniques. Specifically, we investigate off-the-shelf and custom lossless compression algorithms as well as the lossy JPEG2000 compression format. Results of lossless compression algorithms on GERLUMPH data products show small compression ratios (1.35:1 to 4.69:1 of input file size) varying with the nature of the input data. Our results suggest that JPEG2000 could be suitable for other numerical datasets stored as gridded data or volumetric data. When approaching lossy data compression, one should keep in mind the intended purposes of the data to be compressed, and evaluate the effect of the loss on future analysis. In our case study, lossy compression and a high compression ratio do not significantly compromise the intended use of the data for constraining quasar source profiles from cosmological microlensing.Comment: 15 pages, 9 figures, 5 tables. Published in the Special Issue of Astronomy & Computing on The future of astronomical data format

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Special Libraries, September 1946

    Get PDF
    Volume 37, Issue 7https://scholarworks.sjsu.edu/sla_sl_1946/1006/thumbnail.jp

    The Real Value of What Students Do In College: College Completion Series: Part One

    Get PDF
    This report takes a look at how government officials have pressed college accreditors to focus more on "student outcomes" -- quantifiable indicators of knowledge acquired, skills learned, degrees attained, and so on. It then argues that it is not these enumerated outcomes that are the best way to hold colleges accountable, but rather the evidence of student engagement in the curriculum -- their papers, written examinations, projects, and presentations -- that holds the most promise for spurring improvement in higher education. Furthermore, this engagement is also a key factor in keeping students in school all the way to graduation. The report concludes that reformers seeking to enhance college performance and accountability should focus not on fabricated outcome measures but instead on the actual outputs from students' academic engagement, the best indicators of whether a college is providing the quality teaching, financial aid, and supportive environment that make higher learning possible, especially for the disadvantaged.This report is the first of a series from The Century Foundation, sponsored by Pearson. The views and opinions expressed in this paper are those of the authors and do not necessarily reflect the views or position of Pearson. The series grew out of an August 2014 conference at which researchers and several university presidents were exploring new paths to diversity in higher education in light of emerging legal constraints on race-based affirmative action. As participants discussed ideas to ensure access for low-income and minority students, university leaders were equally concerned about how to improve rates of college graduation by disadvantaged students

    Machine Readable Race: Constructing Racial Information in the Third Reich

    Get PDF
    This paper examines how informational processing drove new structures of racial classification in the Third Reich. The Deutsche Hollerith-Maschinen Gesellschaft mbH (Dehomag) worked closely with the government in designing and integrating punch-card informational systems. As a German subsidiary of IBM, Dehomag’s technology was deployed initially for a census in order to provide a more detailed racial analysis of the population. However the racial data was not detailed enough. The Nuremberg Race Laws provided a more precise and procedural definition of Jewishness that could be rendered machine-readable. As the volume and velocity of information in the Reich increased, Dehomag’s technology was adopted by other agencies like the Race and Settlement Office, and culminated in the vision of a single machinic number for each citizen. Through the lens of these proto-technologies, the paper demonstrates the historical interplay between race and information. Yet if the indexing and sorting of race anticipates big-data analytics, contemporary power is more sophisticated and subtle. The complexity of modern algorithmic regimes diffuses obvious racial markers, engendering a racism without race
    • …
    corecore