182 research outputs found

    Transformative Effects of NDIIPP, the Case of the Henry A. Murray Archive

    Get PDF
    This article comprises reflections on the changes to the Henry A. Murray Research Archive, catalyzed by involvement with the National Digital Information Infrastructure and Preservation Program (NDIIPP) partnership, and the accompanying introduction of next generation digital library software. Founded in 1976 at Radcliffe, the Henry A. Murray Research Archive is the endowed, permanent repository for quantitative and qualitative research data at the Institute for Quantitative Social Science, in Harvard University. The Murray preserves in perpetuity all types of data of interest to the research community, including numerical, video, audio, interview notes, and other types. The center is unique among data archives in the United States in the extent of its holdings in quantitative, qualitative, and mixed quantitativequalitative research. The Murray took part in an NDIIPP-funded collaboration with four other archival partners, Data-PASS, for the purpose of the identification and acquisition of data at risk, and the joint development of best practices with respect to shared stewardship, preservation, and exchange of these data. During this time, the Dataverse Network (DVN) software was introduced, facilitating the creation of virtual archives. The combination of institutional collaboration and new technology lead the Murray to re-engineer its entire acquisition process; completely rewrite its ingest, dissemination, and other licensing agreements; and adopt a new model for ingest, discovery, access, and presentation of its collections. Through the Data-PASS project, the Murray has acquired a number of important data collections. The resulting changes within the Murray have been dramatic, including increasing its overall rate of acquisitions by fourfold; and disseminating acquisitions far more rapidly. Furthermore, the new licensing and processing procedures allow a previously undreamed of level of interoperability and collaboration with partner archives, facilitating integrated discovery and presentation services, and joint stewardship of collections.published or submitted for publicatio

    Nineteen Ways of Looking at Statistical Software

    Get PDF
    We identify principles and practices for writing and publishing statistical software with maximum benefit to the scholarly community.

    What Are Judicially Manageable Standards For Redistricting? Evidence From History

    Get PDF
    In the 1960s the courts adopted population-equality measures as a means to limit gerrymandering. In recent cases, the courts have begun to use geographical compactness standards for this same purpose. In this research note, I argue that unlike population-equality measures, compactness standards are not easily manageable by the judiciary. I use a variety of compactness measures and population-equality measures to evaluate 349 district plans, comprising the 3390 U.S. Congressional districts created between 1789 and 1913. I find that different population-equality measures, even those with poor theoretical properties, produce very similar evaluations of plans. On the other hand, different compactness measures fail to agree about the compactness of most districts and plans. In effect, the courts can use any convenient measure of population equality and obtain similar results, while the courts' choice of compactness measures will significantly change the evaluations in each case. Since there is no generally accepted single measure of compactness, this disagreement among measures raises concerns whether compactness is a readily operationalizable notion, to use a social scientific formulation, or a judicially manageable one, to employ terms from law

    BARD: Better Automated Redistricting

    Get PDF
    BARD is the first (and at time of writing, only) open source software package for general redistricting and redistricting analysis. BARD provides methods to create, display, compare, edit, automatically refine, evaluate, and profile political districting plans. BARD aims to provide a framework for scientific analysis of redistricting plans and to facilitate wider public participation in the creation of new plans. BARD facilitates map creation and refinement through command-line, graphical user interface, and automatic methods. Since redistricting is a computationally complex partitioning problem not amenable to an exact optimization solution, BARD implements a variety of selectable metaheuristics that can be used to refine existing or randomly-generated redistricting plans based on user-determined criteria. Furthermore, BARD supports automated generation of redistricting plans and profiling of plans by assigning different weights to various criteria, such as district compactness or equality of population. This functionality permits exploration of trade-offs among criteria. The intent of a redistricting authority may be explored by examining these trade-offs and inferring which reasonably observable plans were not adopted. Redistricting is a computationally-intensive problem for even modest-sized states. Performance is thus an important consideration in BARD's design and implementation. The program implements performance enhancements such as evaluation caching, explicit memory management, and distributed computing across snow clusters.

    accuracy: Tools for Accurate and Reliable Statistical Computing

    Get PDF
    Most empirical social scientists are surprised that low-level numerical issues in software can have deleterious effects on the estimation process. Statistical analyses that appear to be perfectly successful can be invalidated by concealed numerical problems. We have developed a set of tools, contained in accuracy, a package for R and S-PLUS, to diagnose problems stemming from numerical and measurement error and to improve the accuracy of inferences. The tools included in accuracy include a framework for gauging the computational stability of model results, tools for comparing model results, optimization diagnostics, and tools for collecting entropy for true random numbers generation.

    Nineteen Ways of Looking at Statistical Software

    Get PDF
    We identify principles and practices for writing and publishing statistical software with maximum benefit to the scholarly community

    A Grand Challenges-Based Research Agenda for Scholarly Communication and Information Science [MIT Grand Challenge PubPub Participation Platform]

    Get PDF
    Identifying Grand Challenges A global and multidisciplinary community of stakeholders came together in March 2018 to identify, scope, and prioritize a common vision for specific grand research challenges related to the fields of information science and scholarly communications. The participants included domain researchers in academia, practitioners, and those who are aiming to democratize scholarship. An explicit goal of the summit was to identify research needs related to barriers in the development of scalable, interoperable, socially beneficial, and equitable systems for scholarly information; and to explore the development of non-market approaches to governing the scholarly knowledge ecosystem. To spur discussion and exploration, grand challenge provocations were suggested by participants and framed into one of three sections: scholarly discovery, digital curation and preservation, and open scholarship. A few people participated in three segments, but most only attended discussions around a single topic. To create the guest list of desired participants within our three workshop target areas we invited a distribution of expertise providing diversity across several facets. In addition to having expertise in the specific focus area, we aimed for the participants in each track to be diverse across sectors, disciplines, and regions of the world. Each track had approximately 20-25 people from different parts of the world—including the United States, European Union, South Africa, and India. Domain researchers brought perspectives from a range of scientific disciplines, while practitioners brought perspectives from different roles (drawn from commercial, non-profit, and governmental sectors). Notwithstanding, we were constrained by our social networks, and by the location of the workshop in Cambridge, Massachusetts— and most of the participants were affiliated with US and European institutions. During our discussions, it quickly became clear that the grand challenges themselves cannot be neatly categorized into discovery, curation and preservation, and open scholarship—or even, for that matter, limited to library science and information sciences. Several cross-cutting themes emerged, such as a strong need to include underrepresented voices and communities outside of mainstream publishing and academic institutions, a need to identify incentives that will motivate people to make changes in their own approaches and processes toward a more open and trusted framework, and a need to identify collaborators and partners from multiple disciplines in order to build strong programs. The discussions were full of energy, insights, and enthusiasm for inclusive participation—and concluded with a desire for a global call to action to spark changes that will enable more equitable and open scholarship. Some important and productive tensions surfaced in our discussions, particularly around the best paths forward on the challenges we identified. On many core topics, however, there was widespread agreement among participants, especially on the urgent need to address the exclusion of knowledge production and access of so many people around the globe, and the troubling overrepresentation in the scholarly record of white, male, English-language voices. Ultimately, all agreed that we have an obligation to better enrich and greatly expand this space so that our communities can be catalysts for change. Towards a more inclusive, open, equitable, and sustainable scholarly knowledge ecosystem: Vision; Broadest impacts; Recommendations for broad impact. Research landscape: Challenges, threats, and barriers; Challenges to participation in the research community; Restrictions on forms of knowledge; Threats to integrity and trust; Threats to the durability of knowledge; Threats to individual agency; Incentives to sustain a scholarly knowledge ecosystem that is inclusive, equity, trustworthy, and sustainable; Grand Challenges research areas; Recommendations for research areas and programs. Targeted research questions, research challenges: Legal economic, policy, and organizational design for enduring, equitable, open scholarship; Measuring, predicting, and adapting to use and utility across scholarly communities; Designing and governing algorithms in the scholarly knowledge ecosystem to support accountability, credibility, and agency; Integrating oral and tacit knowledge into the scholarly knowledge ecosystem. Integrating research, practice, and policy: The need for leadership to coordinate research, policy, and practice initiatives; Role of libraries and archives as advocates and collaborators; Incorporating values of openness, sustainability, and equity into scholarly infrastructure and practice; Funders, catalysts, and coordinators; Recommendations for integrating research, practice, and policy

    Selecting Efficient and Reliable Preservation Strategies

    Get PDF
    This article addresses the problem of formulating efficient and reliable operational preservation policies that ensure bit-level information integrity over long periods, and in the presence of a diverse range of real-world technical, legal, organizational, and economic threats. We develop a systematic, quantitative prediction framework that combines formal modeling, discrete-event-based simulation, hierarchical modeling, and then use empirically calibrated sensitivity analysis to identify effective strategies. Specifically, the framework formally defines an objective function for preservation that maps a set of preservation policies and a risk profile to a set of preservation costs, and an expected collection loss distribution. In this framework, a curator’s objective is to select optimal policies that minimize expected loss subject to budget constraints. To estimate preservation loss under different policy conditions optimal policies, we develop a statistical hierarchical risk model that includes four sources of risk: the storage hardware; the physical environment; the curating institution; and the global environment. We then employ a general discrete event-based simulation framework to evaluate the expected loss and the cost of employing varying preservation strategies under specific parameterization of risks. The framework offers flexibility for the modeling of a wide range of preservation policies and threats. Since this framework is open source and easily deployed in a cloud computing environment, it can be used to produce analysis based on independent estimates of scenario-specific costs, reliability, and risks. We present results summarizing hundreds of thousands of simulations using this framework. This exploratory analysis points to a number of robust and broadly applicable preservation strategies, provides novel insights into specific preservation tactics, and provides evidence that challenges received wisdom

    Selecting efficient and reliable preservation strategies: modeling long-term information integrity using large-scale hierarchical discrete event simulation

    Full text link
    This article addresses the problem of formulating efficient and reliable operational preservation policies that ensure bit-level information integrity over long periods, and in the presence of a diverse range of real-world technical, legal, organizational, and economic threats. We develop a systematic, quantitative prediction framework that combines formal modeling, discrete-event-based simulation, hierarchical modeling, and then use empirically calibrated sensitivity analysis to identify effective strategies. The framework offers flexibility for the modeling of a wide range of preservation policies and threats. Since this framework is open source and easily deployed in a cloud computing environment, it can be used to produce analysis based on independent estimates of scenario-specific costs, reliability, and risks.Comment: Fortcoming IDCC 202
    corecore