150,482 research outputs found

    Is a Dataframe Just a Table?

    Get PDF
    Querying data is core to databases and data science. However, the two communities have seemingly different concepts and use cases. As a result, both designers and users of the query languages disagree on whether the core abstractions - dataframes (data science) and tables (databases) - and the operations are the same. To investigate the difference from a PL-HCI perspective, we identify the basic affordances provided by tables and dataframes and how programming experiences over tables and dataframes differ. We show that the data structures nudge programmers to query and store their data in different ways. We hope the case study could clarify confusions, dispel misinformation, increase cross-pollination between the two communities, and identify open PL-HCI questions

    The Development of Web-Based Interface to Census Interaction Data

    Get PDF
    This project involves the development of a Web interface to origin-destination statistics from the 1991 Census (in a form that will be compatible with planned 2001 outputs). It provides the user with a set of screen-based tools for setting the parameters governing each data extraction (data set, areas, variables) in the form of a query. Traffic light icons are used to signal what the user has set so far and what remains to be done. There are options to extract different types of flow data and to generate output in different formats. The system can now be used to access the interaction flow data contained in the 1991 Special Migration Statistics Sets 1 and 2 and Special Workplace Statistics Set C. WICID has been demonstrated at the Origin-Destination Statistics Roadshows organised by GRO Scotland and held during May/June 2000 and the Census Offices have expressed interest in using the software in the Census Access Project

    The NASA Astrophysics Data System: Architecture

    Full text link
    The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself. The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes. To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms. The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion. The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table

    On the accuracy of language trees

    Get PDF
    Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.Comment: 36 pages, 14 figure

    A Symbolic Execution Algorithm for Constraint-Based Testing of Database Programs

    Full text link
    In so-called constraint-based testing, symbolic execution is a common technique used as a part of the process to generate test data for imperative programs. Databases are ubiquitous in software and testing of programs manipulating databases is thus essential to enhance the reliability of software. This work proposes and evaluates experimentally a symbolic ex- ecution algorithm for constraint-based testing of database programs. First, we describe SimpleDB, a formal language which offers a minimal and well-defined syntax and seman- tics, to model common interaction scenarios between pro- grams and databases. Secondly, we detail the proposed al- gorithm for symbolic execution of SimpleDB models. This algorithm considers a SimpleDB program as a sequence of operations over a set of relational variables, modeling both the database tables and the program variables. By inte- grating this relational model of the program with classical static symbolic execution, the algorithm can generate a set of path constraints for any finite path to test in the control- flow graph of the program. Solutions of these constraints are test inputs for the program, including an initial content for the database. When the program is executed with respect to these inputs, it is guaranteed to follow the path with re- spect to which the constraints were generated. Finally, the algorithm is evaluated experimentally using representative SimpleDB models.Comment: 12 pages - preliminary wor

    The NASA Astrophysics Data System: The Search Engine and its User Interface

    Get PDF
    The ADS Abstract and Article Services provide access to the astronomical literature through the World Wide Web (WWW). The forms based user interface provides access to sophisticated searching capabilities that allow our users to find references in the fields of Astronomy, Physics/Geophysics, and astronomical Instrumentation and Engineering. The returned information includes links to other on-line information sources, creating an extensive astronomical digital library. Other interfaces to the ADS databases provide direct access to the ADS data to allow developers of other data systems to integrate our data into their system. The search engine is a custom-built software system that is specifically tailored to search astronomical references. It includes an extensive synonym list that contains discipline specific knowledge about search term equivalences. Search request logs show the usage pattern of the various search system capabilities. Access logs show the world-wide distribution of ADS users. The ADS can be accessed at http://adswww.harvard.eduComment: 23 pages, 18 figures, 11 table

    A Query Integrator and Manager for the Query Web

    Get PDF
    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions
    • …
    corecore