4,738 research outputs found

    Broadening the Scope of Nanopublications

    Full text link
    In this paper, we present an approach for extending the existing concept of nanopublications --- tiny entities of scientific results in RDF representation --- to broaden their application range. The proposed extension uses English sentences to represent informal and underspecified scientific claims. These sentences follow a syntactic and semantic scheme that we call AIDA (Atomic, Independent, Declarative, Absolute), which provides a uniform and succinct representation of scientific assertions. Such AIDA nanopublications are compatible with the existing nanopublication concept and enjoy most of its advantages such as information sharing, interlinking of scientific findings, and detailed attribution, while being more flexible and applicable to a much wider range of scientific results. We show that users are able to create AIDA sentences for given scientific results quickly and at high quality, and that it is feasible to automatically extract and interlink AIDA nanopublications from existing unstructured data sources. To demonstrate our approach, a web-based interface is introduced, which also exemplifies the use of nanopublications for non-scientific content, including meta-nanopublications that describe other nanopublications.Comment: To appear in the Proceedings of the 10th Extended Semantic Web Conference (ESWC 2013

    Dynamic Analysis can be Improved with Automatic Test Suite Refactoring

    Full text link
    Context: Developers design test suites to automatically verify that software meets its expected behaviors. Many dynamic analysis techniques are performed on the exploitation of execution traces from test cases. However, in practice, there is only one trace that results from the execution of one manually-written test case. Objective: In this paper, we propose a new technique of test suite refactoring, called B-Refactoring. The idea behind B-Refactoring is to split a test case into small test fragments, which cover a simpler part of the control flow to provide better support for dynamic analysis. Method: For a given dynamic analysis technique, our test suite refactoring approach monitors the execution of test cases and identifies small test cases without loss of the test ability. We apply B-Refactoring to assist two existing analysis tasks: automatic repair of if-statements bugs and automatic analysis of exception contracts. Results: Experimental results show that test suite refactoring can effectively simplify the execution traces of the test suite. Three real-world bugs that could previously not be fixed with the original test suite are fixed after applying B-Refactoring; meanwhile, exception contracts are better verified via applying B-Refactoring to original test suites. Conclusions: We conclude that applying B-Refactoring can effectively improve the purity of test cases. Existing dynamic analysis tasks can be enhanced by test suite refactoring

    Inferring Concise Specifications of APIs

    Get PDF
    Modern software relies on libraries and uses them via application programming interfaces (APIs). Correct API usage as well as many software engineering tasks are enabled when APIs have formal specifications. In this work, we analyze the implementation of each method in an API to infer a formal postcondition. Conventional wisdom is that, if one has preconditions, then one can use the strongest postcondition predicate transformer (SP) to infer postconditions. However, SP yields postconditions that are exponentially large, which makes them difficult to use, either by humans or by tools. Our key idea is an algorithm that converts such exponentially large specifications into a form that is more concise and thus more usable. This is done by leveraging the structure of the specifications that result from the use of SP. We applied our technique to infer postconditions for over 2,300 methods in seven popular Java libraries. Our technique was able to infer specifications for 75.7% of these methods, each of which was verified using an Extended Static Checker. We also found that 84.6% of resulting specifications were less than 1/4 page (20 lines) in length. Our technique was able to reduce the length of SMT proofs needed for verifying implementations by 76.7% and reduced prover execution time by 26.7%

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Approach for testing the extract-transform-load process in data warehouse systems, An

    Get PDF
    2018 Spring.Includes bibliographical references.Enterprises use data warehouses to accumulate data from multiple sources for data analysis and research. Since organizational decisions are often made based on the data stored in a data warehouse, all its components must be rigorously tested. In this thesis, we first present a comprehensive survey of data warehouse testing approaches, and then develop and evaluate an automated testing approach for validating the Extract-Transform-Load (ETL) process, which is a common activity in data warehousing. In the survey we present a classification framework that categorizes the testing and evaluation activities applied to the different components of data warehouses. These approaches include both dynamic analysis as well as static evaluation and manual inspections. The classification framework uses information related to what is tested in terms of the data warehouse component that is validated, and how it is tested in terms of various types of testing and evaluation approaches. We discuss the specific challenges and open problems for each component and propose research directions. The ETL process involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex one-to-one, many-to-one, and many-to-many transformations involving sources and targets that use different schemas, databases, and technologies. Since faulty implementations in any of the ETL steps can result in incorrect information in the target data warehouse, ETL processes must be thoroughly validated. In this thesis, we propose automated balancing tests that check for discrepancies between the data in the source databases and that in the target warehouse. Balancing tests ensure that the data obtained from the source databases is not lost or incorrectly modified by the ETL process. First, we categorize and define a set of properties to be checked in balancing tests. We identify various types of discrepancies that may exist between the source and the target data, and formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Next, we automatically identify source-to-target mappings from ETL transformation rules provided in the specifications. We identify one-to-one, many-to-one, and many-to-many mappings for tables, records, and attributes involved in the ETL transformations. We automatically generate test assertions to verify the properties for balancing tests. We use the source-to-target mappings to automatically generate assertions corresponding to each property. The assertions compare the data in the target data warehouse with the corresponding data in the sources to verify the properties. We evaluate our approach on a health data warehouse that uses data sources with different data models running on different platforms. We demonstrate that our approach can find previously undetected real faults in the ETL implementation. We also provide an automatic mutation testing approach to evaluate the fault finding ability of our balancing tests. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse when faulty ETL scripts execute on mock source data

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects
    • …
    corecore