9,057 research outputs found

    EntiTables: Smart Assistance for Entity-Focused Tables

    Full text link
    Tables are among the most powerful and practical tools for organizing and working with data. Our motivation is to equip spreadsheet programs with smart assistance capabilities. We concentrate on one particular family of tables, namely, tables with an entity focus. We introduce and focus on two specific tasks: populating rows with additional instances (entities) and populating columns with new headings. We develop generative probabilistic models for both tasks. For estimating the components of these models, we consider a knowledge base as well as a large table corpus. Our experimental evaluation simulates the various stages of the user entering content into an actual table. A detailed analysis of the results shows that the models' components are complimentary and that our methods outperform existing approaches from the literature.Comment: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17), 201

    Reasoning with Data Flows and Policy Propagation Rules

    Get PDF
    Data-oriented systems and applications are at the centre of current developments of the World Wide Web. In these scenarios, assessing what policies propagate from the licenses of data sources to the output of a given data-intensive system is an important problem. Both policies and data flows can be described with Semantic Web languages. Although it is possible to define Policy Propagation Rules (PPR) by associating policies to data flow steps, this activity results in a huge number of rules to be stored and managed. In a recent paper, we introduced strategies for reducing the size of a PPR knowledge base by using an ontology of the possible relations between data objects, the Datanode ontology, and applying the (A)AAAA methodology, a knowledge engineering approach that exploits Formal Concept Analysis (FCA). In this article, we investigate whether this reasoning is feasible and how it can be performed. For this purpose, we study the impact of compressing a rule base associated with an inference mechanism on the performance of the reasoning process. Moreover, we report on an extension of the (A)AAAA methodology that includes a coherency check algorithm, that makes this reasoning possible. We show how this compression, in addition to being beneficial to the management of the knowledge base, also has a positive impact on the performance and resource requirements of the reasoning process for policy propagation

    Linked open government data: lessons from Data.gov.uk

    No full text
    The movement to publish government data is an opportunity to populate the linked data Web with data of good provenance. The benefits range from transparency to public service improvement, citizen engagement to the creation of social and economic value. There are many challenges to be met before the vision is implemented, and this paper describes the efforts of the EnAKTing project to extract value from data.gov.uk, through the stages of locating data sources, integrating data into the linked data Web, and browsing and querying it

    Yavaa: supporting data workflows from discovery to visualization

    Get PDF
    Recent years have witness an increasing number of data silos being opened up both within organizations and to the general public: Scientists publish their raw data as supplements to articles or even standalone artifacts to enable others to verify and extend their work. Governments pass laws to open up formerly protected data treasures to improve accountability and transparency as well as to enable new business ideas based on this public good. Even companies share structured information about their products and services to advertise their use and thus increase revenue. Exploiting this wealth of information holds many challenges for users, though. Oftentimes data is provided as tables whose sheer endless rows of daunting numbers are barely accessible. InfoVis can mitigate this gap. However, offered visualization options are generally very limited and next to no support is given in applying any of them. The same holds true for data wrangling. Only very few options to adjust the data to the current needs and barely any protection are in place to prevent even the most obvious mistakes. When it comes to data from multiple providers, the situation gets even bleaker. Only recently tools emerged to search for datasets across institutional borders reasonably. Easy-to-use ways to combine these datasets are still missing, though. Finally, results generally lack proper documentation of their provenance. So even the most compelling visualizations can be called into question when their coming about remains unclear. The foundations for a vivid exchange and exploitation of open data are set, but the barrier of entry remains relatively high, especially for non-expert users. This thesis aims to lower that barrier by providing tools and assistance, reducing the amount of prior experience and skills required. It covers the whole workflow ranging from identifying proper datasets, over possible transformations, up until the export of the result in the form of suitable visualizations

    Approaches to ontology development by non ontology experts

    Get PDF
    Untrained users in the development of ontologies are challenged by the formal representation languages that underlie the most common ontology editing tools. To reduce that barrier, many efforts have gone in the creation of Controlled Languages (CL) translatable into ontology structures. However, CLs fall short of addressing a more profound problem: the selection of the most appropriate ontology modelling component for a certain modelling problem, regardless of the underlying representation paradigm. With the aim of approaching non ontology expert's difficulties in selecting the most appropriate modelling solution, we propose a Natural Language (NL) guided approach based on a repository of Lexico-Syntactic Patterns associated to consensual modelling solutions, i.e., Ontology Design Patterns. By relying on this repository, untrained users can formulate in NL what they want to model in the ontology, and obtain the corresponding design pattern for the modelling issue

    Towards data grids for microarray expression profiles

    Get PDF
    The UK DTI funded Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES) project developed a Grid infrastructure through which research into the genetic causes of hypertension could be supported by scientists within the large Wellcome Trust funded Cardiovascular Functional Genomics project. The BRIDGES project had a focus on developing a compute Grid and a data Grid infrastructure with security at its heart. Building on the work within BRIDGES, the BBSRC funded Grid enabled Microarray Expression Profile Search (GEMEPS) project plans to provide an enhanced data Grid infrastructure to support richer queries needed for the discovery and analysis of microarray data sets, also based upon a fine-grained security infrastructure. This paper outlines the experiences gained within BRIDGES and outlines the status of the GEMEPS project, the open challenges that remain and plans for the future

    User Review Analysis for Requirement Elicitation: Thesis and the framework prototype's source code

    Get PDF
    Online reviews are an important channel for requirement elicitation. However, requirement engineers face challenges when analysing online user reviews, such as data volumes, technical supports, existing techniques, and legal barriers. Juan Wang proposes a framework solving user review analysis problems for the purpose of requirement elicitation that sets up a channel from downloading user reviews to structured analysis data. The main contributions of her work are: (1) the thesis proposed a framework to solve the user review analysis problem for requirement elicitation; (2) the prototype of this framework proves its feasibility; (3) the experiments prove the effectiveness and efficiency of this framework. This resource here is the latest version of Juan Wang's PhD thesis "User Review Analysis for Requirement Elicitation" and all the source code of the prototype for the framework as the results of her thesis
    corecore