9 research outputs found

    Statistical Claim Checking: StatCheck in Action

    No full text
    International audienc

    Exploration orientée utilisateur des lacs de données hautement hétérogènes

    No full text
    International audienceThe proliferation of digital data sources and formats has led to the apparition of data lakes, systems where numerous data sources coexist, with less (or no) control and coordination among the sources, than previously practised in enterprise databases and data warehouses. While most data lakes are designed for very large number of tables, Connec-tionLens [2,3] is a data lake system for structured, semi-structured, and unstructured data, which it integrates into a single graph; the graph can be explored via graph queries with keyword search [4] and entity path enumeration [5]. In this paper, we describe ConnectionStudio, a userfriendly platform leveraging ConnectionLens, and integrating feedback from non-expert users, in particular, journalists. Our main insights are: (i) improve and entice exploration by giving a first global view; (ii) facilitate tabular exports from the integrated graph; (iii) provide interactive means to improve the graph constructions. The insights can be used to further advance the exploration and usage of data lakes for non-IT users

    Exploration orientée utilisateur des lacs de données hautement hétérogènes

    No full text
    International audienceThe proliferation of digital data sources and formats has led to the apparition of data lakes, systems where numerous data sources coexist, with less (or no) control and coordination among the sources, than previously practised in enterprise databases and data warehouses. While most data lakes are designed for very large number of tables, Connec-tionLens [2,3] is a data lake system for structured, semi-structured, and unstructured data, which it integrates into a single graph; the graph can be explored via graph queries with keyword search [4] and entity path enumeration [5]. In this paper, we describe ConnectionStudio, a userfriendly platform leveraging ConnectionLens, and integrating feedback from non-expert users, in particular, journalists. Our main insights are: (i) improve and entice exploration by giving a first global view; (ii) facilitate tabular exports from the integrated graph; (iii) provide interactive means to improve the graph constructions. The insights can be used to further advance the exploration and usage of data lakes for non-IT users

    Fact-checking Multidimensional Statistic Claims in French

    No full text
    International audienceTo strengthen public trust and counter disinformation, computational fact-checking, leveraging digital data sources, attracts interest from the journalists and the computer science community. A particular class of interesting data sources comprises statistics, that is, numerical data compiled mostly by governments, administrations, and international organizations. Statistics are often multidimensional datasets, where multiple dimensions characterize one value, and the dimensions may be organized in hierarchies. This paper describes STATCHECK, a statistic fact-checking system jointly developed by the authors, which are either computer science researchers or fact-checking journalists working for a French-language media with a daily audience of more than 15 millions (aud, 2022). The technical novelty of STATCHECK is twofold: (i) we focus on multidimensional, complex-structure statistics, which have received little attention so far, despite their practical importance; and (ii) novel statistical claim extraction modules for French, an area where few resources exist. We validate the efficiency and quality of our system onlarge statistic datasets (hundreds of millions of facts), including the complete INSEE (French)and Eurostat (European Union) datasets, as well as French presidential election debates
    corecore