122 research outputs found

    Property Label Stability in Wikidata

    Get PDF
    International audienceStability in Wikidata's schema is essential for the reuse of its data. In this paper, we analyze the stability of the data based on the changes in labels of properties in six languages. We find that the schema is overall stable, making it a reliable resource for external usage

    Discovering Implicational Knowledge in Wikidata

    Full text link
    Knowledge graphs have recently become the state-of-the-art tool for representing the diverse and complex knowledge of the world. Examples include the proprietary knowledge graphs of companies such as Google, Facebook, IBM, or Microsoft, but also freely available ones such as YAGO, DBpedia, and Wikidata. A distinguishing feature of Wikidata is that the knowledge is collaboratively edited and curated. While this greatly enhances the scope of Wikidata, it also makes it impossible for a single individual to grasp complex connections between properties or understand the global impact of edits in the graph. We apply Formal Concept Analysis to efficiently identify comprehensible implications that are implicitly present in the data. Although the complex structure of data modelling in Wikidata is not amenable to a direct approach, we overcome this limitation by extracting contextual representations of parts of Wikidata in a systematic fashion. We demonstrate the practical feasibility of our approach through several experiments and show that the results may lead to the discovery of interesting implicational knowledge. Besides providing a method for obtaining large real-world data sets for FCA, we sketch potential applications in offering semantic assistance for editing and curating Wikidata

    What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

    Full text link
    Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this work, we introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type, and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose KGist, Knowledge Graph Inductive SummarizaTion, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle---a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that KGist outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93% of missing entities---over 10% more than baselines), while also being efficient for large knowledge graphs.Comment: 10 pages, plus 2 pages of references. 5 figures. Accepted at The Web Conference 202

    Non-parametric class completeness estimators for collaborative knowledge graphs — the case of wikidata

    Get PDF
    Collaborative Knowledge Graph platforms allow humans and automated scripts to collaborate in creating, updating and interlinking entities and facts. To ensure both the completeness of the data as well as a uniform coverage of the different topics, it is crucial to identify underrepresented classes in the Knowledge Graph. In this paper, we tackle this problem by developing statistical techniques for class cardinality estimation in collaborative Knowledge Graph platforms. Our method is able to estimate the completeness of a class—as defined by a schema or ontology—hence can be used to answer questions such as “Does the knowledge base have a complete list of all {Beer Brands—Volcanos—Video Game Consoles}?” As a use-case, we focus on Wikidata, which poses unique challenges in terms of the size of its ontology, the number of users actively populating its graph, and its extremely dynamic nature. Our techniques are derived from species estimation and data-management methodologies, and are applied to the case of graphs and collaborative editing. In our empirical evaluation, we observe that i) the number and frequency of unique class instances drastically influence the performance of an estimator, ii) bursts of inserts cause some estimators to overestimate the true size of the class if they are not properly handled, and iii) one can effectively measure the convergence of a class towards its true size by considering the stability of an estimator against the number of available instances

    Height growth curves from satellite images and national forest inventory measurements for various tree species

    No full text
    <p>Data corresponding to height growth curves from satellite images and national forest inventory measurements for various tree species. The heights are in meters and the ages in years.</p><p>Growth_curve_NFI corresponds to French National Forest Inventory data (IGN - National Forest Inventory of France Raw Data, Annual Campaigns from 2005 onwards, IGN – Inventaire forestier national français, Accessed 1 June 2023)</p><p>Growth_curve_EO gives the 5th, 25th, 50th, 75th, and 95th percentile of height per years. The tree height, age, and tree species information wee retried from satellite observations. </p><p>Schwartz M, Ciais P, De Truchis A, et al (2023) FORMS: Forest Multiple Source height, wood volume, and biomass maps in France at 10 to 30 m resolution based on Sentinel-1, Sentinel-2, and GEDI data with a deep learning approach. Earth Syst Sci Data Discuss 1–28. https://doi.org/10.5194/essd-2023-196, </p><p>Senf C, Seidl R (2021) Mapping the forest disturbance regimes of Europe. Nat Sustain 4:63–70. https://doi.org/10.1038/s41893-020-00609-y</p><p>IGN (2018b) BD Foret® version 2, Accessed 5 May 2023</p&gt

    Discrimination cinétique de marqueurs fluorescents réversiblement photocommutables

    No full text
    Multiplexing, i.e. simultaneously imaging tens of chemical species in a cell, is a challenge for quantitative biology. To this aim, two imaging protocols, LIGHTNING and HIGHLIGHT, exploiting the kinetics of rich photocycles including photochemical and thermal steps are introduced and applied to reversibly photoswitchable fluorescent proteins (RSFPs). In LIGHTNING, four characteristic times defining the kinetic signature of an RSFP are extracted from fluorescence evolution in four constant illumination regimes granting access to four independent dynamics. Reduced chemical mechanisms derived for the four illumination regimes qualitatively account for fluorescence evolution. Two RSFPs can be distinguished provided that the distance between their 4-D kinetic signature is larger than a cutoff distance related to experimental accuracy. 20 RSFPs out of the 22 investigated RSFPs are discriminated. In HIGHLIGHT, reversibly photoactivatable fluorophores are submitted to sine-wave illumination. At each harmonics, either the in-phase or the quadrature Fourier amplitude of fluorescence oscillations exhibits a resonance in the space of control parameters formed by the excitation frequency and the mean light intensities. Resonance conditions relating the control parameters and parameters characterizing kinetics are made explicit. Tuning the control parameters to target a given fluorophore optimizes its fluorescence response and nearly eliminates the responses of fluorophores with different kinetic properties. Both protocols have complementary merits. LIGHTNING has a larger discriminatory power and HIGHLIGHT provides better quality images due to lock-inamplification.Le multiplexage ou imagerie simultanée de dizaines d’espèces dans une cellule, est un défi pour la biologie quantitative. Deux protocoles d’imagerie, LIGHTNING et HIGHLIGHT, exploitant la cinétique riche de photocycles comprenant des étapes photochimiques et thermiques, ont été conçus et appliqués à des protéines fluorescentes réversiblement photocommutables (RSFP). Dans LIGHTNING, 4 temps caractéristiques définissant la signature cinétique d’une RSFP sont extraits de l’évolution de la fluorescence pour 4 conditions d’illumination constante donnant accès à 4 dynamiques indépendantes. Des mécanismes chimiques réduits sont dérivés pour les 4 conditions d’illumination. Deux RSFP sont distinguées si la distance entre leurs signatures cinétiques est plus grande qu’une distance de coupure liée à la précision expérimentale. 20 RSFP parmi les 22 étudiées sont discriminées. Dans HIGHLIGHT, des fluorophores réversiblement photocommutables sont soumis à un éclairement sinusoı̈dal. A chaque harmonique, l’amplitude de Fourier en phase ou en quadrature des oscillations de fluorescence présente une résonance dans l’espace des paramètres de contrôle, fréquence d’excitation et intensités lumineuses moyennes. Des conditions de résonance reliant les paramètres de contrôle et les paramètres caractérisant la cinétique sont explicitées. Le choix des paramètres de contrôle permet d’optimiser la réponse d’un fluorophore tout en éliminant celle des fluorophores de cinétiques différentes. Les deux protocoles ont des mérites complémentaires. LIGHTNING a un pouvoir discriminant plus important et HIGHLIGHT fournit des images de meilleure qualité grâce à une détection synchrone

    Maintenance des bases de connaissances à l’aide de contraintes

    No full text
    Knowledge bases are huge collections of primarily encyclopedic facts.They are widely used in entity recognition, structured search, question answering, and other tasks.These knowledge bases have to be curated, and this is a crucial but costly task.In this thesis, we are concerned with curating knowledge bases automatically using constraints.Our first contribution aims at discovering constraints automatically. We improve standard rule mining approaches by using (in-)completeness meta-information. We show that this information can increase the quality of the learned rules significantly. Our second contribution is the creation of a knowledge base, YAGO 4, where we statically enforce a set of constraints by removing the facts that do not comply with them. Our last contribution is a method to correct constraint violations automatically.Our method uses the edit history of the knowledge base to see how users corrected violations in the past, in order to propose corrections for the present.Les bases de connaissances sont des ensembles de faits, souvent sur des sujets encyclopédiques.Elles sont souvent utilisées pour la reconnaissance d'entités nommées, la recherche structurée, la réponse automatique à des questions, etc. Ces bases de connaissances doivent être maintenues, ce qui est une tâche cruciale mais coûteuse. Le sujet de cette thèse est la maintenance automatique de bases de connaissances à l'aide de contraintes. La première contribution de cette thèse est à propos de la découverte automatique de contraintes. Elle améliore les approches classiques d'apprentissage de règles en utilisant des méta-informations de complétude des données. Elle montre que que ces informations permettent d'améliorer de manière significative la qualité des règles trouvées. La seconde contribution est la création d'une base de connaissance, YAGO 4, qui assure le respect d'une série de contraintes en supprimant les faits qui n'y correspondent pas. La troisième contribution est une méthode pour corriger automatiquement les violations de contraintes.Cette méthode utilise l'historique des modifications de la base de connaissance afin de proposer des corrections, ceci à partir de la manière avec laquelle les utilisateurs de la base de connaissance ont déjà corrigé des violations similaires

    The Labour Theory of Value and Social Justice. The Teachings of Social Catholic Criticisms of Bastiat's Doctrine

    No full text
    Social Catholic criticisms of Frédéric Bastiats thinking, notably Charles Périns, clarify the link between the labour theory of value and the demands for social justice. Claiming that Bastiats theory of value rests on a sophism, Périn rejects his view that competition is the solution to the social question. Contrary to Bastiat, indeed, he accepts the labor theory of value and apparently makes it a standard of justice: according to him, rents sanction an injustice. Social Catholics, particularly René de La Tour du Pin, follow in his tracks, insisting that a specific course must be taken so as to set right the injustice of rents the course of what will be called furtherly social justice.Les critiques portées par les catholiques sociaux à la doctrine de Frédéric Bastiat, notamment celles de Charles Périn, permettent de comprendre le lien qui unit la théorie de la valeur travail et lexigence de justice sociale. Périn affirme que la théorie de la valeur de Bastiat repose sur un sophisme et repousse donc sa réponse à la question sociale, la concurrence. Il a adopté en effet, au contraire de Bastiat, la théorie de la valeur travail et la érigée, semble-t-il, en critère de justice : la rente consacre, à ses yeux, une injustice. A sa suite, les catholiques sociaux, exemplairement, René de La tour du Pin, affirmeront quil faut mener une action spécifique pour corriger linjustice de la rente et lappelleront justice sociale.
    corecore