83 research outputs found
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
Enriching open-world knowledge graphs with expressive negative statements
Machine knowledge about entities and their relationships has been a long-standing goal for AI researchers. Over the last 15 years, thousands of public knowledge graphs have been automatically constructed from various web sources. They are crucial for use cases such as search engines. Yet, existing web-scale knowledge graphs focus on collecting positive statements, and store very little to no negatives. Due to their incompleteness, the truth of absent information remains unknown, which compromises the usability of the knowledge graph. In this dissertation: First, I make the case for selective materialization of salient negative statements in open-world knowledge graphs. Second, I present our methods to automatically infer them from encyclopedic and commonsense knowledge graphs, by locally inferring closed-world topics from reference comparable entities. I then discuss our evaluation fin-dings on metrics such as correctness and salience. Finally, I conclude with open challenges and future opportunities.Machine knowledge about entities and their relationships has been a long-standing goal for AI researchers. Over the last 15 years, thousands of public knowledge graphs have been automatically constructed from various web sources. They are crucial for use cases such as search engines. Yet, existing web-scale knowledge graphs focus on collecting positive statements, and store very little to no negatives. Due to their incompleteness, the truth of absent information remains unknown, which compromises the usability of the knowledge graph. In this dissertation: First, I make the case for selective materialization of salient negative statements in open-world knowledge graphs. Second, I present our methods to automatically infer them from encyclopedic and commonsense knowledge graphs, by locally inferring closed-world topics from reference comparable entities. I then discuss our evaluation fin-dings on metrics such as correctness and salience. Finally, I conclude with open challenges and future opportunities.Wissensgraphen über Entitäten und ihre Attribute sind eine wichtige Komponente vieler KI-Anwendungen. Wissensgraphen im Webmaßstab speichern fast nur positive Aussagen und übersehen negative Aussagen. Aufgrund der Unvollständigkeit von Open-World-Wissensgraphen werden fehlende Aussagen als unbekannt und nicht als falsch betrachtet. Diese Dissertation plädiert dafür, Wissensgraphen mit informativen Aussagen anzureichern, die nicht gelten, und so ihren Mehrwert für Anwendungen wie die Beantwortung von Fragen und die Zusammenfassung von Entitäten zu verbessern. Mit potenziell Milliarden negativer Aussagen von Kandidaten bewältigen wir vier Hauptherausforderungen. 1. Korrektheit (oder Plausibilität) negativer Aussagen: Unter der Open-World-Annahme (OWA) reicht es nicht aus, zu prüfen, ob ein negativer Kandidat im Wissensgraphen nicht explizit als positiv angegeben ist, da es sich möglicherweise um eine fehlende Aussage handeln kann. Von entscheidender Bedeutung sind Methoden zur Prüfung großer Kandidatengruppen, und zur Beseitigung falsch positiver Ergebnisse. 2. Bedeutung negativer Aussagen: Die Menge korrekter negativer Aussagen ist sehr groß, aber voller trivialer oder unsinniger Aussagen, z. B. “Eine Katze kann keine Daten speichern.”. Es sind Methoden zur Quantifizierung der Aussagekraft von Negativen erforderlich. 3. Abdeckung der Themen: Abhängig von der Datenquelle und den Methoden zum Abrufen von Kandidaten erhalten einige Themen oder Entitäten in demWissensgraphen möglicherweise keine negativen Kandidaten. Methoden müssen die Fähigkeit gewährleisten, Negative über fast jede bestehende Entität zu entdecken. 4. Komplexe negative Aussagen: In manchen Fällen erfordert das Ausdrücken einer Negation mehr als ein Wissensgraphen-Tripel. Beispielsweise ist “Einstein hat keine Ausbildung erhalten” eine inkorrekte Negation, aber “Einstein hat keine Ausbildung an einer US-amerikanischen Universität erhalten” ist korrekt. Es werden Methoden zur Erzeugung komplexer Negationen benötigt. Diese Dissertation geht diese Herausforderungen wie folgt an. 1. Wir plädieren zunächst für die selektive Materialisierung negativer Aussagen über Entitäten in enzyklopädischen (gut kanonisierten) Open-World-Wissensgraphen, und definieren formal drei Arten negativer Aussagen: fundiert, universell abwesend und konditionierte negative Aussagen. Wir stellen die Peer-basierte Negationsinferenz-Methode vor, um Listen hervorstechender Negationen über Entitäten zu erstellen. Die Methode berechnet relevante Peers für eine bestimmte Eingabeentität und verwendet ihre positiven Eigenschaften, um Erwartungen für die Eingabeentität festzulegen. Eine Erwartung, die nicht erfüllt ist, ist ein unmittelbar negativer Kandidat und wird dann anhand von Häufigkeits-, Wichtigkeits- und Unerwartetheitsmetriken bewertet. 2. Wir schlagen die Methode musterbasierte Abfrageprotokollextraktion vor, um hervorstechende Negationen aus umfangreichen Textquellen zu extrahieren. Diese Methode extrahiert hervorstechende Negationen über eine Entität, indem sie große Korpora, z.B., die Anfrageprotokolle von Suchmaschinen, unter Verwendung einiger handgefertigter Muster mit negativen Schlüsselwörtern sammelt. 3. Wir führen die UnCommonsense-Methode ein, um hervorstechende negative Phrasen über alltägliche Konzepte in weniger kanonisierten commonsense-KGs zu generieren. Diese Methode ist für die Negationsinferenz, Prüfung und Einstufung kurzer Phrasen in natürlicher Sprache konzipiert. Sie berechnet vergleichbare Konzepte für ein bestimmtes Zielkonzept, leitet aus dem Vergleich ihrer positiven Kandidaten Negationen ab, und prüft diese Kandidaten im Vergleich zum Wissensgraphen selbst, sowie mit Sprachmodellen (LMs) als externer Wissensquelle. Schließlich werden die Kandidaten mithilfe semantischer Ähnlichkeitserkennungshäufigkeitsmaßen eingestuft. 4. Um die Exploration unserer Methoden und ihrer Ergebnisse zu erleichtern, implementieren wir zwei Prototypensysteme. In Wikinegata wird ein System zur Präsentation der Peer-basierten Methode entwickelt, mit dem Benutzer negative Aussagen über 500K Entitäten aus 11 Klassen untersuchen und verschiedene Parameter der Peer-basierten Inferenzmethode anpassen können. Sie können den Wissensgraphen auch mithilfe einer Suchmaske mit negierten Prädikaten befragen. Im UnCommonsense-System können Benutzer genau prüfen, was die Methode bei jedem Schritt hervorbringt, sowie Negationen zu 8K alltäglichen Konzepten durchsuchen. Darüber hinaus erstellen wir mithilfe der Peer-basierten Negationsinferenzmethode den ersten groß angelegten Datensatz zu Demografie und Ausreißern in Interessengemeinschaften und zeigen dessen Nützlichkeit in Anwendungsfällen wie der Identifizierung unterrepräsentierter Gruppen. 5. Wir veröffentlichen alle in diesen Projekten erstellten Datensätze und Quellcodes unter https://www.mpi-inf.mpg.de/negation-in-kbs und https://www.mpi-inf.mpg.de/Uncommonsense
Окружење за анализу и оцену квалитета великих и повезаних података
Linking and publishing data in the Linked Open Data format increases the interoperability
and discoverability of resources over the Web. To accomplish this, the process comprises
several design decisions, based on the Linked Data principles that, on one hand, recommend to
use standards for the representation and the access to data on the Web, and on the other hand
to set hyperlinks between data from different sources.
Despite the efforts of the World Wide Web Consortium (W3C), being the main international
standards organization for the World Wide Web, there is no one tailored formula for publishing
data as Linked Data. In addition, the quality of the published Linked Open Data (LOD) is a
fundamental issue, and it is yet to be thoroughly managed and considered.
In this doctoral thesis, the main objective is to design and implement a novel framework for
selecting, analyzing, converting, interlinking, and publishing data from diverse sources,
simultaneously paying great attention to quality assessment throughout all steps and modules
of the framework. The goal is to examine whether and to what extent are the Semantic Web
technologies applicable for merging data from different sources and enabling end-users to
obtain additional information that was not available in individual datasets, in addition to the
integration into the Semantic Web community space. Additionally, the Ph.D. thesis intends to
validate the applicability of the process in the specific and demanding use case, i.e. for creating
and publishing an Arabic Linked Drug Dataset, based on open drug datasets from selected
Arabic countries and to discuss the quality issues observed in the linked data life-cycle. To that
end, in this doctoral thesis, a Semantic Data Lake was established in the pharmaceutical domain
that allows further integration and developing different business services on top of the
integrated data sources. Through data representation in an open machine-readable format, the
approach offers an optimum solution for information and data dissemination for building
domain-specific applications, and to enrich and gain value from the original dataset. This thesis
showcases how the pharmaceutical domain benefits from the evolving research trends for
building competitive advantages. However, as it is elaborated in this thesis, a better
understanding of the specifics of the Arabic language is required to extend linked data
technologies utilization in targeted Arabic organizations.Повезивање и објављивање података у формату "Повезани отворени подаци" (енг.
Linked Open Data) повећава интероперабилност и могућности за претраживање ресурса
преко Web-а. Процес је заснован на Linked Data принципима (W3C, 2006) који са једне
стране елаборира стандарде за представљање и приступ подацима на Wебу (RDF, OWL,
SPARQL), а са друге стране, принципи сугеришу коришћење хипервеза између података
из различитих извора.
Упркос напорима W3C конзорцијума (W3C је главна међународна организација за
стандарде за Web-у), не постоји јединствена формула за имплементацију процеса
објављивање података у Linked Data формату. Узимајући у обзир да је квалитет
објављених повезаних отворених података одлучујући за будући развој Web-а, у овој
докторској дисертацији, главни циљ је (1) дизајн и имплементација иновативног оквира
за избор, анализу, конверзију, међусобно повезивање и објављивање података из
различитих извора и (2) анализа примена овог приступа у фармацeутском домену.
Предложена докторска дисертација детаљно истражује питање квалитета великих и
повезаних екосистема података (енг. Linked Data Ecosystems), узимајући у обзир
могућност поновног коришћења отворених података. Рад је мотивисан потребом да се
омогући истраживачима из арапских земаља да употребом семантичких веб технологија
повежу своје податке са отвореним подацима, као нпр. DBpedia-јом. Циљ је да се испита
да ли отворени подаци из Арапских земаља омогућавају крајњим корисницима да добију
додатне информације које нису доступне у појединачним скуповима података, поред
интеграције у семантички Wеб простор.
Докторска дисертација предлаже методологију за развој апликације за рад са
повезаним (Linked) подацима и имплементира софтверско решење које омогућује
претраживање консолидованог скупа података о лековима из изабраних арапских
земаља. Консолидовани скуп података је имплементиран у облику Семантичког језера
података (енг. Semantic Data Lake).
Ова теза показује како фармацеутска индустрија има користи од примене
иновативних технологија и истраживачких трендова из области семантичких
технологија. Међутим, како је елаборирано у овој тези, потребно је боље разумевање
специфичности арапског језика за имплементацију Linked Data алата и њухову примену
са подацима из Арапских земаља
Internet of Things Applications - From Research and Innovation to Market Deployment
The book aims to provide a broad overview of various topics of Internet of Things from the research, innovation and development priorities to enabling technologies, nanoelectronics, cyber physical systems, architecture, interoperability and industrial applications. It is intended to be a standalone book in a series that covers the Internet of Things activities of the IERC – Internet of Things European Research Cluster from technology to international cooperation and the global "state of play".The book builds on the ideas put forward by the European research Cluster on the Internet of Things Strategic Research Agenda and presents global views and state of the art results on the challenges facing the research, development and deployment of IoT at the global level. Internet of Things is creating a revolutionary new paradigm, with opportunities in every industry from Health Care, Pharmaceuticals, Food and Beverage, Agriculture, Computer, Electronics Telecommunications, Automotive, Aeronautics, Transportation Energy and Retail to apply the massive potential of the IoT to achieving real-world solutions. The beneficiaries will include as well semiconductor companies, device and product companies, infrastructure software companies, application software companies, consulting companies, telecommunication and cloud service providers. IoT will create new revenues annually for these stakeholders, and potentially create substantial market share shakeups due to increased technology competition. The IoT will fuel technology innovation by creating the means for machines to communicate many different types of information with one another while contributing in the increased value of information created by the number of interconnections among things and the transformation of the processed information into knowledge shared into the Internet of Everything. The success of IoT depends strongly on enabling technology development, market acceptance and standardization, which provides interoperability, compatibility, reliability, and effective operations on a global scale. The connected devices are part of ecosystems connecting people, processes, data, and things which are communicating in the cloud using the increased storage and computing power and pushing for standardization of communication and metadata. In this context security, privacy, safety, trust have to be address by the product manufacturers through the life cycle of their products from design to the support processes. The IoT developments address the whole IoT spectrum - from devices at the edge to cloud and datacentres on the backend and everything in between, through ecosystems are created by industry, research and application stakeholders that enable real-world use cases to accelerate the Internet of Things and establish open interoperability standards and common architectures for IoT solutions. Enabling technologies such as nanoelectronics, sensors/actuators, cyber-physical systems, intelligent device management, smart gateways, telematics, smart network infrastructure, cloud computing and software technologies will create new products, new services, new interfaces by creating smart environments and smart spaces with applications ranging from Smart Cities, smart transport, buildings, energy, grid, to smart health and life. Technical topics discussed in the book include: • Introduction• Internet of Things Strategic Research and Innovation Agenda• Internet of Things in the industrial context: Time for deployment.• Integration of heterogeneous smart objects, applications and services• Evolution from device to semantic and business interoperability• Software define and virtualization of network resources• Innovation through interoperability and standardisation when everything is connected anytime at anyplace• Dynamic context-aware scalable and trust-based IoT Security, Privacy framework• Federated Cloud service management and the Internet of Things• Internet of Things Application
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
RDF graph validation using rule-based reasoning
The correct functioning of Semantic Web applications requires that given RDF graphs adhere to an expected shape. This shape depends on the RDF graph and the application's supported entailments of that graph. During validation, RDF graphs are assessed against sets of constraints, and found violations help refining the RDF graphs. However, existing validation approaches cannot always explain the root causes of violations (inhibiting refinement), and cannot fully match the entailments supported during validation with those supported by the application. These approaches cannot accurately validate RDF graphs, or combine multiple systems, deteriorating the validator's performance. In this paper, we present an alternative validation approach using rule-based reasoning, capable of fully customizing the used inferencing steps. We compare to existing approaches, and present a formal ground and practical implementation "Validatrr", based on N3Logic and the EYE reasoner. Our approach - supporting an equivalent number of constraint types compared to the state of the art - better explains the root cause of the violations due to the reasoner's generated logical proof, and returns an accurate number of violations due to the customizable inferencing rule set. Performance evaluation shows that Validatrr is performant for smaller datasets, and scales linearly w.r.t. the RDF graph size. The detailed root cause explanations can guide future validation report description specifications, and the fine-grained level of configuration can be employed to support different constraint languages. This foundation allows further research into handling recursion, validating RDF graphs based on their generation description, and providing automatic refinement suggestions
- …