Search CORE

320 research outputs found

Génération automatique d'alignements complexes d'ontologies

Author: Thiéblin Elodie
Publication venue
Publication date: 21/10/2019
Field of study

Le web de données liées (LOD) est composé de nombreux entrepôts de données. Ces données sont décrites par différents vocabulaires (ou ontologies). Chaque ontologie a une terminologie et une modélisation propre ce qui les rend hétérogènes. Pour lier et rendre les données du web de données liées interopérables, les alignements d'ontologies établissent des correspondances entre les entités desdites ontologies. Il existe de nombreux systèmes d'alignement qui génèrent des correspondances simples, i.e., ils lient une entité à une autre entité. Toutefois, pour surmonter l'hétérogénéité des ontologies, des correspondances plus expressives sont parfois nécessaires. Trouver ce genre de correspondances est un travail fastidieux qu'il convient d'automatiser. Dans le cadre de cette thèse, une approche d'alignement complexe basée sur des besoins utilisateurs et des instances communes est proposée. Le domaine des alignements complexes est relativement récent et peu de travaux adressent la problématique de leur évaluation. Pour pallier ce manque, un système d'évaluation automatique basé sur de la comparaison d'instances est proposé. Ce système est complété par un jeu de données artificiel sur le domaine des conférences.The Linked Open Data (LOD) cloud is composed of data repositories. The data in the repositories are described by vocabularies also called ontologies. Each ontology has its own terminology and model. This leads to heterogeneity between them. To make the ontologies and the data they describe interoperable, ontology alignments establish correspondences, or links between their entities. There are many ontology matching systems which generate simple alignments, i.e., they link an entity to another. However, to overcome the ontology heterogeneity, more expressive correspondences are sometimes needed. Finding this kind of correspondence is a fastidious task that can be automated. In this thesis, an automatic complex matching approach based on a user's knowledge needs and common instances is proposed. The complex alignment field is still growing and little work address the evaluation of such alignments. To palliate this lack, we propose an automatic complex alignment evaluation system. This system is based on instances. A famous alignment evaluation dataset has been extended for this evaluation

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

Alignment Distance of Regular Tree Languages

Author: Han Yo-Sub
Ko Sang-Ki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

University of Liverpool Repository

Business Intelligence on Non-Conventional Data

Author: Gallinucci Enrico <1988>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 15/05/2017
Field of study

The revolution in digital communications witnessed over the last decade had a significant impact on the world of Business Intelligence (BI). In the big data era, the amount and diversity of data that can be collected and analyzed for the decision-making process transcends the restricted and structured set of internal data that BI systems are conventionally limited to. This thesis investigates the unique challenges imposed by three specific categories of non-conventional data: social data, linked data and schemaless data. Social data comprises the user-generated contents published through websites and social media, which can provide a fresh and timely perception about people’s tastes and opinions. In Social BI (SBI), the analysis focuses on topics, meant as specific concepts of interest within the subject area. In this context, this thesis proposes meta-star, an alternative strategy to the traditional star-schema for modeling hierarchies of topics to enable OLAP analyses. The thesis also presents an architectural framework of a real SBI project and a cross-disciplinary benchmark for SBI. Linked data employ the Resource Description Framework (RDF) to provide a public network of interlinked, structured, cross-domain knowledge. In this context, this thesis proposes an interactive and collaborative approach to build aggregation hierarchies from linked data. Schemaless data refers to the storage of data in NoSQL databases that do not force a predefined schema, but let database instances embed their own local schemata. In this context, this thesis proposes an approach to determine the schema profile of a document-based database; the goal is to facilitate users in a schema-on-read analysis process by understanding the rules that drove the usage of the different schemata. A final and complementary contribution of this thesis is an innovative technique in the field of recommendation systems to overcome user disorientation in the analysis of a large and heterogeneous wealth of data

AMS Tesi di Dottorato

Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web

Author: Kummer Robert
Publication venue
Publication date: 01/01/2013
Field of study

The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized. ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery. First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems

Kölner UniversitätsPublikationsServer

Survey over Existing Query and Transformation Languages

Author: Bolzer Oliver
Bry François
Furche Tim
Horrocks Ian
Kraus Michael
Orsini Renzo
Schaffert Sebastian
Publication venue
Publication date: 01/01/2004
Field of study

A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability of many current Semantic Web approaches to cope with data available in such diverging representation formalisms as XML, RDF, or Topic Maps. A common query language is the first step to allow transparent access to data in any of these formats. To further the understanding of the requirements and approaches proposed for query languages in the conventional as well as the Semantic Web, this report surveys a large number of query languages for accessing XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from all these areas. From the detailed survey of these query languages, a common classification scheme is derived that is useful for understanding and differentiating languages within and among all three areas

CiteSeerX

Open Access LMU

Content Based Image Retrieval (CBIR) in Remote Clinical Diagnosis and Healthcare

Author: A.Beach
A.Khokher
C. J. C.Burges
C.Guo
F.Liu
H.Frigui
J. L.Ozer
J.Holler
J.Wang
M. S.Venkatarajan
N.Goel
R.Veltkamp
S. D.Jain
S. K.Saha
S.Akramullah
S.Pattanaik
T.Deselaers
V. S.Murthy
Y. N.Mamatha
Y.Li
Publication venue: 'IGI Global'
Publication date: 01/01/2016
Field of study

Content-Based Image Retrieval (CBIR) locates, retrieves and displays images alike to one given as a query, using a set of features. It demands accessible data in medical archives and from medical equipment, to infer meaning after some processing. A problem similar in some sense to the target image can aid clinicians. CBIR complements text-based retrieval and improves evidence-based diagnosis, administration, teaching, and research in healthcare. It facilitates visual/automatic diagnosis and decision-making in real-time remote consultation/screening, store-and-forward tests, home care assistance and overall patient surveillance. Metrics help comparing visual data and improve diagnostic. Specially designed architectures can benefit from the application scenario. CBIR use calls for file storage standardization, querying procedures, efficient image transmission, realistic databases, global availability, access simplicity, and Internet-based structures. This chapter recommends important and complex aspects required to handle visual content in healthcare.Comment: 28 pages, 6 figures, Book Chapter from "Encyclopedia of E-Health and Telemedicine

arXiv.org e-Print Archive

Crossref

Methodology and process to accompany a virtual geotechnical database for the St. Louis metropolitan area

Author: Onstad Katherine Leigh
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2008
Field of study

The St. Louis metropolitan (STL) area, which spans Missouri and Illinois, currently has no overarching organization of geospatial data (geodata). The two states have their own geodata, using their own individual standards. The development of a virtual geotechnical database (VGDB) encompassing the entire STL area on both sides of the Mississippi River is a solution to the data sharing and standardization problem of this area. The VGDB integrates geodata from disparate sources into a geographic information systems (GIS) database accessible via the internet. The creation of the STL area VGDB consisted of three parts: 1) compiling the geodata; 2) formatting the geodata in GIS and extensible markup language (XML); 3) making the VGDB viewable in an internet browser --Abstract, page iii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Accelerating data retrieval steps in XML documents

Author: Shen Yun
Publication venue
Publication date: 01/01/2005
Field of study

Repository@Hull - Worktribe

Efficient Schema Extraction from a Collection of XML Documents

Author: Parthepan Vijayeandra
Publication venue: TopSCHOLAR®
Publication date: 01/05/2011
Field of study

The eXtensible Markup Language (XML) has become the standard format for data exchange on the Internet, providing interoperability between different business applications. Such wide use results in large volumes of heterogeneous XML data, i.e., XML documents conforming to different schemas. Although schemas are important in many business applications, they are often missing in XML documents. In this thesis, we present a suite of algorithms that are effective in extracting schema information from a large collection of XML documents. We propose using the cost of NFA simulation to compute the Minimum Length Description to rank the inferred schema. We also studied using frequencies of the sample inputs to improve the precision of the schema extraction. Furthermore, we propose an evaluation framework to quantify the quality of the extracted schema. Experimental studies are conducted on various data sets to demonstrate the efficiency and efficacy of our approach

TopSCHOLAR