Search CORE

324 research outputs found

Metadata Extraction from User Queries for Self-Service Data Lake Exploration

Author: Gunklach Jonas
Maedche Alexander
Michalczyk Sven
Nadj Mario
Publication venue: Springer
Publication date: 27/06/2023
Field of study

Knowledge Graphs and Large Language Models for Intelligent Applications in the Tourism Domain

Author: SECCHI LUCA
Publication venue: Università degli Studi di Cagliari
Publication date: 20/02/2024
Field of study

In the current era of big data, the World Wide Web is transitioning from being merely a repository of content to a complex web of data. Two pivotal technologies underpinning this shift are Knowledge Graphs (KGs) and Data Lakes. Concurrently, Artificial Intelligence has emerged as a potent means to leverage data, creating knowledge and pioneering new tools across various sectors. Among these advancements, Large Language Models (LLM) stand out as transformative technologies in many domains. This thesis delves into an integrative exploration, juxtaposing the structured world of KGs and the raw data reservoirs of Data Lakes, together with a focus on harnessing LLM to derive meaningful insights in the domain of tourism. Starting with an exposition on the importance of KGs in the present digital milieu, the thesis delineates the creation and management of KGs that utilize entities and their relations to represent intricate data patterns within the tourism sector. In this context, we introduce a semi-automatic methodology for generating a Tourism Knowledge Graph (TKG) and a novel Tourism Analytics Ontology (TAO). Through integrating information from enterprise data lakes with public knowledge graphs, the thesis illustrates the creation of a comprehensive semantic layer built upon the raw data, demonstrating versatility and scalability. Subsequently, we present an in-depth investigation into transformer-based language models, emphasizing their potential and limitations. Addressing the exigency for domain-specific knowledge enrichment, we conduct a methodical study on knowledge enhancement strategies for transformers based language models. The culmination of this thesis is the presentation of an innovative method that fuses large language models with domain-specific knowledge graphs, targeting the optimisation of hospitality offers. This approach integrates domain KGs with feature engineering, enriching data representation in LLMs. Our scientific contributions span multiple dimensions: from devising methodologies for KG construction, especially in tourism, to the design and implementation of a novel ontology; from the analysis and comparison of techniques for enriching LLMs with specialized knowledge, to deploying such methods in a novel framework that effectively combines LLMs and KGs within the context of the tourism domain. In our research, we explore the potential benefits and challenges arising from the integration of knowledge engineering and artificial intelligence, with a specific emphasis on the tourism sector. We believe our findings offer a promising avenue and serve as a foundational platform for subsequent studies and practical implementations for the academic community and the tourism industry alike

Archivio istituzionale della ricerca - Università di Cagliari

Business intelligence-centered software as the main driver to migrate from spreadsheet-based analytics

Author: Rebelo Tomás de Oliveira Saramago Brito
Publication venue
Publication date: 11/04/2023
Field of study

Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNowadays, companies are handling and managing data in a way that they weren’t ten years ago. The data deluge is, as a mere consequence of that, the constant day-to-day challenge for them - having to create agile and scalable data solutions to tackle this reality. The main trigger of this project was to support the decision-making process of a customer-centered marketing team (called Customer Voice) in the Company X by developing a complete, holistic Business Intelligence solution that goes all the way from ETL processes to data visualizations based on that team’s business needs. Having this context into consideration, the focus of the internship was to make use of BI, ETL techniques to migrate their data stored in spreadsheets — where they performed data analysis — and shift the way they see the data into a more dynamic, sophisticated, and suitable way in order to help them make data-driven strategic decisions. To ensure that there was credibility throughout the development of this project and its subsequent solution, it was necessary to make an exhaustive literature review to help me frame this project in a more realistic and logical way. That being said, this report made use of scientific literature that explained the evolution of the ETL workflows, tools, and limitations across different time periods and generations, how it was transformed from manual to real-time data tasks together with data warehouses, the importance of data quality and, finally, the relevance of ETL processes optimization and new ways of approaching data integrations by using modern, cloud architectures

Repositório da Universidade Nova de Lisboa

Dataset Discovery and Exploration: A Survey

Author: Chen Jiaoyan
Paton Norman
Wu Zhenyu
Publication venue
Publication date: 04/10/2023
Field of study

The University of Manchester - Institutional Repository

Dataset search: a survey

Author: Chapman Adriane
Groth Paul
Ibáñez-Gonzalez Luis-Daniel
Kacprzak Emilia
Koesten Laura
Konstantinidis George
Simperl Elena
Publication venue
Publication date: 03/01/2019
Field of study

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.Comment: 20 pages, 153 reference

arXiv.org e-Print Archive

King's Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Strategies for Managing Linked Enterprise Data

Author: Galkin Mikhail
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Data, information and knowledge become key assets of our 21st century economy. As a result, data and knowledge management become key tasks with regard to sustainable development and business success. Often, knowledge is not explicitly represented residing in the minds of people or scattered among a variety of data sources. Knowledge is inherently associated with semantics that conveys its meaning to a human or machine agent. The Linked Data concept facilitates the semantic integration of heterogeneous data sources. However, we still lack an effective knowledge integration strategy applicable to enterprise scenarios, which balances between large amounts of data stored in legacy information systems and data lakes as well as tailored domain specific ontologies that formally describe real-world concepts. In this thesis we investigate strategies for managing linked enterprise data analyzing how actionable knowledge can be derived from enterprise data leveraging knowledge graphs. Actionable knowledge provides valuable insights, supports decision makers with clear interpretable arguments, and keeps its inference processes explainable. The benefits of employing actionable knowledge and its coherent management strategy span from a holistic semantic representation layer of enterprise data, i.e., representing numerous data sources as one, consistent, and integrated knowledge source, to unified interaction mechanisms with other systems that are able to effectively and efficiently leverage such an actionable knowledge. Several challenges have to be addressed on different conceptual levels pursuing this goal, i.e., means for representing knowledge, semantic data integration of raw data sources and subsequent knowledge extraction, communication interfaces, and implementation. In order to tackle those challenges we present the concept of Enterprise Knowledge Graphs (EKGs), describe their characteristics and advantages compared to existing approaches. We study each challenge with regard to using EKGs and demonstrate their efficiency. In particular, EKGs are able to reduce the semantic data integration effort when processing large-scale heterogeneous datasets. Then, having built a consistent logical integration layer with heterogeneity behind the scenes, EKGs unify query processing and enable effective communication interfaces for other enterprise systems. The achieved results allow us to conclude that strategies for managing linked enterprise data based on EKGs exhibit reasonable performance, comply with enterprise requirements, and ensure integrated data and knowledge management throughout its life cycle

bonndoc – Der Publikationsserver der Universität Bonn

Extracting and Cleaning RDF Data

Author: Farid Mina
Publication venue: 'University of Waterloo'
Publication date: 01/05/2020
Field of study

The RDF data model has become a prevalent format to represent heterogeneous data because of its versatility. The capability of dismantling information from its native formats and representing it in triple format offers a simple yet powerful way of modelling data that is obtained from multiple sources. In addition, the triple format and schema constraints of the RDF model make the RDF data easy to process as labeled, directed graphs. This graph representation of RDF data supports higher-level analytics by enabling querying using different techniques and querying languages, e.g., SPARQL. Anlaytics that require structured data are supported by transforming the graph data on-the-fly to populate the target schema that is needed for downstream analysis. These target schemas are defined by downstream applications according to their information need. The flexibility of RDF data brings two main challenges. First, the extraction of RDF data is a complex task that may involve domain expertise about the information required to be extracted for different applications. Another significant aspect of analyzing RDF data is its quality, which depends on multiple factors including the reliability of data sources and the accuracy of the extraction systems. The quality of the analysis depends mainly on the quality of the underlying data. Therefore, evaluating and improving the quality of RDF data has a direct effect on the correctness of downstream analytics. This work presents multiple approaches related to the extraction and quality evaluation of RDF data. To cope with the large amounts of data that needs to be extracted, we present DSTLR, a scalable framework to extract RDF triples from semi-structured and unstructured data sources. For rare entities that fall on the long tail of information, there may not be enough signals to support high-confidence extraction. Towards this problem, we present an approach to estimate property values for long tail entities. We also present multiple algorithms and approaches that focus on the quality of RDF data. These include discovering quality constraints from RDF data, and utilizing machine learning techniques to repair errors in RDF data

University of Waterloo's Institutional Repository

A Feminist Dialogic Encounter with Refugee Women

Author: Nabukeera Christine
Publication venue
Publication date: 20/09/2016
Field of study

ABSTRACT This study sought to examine (1) how refugee women who have experienced violent displacement manage the resettlement process and negotiate new identities in unfamiliar settings, and (2) explore ways in which social work practice can be involved in refugee womens lives more effectively and sensitively in accordance with feminist dialogism. Although extensive research on violent displacements exists, little is known about the women fleeing sub-Saharan Africa (SSA). Self-reported effects of violent displacement on women and how to address its consequences adequately have yet to be given much attention in social work research. My dissertation pioneers a theorization that builds bridges across knowledge systems by mediating between European and certain aspects of African perspectives to facilitate the resettlement of traumatized and vulnerable refugee women from the Great Lakes region of Africa. Informed by a feminist dialogical approach to qualitative research, my dissertation presents a detailed analysis of ten in-depth, loosely structured interviews with eight participants in addition to my extensive field notes. As power shifted from my voice, the researchers, to the diverse voices of the participants, the process necessitated the adoption of a qualitative approach. I make the case for an approach that views the world as multi-voiced and takes into account participants perspectives, transcends fixed assumptions and embraces points of view that embody collective voices. A feminist dialogical approach to social work research and practice regards the face-to-face encounter as a site of ethical responsibility towards the other. Such a theorization implies that the relationship between self and other underscores responsibility as central to a justice oriented practice. Using Bakhtins concept of otherness and Levinas infinite other, I created dialogic spaces to foreground my ethical responsibility to the other. Notions of the infinite other and otherness allowed me to pay attention to silent voices while acknowledging the limitations of my conceptions and knowledge claims. The proposed approach is well-suited to working with diverse communities that include various underrepresented others such as the African other, woman as other and the refugee other from the Great Lakes region. The methodology can also be used to understand peculiar experiences of displacement and identity reconstruction for women who fled other non-European conflict zones, particularly sub-Saharan Africa, which shares many characteristics with the Great Lakes region.ABSTRACT This study sought to examine (1) how refugee women who have experienced violent displacement manage the resettlement process and negotiate new identities in unfamiliar settings, and (2) explore ways in which social work practice can be involved in refugee womens lives more effectively and sensitively in accordance with feminist dialogism. Although extensive research on violent displacements exists, little is known about the women fleeing sub-Saharan Africa (SSA). Self-reported effects of violent displacement on women and how to address its consequences adequately have yet to be given much attention in social work research. My dissertation pioneers a theorization that builds bridges across knowledge systems by mediating between European and certain aspects of African perspectives to facilitate the resettlement of traumatized and vulnerable refugee women from the Great Lakes region of Africa. Informed by a feminist dialogical approach to qualitative research, my dissertation presents a detailed analysis of ten in-depth, loosely structured interviews with eight participants in addition to my extensive field notes. As power shifted from my voice, the researchers, to the diverse voices of the participants, the process necessitated the adoption of a qualitative approach. I make the case for an approach that views the world as multi-voiced and takes into account participants perspectives, transcends fixed assumptions and embraces points of view that embody collective voices. A feminist dialogical approach to social work research and practice regards the face-to-face encounter as a site of ethical responsibility towards the other. Such a theorization implies that the relationship between self and other underscores responsibility as central to a justice oriented practice. Using Bakhtins concept of otherness and Levinas infinite other, I created dialogic spaces to foreground my ethical responsibility to the other. Notions of the infinite other and otherness allowed me to pay attention to silent voices while acknowledging the limitations of my conceptions and knowledge claims. The proposed approach is well-suited to working with diverse communities that include various underrepresented others such as the African other, woman as other and the refugee other from the Great Lakes region. The methodology can also be used to understand peculiar experiences of displacement and identity reconstruction for women who fled other non-European conflict zones, particularly sub-Saharan Africa, which shares many characteristics with the Great Lakes region

YorkSpace

BIG DATA AND ANALYTICS AS A NEW FRONTIER OF ENTERPRISE DATA MANAGEMENT

Author: FADLER Martin
Publication venue: Université de Lausanne, Faculté des hautes études commerciales
Publication date: 01/01/2021
Field of study

Big Data and Analytics (BDA) promises significant value generation opportunities across industries. Even though companies increase their investments, their BDA initiatives fall short of expectations and they struggle to guarantee a return on investments. In order to create business value from BDA, companies must build and extend their data-related capabilities. While BDA literature has emphasized the capabilities needed to analyze the increasing volumes of data from heterogeneous sources, EDM researchers have suggested organizational capabilities to improve data quality. However, to date, little is known how companies actually orchestrate the allocated resources, especially regarding the quality and use of data to create value from BDA. Considering these gaps, this thesis – through five interrelated essays – investigates how companies adapt their EDM capabilities to create additional business value from BDA. The first essay lays the foundation of the thesis by investigating how companies extend their Business Intelligence and Analytics (BI&A) capabilities to build more comprehensive enterprise analytics platforms. The second and third essays contribute to fundamental reflections on how organizations are changing and designing data governance in the context of BDA. The fourth and fifth essays look at how companies provide high quality data to an increasing number of users with innovative EDM tools, that are, machine learning (ML) and enterprise data catalogs (EDC). The thesis outcomes show that BDA has profound implications on EDM practices. In the past, operational data processing and analytical data processing were two “worlds” that were managed separately from each other. With BDA, these "worlds" are becoming increasingly interdependent and organizations must manage the lifecycles of data and analytics products in close coordination. Also, with BDA, data have become the long-expected, strategically relevant resource. As such data must now be viewed as a distinct value driver separate from IT as it requires specific mechanisms to foster value creation from BDA. BDA thus extends data governance goals: in addition to data quality and regulatory compliance, governance should facilitate data use by broadening data availability and enabling data monetization. Accordingly, companies establish comprehensive data governance designs including structural, procedural, and relational mechanisms to enable a broad network of employees to work with data. Existing EDM practices therefore need to be rethought to meet the emerging BDA requirements. While ML is a promising solution to improve data quality in a scalable and adaptable way, EDCs help companies democratize data to a broader range of employees

Serveur académique lausannois

New Data Formats and Interface Frameworks for Twinscan System Interfaces

Author: Daneshpoor Saeed
Publication venue: Technische Universiteit Eindhoven
Publication date: 11/10/2023
Field of study

Pure OAI Repository