19 research outputs found
Building Semantic Knowledge Graphs from (Semi-)Structured Data: A Review
Knowledge graphs have, for the past decade, been a hot topic both in public and private domains, typically used for large-scale integration and analysis of data using graph-based data models. One of the central concepts in this area is the Semantic Web, with the vision of providing a well-defined meaning to information and services on the Web through a set of standards. Particularly, linked data and ontologies have been quite essential for data sharing, discovery, integration, and reuse. In this paper, we provide a systematic literature review on knowledge graph creation from structured and semi-structured data sources using Semantic Web technologies. The review takes into account four prominent publication venues, namely, Extended Semantic Web Conference, International Semantic Web Conference, Journal of Web Semantics, and Semantic Web Journal. The review highlights the tools, methods, types of data sources, ontologies, and publication methods, together with the challenges, limitations, and lessons learned in the knowledge graph creation processes.publishedVersio
An agent-based approach to customer crowd-shipping
Thesis (MEng)--Stellenbosch University, 2022.ENGLISH SUMMARY: The challenge of effective last-mile deliveries is progressively becoming more important with the
acceleration in the e-commerce industry that is accompanied by a growing number of doorstep
deliveries. Crowd logistics provides innovative solutions whereby ordinary people become in- volved
in the execution of logistics operations. A particular crowd logistics initiative, referred to as
customer crowd-shipping, recently gained interest from researchers after initial implemen- tations
thereof had been performed by companies such as Walmart and Amazon. The approach involves the use
of a retailer’s in-store customers, in addition to regular delivery vehicles, for delivering orders
to online customers. Such in-store customers, referred to as occasional drivers, are offered
incentives to deliver orders on their way home after visiting the retailer.
In this thesis, an agent-based simulation model is proposed for studying the highly dynamic working
of the customer crowd-shipping initiative. The model encompasses a traditional last- mile delivery
system, complemented by the ability to utilise autonomous occasional drivers. The modelled
traditional last-mile delivery system consists of a dedicated fleet of delivery vehicles serving
online customers from a single depot. The execution of deliveries is formulated as a vehicle
routing problem and subsequently solved by means of well-known vehicle routing heuristics. In
addition, the occasional drivers are modelled as autonomous agents who have the ability to act
outside of the control of the retailer. Rather than being assigned to particular orders, occasional
drivers are presented with potential orders from which they may select an order suitable for them
to deliver. Their decision to participate is modelled based on self- interest, where an occasional
driver agent aims to maximise the difference between the incentive offered and his or her perceived
value of the additional time required to deliver the order.
An integrated approach to customer crowd-shipping is developed in order to consider the benefits
for both the retailer and occasional drivers. This includes an incentive scheme and a method for
identifying online customers as candidates for crowd-shipping. The latter involves the dynamic
calculation of the company’s cost to serve an individual customer, which is determined for all
online customers. Finally, user-friendly access to the agent-based simulation model is facilitated
by a graphical user interface.
The proposed model is subjected to systematic verification, ensuring the correct functioning and
integration of its subcomponents. Moreover, the model is evaluated under various operating
conditions to gain a deeper understanding of the crowd-shipping initiative, while simultaneously
validating the model as adequate. In particular, parameter variation, sensitivity analyses, and
scenario analyses are conducted, followed by face validation by subject matter experts.
The results of the various analyses indicate that customer crowd-shipping may successfully function
as an extension to an existing last-mile delivery system, with the potential of reducing both the
total delivery cost and customer waiting time. These benefits are, however, shown to be influenced
by the incentive scheme and the strategy by which online customers are se- lected as crowd-shipping
candidates. Finally, it is deduced that the maturity of the customer crowd-shipping system and the
occasional population’s perceived value of time influence the performance of the customer crowd-shipping model.AFRIKAANS OPSOMMING: Die uitdaging van doeltreffende laaste-myl aflewerings word geleidelik belangriker met die versnelling
in die e-handelsbedryf wat gepaard gaan met ’n groeiende aantal voorstoepaflewerings.
Skare-logistiek bied innoverende oplossings waardeur gewone mense betrokke raak by die uitvoering
van logistieke bedrywighede. ’n Sekere skare-logistieke inisiatief, waarna verwys word
as kli¨ente-skareversending, het onlangs belangstelling by navorsers ontlok nadat aanvanklike
implementering daarvan deur maatskappye soos Walmart en Amazon plaasgevind het. Die
benadering behels die gebruik van ’n kleinhandelaar se in-winkel kli¨ente, benewens normale
afleweringsvoertuie, om bestellings by aanlynkli¨ente af te lewer. Sulke in-winkel kli¨ente, na wie
daar ook verwys word as geleentheidsbestuurders, word aansporings gebied om bestellings op
pad huis toe af te lewer nadat hulle die kleinhandelaar besoek het.
In hierdie tesis word ’n agent-gebaseerde simulasiemodel voorgestel vir die bestudering van
die hoogs-dinamiese werking van die kli¨ente-skareversendingsinisiatief. Die model sluit ’n tradisionele
laaste-myl afleweringstelsel in, aangevul deur die mootlikheid om outonome geleentheidsbestuurders
te gebruik. Die gemodelleerde tradisionele laaste-myl afleweringstelsel bestaan uit
’n toegewyde vloot afleweringsvoertuie wat aanlynkli¨ente vanaf ’n enkele depot bedien. Die uitvoering
van aflewerings word as ’n voertuig-roeteringsprobleem geformuleer en vervolgens deur
middel van bekende voertuig-roeteringsheuristieke opgelos. Daarbenewens word die geleentheidsbestuurders
as outonome agente gemodelleer wat oor die vermo¨e beskik om buite die beheer van
die kleinhandelaar op te tree. Eerder as om aan spesifieke bestellings toegewys te word, word
geleentheidsbestuurders potensi¨ele bestellings aangebied waaruit hulle een kan kies wat geskik
is om deur hulle afgelewer te word. Hul besluit om deel te neem berus op eiebelang, waar ’n
geleentheidsbestuurder-agent poog om die verskil tussen die aansporing wat aangebied word en
sy of haar waargenome waarde van die bykomende tyd wat benodig word om die bestelling af
te lewer, te maksimeer.
’n Ge¨ıntegreerde benadering tot kli¨ente-skareversending word ontwikkel om die voordele vir beide
die kleinhandelaar en geleentheidsbestuurders te oorweeg. Dit sluit ’n aansporingskema in sowel
as ’n metode om aanlynkli¨ente as kandidate vir skareversending te identifiseer. Laasgenoemde
behels die dinamiese berekening van die maatskappy se koste om ’n individuele kli¨ent te bedien,
wat vir alle aanlynkli¨ente bepaal word. Laastens word gebruikersvriendelike toegang tot die
agent-gebaseerde simulasiemodel deur ’n grafiese gebruikerskoppelvlak moontlik gemaak.
Die voorgestelde model word aan sistematiese verifikasie onderwerp, wat die korrekte funksionering
en integrasie van die deelkomponente daarvan verseker. Boonop word die model onder
verskeie bedryfstoestande ge¨evalueer om ’n dieper begrip van die kli¨ente-skareversendingsinisiatief
te verkry, terwyl die model terselfdertyd as voldoende bekragtig word. In die besonder
word parametervariasie, sensitiwiteitsanalises en scenario-ontledings uitgevoer, gevolg deur sigvalidering
deur vakkundiges.
Die resultate van die verskillende ontledings dui daarop dat kli¨ente-skareversending suksesvol as
’n uitbreiding van ’n bestaande laaste-myl afleweringstelsel kan funksioneer, met die potensiaal
om beide die totale afleweringskoste en kli¨entewagtyd te verminder. Daar word egter getoon dat hierdie voordele be¨ınvloed word deur die aansporingskema en die strategie waardeur aanlynkli
¨ente as skareversendingkandidate gekies word. Laastens word afgelei dat die volwassenheid
van die kli¨ent-skareversendingstelsel en die bevolking geleentheidsbestuurders se waargenome
waarde van tyd die prestasie van die kli¨ente-skareversendingsmodel be¨ınvloed.Master
Learning regulatory compliance data for data governance in financial services industry by machine learning models
While regulatory compliance data has been governed in the financial services industry for a long time to identify, assess, remediate and prevent risks, improving data governance (“DG”) has emerged as a new paradigm that uses machine learning models to enhance the level of data management.
In the literature, there is a research gap. Machine learning models have not been extensively applied to DG processes by a) predicting data quality (“DQ”) in supervised learning and taking temporal sequences and correlations of data noise into account in DQ prediction; b) predicting DQ in unsupervised learning and learning the importance of data noise jointly with temporal sequences and correlations of data noise in DQ prediction; c) analyzing DQ prediction at a granular level; d) measuring network run-time saving in DQ prediction; and e) predicting information security compliance levels.
Our main research focus is whether our ML models accurately predict DQ and information security compliance levels during DG processes of financial institutions by learning regulatory compliance data from both theoretical and experimental perspectives.
We propose five machine learning models including a) a DQ prediction sequential learning model in supervised learning; b) a DQ prediction sequential learning model with an attention mechanism in unsupervised learning; c) a DQ prediction analytical model; d) a DQ prediction network efficiency improvement model; and e) an information security compliance prediction model.
Experimental results demonstrate the effectiveness of these models by accurately predicting DQ in supervised learning, precisely predicting DQ in unsupervised learning, analyzing DQ prediction by divergent dimensions such as risk types and business segments, saving significant network run-time in DQ prediction for improving the network efficiency, and accurately predicting information security compliance levels.
Our models strengthen DG capabilities of financial institutions by improving DQ, data risk management, bank-wide risk management, and information security based on regulatory requirements in the financial services industry including Basel Committee on Banking Supervision Standard Number 239, Australia Prudential Regulation Authority (“APRA”) Standard Number CPG 235 and APRA Standard Number CPG 234. These models are part of DG programs under the DG framework of financial institutions
Enabling Human-Robot Collaboration via Holistic Human Perception and Partner-Aware Control
As robotic technology advances, the barriers to the coexistence of humans and robots are slowly coming down. Application domains like elderly care, collaborative manufacturing, collaborative manipulation, etc., are considered the need of the hour, and progress in robotics holds the potential to address many societal challenges. The future socio-technical systems constitute of blended workforce with a symbiotic relationship between human and robot partners working collaboratively. This thesis attempts to address some of the research challenges in enabling human-robot collaboration. In particular, the challenge of a holistic perception of a human partner to continuously communicate his intentions and needs in real-time to a robot partner is crucial for the successful realization of a collaborative task. Towards that end, we present a holistic human perception framework for real-time monitoring of whole-body human motion and dynamics. On the other hand, the challenge of leveraging assistance from a human partner will lead to improved human-robot collaboration. In this direction, we attempt at methodically defining what constitutes assistance from a human partner and propose partner-aware robot control strategies to endow robots with the capacity to meaningfully engage in a collaborative task
Recommended from our members
Cultural Contact in Early Roman Spain through Linked Open Data
The study of the Roman colonisation of the western provinces has produced much literature, especially about the processes of assimilation of Roman culture by indigenous communities and the cultural changes experienced by these under Roman influence. In Spain, traditional scholarship has looked mainly at the literary evidence for these processes, and therefore, the ‘Roman’ perspective of the conquest; current schools of thought argue for a new reading of the cultural processes rooted in theory and a contextualised analysis of archaeological data.
Traditional methods lacked the tools capable of making effective relationships within large amounts of data. Linked Open Data (hereafter LOD) technologies provide the means to resolve this deadlock. In the last decade, a number of projects have made available large amounts of data leading to a burgeoning of resources that rely on LOD technologies. The number of databases collecting information from Hispania is also continuously increasing. While these resources provide a vast amount of material, most of them do not meet open-access requirements, becoming information silos that hinder information accessibility and interoperability.
This research applies LOD technologies to align and connect web-exposed datasets (that follow or can be integrated to follow LOD standards) together with enhanced and aggregated information to investigate the dynamics of cultural interaction in the southern area of Spain between the 4th century BCE and the 1st century CE on the basis of epigraphic, monetary and sculptural evidence. Ultimately, this thesis examines the extent to which the application of LOD technologies can improve the way archaeological information is accessed, retrieved and analysed by means of a LOD dataset (ERUB) and the Cultural Contact Ontology (CuCoO)
Recommended from our members
A knowledge-based framework for information extraction and exploration
Harnessing insights from the colossal amount of online information requires the computerised processing of unstructured text in order to satisfy the information need of particular applications such as recommender systems and sentiment analysis. The increasing availability of online documents that describe domain-specific information provides an opportunity in employing a knowledge-based approach in extracting information from Web data.
In this thesis, a novel comprehensive knowledge-based framework is proposed to construct and exploit a domain-specific semantic knowledgebase. The proposed framework introduces a methodology for linking several components of different techniques and tools. It focuses on providing reusable and configurable data and application templates, which allow developers to apply it in diversity of domains. The objectives of this framework are: extracting information from unstructured data, constructing a semantic knowledgebase from the extracted information, enriching the resultant semantic knowledgebase by sourcing appropriate semi-structured and structured datasets, and consuming the resultant semantic knowledgebase to facilitate the intelligent exploration and search of information. For the purpose of investigating the challenges of extracting and modelling information in a specific domain, the financial domain was employed as a use-case in the context of a stock investment motivating scenario.
The developed knowledge-based approach exploits the semantic and syntactic characteristics of the problem domain knowledge in implementing a hybrid approach of Rule-based and Machine Learning based relation classification. The rule-based approach is adopted in the Natural Language Processing tasks associated with linguistic and structural features, Named Entity Recognition, instances labelling and feature generation processes. The results of these tasks are used to classify the relations between the named entities by employing the Machine Learning based relation classification. In addition, the domain knowledge is analysed to benefit knowledge modelling by translating the domain key concepts into a formal ontology. This ontology is employed in constructing semantic knowledgebase from unstructured online data of a specific domain, enriching the resulting semantic knowledgebase by sourcing semi-structured and structured online data sources and applying advanced classifications and inference technologies to infer new and interesting facts to improve the decision-making and intelligent exploration activities. However, most relations are non-binary in the problem domain knowledge because of its specific characteristic hence an appropriate N-ary relation patterns technique were adopted and investigated.
A serious of a novel experiments were conducted to implement and configure a Machine Learning based relation classification. The experimental evaluation evidenced that the developed knowledge-assisted ML relation classification model, which was further boosted by our implementation of GAs to reduce the feature space, has resulted in significant improvement in the process of relation extraction. The experimental results also indicate that amongst the implemented ML algorithms, SVM exhibited the best relation classification accuracy in the majority of the training datasets, while retaining acceptable levels of accuracy in the rest in the remaining training datasets.
Web Ontology Language (OWL) reasoning and rule-based reasoning on the resultant semantic knowledgebase were applied to derive stock investment specific recommendations. In addition, SPARQL query language was employed to explore the semantic knowledgebase. Moreover, taking into consideration the problem domain's requirements for modelling non-binary relations, a relation-as-class N-ary relations pattern was implemented, and the reasoning axioms and query language were adjusted to fit the intermediate resources in the N-ary relations requirements.
In this thesis also the experience on addressing the challenges of implementing the proposed knowledge-based framework for constructing and exploiting a semantic knowledgebase were summarised. These challenges can be considered by domain experts and knowledge engineers as a novel methodology for employing the Semantic Web Technologies for the knowledge user to intelligently exploit knowledge in similar problem domains.
The evaluation of knowledge accessibility by utilising Semantic Web Technologies in the developed application includes the ability of data retrieval to obtain either the entire or some portion of the data from the semantic knowledgebase for a particular use-case scenario. Investigating the tasks of reasoning, accessing and querying the semantic knowledgebase evidences that Semantic Web Technologies can perform an accurate and complex knowledge representation to share Knowledge from a diversity of data sources and, improve the decision‑making process and the intelligent exploration of the semantic knowledgebase
Matching Startup Founders to Investors: a Tool and a Study
The process of matching startup founders with venture capital investors is a
necessary first step for many modern technology companies, yet there have been
few attempts to study the characteristics of the two parties and their
interactions. Surprisingly little has been shown quantitatively about the
process, and many of the common assumptions are based on anecdotal evidence. In
this thesis, we aim to learn more about the matching component of the startup
fundraising process. We begin with a tool (VCWiz), created from the current set
of best-practices to help inexperienced founders navigate the founder-investor
matching process. The goal of this tool is to increase efficiency and
equitability, while collecting data to inform further studies. We use this
data, combined with public data on venture investments in the USA, to draw
conclusions about the characteristics of venture financing rounds. Finally, we
explore the communication data contributed to the tool by founders who are
actively fundraising, and use it to learn which social attributes are most
beneficial for individuals to possess when soliciting investments.Comment: MIT Master's of Engineering in Computer Science thesis. June 2018.
152 page
Linked data wrapper curation: A platform perspective
131 p.Linked Data Wrappers (LDWs) turn Web APIs into RDF end-points, leveraging the LOD cloud with current data. This potential is frequently undervalued, regarding LDWs as mere by-products of larger endeavors, e.g. developing mashup applications. However, LDWs are mainly data-driven, not contaminated by application semantics, hence with an important potential for reuse. If LDWs could be decoupled from their breakout projects, this would increase the chances of LDWs becoming truly RDF end-points. But this vision is still under threat by LDW fragility upon API upgrades, and the risk of unmaintained LDWs. LDW curation might help. Similar to dataset curation, LDW curation aims to clean up datasets but, in this case, the dataset is implicitly described by the LDW definition, and ¿stains¿ are not limited to those related with the dataset quality but also include those related to the underlying API. This requires the existence of LDW Platforms that leverage existing code repositories with additional functionalities that cater for LDW definition, deployment and curation. This dissertation contributes to this vision through: (1) identifying a set of requirements for LDW Platforms; (2) instantiating these requirements in SYQL, a platform built upon Yahoo's YQL; (3) evaluating SYQL through a fully-developed proof of concept; and (4), validating the extent to which this approach facilitates LDW curation