24 research outputs found
Luzzu - A Framework for Linked Data Quality Assessment
With the increasing adoption and growth of the Linked Open Data cloud [9],
with RDFa, Microformats and other ways of embedding data into ordinary Web
pages, and with initiatives such as schema.org, the Web is currently being
complemented with a Web of Data. Thus, the Web of Data shares many
characteristics with the original Web of Documents, which also varies in
quality. This heterogeneity makes it challenging to determine the quality of
the data published on the Web and to subsequently make this information
explicit to data consumers. The main contribution of this article is LUZZU, a
quality assessment framework for Linked Open Data. Apart from providing quality
metadata and quality problem reports that can be used for data cleaning, LUZZU
is extensible: third party metrics can be easily plugged-in the framework. The
framework does not rely on SPARQL endpoints, and is thus free of all the
problems that come with them, such as query timeouts. Another advantage over
SPARQL based qual- ity assessment frameworks is that metrics implemented in
LUZZU can have more complex functionality than triple matching. Using the
framework, we performed a quality assessment of a number of statistical linked
datasets that are available on the LOD cloud. For this evaluation, 25 metrics
from ten different dimensions were implemented
Linked Data Quality Assessment and its Application to Societal Progress Measurement
In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. LD essentially entails a set of best practices for publishing and connecting structure data on the Web, which allows publish- ing and exchanging information in an interoperable and reusable fashion. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. This is confirmed by the dramatically growing Linked Data Web, where currently more than 50 billion facts are represented.
With the emergence of Web of Linked Data, there are several use cases, which are possible due to the rich and disparate data integrated into one global information space. Linked Data, in these cases, not only assists in building mashups by interlinking heterogeneous and dispersed data from multiple sources but also empowers the uncovering of meaningful and impactful relationships. These discoveries have paved the way for scientists to explore the existing data and uncover meaningful outcomes that they might not have been aware of previously.
In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. There are cases when datasets that contain quality problems, are useful for certain applications, thus depending on the use case at hand. Thus, LD consumption has to deal with the problem of getting the data into a state in which it can be exploited for real use cases. The insufficient data quality can be caused either by the LD publication process or is intrinsic to the data source itself.
A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to οΏΌοΏΌmeasure the accuracy of representing the real-world data. On the document Web, data quality can only be indirectly or vaguely defined, but there is a requirement for more concrete and measurable data quality metrics for LD. Such data quality metrics include correctness of facts wrt. the real-world, adequacy of semantic representation, quality of interlinks, interoperability, timeliness or consistency with regard to implicit information. Even though data quality is an important concept in LD, there are few methodologies proposed to assess the quality of these datasets.
Thus, in this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. The first methodology includes the employment of LD experts for the assessment. This assessment is performed with the help of the TripleCheckMate tool, which was developed specifically to assist LD experts for assessing the quality of a dataset, in this case DBpedia. The second methodology is a semi-automatic process, in which the first phase involves the detection of common quality problems by the automatic creation of an extended schema for DBpedia. The second phase involves the manual verification of the generated schema axioms. Thereafter, we employ the wisdom of the crowds i.e. workers for online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) to assess the quality of DBpedia. We then compare the two approaches (previous assessment by LD experts and assessment by MTurk workers in this study) in order to measure the feasibility of each type of the user-driven data quality assessment methodology.
Additionally, we evaluate another semi-automated methodology for LD quality assessment, which also involves human judgement. In this semi-automated methodology, selected metrics are formally defined and implemented as part of a tool, namely R2RLint. The user is not only provided the results of the assessment but also specific entities that cause the errors, which help users understand the quality issues and thus can fix them. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. In particular, we identify four LD sources, assess their quality using the R2RLint tool and then utilize them in building the Health Economic Research (HER) Observatory. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis
Completeness and Consistency Analysis for Evolving Knowledge Bases
Assessing the quality of an evolving knowledge base is a challenging task as
it often requires to identify correct quality assessment procedures.
Since data is often derived from autonomous, and increasingly large data
sources, it is impractical to manually curate the data, and challenging to
continuously and automatically assess their quality.
In this paper, we explore two main areas of quality assessment related to
evolving knowledge bases: (i) identification of completeness issues using
knowledge base evolution analysis, and (ii) identification of consistency
issues based on integrity constraints, such as minimum and maximum cardinality,
and range constraints.
For completeness analysis, we use data profiling information from consecutive
knowledge base releases to estimate completeness measures that allow predicting
quality issues. Then, we perform consistency checks to validate the results of
the completeness analysis using integrity constraints and learning models.
The approach has been tested both quantitatively and qualitatively by using a
subset of datasets from both DBpedia and 3cixty knowledge bases. The
performance of the approach is evaluated using precision, recall, and F1 score.
From completeness analysis, we observe a 94% precision for the English DBpedia
KB and 95% precision for the 3cixty Nice KB. We also assessed the performance
of our consistency analysis by using five learning models over three sub-tasks,
namely minimum cardinality, maximum cardinality, and range constraint. We
observed that the best performing model in our experimental setup is the Random
Forest, reaching an F1 score greater than 90% for minimum and maximum
cardinality and 84% for range constraints.Comment: Accepted for Journal of Web Semantic
Mitigating linked data quality issues in knowledge-intense information extraction methods
Advances in research areas such as named entity linking and sentiment analysis have triggered the emergence of knowledge-intensive information extraction methods that combine classical information extraction with background knowledge from the Web. Despite data quality concerns, linked data sources such as DBpedia, GeoNames and Wikidata which encode facts in a standardized structured format are particularly attractive for such applications. This paper addresses the problem of data quality by introducing a framework that elaborates on linked data quality issues relevant to different stages of the background knowledge acquisition process, their impact on information extraction performance and applicable mitigation strategies. Applying this framework to named entity linking and data enrichment demonstrates the potential of the introduced mitigation strategies to lessen the impact of different kinds of data quality problems. An industrial use case that aims at the automatic generation of image metadata from image descriptions illustrates the successful deployment of knowledge-intensive information extraction in real-world applications and constraints introduced by data quality concerns
EvaluaciΓ³n del nivel de confianza de los recursos LOD en instancias CKAN
Linked Open Data has been an initiative aimed at offering principles for the interconnection of data through machine-readable structures and knowledge representation schemes. At present, there are platforms that allow consuming LOD resources, being CKAN one of the most relevant on a large community made up of governmental organizations, NGOs, among others. However, the resources consumption lacks minimum criteria to determine their validity such as level of trust, quality, linkage and usability of the data; aspects that require a previous systematic analysis on the set of published data. To support this process of analysis and determination of the mentioned criteria, this paper has as purpose to present a method that allows analyzing the dataset current state obtained from the different instances published in CKAN, with the aim of evaluating the levels of trust that can offer from their sources. Finally, it presents results, conclusions and future work from the use of the tool for the dataset consumption belonging to certain instances ascribed to the CKAN platform.Linked Open Data ha sido una iniciativa orientada a ofrecer una serie de principios para la interconexiΓ³n de datos mediante estructuras legibles por mΓ‘quinas y esquemas de representaciΓ³n de conocimiento. En la actualidad existen plataformas que permiten consumir este tipo de recursos LOD, siendo CKAN una de las mΓ‘s relevantes sobre una gran comunidad conformada por organizaciones gubernamentales, ONGs, entre otras. Sin embargo, el consumo de estos recursos carece de criterios mΓnimos para determinar la validez de los mismos tales como: nivel de confianza, calidad, vinculaciΓ³n y usabilidad de los datos; aspectos que requieren de un anΓ‘lisis sistemΓ‘tico previo sobre el conjunto de datos publicados. Para apoyar este proceso de anΓ‘lisis y determinaciΓ³n de los criterios mencionados, el presente artΓculo tiene como propΓ³sito presentar un mΓ©todo que permita analizar el estado actual de los dataset obtenidos desde las distintas instancias publicadas en CKAN, con el propΓ³sito de evaluar los niveles de confianza que pueden ofrecer desde sus fuentes de origen. Finalmente, presenta resultados, conclusiones y trabajo futuro a partir del uso de la herramienta para el consumo de conjuntos de datos pertenecientes a ciertas instancias adscritas a la plataforma CKAN
Automated Knowledge Base Quality Assessment and Validation based on Evolution Analysis
In recent years, numerous efforts have been put towards sharing Knowledge Bases (KB) in the Linked Open Data (LOD) cloud. These KBs are being used for various tasks, including performing data analytics or building question answering systems. Such KBs evolve continuously: their data (instances) and schemas can be updated, extended, revised and refactored. However, unlike in more controlled types of knowledge bases, the evolution of KBs exposed in the LOD cloud is usually unrestrained, what may cause data to suffer from a variety of quality issues, both at a semantic level and at a pragmatic level. This situation affects negatively data stakeholders β consumers, curators, etc. β. Data quality is commonly related to the perception of the fitness for use, for a certain application or use case. Therefore, ensuring the quality of the data of a knowledge base that evolves is vital. Since data is derived from autonomous, evolving, and increasingly large data providers, it is impractical to do manual data curation, and at the same time, it is very challenging to do a continuous automatic assessment of data quality. Ensuring the quality of a KB is a non-trivial task since they are based on a combination of structured information supported by models, ontologies, and vocabularies, as well as queryable endpoints, links, and mappings. Thus, in this thesis, we explored two main areas in assessing KB quality: (i) quality assessment using KB evolution analysis, and (ii) validation using machine learning models. The evolution of a KB can be analyzed using fine-grained βchangeβ detection at low-level or using βdynamicsβ of a dataset at high-level. In this thesis, we present a novel knowledge base quality assessment approach using evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. However, the first step in building the quality assessment approach was to identify the quality characteristics. Using high-level change detection as measurement functions, in this thesis we present four quality characteristics: Persistency, Historical Persistency, Consistency and Completeness. Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty Nice. However, high-level changes, being coarse-grained, cannot capture all possible quality issues. In this context, we present a validation strategy whose rationale is twofold. First, using manual validation from qualitative analysis to identify causes of quality issues. Then, use RDF data profiling information to generate integrity constraints. The validation approach relies on the idea of inducing RDF shape by exploiting SHALL constraint components. In particular, this approach will learn, what are the integrity constraints that can be applied to a large KB by instructing a process of statistical analysis, which is followed by a learning model. We illustrate the performance of our validation approach by using five learning models over three sub-tasks, namely minimum cardinality, maximum cardinality, and range constraint. The techniques of quality assessment and validation developed during this work are automatic and can be applied to different knowledge bases independently of the domain. Furthermore, the measures are based on simple statistical operations that make the solution both flexible and scalable
Scalable Quality Assessment of Linked Data
In a world where the information economy is booming, poor data quality can lead to adverse consequences, including social and economical problems such as decrease in revenue. Furthermore, data-driven indus- tries are not just relying on their own (proprietary) data silos, but are also continuously aggregating data from different sources. This aggregation could then be re-distributed back to βdata lakesβ. However, this data (including Linked Data) is not necessarily checked for its quality prior to its use. Large volumes of data are being exchanged in a standard and interoperable format between organisations and published as Linked Data to facilitate their re-use. Some organisations, such as government institutions, take a step further and open their data. The Linked Open Data Cloud is a witness to this. However, similar to data in data lakes, it is challenging to determine the quality of this heterogeneous data, and subsequently to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data quality, the current solutions do not aggregate a holistic approach that enables both the assessment of datasets and also provides consumers with quality results that can then be used to find, compare and rank datasetsβ fitness for use. In this thesis we investigate methods to assess the quality of (possibly large) linked datasets with the intent that data consumers can then use the assessment results to find datasets that are fit for use, that is; finding the right dataset for the task at hand. Moreover, the benefits of quality assessment are two-fold: (1) data consumers do not need to blindly rely on subjective measures to choose a dataset, but base their choice on multiple factors such as the intrinsic structure of the dataset, therefore fostering trust and reputation between the publishers and consumers on more objective foundations; and (2) data publishers can be encouraged to improve their datasets so that they can be re-used more. Furthermore, our approach scales for large datasets. In this regard, we also look into improving the efficiency of quality metrics using various approximation techniques. However the trade-off is that consumers will not get the exact quality value, but a very close estimate which anyway provides the required guidance towards fitness for use. The central point of this thesis is not on data quality improvement, nonetheless, we still need to understand what data quality means to the consumers who are searching for potential datasets. This thesis looks into the challenges faced to detect quality problems in linked datasets presenting quality results in a standardised machine-readable and interoperable format for which agents can make sense out of to help human consumers identifying the fitness for use dataset. Our proposed approach is more consumer-centric where it looks into (1) making the assessment of quality as easy as possible, that is, allowing stakeholders, possibly non-experts, to identify and easily define quality metrics and to initiate the assessment; and (2) making results (quality metadata and quality reports) easy for stakeholders to understand, or at least interoperable with other systems to facilitate a possible data quality pipeline. Finally, our framework is used to assess the quality of a number of heterogeneous (large) linked datasets, where each assessment returns a quality metadata graph that can be consumed by agents as Linked Data. In turn, these agents can intelligently interpret a datasetβs quality with regard to multiple dimensions and observations, and thus provide further insight to consumers regarding its fitness for use
ΠΠΊΡΡΠΆΠ΅ΡΠ΅ Π·Π° Π°Π½Π°Π»ΠΈΠ·Ρ ΠΈ ΠΎΡΠ΅Π½Ρ ΠΊΠ²Π°Π»ΠΈΡΠ΅ΡΠ° Π²Π΅Π»ΠΈΠΊΠΈΡ ΠΈ ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ°
Linking and publishing data in the Linked Open Data format increases the interoperability
and discoverability of resources over the Web. To accomplish this, the process comprises
several design decisions, based on the Linked Data principles that, on one hand, recommend to
use standards for the representation and the access to data on the Web, and on the other hand
to set hyperlinks between data from different sources.
Despite the efforts of the World Wide Web Consortium (W3C), being the main international
standards organization for the World Wide Web, there is no one tailored formula for publishing
data as Linked Data. In addition, the quality of the published Linked Open Data (LOD) is a
fundamental issue, and it is yet to be thoroughly managed and considered.
In this doctoral thesis, the main objective is to design and implement a novel framework for
selecting, analyzing, converting, interlinking, and publishing data from diverse sources,
simultaneously paying great attention to quality assessment throughout all steps and modules
of the framework. The goal is to examine whether and to what extent are the Semantic Web
technologies applicable for merging data from different sources and enabling end-users to
obtain additional information that was not available in individual datasets, in addition to the
integration into the Semantic Web community space. Additionally, the Ph.D. thesis intends to
validate the applicability of the process in the specific and demanding use case, i.e. for creating
and publishing an Arabic Linked Drug Dataset, based on open drug datasets from selected
Arabic countries and to discuss the quality issues observed in the linked data life-cycle. To that
end, in this doctoral thesis, a Semantic Data Lake was established in the pharmaceutical domain
that allows further integration and developing different business services on top of the
integrated data sources. Through data representation in an open machine-readable format, the
approach offers an optimum solution for information and data dissemination for building
domain-specific applications, and to enrich and gain value from the original dataset. This thesis
showcases how the pharmaceutical domain benefits from the evolving research trends for
building competitive advantages. However, as it is elaborated in this thesis, a better
understanding of the specifics of the Arabic language is required to extend linked data
technologies utilization in targeted Arabic organizations.ΠΠΎΠ²Π΅Π·ΠΈΠ²Π°ΡΠ΅ ΠΈ ΠΎΠ±ΡΠ°Π²ΡΠΈΠ²Π°ΡΠ΅ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° Ρ ΡΠΎΡΠΌΠ°ΡΡ "ΠΠΎΠ²Π΅Π·Π°Π½ΠΈ ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈ ΠΏΠΎΠ΄Π°ΡΠΈ" (Π΅Π½Π³.
Linked Open Data) ΠΏΠΎΠ²Π΅ΡΠ°Π²Π° ΠΈΠ½ΡΠ΅ΡΠΎΠΏΠ΅ΡΠ°Π±ΠΈΠ»Π½ΠΎΡΡ ΠΈ ΠΌΠΎΠ³ΡΡΠ½ΠΎΡΡΠΈ Π·Π° ΠΏΡΠ΅ΡΡΠ°ΠΆΠΈΠ²Π°ΡΠ΅ ΡΠ΅ΡΡΡΡΠ°
ΠΏΡΠ΅ΠΊΠΎ Web-Π°. ΠΡΠΎΡΠ΅Ρ ΡΠ΅ Π·Π°ΡΠ½ΠΎΠ²Π°Π½ Π½Π° Linked Data ΠΏΡΠΈΠ½ΡΠΈΠΏΠΈΠΌΠ° (W3C, 2006) ΠΊΠΎΡΠΈ ΡΠ° ΡΠ΅Π΄Π½Π΅
ΡΡΡΠ°Π½Π΅ Π΅Π»Π°Π±ΠΎΡΠΈΡΠ° ΡΡΠ°Π½Π΄Π°ΡΠ΄Π΅ Π·Π° ΠΏΡΠ΅Π΄ΡΡΠ°Π²ΡΠ°ΡΠ΅ ΠΈ ΠΏΡΠΈΡΡΡΠΏ ΠΏΠΎΠ΄Π°ΡΠΈΠΌΠ° Π½Π° WΠ΅Π±Ρ (RDF, OWL,
SPARQL), Π° ΡΠ° Π΄ΡΡΠ³Π΅ ΡΡΡΠ°Π½Π΅, ΠΏΡΠΈΠ½ΡΠΈΠΏΠΈ ΡΡΠ³Π΅ΡΠΈΡΡ ΠΊΠΎΡΠΈΡΡΠ΅ΡΠ΅ Ρ
ΠΈΠΏΠ΅ΡΠ²Π΅Π·Π° ΠΈΠ·ΠΌΠ΅ΡΡ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ°
ΠΈΠ· ΡΠ°Π·Π»ΠΈΡΠΈΡΠΈΡ
ΠΈΠ·Π²ΠΎΡΠ°.
Π£ΠΏΡΠΊΠΎΡ Π½Π°ΠΏΠΎΡΠΈΠΌΠ° W3C ΠΊΠΎΠ½Π·ΠΎΡΡΠΈΡΡΠΌΠ° (W3C ΡΠ΅ Π³Π»Π°Π²Π½Π° ΠΌΠ΅ΡΡΠ½Π°ΡΠΎΠ΄Π½Π° ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΡΠ° Π·Π°
ΡΡΠ°Π½Π΄Π°ΡΠ΄Π΅ Π·Π° Web-Ρ), Π½Π΅ ΠΏΠΎΡΡΠΎΡΠΈ ΡΠ΅Π΄ΠΈΠ½ΡΡΠ²Π΅Π½Π° ΡΠΎΡΠΌΡΠ»Π° Π·Π° ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠ°ΡΠΈΡΡ ΠΏΡΠΎΡΠ΅ΡΠ°
ΠΎΠ±ΡΠ°Π²ΡΠΈΠ²Π°ΡΠ΅ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° Ρ Linked Data ΡΠΎΡΠΌΠ°ΡΡ. Π£Π·ΠΈΠΌΠ°ΡΡΡΠΈ Ρ ΠΎΠ±Π·ΠΈΡ Π΄Π° ΡΠ΅ ΠΊΠ²Π°Π»ΠΈΡΠ΅Ρ
ΠΎΠ±ΡΠ°Π²ΡΠ΅Π½ΠΈΡ
ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ
ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈΡ
ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΠΎΠ΄Π»ΡΡΡΡΡΡΠΈ Π·Π° Π±ΡΠ΄ΡΡΠΈ ΡΠ°Π·Π²ΠΎΡ Web-Π°, Ρ ΠΎΠ²ΠΎΡ
Π΄ΠΎΠΊΡΠΎΡΡΠΊΠΎΡ Π΄ΠΈΡΠ΅ΡΡΠ°ΡΠΈΡΠΈ, Π³Π»Π°Π²Π½ΠΈ ΡΠΈΡ ΡΠ΅ (1) Π΄ΠΈΠ·Π°ΡΠ½ ΠΈ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠ°ΡΠΈΡΠ° ΠΈΠ½ΠΎΠ²Π°ΡΠΈΠ²Π½ΠΎΠ³ ΠΎΠΊΠ²ΠΈΡΠ°
Π·Π° ΠΈΠ·Π±ΠΎΡ, Π°Π½Π°Π»ΠΈΠ·Ρ, ΠΊΠΎΠ½Π²Π΅ΡΠ·ΠΈΡΡ, ΠΌΠ΅ΡΡΡΠΎΠ±Π½ΠΎ ΠΏΠΎΠ²Π΅Π·ΠΈΠ²Π°ΡΠ΅ ΠΈ ΠΎΠ±ΡΠ°Π²ΡΠΈΠ²Π°ΡΠ΅ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΠΈΠ·
ΡΠ°Π·Π»ΠΈΡΠΈΡΠΈΡ
ΠΈΠ·Π²ΠΎΡΠ° ΠΈ (2) Π°Π½Π°Π»ΠΈΠ·Π° ΠΏΡΠΈΠΌΠ΅Π½Π° ΠΎΠ²ΠΎΠ³ ΠΏΡΠΈΡΡΡΠΏΠ° Ρ ΡΠ°ΡΠΌΠ°ΡeΡΡΡΠΊΠΎΠΌ Π΄ΠΎΠΌΠ΅Π½Ρ.
ΠΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π° Π΄ΠΎΠΊΡΠΎΡΡΠΊΠ° Π΄ΠΈΡΠ΅ΡΡΠ°ΡΠΈΡΠ° Π΄Π΅ΡΠ°ΡΠ½ΠΎ ΠΈΡΡΡΠ°ΠΆΡΡΠ΅ ΠΏΠΈΡΠ°ΡΠ΅ ΠΊΠ²Π°Π»ΠΈΡΠ΅ΡΠ° Π²Π΅Π»ΠΈΠΊΠΈΡ
ΠΈ
ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ
Π΅ΠΊΠΎΡΠΈΡΡΠ΅ΠΌΠ° ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° (Π΅Π½Π³. Linked Data Ecosystems), ΡΠ·ΠΈΠΌΠ°ΡΡΡΠΈ Ρ ΠΎΠ±Π·ΠΈΡ
ΠΌΠΎΠ³ΡΡΠ½ΠΎΡΡ ΠΏΠΎΠ½ΠΎΠ²Π½ΠΎΠ³ ΠΊΠΎΡΠΈΡΡΠ΅ΡΠ° ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈΡ
ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ°. Π Π°Π΄ ΡΠ΅ ΠΌΠΎΡΠΈΠ²ΠΈΡΠ°Π½ ΠΏΠΎΡΡΠ΅Π±ΠΎΠΌ Π΄Π° ΡΠ΅
ΠΎΠΌΠΎΠ³ΡΡΠΈ ΠΈΡΡΡΠ°ΠΆΠΈΠ²Π°ΡΠΈΠΌΠ° ΠΈΠ· Π°ΡΠ°ΠΏΡΠΊΠΈΡ
Π·Π΅ΠΌΠ°ΡΠ° Π΄Π° ΡΠΏΠΎΡΡΠ΅Π±ΠΎΠΌ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΈΡ
Π²Π΅Π± ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ°
ΠΏΠΎΠ²Π΅ΠΆΡ ΡΠ²ΠΎΡΠ΅ ΠΏΠΎΠ΄Π°ΡΠΊΠ΅ ΡΠ° ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈΠΌ ΠΏΠΎΠ΄Π°ΡΠΈΠΌΠ°, ΠΊΠ°ΠΎ Π½ΠΏΡ. DBpedia-ΡΠΎΠΌ. Π¦ΠΈΡ ΡΠ΅ Π΄Π° ΡΠ΅ ΠΈΡΠΏΠΈΡΠ°
Π΄Π° Π»ΠΈ ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈ ΠΏΠΎΠ΄Π°ΡΠΈ ΠΈΠ· ΠΡΠ°ΠΏΡΠΊΠΈΡ
Π·Π΅ΠΌΠ°ΡΠ° ΠΎΠΌΠΎΠ³ΡΡΠ°Π²Π°ΡΡ ΠΊΡΠ°ΡΡΠΈΠΌ ΠΊΠΎΡΠΈΡΠ½ΠΈΡΠΈΠΌΠ° Π΄Π° Π΄ΠΎΠ±ΠΈΡΡ
Π΄ΠΎΠ΄Π°ΡΠ½Π΅ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡΠ΅ ΠΊΠΎΡΠ΅ Π½ΠΈΡΡ Π΄ΠΎΡΡΡΠΏΠ½Π΅ Ρ ΠΏΠΎΡΠ΅Π΄ΠΈΠ½Π°ΡΠ½ΠΈΠΌ ΡΠΊΡΠΏΠΎΠ²ΠΈΠΌΠ° ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ°, ΠΏΠΎΡΠ΅Π΄
ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡΠ΅ Ρ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΈ WΠ΅Π± ΠΏΡΠΎΡΡΠΎΡ.
ΠΠΎΠΊΡΠΎΡΡΠΊΠ° Π΄ΠΈΡΠ΅ΡΡΠ°ΡΠΈΡΠ° ΠΏΡΠ΅Π΄Π»Π°ΠΆΠ΅ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ»ΠΎΠ³ΠΈΡΡ Π·Π° ΡΠ°Π·Π²ΠΎΡ Π°ΠΏΠ»ΠΈΠΊΠ°ΡΠΈΡΠ΅ Π·Π° ΡΠ°Π΄ ΡΠ°
ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΠΌ (Linked) ΠΏΠΎΠ΄Π°ΡΠΈΠΌΠ° ΠΈ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠΈΡΠ° ΡΠΎΡΡΠ²Π΅ΡΡΠΊΠΎ ΡΠ΅ΡΠ΅ΡΠ΅ ΠΊΠΎΡΠ΅ ΠΎΠΌΠΎΠ³ΡΡΡΡΠ΅
ΠΏΡΠ΅ΡΡΠ°ΠΆΠΈΠ²Π°ΡΠ΅ ΠΊΠΎΠ½ΡΠΎΠ»ΠΈΠ΄ΠΎΠ²Π°Π½ΠΎΠ³ ΡΠΊΡΠΏΠ° ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΠΎ Π»Π΅ΠΊΠΎΠ²ΠΈΠΌΠ° ΠΈΠ· ΠΈΠ·Π°Π±ΡΠ°Π½ΠΈΡ
Π°ΡΠ°ΠΏΡΠΊΠΈΡ
Π·Π΅ΠΌΠ°ΡΠ°. ΠΠΎΠ½ΡΠΎΠ»ΠΈΠ΄ΠΎΠ²Π°Π½ΠΈ ΡΠΊΡΠΏ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΡΠ΅ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠΈΡΠ°Π½ Ρ ΠΎΠ±Π»ΠΈΠΊΡ Π‘Π΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΎΠ³ ΡΠ΅Π·Π΅ΡΠ°
ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° (Π΅Π½Π³. Semantic Data Lake).
ΠΠ²Π° ΡΠ΅Π·Π° ΠΏΠΎΠΊΠ°Π·ΡΡΠ΅ ΠΊΠ°ΠΊΠΎ ΡΠ°ΡΠΌΠ°ΡΠ΅ΡΡΡΠΊΠ° ΠΈΠ½Π΄ΡΡΡΡΠΈΡΠ° ΠΈΠΌΠ° ΠΊΠΎΡΠΈΡΡΠΈ ΠΎΠ΄ ΠΏΡΠΈΠΌΠ΅Π½Π΅
ΠΈΠ½ΠΎΠ²Π°ΡΠΈΠ²Π½ΠΈΡ
ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ° ΠΈ ΠΈΡΡΡΠ°ΠΆΠΈΠ²Π°ΡΠΊΠΈΡ
ΡΡΠ΅Π½Π΄ΠΎΠ²Π° ΠΈΠ· ΠΎΠ±Π»Π°ΡΡΠΈ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΈΡ
ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ°. ΠΠ΅ΡΡΡΠΈΠΌ, ΠΊΠ°ΠΊΠΎ ΡΠ΅ Π΅Π»Π°Π±ΠΎΡΠΈΡΠ°Π½ΠΎ Ρ ΠΎΠ²ΠΎΡ ΡΠ΅Π·ΠΈ, ΠΏΠΎΡΡΠ΅Π±Π½ΠΎ ΡΠ΅ Π±ΠΎΡΠ΅ ΡΠ°Π·ΡΠΌΠ΅Π²Π°ΡΠ΅
ΡΠΏΠ΅ΡΠΈΡΠΈΡΠ½ΠΎΡΡΠΈ Π°ΡΠ°ΠΏΡΠΊΠΎΠ³ ΡΠ΅Π·ΠΈΠΊΠ° Π·Π° ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠ°ΡΠΈΡΡ Linked Data Π°Π»Π°ΡΠ° ΠΈ ΡΡΡ
ΠΎΠ²Ρ ΠΏΡΠΈΠΌΠ΅Π½Ρ
ΡΠ° ΠΏΠΎΠ΄Π°ΡΠΈΠΌΠ° ΠΈΠ· ΠΡΠ°ΠΏΡΠΊΠΈΡ
Π·Π΅ΠΌΠ°ΡΠ°
InvestointipÀÀtâkseen vaikuttavat faktat ja tuntemukset maaliikenteessÀ
The conventional approach in economics has assumed business decisions to follow rational and logical reasoning. Management accounting (MA) has been defined as information, which is designed to enable rational decision makers to make optimal decisions. However, the recent MA literature has shown that managerial decisions rely also on feeling and intuition. Institution theory has been applied to illustrate the observed bounded rationality in organizations, but it does not explain why and when investment discussions culminate to investment actions. More research is needed to understand the decision making process in organizations and how the intertwined facts and feelings of the decision affect the process. When the influence of fact and feeling driven criteria in decisions is recognized, the actual role of MA in decision making must be reconsidered. Alternative fuel vehicle investments offer a complex and appealing context to study the topic, as the investment discussions are influenced by a diverse set of facts and feelings ranging from fuel cost savings to environmental values.
The thesis focused particularly on the engagement of facts and feelings related to investment decisions on natural and bio gas vehicles. The topic was covered by creating a framework explaining the interaction of facts and feelings in decision making process. It also described how the discussions eventually lead to actual investment behaviour and how to recognize the role of MA in the process. The research material was gathered through an interventionist case study setting by creating MA tools for a case company and interviewing B2B customers about their natural gas vehicle investment decisions.
The findings contributed to the discussion on factors affecting investment decisions in road transportation. The study suggests that the investment decision requires both a factual grounding and support from the decision makerβs values in order to form a real investment possibility. The fact and feeling driven decision criteria varies with the nature of the investment. Therefore, also the role of MA is different depending on the hierarchy of the criteria. Calculations alone do not determinate the outcome of the investment decision in the road transportation context