23 research outputs found

    Data-driven conceptual modeling: how some knowledge drivers for the enterprise might be mined from enterprise data

    Get PDF
    As organizations perform their business, they analyze, design and manage a variety of processes represented in models with different scopes and scale of complexity. Specifying these processes requires a certain level of modeling competence. However, this condition does not seem to be balanced with adequate capability of the person(s) who are responsible for the task of defining and modeling an organization or enterprise operation. On the other hand, an enterprise typically collects various records of all events occur during the operation of their processes. Records, such as the start and end of the tasks in a process instance, state transitions of objects impacted by the process execution, the message exchange during the process execution, etc., are maintained in enterprise repositories as various logs, such as event logs, process logs, effect logs, message logs, etc. Furthermore, the growth rate in the volume of these data generated by enterprise process execution has increased manyfold in just a few years. On top of these, models often considered as the dashboard view of an enterprise. Models represents an abstraction of the underlying reality of an enterprise. Models also served as the knowledge driver through which an enterprise can be managed. Data-driven extraction offers the capability to mine these knowledge drivers from enterprise data and leverage the mined models to establish the set of enterprise data that conforms with the desired behaviour. This thesis aimed to generate models or knowledge drivers from enterprise data to enable some type of dashboard view of enterprise to provide support for analysts. The rationale for this has been started as the requirement to improve an existing process or to create a new process. It was also mentioned models can also serve as a collection of effectors through which an organization or an enterprise can be managed. The enterprise data refer to above has been identified as process logs, effect logs, message logs, and invocation logs. The approach in this thesis is to mine these logs to generate process, requirement, and enterprise architecture models, and how goals get fulfilled based on collected operational data. The above a research question has been formulated as whether it is possible to derive the knowledge drivers from the enterprise data, which represent the running operation of the enterprise, or in other words, is it possible to use the available data in the enterprise repository to generate the knowledge drivers? . In Chapter 2, review of literature that can provide the necessary background knowledge to explore the above research question has been presented. Chapter 3 presents how process semantics can be mined. Chapter 4 suggest a way to extract a requirements model. The Chapter 5 presents a way to discover the underlying enterprise architecture and Chapter 6 presents a way to mine how goals get orchestrated. Overall finding have been discussed in Chapter 7 to derive some conclusions

    Linked Data Entity Summarization

    Get PDF
    On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion

    Qualitative robot planning of object moving by pushing

    Get PDF
    The thesis belongs to the field of Artificial Intelligence, robotics and qualitative reasoning. The purpose of the work is to use a qualitative simulator for planning qualitative actions of a robot. Our modification of the known QSIM algorithm generates state space, which we search with the heuristic search algorithm A*. Implementations of all algorithms are written in the programming language Prolog. Some machine learning algorithms induce qualitative models using QDE constraints that are not defined in the original QSIM algorithm. One of these QDE constraints is the monotonicity in multiple variables. This QDE constraint was implemented and tested on an artificial domain. Generated robot plans have been tested on an object pushing simulator, which is based on the Box2D engine. For this purpose, an algorithm for plan execution was developed. This plan execution algorithm communicates through an interface, which was also developed as part of the thesis. The interface is responsible for a conversion of numerical data into qualitative states. The interface also implements execution of qualitative actions on the simulator. Plans developed by the proposed algorithm have been tested in two object pushing domains: the case of pushing a vertical cylinder and the case of pushing a block. For this purpose, there was a hand-built qualitative model for each domain. The thesis is concluded with an examination of achieved objectives, a review of potential challenges in the implementation of algorithms and a review of ideas for further research

    A Framework for Semantic Similarity Measures to enhance Knowledge Graph Quality

    Get PDF
    Precisely determining similarity values among real-world entities becomes a building block for data driven tasks, e.g., ranking, relation discovery or integration. Semantic Web and Linked Data initiatives have promoted the publication of large semi-structured datasets in form of knowledge graphs. Knowledge graphs encode semantics that describes resources in terms of several aspects or resource characteristics, e.g., neighbors, class hierarchies or attributes. Existing similarity measures take into account these aspects in isolation, which may prevent them from delivering accurate similarity values. In this thesis, the relevant resource characteristics to determine accurately similarity values are identified and considered in a cumulative way in a framework of four similarity measures. Additionally, the impact of considering these resource characteristics during the computation of similarity values is analyzed in three data-driven tasks for the enhancement of knowledge graph quality. First, according to the identified resource characteristics, new similarity measures able to combine two or more of them are described. In total four similarity measures are presented in an evolutionary order. While the first three similarity measures, OnSim, IC-OnSim and GADES, combine the resource characteristics according to a human defined aggregation function, the last one, GARUM, makes use of a machine learning regression approach to determine the relevance of each resource characteristic during the computation of the similarity. Second, the suitability of each measure for real-time applications is studied by means of a theoretical and an empirical comparison. The theoretical comparison consists on a study of the worst case computational complexity of each similarity measure. The empirical comparison is based on the execution times of the different similarity measures in two third-party benchmarks involving the comparison of semantically annotated entities. Ultimately, the impact of the described similarity measures is shown in three data-driven tasks for the enhancement of knowledge graph quality: relation discovery, dataset integration and evolution analysis of annotation datasets. Empirical results show that relation discovery and dataset integration tasks obtain better results when considering semantics encoded in semantic similarity measures. Further, using semantic similarity measures in the evolution analysis tasks allows for defining new informative metrics able to give an overview of the evolution of the whole annotation set, instead of the individual annotations like state-of-the-art evolution analysis frameworks

    Exploiting general-purpose background knowledge for automated schema matching

    Full text link
    The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process. In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources. A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems. One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented. In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications

    Data Science and Knowledge Discovery

    Get PDF
    Data Science (DS) is gaining significant importance in the decision process due to a mix of various areas, including Computer Science, Machine Learning, Math and Statistics, domain/business knowledge, software development, and traditional research. In the business field, DS's application allows using scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data to support the decision process. After collecting the data, it is crucial to discover the knowledge. In this step, Knowledge Discovery (KD) tasks are used to create knowledge from structured and unstructured sources (e.g., text, data, and images). The output needs to be in a readable and interpretable format. It must represent knowledge in a manner that facilitates inferencing. KD is applied in several areas, such as education, health, accounting, energy, and public administration. This book includes fourteen excellent articles which discuss this trending topic and present innovative solutions to show the importance of Data Science and Knowledge Discovery to researchers, managers, industry, society, and other communities. The chapters address several topics like Data mining, Deep Learning, Data Visualization and Analytics, Semantic data, Geospatial and Spatio-Temporal Data, Data Augmentation and Text Mining
    corecore