    Supporting software processes analysis and decision-making using provenance data

    Data provenance can be defined as the description of the origins of a piece of data and the process by which it arrived in a database. Provenance has been successfully used in health sciences, chemical industries, and scientific computing, considering that these areas require a comprehensive traceability mechanism. Moreover, companies have been increasing the amount of data they collect from their systems and processes, considering the dropping cost of memory and storage technologies in the last years. Thus, this thesis investigates if the use of provenance models and techniques can support software processes execution analysis and data-driven decision-making, considering the increasing availability of process data provided by companies. A provenance model for software processes was developed and evaluated by experts in process and provenance area, in addition to an approach for capturing, storing, inferencing of implicit information, and visualization to software process provenance data. In addition, a case study using data from industry’s processes was conducted to evaluate the approach, with a discussion about several specific analysis and data-driven decision-making possibilities.Proveniência de dados é definida como a descrição da origem de um dado e o processo pelo qual este passou até chegar ao seu estado atual. Proveniência de dados tem sido usada com sucesso em domínios como ciências da saúde, indústrias químicas e computação científica, considerando que essas áreas exigem um mecanismo abrangente de rastreabilidade. Por outro lado, as empresas vêm aumentando a quantidade de dados que coletam de seus sistemas e processos, considerando a diminuição no custo das tecnologias de memória e armazenamento nos últimos anos. Assim, esta tese investiga se o uso de modelos e técnicas de proveniência é capaz de apoiar a análise da execução de processos de software e a tomada de decisões baseada em dados, considerando a disponibilização cada vez maior de dados relativos a processos pelas empresas. Um modelo de proveniência para processos de software foi desenvolvido e avaliado por especialistas em processos e proveniência, além de uma abordagem e ferramental de apoio para captura, armazenamento, inferência de novas informações e posterior análise e visualização dos dados de proveniência de processos. Um estudo de caso utilizando dados de processos da indústria foi conduzido para avaliação da abordagem e discussão de possibilidades distintas para análise e tomada de decisão orientada por estes dados

    A framework for integrating syntax, semantics and pragmatics for computer-aided professional practice: With application of costing in construction industry

    Producing a bill of quantity is a knowledge-based, dynamic and collaborative process, and evolves with variances and current evidence. However, within the context of information system practice in BIM, knowledge of cost estimation has not been represented, nor has it been integrated into the processes based on BIM. This paper intends to establish an innovative means of taking data from the BIM linked to a project, and using it to create the necessary items for a bill of quantity that will enable cost estimation to be undertaken for the project. Our framework is founded upon the belief that three components are necessary to gain a full awareness of the domain which is being computerised; the information type which is to be assessed for compatibility (syntax), the definition for the pricing domain (semantics), and the precise implementation environment for the standards being taken into account (pragmatics). In order to achieve this, a prototype is created that allows a cost item for the bill of quantity to be spontaneously generated, by means of the semantic web ontology and a forward chain algorithm. Within this paper, ‘cost items’ signify the elements included in a bill of quantity, including details of their description, quantity and price. As a means of authenticating the process being developed, the authors of this work effectively implemented it in the production of cost items. In addition, the items created were contrasted with those produced by specialists. For this reason, this innovative framework introduces the possibility of a new means of applying semantic web ontology and forward chain algorithm to construction professional practice resulting in automatic cost estimation. These key outcomes demonstrate that, decoupling the professional practice into three key components of syntax, semantics and pragmatics can provide tangible benefits to domain use

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    Big Data and Analytics: Issues and Challenges for the Past and Next Ten Years

    In this paper we continue the minitrack series of papers recognizing issues and challenges identified in the field of Big Data and Analytics, from the past and going forward. As this field has evolved, it has begun to encompass other analytical regimes, notably AI/ML systems. In this paper we focus on two areas: continuing main issues for which some progress has been made and new and emerging issues which we believe form the basis for near-term and future research in Big Data and Analytics. The Bottom Line: Big Data and Analytics is healthy, is growing in scope and evolving in capability, and is finding applicability in more problem domains than ever before

    Defining and Conceptualizing Actionable Insight: A Conceptual Framework for Decision-centric Analytics

    Despite actionable insight being widely recognized as the outcome of data analytics, there is a lack of a systematic and commonly-agreed definition for the term. More importantly, existing definitions are generally too abstract for informing the design of data analytics systems. This study proposes a definition of actionable insight as a multi-component concept comprising analytic insight, synergic insight, and prognostic insights. This definition is informed by a conceptual framework, which also can be used to systematically understand actionable insight, both at the concept-level and component-level. Each component is explained from the analytical, cognitive, and computational perspectives and relevant design considerations are suggested. We hope this study could be a rudimentary step toward the realization of decision-centric data analytics that can deliver the promised actionable insight

    Collaborative Decision Support and Documentation in Chemical Safety with KnowSEC

    To protect the health of human and environment, the European Union implemented the REACH regulation for chemical substances. REACH is an acronym for Registration, Evaluation, Authorization, and Restriction of Chemicals. Under REACH, the authorities have the task of assessing chemical substances, especially those that might pose a risk to human health or environment. The work under REACH is scientifically, technically and procedurally a complex and knowledge-intensive task that is jointly performed by the European Chemicals Agency and member state authorities in Europe. The assessment of substances under REACH conducted in the German Environment Agency is supported by the knowledge-based system KnowSEC, which is used for the screening, documentation, and decision support when working on chemical substances. The software KnowSEC integrates advanced semantic technologies and strong problem solving methods. It allows for the collaborative work on substances in the context of the European REACH regulation. We discuss the applied methods and process models and we report on experiences with the implementation and use of the system

    Methodological approaches and techniques for designing ontologies in information systems requirements engineering

    Programa doutoral em Information Systems and TechnologyThe way we interact with the world around us is changing as new challenges arise, embracing innovative business models, rethinking the organization and processes to maximize results, and evolving change management. Currently, and considering the projects executed, the methodologies used do not fully respond to the companies' needs. On the one hand, organizations are not familiar with the languages used in Information Systems, and on the other hand, they are often unable to validate requirements or business models. These are some of the difficulties encountered that lead us to think about formulating a new approach. Thus, the state of the art presented in this paper includes a study of the models involved in the software development process, where traditional methods and the rivalry of agile methods are present. In addition, a survey is made about Ontologies and what methods exist to conceive, transform, and represent them. Thus, after analyzing some of the various possibilities currently available, we began the process of evolving a method and developing an approach that would allow us to design ontologies. The method we evolved and adapted will allow us to derive terminologies from a specific domain, aggregating them in order to facilitate the construction of a catalog of terminologies. Next, the definition of an approach to designing ontologies will allow the construction of a domain-specific ontology. This approach allows in the first instance to integrate and store the data from different information systems of a given organization. In a second instance, the rules for mapping and building the ontology database are defined. Finally, a technological architecture is also proposed that will allow the mapping of an ontology through the construction of complex networks, allowing mapping and relating terminologies. This doctoral work encompasses numerous Research & Development (R&D) projects belonging to different domains such as Software Industry, Textile Industry, Robotic Industry and Smart Cities. Finally, a critical and descriptive analysis of the work done is performed, and we also point out perspectives for possible future work.A forma como interagimos com o mundo à nossa volta está a mudar à medida que novos desafios surgem, abraçando modelos empresariais inovadores, repensando a organização e os processos para maximizar os resultados, e evoluindo a gestão da mudança. Atualmente, e considerando os projetos executados, as metodologias utilizadas não respondem na totalidade às necessidades das empresas. Por um lado, as organizações não estão familiarizadas com as linguagens utilizadas nos Sistemas de Informação, por outro lado, são muitas vezes incapazes de validar requisitos ou modelos de negócio. Estas são algumas das dificuldades encontradas que nos levam a pensar na formulação de uma nova abordagem. Assim, o estado da arte apresentado neste documento inclui um estudo dos modelos envolvidos no processo de desenvolvimento de software, onde os métodos tradicionais e a rivalidade de métodos ágeis estão presentes. Além disso, é efetuado um levantamento sobre Ontologias e quais os métodos existentes para as conceber, transformar e representar. Assim, e após analisarmos algumas das várias possibilidades atualmente disponíveis, iniciou-se o processo de evolução de um método e desenvolvimento de uma abordagem que nos permitisse conceber ontologias. O método que evoluímos e adaptamos permitirá derivar terminologias de um domínio específico, agregando-as de forma a facilitar a construção de um catálogo de terminologias. Em seguida, a definição de uma abordagem para conceber ontologias permitirá a construção de uma ontologia de um domínio específico. Esta abordagem permite em primeira instância, integrar e armazenar os dados de diferentes sistemas de informação de uma determinada organização. Num segundo momento, são definidas as regras para o mapeamento e construção da base de dados ontológica. Finalmente, é também proposta uma arquitetura tecnológica que permitirá efetuar o mapeamento de uma ontologia através da construção de redes complexas, permitindo mapear e relacionar terminologias. Este trabalho de doutoramento engloba inúmeros projetos de Investigação & Desenvolvimento (I&D) pertencentes a diferentes domínios como por exemplo Indústria de Software, Indústria Têxtil, Indústria Robótica e Smart Cities. Finalmente, é realizada uma análise critica e descritiva do trabalho realizado, sendo que apontamos ainda perspetivas de possíveis trabalhos futuros

    Semantically defined Analytics for Industrial Equipment Diagnostics

    In this age of digitalization, industries everywhere accumulate massive amount of data such that it has become the lifeblood of the global economy. This data may come from various heterogeneous systems, equipment, components, sensors, systems and applications in many varieties (diversity of sources), velocities (high rate of changes) and volumes (sheer data size). Despite significant advances in the ability to collect, store, manage and filter data, the real value lies in the analytics. Raw data is meaningless, unless it is properly processed to actionable (business) insights. Those that know how to harness data effectively, have a decisive competitive advantage, through raising performance by making faster and smart decisions, improving short and long-term strategic planning, offering more user-centric products and services and fostering innovation. Two distinct paradigms in practice can be discerned within the field of analytics: semantic-driven (deductive) and data-driven (inductive). The first emphasizes logic as a way of representing the domain knowledge encoded in rules or ontologies and are often carefully curated and maintained. However, these models are often highly complex, and require intensive knowledge processing capabilities. Data-driven analytics employ machine learning (ML) to directly learn a model from the data with minimal human intervention. However, these models are tuned to trained data and context, making it difficult to adapt. Industries today that want to create value from data must master these paradigms in combination. However, there is great need in data analytics to seamlessly combine semantic-driven and data-driven processing techniques in an efficient and scalable architecture that allows extracting actionable insights from an extreme variety of data. In this thesis, we address these needs by providing: • A unified representation of domain-specific and analytical semantics, in form of ontology models called TechOnto Ontology Stack. It is highly expressive, platform-independent formalism to capture conceptual semantics of industrial systems such as technical system hierarchies, component partonomies etc and its analytical functional semantics. • A new ontology language Semantically defined Analytical Language (SAL) on top of the ontology model that extends existing DatalogMTL (a Horn fragment of Metric Temporal Logic) with analytical functions as first class citizens. • A method to generate semantic workflows using our SAL language. It helps in authoring, reusing and maintaining complex analytical tasks and workflows in an abstract fashion. • A multi-layer architecture that fuses knowledge- and data-driven analytics into a federated and distributed solution. To our knowledge, the work in this thesis is one of the first works to introduce and investigate the use of the semantically defined analytics in an ontology-based data access setting for industrial analytical applications. The reason behind focusing our work and evaluation on industrial data is due to (i) the adoption of semantic technology by the industries in general, and (ii) the common need in literature and in practice to allow domain expertise to drive the data analytics on semantically interoperable sources, while still harnessing the power of analytics to enable real-time data insights. Given the evaluation results of three use-case studies, our approach surpass state-of-the-art approaches for most application scenarios.Im Zeitalter der Digitalisierung sammeln die Industrien überall massive Daten-mengen, die zum Lebenselixier der Weltwirtschaft geworden sind. Diese Daten können aus verschiedenen heterogenen Systemen, Geräten, Komponenten, Sensoren, Systemen und Anwendungen in vielen Varianten (Vielfalt der Quellen), Geschwindigkeiten (hohe Änderungsrate) und Volumina (reine Datengröße) stammen. Trotz erheblicher Fortschritte in der Fähigkeit, Daten zu sammeln, zu speichern, zu verwalten und zu filtern, liegt der eigentliche Wert in der Analytik. Rohdaten sind bedeutungslos, es sei denn, sie werden ordnungsgemäß zu verwertbaren (Geschäfts-)Erkenntnissen verarbeitet. Wer weiß, wie man Daten effektiv nutzt, hat einen entscheidenden Wettbewerbsvorteil, indem er die Leistung steigert, indem er schnellere und intelligentere Entscheidungen trifft, die kurz- und langfristige strategische Planung verbessert, mehr benutzerorientierte Produkte und Dienstleistungen anbietet und Innovationen fördert. In der Praxis lassen sich im Bereich der Analytik zwei unterschiedliche Paradigmen unterscheiden: semantisch (deduktiv) und Daten getrieben (induktiv). Die erste betont die Logik als eine Möglichkeit, das in Regeln oder Ontologien kodierte Domänen-wissen darzustellen, und wird oft sorgfältig kuratiert und gepflegt. Diese Modelle sind jedoch oft sehr komplex und erfordern eine intensive Wissensverarbeitung. Datengesteuerte Analysen verwenden maschinelles Lernen (ML), um mit minimalem menschlichen Eingriff direkt ein Modell aus den Daten zu lernen. Diese Modelle sind jedoch auf trainierte Daten und Kontext abgestimmt, was die Anpassung erschwert. Branchen, die heute Wert aus Daten schaffen wollen, müssen diese Paradigmen in Kombination meistern. Es besteht jedoch ein großer Bedarf in der Daten-analytik, semantisch und datengesteuerte Verarbeitungstechniken nahtlos in einer effizienten und skalierbaren Architektur zu kombinieren, die es ermöglicht, aus einer extremen Datenvielfalt verwertbare Erkenntnisse zu gewinnen. In dieser Arbeit, die wir auf diese Bedürfnisse durch die Bereitstellung: • Eine einheitliche Darstellung der Domänen-spezifischen und analytischen Semantik in Form von Ontologie Modellen, genannt TechOnto Ontology Stack. Es ist ein hoch-expressiver, plattformunabhängiger Formalismus, die konzeptionelle Semantik industrieller Systeme wie technischer Systemhierarchien, Komponenten-partonomien usw. und deren analytische funktionale Semantik zu erfassen. • Eine neue Ontologie-Sprache Semantically defined Analytical Language (SAL) auf Basis des Ontologie-Modells das bestehende DatalogMTL (ein Horn fragment der metrischen temporären Logik) um analytische Funktionen als erstklassige Bürger erweitert. • Eine Methode zur Erzeugung semantischer workflows mit unserer SAL-Sprache. Es hilft bei der Erstellung, Wiederverwendung und Wartung komplexer analytischer Aufgaben und workflows auf abstrakte Weise. • Eine mehrschichtige Architektur, die Wissens- und datengesteuerte Analysen zu einer föderierten und verteilten Lösung verschmilzt. Nach unserem Wissen, die Arbeit in dieser Arbeit ist eines der ersten Werke zur Einführung und Untersuchung der Verwendung der semantisch definierten Analytik in einer Ontologie-basierten Datenzugriff Einstellung für industrielle analytische Anwendungen. Der Grund für die Fokussierung unserer Arbeit und Evaluierung auf industrielle Daten ist auf (i) die Übernahme semantischer Technologien durch die Industrie im Allgemeinen und (ii) den gemeinsamen Bedarf in der Literatur und in der Praxis zurückzuführen, der es der Fachkompetenz ermöglicht, die Datenanalyse auf semantisch inter-operablen Quellen voranzutreiben, und nutzen gleichzeitig die Leistungsfähigkeit der Analytik, um Echtzeit-Daten-einblicke zu ermöglichen. Aufgrund der Evaluierungsergebnisse von drei Anwendungsfällen Übertritt unser Ansatz für die meisten Anwendungsszenarien Modernste Ansätze