3,907 research outputs found

    Approximate Data Mining Techniques on Clinical Data

    Get PDF
    The past two decades have witnessed an explosion in the number of medical and healthcare datasets available to researchers and healthcare professionals. Data collection efforts are highly required, and this prompts the development of appropriate data mining techniques and tools that can automatically extract relevant information from data. Consequently, they provide insights into various clinical behaviors or processes captured by the data. Since these tools should support decision-making activities of medical experts, all the extracted information must be represented in a human-friendly way, that is, in a concise and easy-to-understand form. To this purpose, here we propose a new framework that collects different new mining techniques and tools proposed. These techniques mainly focus on two aspects: the temporal one and the predictive one. All of these techniques were then applied to clinical data and, in particular, ICU data from MIMIC III database. It showed the flexibility of the framework, which is able to retrieve different outcomes from the overall dataset. The first two techniques rely on the concept of Approximate Temporal Functional Dependencies (ATFDs). ATFDs have been proposed, with their suitable treatment of temporal information, as a methodological tool for mining clinical data. An example of the knowledge derivable through dependencies may be "within 15 days, patients with the same diagnosis and the same therapy usually receive the same daily amount of drug". However, current ATFD models are not analyzing the temporal evolution of the data, such as "For most patients with the same diagnosis, the same drug is prescribed after the same symptom". To this extent, we propose a new kind of ATFD called Approximate Pure Temporally Evolving Functional Dependencies (APEFDs). Another limitation of such kind of dependencies is that they cannot deal with quantitative data when some tolerance can be allowed for numerical values. In particular, this limitation arises in clinical data warehouses, where analysis and mining have to consider one or more measures related to quantitative data (such as lab test results and vital signs), concerning multiple dimensional (alphanumeric) attributes (such as patient, hospital, physician, diagnosis) and some time dimensions (such as the day since hospitalization and the calendar date). According to this scenario, we introduce a new kind of ATFD, named Multi-Approximate Temporal Functional Dependency (MATFD), which considers dependencies between dimensions and quantitative measures from temporal clinical data. These new dependencies may provide new knowledge as "within 15 days, patients with the same diagnosis and the same therapy receive a daily amount of drug within a fixed range". The other techniques are based on pattern mining, which has also been proposed as a methodological tool for mining clinical data. However, many methods proposed so far focus on mining of temporal rules which describe relationships between data sequences or instantaneous events, without considering the presence of more complex temporal patterns into the dataset. These patterns, such as trends of a particular vital sign, are often very relevant for clinicians. Moreover, it is really interesting to discover if some sort of event, such as a drug administration, is capable of changing these trends and how. To this extent, we propose a new kind of temporal patterns, called Trend-Event Patterns (TEPs), that focuses on events and their influence on trends that can be retrieved from some measures, such as vital signs. With TEPs we can express concepts such as "The administration of paracetamol on a patient with an increasing temperature leads to a decreasing trend in temperature after such administration occurs". We also decided to analyze another interesting pattern mining technique that includes prediction. This technique discovers a compact set of patterns that aim to describe the condition (or class) of interest. Our framework relies on a classification model that considers and combines various predictive pattern candidates and selects only those that are important to improve the overall class prediction performance. We show that our classification approach achieves a significant reduction in the number of extracted patterns, compared to the state-of-the-art methods based on minimum predictive pattern mining approach, while preserving the overall classification accuracy of the model. For each technique described above, we developed a tool to retrieve its kind of rule. All the results are obtained by pre-processing and mining clinical data and, as mentioned before, in particular ICU data from MIMIC III database

    Semantically defined Analytics for Industrial Equipment Diagnostics

    Get PDF
    In this age of digitalization, industries everywhere accumulate massive amount of data such that it has become the lifeblood of the global economy. This data may come from various heterogeneous systems, equipment, components, sensors, systems and applications in many varieties (diversity of sources), velocities (high rate of changes) and volumes (sheer data size). Despite significant advances in the ability to collect, store, manage and filter data, the real value lies in the analytics. Raw data is meaningless, unless it is properly processed to actionable (business) insights. Those that know how to harness data effectively, have a decisive competitive advantage, through raising performance by making faster and smart decisions, improving short and long-term strategic planning, offering more user-centric products and services and fostering innovation. Two distinct paradigms in practice can be discerned within the field of analytics: semantic-driven (deductive) and data-driven (inductive). The first emphasizes logic as a way of representing the domain knowledge encoded in rules or ontologies and are often carefully curated and maintained. However, these models are often highly complex, and require intensive knowledge processing capabilities. Data-driven analytics employ machine learning (ML) to directly learn a model from the data with minimal human intervention. However, these models are tuned to trained data and context, making it difficult to adapt. Industries today that want to create value from data must master these paradigms in combination. However, there is great need in data analytics to seamlessly combine semantic-driven and data-driven processing techniques in an efficient and scalable architecture that allows extracting actionable insights from an extreme variety of data. In this thesis, we address these needs by providing: • A unified representation of domain-specific and analytical semantics, in form of ontology models called TechOnto Ontology Stack. It is highly expressive, platform-independent formalism to capture conceptual semantics of industrial systems such as technical system hierarchies, component partonomies etc and its analytical functional semantics. • A new ontology language Semantically defined Analytical Language (SAL) on top of the ontology model that extends existing DatalogMTL (a Horn fragment of Metric Temporal Logic) with analytical functions as first class citizens. • A method to generate semantic workflows using our SAL language. It helps in authoring, reusing and maintaining complex analytical tasks and workflows in an abstract fashion. • A multi-layer architecture that fuses knowledge- and data-driven analytics into a federated and distributed solution. To our knowledge, the work in this thesis is one of the first works to introduce and investigate the use of the semantically defined analytics in an ontology-based data access setting for industrial analytical applications. The reason behind focusing our work and evaluation on industrial data is due to (i) the adoption of semantic technology by the industries in general, and (ii) the common need in literature and in practice to allow domain expertise to drive the data analytics on semantically interoperable sources, while still harnessing the power of analytics to enable real-time data insights. Given the evaluation results of three use-case studies, our approach surpass state-of-the-art approaches for most application scenarios.Im Zeitalter der Digitalisierung sammeln die Industrien überall massive Daten-mengen, die zum Lebenselixier der Weltwirtschaft geworden sind. Diese Daten können aus verschiedenen heterogenen Systemen, Geräten, Komponenten, Sensoren, Systemen und Anwendungen in vielen Varianten (Vielfalt der Quellen), Geschwindigkeiten (hohe Änderungsrate) und Volumina (reine Datengröße) stammen. Trotz erheblicher Fortschritte in der Fähigkeit, Daten zu sammeln, zu speichern, zu verwalten und zu filtern, liegt der eigentliche Wert in der Analytik. Rohdaten sind bedeutungslos, es sei denn, sie werden ordnungsgemäß zu verwertbaren (Geschäfts-)Erkenntnissen verarbeitet. Wer weiß, wie man Daten effektiv nutzt, hat einen entscheidenden Wettbewerbsvorteil, indem er die Leistung steigert, indem er schnellere und intelligentere Entscheidungen trifft, die kurz- und langfristige strategische Planung verbessert, mehr benutzerorientierte Produkte und Dienstleistungen anbietet und Innovationen fördert. In der Praxis lassen sich im Bereich der Analytik zwei unterschiedliche Paradigmen unterscheiden: semantisch (deduktiv) und Daten getrieben (induktiv). Die erste betont die Logik als eine Möglichkeit, das in Regeln oder Ontologien kodierte Domänen-wissen darzustellen, und wird oft sorgfältig kuratiert und gepflegt. Diese Modelle sind jedoch oft sehr komplex und erfordern eine intensive Wissensverarbeitung. Datengesteuerte Analysen verwenden maschinelles Lernen (ML), um mit minimalem menschlichen Eingriff direkt ein Modell aus den Daten zu lernen. Diese Modelle sind jedoch auf trainierte Daten und Kontext abgestimmt, was die Anpassung erschwert. Branchen, die heute Wert aus Daten schaffen wollen, müssen diese Paradigmen in Kombination meistern. Es besteht jedoch ein großer Bedarf in der Daten-analytik, semantisch und datengesteuerte Verarbeitungstechniken nahtlos in einer effizienten und skalierbaren Architektur zu kombinieren, die es ermöglicht, aus einer extremen Datenvielfalt verwertbare Erkenntnisse zu gewinnen. In dieser Arbeit, die wir auf diese Bedürfnisse durch die Bereitstellung: • Eine einheitliche Darstellung der Domänen-spezifischen und analytischen Semantik in Form von Ontologie Modellen, genannt TechOnto Ontology Stack. Es ist ein hoch-expressiver, plattformunabhängiger Formalismus, die konzeptionelle Semantik industrieller Systeme wie technischer Systemhierarchien, Komponenten-partonomien usw. und deren analytische funktionale Semantik zu erfassen. • Eine neue Ontologie-Sprache Semantically defined Analytical Language (SAL) auf Basis des Ontologie-Modells das bestehende DatalogMTL (ein Horn fragment der metrischen temporären Logik) um analytische Funktionen als erstklassige Bürger erweitert. • Eine Methode zur Erzeugung semantischer workflows mit unserer SAL-Sprache. Es hilft bei der Erstellung, Wiederverwendung und Wartung komplexer analytischer Aufgaben und workflows auf abstrakte Weise. • Eine mehrschichtige Architektur, die Wissens- und datengesteuerte Analysen zu einer föderierten und verteilten Lösung verschmilzt. Nach unserem Wissen, die Arbeit in dieser Arbeit ist eines der ersten Werke zur Einführung und Untersuchung der Verwendung der semantisch definierten Analytik in einer Ontologie-basierten Datenzugriff Einstellung für industrielle analytische Anwendungen. Der Grund für die Fokussierung unserer Arbeit und Evaluierung auf industrielle Daten ist auf (i) die Übernahme semantischer Technologien durch die Industrie im Allgemeinen und (ii) den gemeinsamen Bedarf in der Literatur und in der Praxis zurückzuführen, der es der Fachkompetenz ermöglicht, die Datenanalyse auf semantisch inter-operablen Quellen voranzutreiben, und nutzen gleichzeitig die Leistungsfähigkeit der Analytik, um Echtzeit-Daten-einblicke zu ermöglichen. Aufgrund der Evaluierungsergebnisse von drei Anwendungsfällen Übertritt unser Ansatz für die meisten Anwendungsszenarien Modernste Ansätze

    Representing and Reasoning about Temporal Granularities

    Full text link

    Dagstuhl News January - December 2001

    Get PDF
    "Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

    A Generic Approach to Supporting the Management of Computerised Clinical Guidelines and Protocols

    Get PDF
    Clinical guidelines or protocols (CGPs) are statements that are systematically developed for the purpose of guiding the clinician and the patient in making decisions about appropriate healthcare for specific clinical problems. Using CGPs is one of the most effective and proven ways to attaining improved quality, optimised resource utilisation, cost containment and reduced variation in healthcare practice. CGPs exist mainly as paper-based natural language statements, but are increasingly being computerised. Supporting computerised CGPs in a healthcare environment so that they are incorporated into the routine used daily by clinicians is complex and presents major information management challenges. This thesis contends that the management of computerised CGPs should incorporate their manipulation (operations and queries), in addition to their specification and execution, as part of a single unified management framework. The thesis applies modern advanced database technology to the task of managing computerised CGPs. The event-condition-action (ECA) rule paradigm is recognised to have a huge potential in supporting computerised CGPs. In this thesis, a unified generic framework, called SpEM and an approach, called MonCooS, were developed for enabling computerised CGPs, to be specified by using a specification language, called PLAN, which follows the ECA rule paradigm; executed by using a software mechanism based on the ECA mechanism within a modern database system, and manipulated by using a manipulation language, called TOPSQL. The MonCooS approach focuses on providing clinicians with assistance in monitoring and coordinating clinical interventions while leaving the reasoning task to domain experts. A proof-of-concepts system, TOPS, was developed to show that CGP management can be easily attained, within the SpEM framework, by using the MonCooS approach. TOPS is used to evaluate the framework and approach in a case study to manage a microalbuminuria protocol for diabetic patients. SpEM and MonCooS were found to be promising in supporting the full-scale management of information and knowledge for the computerised clinical protocol. Active capability within modern DBMS is still experiencing significant limitations in supporting some requirements of this application domain. These limitations lead to pointers for further improvements in database management system (DBMS) functionality for ECA rule support. The main contributions of this thesis are: a generic and unified framework for the management of CGPs; a general platform and an advanced software mechanism for the manipulation of information and knowledge in computerised CGPs; a requirement for further development of the active functionality within modern DBMS; and a case study for the computer-based management of microalbuminuria in diabetes patients

    Relaxed Functional Dependencies - A Survey of Approaches

    Get PDF
    Recently, there has been a renovated interest in functional dependencies due to the possibility of employing them in several advanced database operations, such as data cleaning, query relaxation, record matching, and so forth. In particular, the constraints defined for canonical functional dependencies have been relaxed to capture inconsistencies in real data, patterns of semantically related data, or semantic relationships in complex data types. In this paper, we have surveyed 35 of such functional dependencies, providing a classification criteria, motivating examples, and a systematic analysis of them

    Applying Process-Oriented Data Science to Dentistry

    Get PDF
    Background: Healthcare services now often follow evidence-based principles, so technologies such as process and data mining will help inform their drive towards optimal service delivery. Process mining (PM) can help the monitoring and reporting of this service delivery, measure compliance with guidelines, and assess effectiveness. In this research, PM extracts information about clinical activity recorded in dental electronic health records (EHRs) converts this into process-models providing stakeholders with unique insights to the dental treatment process. This thesis addresses a gap in prior research by demonstrating how process analytics can enhance our understanding of these processes and the effects of changes in strategy and policy over time. It also emphasises the importance of a rigorous and documented methodological approach often missing from the published literature. Aim: Apply the emerging technology of PM to an oral health dataset, illustrating the value of the data in the dental repository, and demonstrating how it can be presented in a useful and actionable manner to address public health questions. A subsidiary aim is to present the methodology used in this research in a way that provides useful guidance to future applications of dental PM. Objectives: Review dental and healthcare PM literature establishing state-of-the-art. Evaluate existing PM methods and their applicability to this research’s dataset. Extend existing PM methods achieving the aims of this research. Apply PM methods to the research dataset addressing public health questions. Document and present this research’s methodology. Apply data-mining, PM, and data-visualisation to provide insights into the variable pathways leading to different outcomes. Identify the data needed for PM of a dental EHR. Identify challenges to PM of dental EHR data. Methods: Extend existing PM methods to facilitate PM research in public health by detailing how data extracts from a dental EHR can be effectively managed, prepared, and used for PM. Use existing dental EHR and PM standards to generate a data reference model for effective PM. Develop a data-quality management framework. Results: Comparing the outputs of PM to established care-pathways showed that the dataset facilitated generation of high-level pathways but was less suitable for detailed guidelines. Used PM to identify the care pathway preceding a dental extraction under general anaesthetic and provided unique insights into this and the effects of policy decisions around school dental screenings. Conclusions: Research showed that PM and data-mining techniques can be applied to dental EHR data leading to fresh insights about dental treatment processes. This emerging technology along with established data mining techniques, should provide valuable insights to policy makers such as principal and chief dental officers to inform care pathways and policy decisions

    Developing a distributed electronic health-record store for India

    Get PDF
    The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India

    Tackling Dierent Business Process Perspectives

    Get PDF
    Business Process Management (BPM) has emerged as a discipline to design, control, analyze, and optimize business operations. Conceptual models lie at the core of BPM. In particular, business process models have been taken up by organizations as a means to describe the main activities that are performed to achieve a specific business goal. Process models generally cover different perspectives that underlie separate yet interrelated representations for analyzing and presenting process information. Being primarily driven by process improvement objectives, traditional business process modeling languages focus on capturing the control flow perspective of business processes, that is, the temporal and logical coordination of activities. Such approaches are usually characterized as \u201cactivity-centric\u201d. Nowadays, activity-centric process modeling languages, such as the Business Process Model and Notation (BPMN) standard, are still the most used in practice and benefit from industrial tool support. Nevertheless, evidence shows that such process modeling languages still lack of support for modeling non-control-flow perspectives, such as the temporal, informational, and decision perspectives, among others. This thesis centres on the BPMN standard and addresses the modeling the temporal, informational, and decision perspectives of process models, with particular attention to processes enacted in healthcare domains. Despite being partially interrelated, the main contributions of this thesis may be partitioned according to the modeling perspective they concern. The temporal perspective deals with the specification, management, and formal verification of temporal constraints. In this thesis, we address the specification and run-time management of temporal constraints in BPMN, by taking advantage of process modularity and of event handling mechanisms included in the standard. Then, we propose three different mappings from BPMN to formal models, to validate the behavior of the proposed process models and to check whether they are dynamically controllable. The informational perspective represents the information entities consumed, produced or manipulated by a process. This thesis focuses on the conceptual connection between processes and data, borrowing concepts from the database domain to enable the representation of which part of a database schema is accessed by a certain process activity. This novel conceptual view is then employed to detect potential data inconsistencies arising when the same data are accessed erroneously by different process activities. The decision perspective encompasses the modeling of the decision-making related to a process, considering where decisions are made in the process and how decision outcomes affect process execution. In this thesis, we investigate the use of the Decision Model and Notation (DMN) standard in conjunction with BPMN starting from a pattern-based approach to ease the derivation of DMN decision models from the data represented in BPMN processes. Besides, we propose a methodology that focuses on the integrated use of BPMN and DMN for modeling decision-intensive care pathways in a real-world application domain
    • …
    corecore