397 research outputs found

    Movement Analytics: Current Status, Application to Manufacturing, and Future Prospects from an AI Perspective

    Full text link
    Data-driven decision making is becoming an integral part of manufacturing companies. Data is collected and commonly used to improve efficiency and produce high quality items for the customers. IoT-based and other forms of object tracking are an emerging tool for collecting movement data of objects/entities (e.g. human workers, moving vehicles, trolleys etc.) over space and time. Movement data can provide valuable insights like process bottlenecks, resource utilization, effective working time etc. that can be used for decision making and improving efficiency. Turning movement data into valuable information for industrial management and decision making requires analysis methods. We refer to this process as movement analytics. The purpose of this document is to review the current state of work for movement analytics both in manufacturing and more broadly. We survey relevant work from both a theoretical perspective and an application perspective. From the theoretical perspective, we put an emphasis on useful methods from two research areas: machine learning, and logic-based knowledge representation. We also review their combinations in view of movement analytics, and we discuss promising areas for future development and application. Furthermore, we touch on constraint optimization. From an application perspective, we review applications of these methods to movement analytics in a general sense and across various industries. We also describe currently available commercial off-the-shelf products for tracking in manufacturing, and we overview main concepts of digital twins and their applications

    Declarative techniques for modeling and mining business processes..

    Get PDF
    Organisaties worden vandaag de dag geconfronteerd met een schijnbare tegenstelling. Hoewel ze aan de ene kant veel geld geïnvesteerd hebben in informatiesystemen die hun bedrijfsprocessen automatiseren, lijken ze hierdoor minder in staat om een goed inzicht te krijgen in het verloop van deze processen. Een gebrekkig inzicht in de bedrijfsprocessen bedreigt hun flexibiliteit en conformiteit. Flexibiliteit is belangrijk, omdat organisaties door continu wijzigende marktomstandigheden gedwongen worden hun bedrijfsprocessen snel en soepel aan te passen. Daarnaast moeten organisaties ook kunnen garanderen dan hun bedrijfsvoering conform is aan de wetten, richtlijnen, en normen die hun opgelegd worden. Schandalen zoals de recent aan het licht gekomen fraude bij de Franse bank Société Générale toont het belang aan van conformiteit en flexibiliteit. Door het afleveren van valse bewijsstukken en het omzeilen van vaste controlemomenten, kon één effectenhandelaar een risicoloze arbitragehandel op prijsverschillen in futures omtoveren tot een risicovolle, speculatieve handel in deze financiële derivaten. De niet-ingedekte, niet-geautoriseerde posities bleven lange tijd verborgen door een gebrekkige interne controle, en tekortkomingen in de IT beveiliging en toegangscontrole. Om deze fraude in de toekomst te voorkomen, is het in de eerste plaats noodzakelijk om inzicht te verkrijgen in de operationele processen van de bank en de hieraan gerelateerde controleprocessen. In deze tekst behandelen we twee benaderingen die gebruikt kunnen worden om het inzicht in de bedrijfsprocessen te verhogen: procesmodellering en procesontginning. In het onderzoek is getracht technieken te ontwikkelen voor procesmodellering en procesontginning die declaratief zijn. Procesmodellering process modeling is de manuele constructie van een formeel model dat een relevant aspect van een bedrijfsproces beschrijft op basis van informatie die grotendeels verworven is uit interviews. Procesmodellen moeten adequate informatie te verschaffen over de bedrijfsprocessen om zinvol te kunnen worden gebruikt bij hun ontwerp, implementatie, uitvoering, en analyse. De uitdaging bestaat erin om nieuwe talen voor procesmodellering te ontwikkelen die adequate informatie verschaffen om deze doelstelling realiseren. Declaratieve procestalen maken de informatie omtrent bedrijfsbekommernissen expliciet. We karakteriseren en motiveren declaratieve procestalen, en nemen we een aantal bestaande technieken onder de loep. Voorts introduceren we een veralgemenend raamwerk voor declaratieve procesmodellering waarbinnen bestaande procestalen gepositioneerd kunnen worden. Dit raamwerk heet het EM-BrA�CE raamwerk, en staat voor `Enterprise Modeling using Business Rules, Agents, Activities, Concepts and Events'. Het bestaat uit een formele ontolgie en een formeel uitvoeringsmodel. Dit raamwerk legt de ontologische basis voor de talen en technieken die verder in het doctoraat ontwikkeld worden. Procesontginning process mining is de automatische constructie van een procesmodel op basis van de zogenaamde event logs uit informatiesystemen. Vandaag de dag worden heel wat processen door informatiesystemen in event logs geregistreerd. In event logs vindt men in chronologische volgorde terug wie, wanneer, welke activiteit verricht heeft. De analyse van event logs kan een accuraat beeld opleveren van wat er zich in werkelijkheid afspeelt binnen een organisatie. Om bruikbaar te zijn, moeten de ontgonnen procesmodellen voldoen aan criteria zoals accuraatheid, verstaanbaarheid, en justifieerbaarheid. Bestaande technieken voor procesontginning focussen vooral op het eerste criterium: accuraatheid. Declaratieve technieken voor procesontginning richten zich ook op de verstaanbaarheid en justifieerbaarheid van de ontgonnen modellen. Declaratieve technieken voor procesontginning zijn meer verstaanbaar omdat ze pogen procesmodellen voor te stellen aan de hand van declaratieve voorstellingsvormen. Daarenboven verhogen declaratieve technieken de justifieerbaarheid van de ontgonnen modellen. Dit komt omdat deze technieken toelaten de apriori kennis, inductieve bias, en taal bias van een leeralgoritme in te stellen. Inductief logisch programmeren (ILP) is een leertechniek die inherent declaratief is. In de tekst tonen we hoe proces mining voorgesteld kan worden als een ILP classificatieprobleem, dat de logische voorwaarden leert waaronder gebeurtenis plaats vindt (positief event) of niet plaatsvindt (een negatief event). Vele event logs bevatten van nature geen negatieve events die aangeven dat een bepaalde activiteit niet kon plaatsvinden. Om aan dit probleem tegemoet te komen, beschrijven we een techniek om artificiële negatieve events te genereren, genaamd AGNEs (process discovery by Artificially Generated Negative Events). De generatie van artificiële negatieve events komt neer op een configureerbare inductieve bias. De AGNEs techniek is geïmplementeerd als een mining plugin in het ProM raamwerk. Door process discovery voor te stellen als een eerste-orde classificatieprobleem op event logs met artificiële negatieve events, kunnen de traditionele metrieken voor het kwantificeren van precisie (precision) en volledigheid (recall) toegepast worden voor het kwantificeren van de precisie en volledigheid van een procesmodel ten opzicht van een event log. In de tekst stellen we twee nieuwe metrieken voor. Deze nieuwe metrieken, in combinatie met bestaande metrieken, werden gebruikt voor een uitgebreide evaluatie van de AGNEs techniek voor process discovery in zowel een experimentele als een praktijkopstelling.

    Scalable statistical learning for relation prediction on structured data

    Get PDF
    Relation prediction seeks to predict unknown but potentially true relations by revealing missing relations in available data, by predicting future events based on historical data, and by making predicted relations retrievable by query. The approach developed in this thesis can be used for a wide variety of purposes, including to predict likely new friends on social networks, attractive points of interest for an individual visiting an unfamiliar city, and associations between genes and particular diseases. In recent years, relation prediction has attracted significant interest in both research and application domains, partially due to the increasing volume of published structured data and background knowledge. In the Linked Open Data initiative of the Semantic Web, for instance, entities are uniquely identified such that the published information can be integrated into applications and services, and the rapid increase in the availability of such structured data creates excellent opportunities as well as challenges for relation prediction. This thesis focuses on the prediction of potential relations by exploiting regularities in data using statistical relational learning algorithms and applying these methods to relational knowledge bases, in particular in Linked Open Data in particular. We review representative statistical relational learning approaches, e.g., Inductive Logic Programming and Probabilistic Relational Models. While logic-based reasoning can infer and include new relations via deduction by using ontologies, machine learning can be exploited to predict new relations (with some degree of certainty) via induction, purely based on the data. Because the application of machine learning approaches to relation prediction usually requires handling large datasets, we also discuss the scalability of machine learning as a solution to relation prediction, as well as the significant challenge posed by incomplete relational data (such as social network data, which is often much more extensive for some users than others). The main contribution of this thesis is to develop a learning framework called the Statistical Unit Node Set (SUNS) and to propose a multivariate prediction approach used in the framework. We argue that multivariate prediction approaches are most suitable for dealing with large, sparse data matrices. According to the characteristics and intended application of the data, the approach can be extended in different ways. We discuss and test two extensions of the approach--kernelization and a probabilistic method of handling complex n-ary relationships--in empirical studies based on real-world data sets. Additionally, this thesis contributes to the field of relation prediction by applying the SUNS framework to various domains. We focus on three applications: 1. In social network analysis, we present a combined approach of inductive and deductive reasoning for recommending movies to users. 2. In the life sciences, we address the disease gene prioritization problem. 3. In the recommendation system, we describe and investigate the back-end of a mobile app called BOTTARI, which provides personalized location-based recommendations of restaurants.Die Beziehungsvorhersage strebt an, unbekannte aber potenziell wahre Beziehungen vorherzusagen, indem fehlende Relationen in verfügbaren Daten aufgedeckt, zukünftige Ereignisse auf der Grundlage historischer Daten prognostiziert und vorhergesagte Relationen durch Anfragen abrufbar gemacht werden. Der in dieser Arbeit entwickelte Ansatz lässt sich für eine Vielzahl von Zwecken einschließlich der Vorhersage wahrscheinlicher neuer Freunde in sozialen Netzen, der Empfehlung attraktiver Sehenswürdigkeiten für Touristen in fremden Städten und der Priorisierung möglicher Assoziationen zwischen Genen und bestimmten Krankheiten, verwenden. In den letzten Jahren hat die Beziehungsvorhersage sowohl in Forschungs- als auch in Anwendungsbereichen eine enorme Aufmerksamkeit erregt, aufgrund des Zuwachses veröffentlichter strukturierter Daten und von Hintergrundwissen. In der Linked Open Data-Initiative des Semantischen Web werden beispielsweise Entitäten eindeutig identifiziert, sodass die veröffentlichten Informationen in Anwendungen und Dienste integriert werden können. Diese rapide Erhöhung der Verfügbarkeit strukturierter Daten bietet hervorragende Gelegenheiten sowie Herausforderungen für die Beziehungsvorhersage. Diese Arbeit fokussiert sich auf die Vorhersage potenzieller Beziehungen durch Ausnutzung von Regelmäßigkeiten in Daten unter der Verwendung statistischer relationaler Lernalgorithmen und durch Einsatz dieser Methoden in relationale Wissensbasen, insbesondere in den Linked Open Daten. Wir geben einen Überblick über repräsentative statistische relationale Lernansätze, z.B. die Induktive Logikprogrammierung und Probabilistische Relationale Modelle. Während das logikbasierte Reasoning neue Beziehungen unter der Nutzung von Ontologien ableiten und diese einbeziehen kann, kann maschinelles Lernen neue Beziehungen (mit gewisser Wahrscheinlichkeit) durch Induktion ausschließlich auf der Basis der vorliegenden Daten vorhersagen. Da die Verarbeitung von massiven Datenmengen in der Regel erforderlich ist, wenn maschinelle Lernmethoden in die Beziehungsvorhersage eingesetzt werden, diskutieren wir auch die Skalierbarkeit des maschinellen Lernens sowie die erhebliche Herausforderung, die sich aus unvollständigen relationalen Daten ergibt (z. B. Daten aus sozialen Netzen, die oft für manche Benutzer wesentlich umfangreicher sind als für Anderen). Der Hauptbeitrag der vorliegenden Arbeit besteht darin, ein Lernframework namens Statistical Unit Node Set (SUNS) zu entwickeln und einen im Framework angewendeten multivariaten Prädiktionsansatz einzubringen. Wir argumentieren, dass multivariate Vorhersageansätze am besten für die Bearbeitung von großen und dünnbesetzten Datenmatrizen geeignet sind. Je nach den Eigenschaften und der beabsichtigten Anwendung der Daten kann der Ansatz auf verschiedene Weise erweitert werden. In empirischen Studien werden zwei Erweiterungen des Ansatzes--ein kernelisierter Ansatz sowie ein probabilistischer Ansatz zur Behandlung komplexer n-stelliger Beziehungen-- diskutiert und auf realen Datensätzen untersucht. Ein weiterer Beitrag dieser Arbeit ist die Anwendung des SUNS Frameworks auf verschiedene Bereiche. Wir konzentrieren uns auf drei Anwendungen: 1. In der Analyse sozialer Netze stellen wir einen kombinierten Ansatz von induktivem und deduktivem Reasoning vor, um Benutzern Filme zu empfehlen. 2. In den Biowissenschaften befassen wir uns mit dem Problem der Priorisierung von Krankheitsgenen. 3. In den Empfehlungssystemen beschreiben und untersuchen wir das Backend einer mobilen App "BOTTARI", das personalisierte ortsbezogene Empfehlungen von Restaurants bietet

    Scalable statistical learning for relation prediction on structured data

    Get PDF
    Relation prediction seeks to predict unknown but potentially true relations by revealing missing relations in available data, by predicting future events based on historical data, and by making predicted relations retrievable by query. The approach developed in this thesis can be used for a wide variety of purposes, including to predict likely new friends on social networks, attractive points of interest for an individual visiting an unfamiliar city, and associations between genes and particular diseases. In recent years, relation prediction has attracted significant interest in both research and application domains, partially due to the increasing volume of published structured data and background knowledge. In the Linked Open Data initiative of the Semantic Web, for instance, entities are uniquely identified such that the published information can be integrated into applications and services, and the rapid increase in the availability of such structured data creates excellent opportunities as well as challenges for relation prediction. This thesis focuses on the prediction of potential relations by exploiting regularities in data using statistical relational learning algorithms and applying these methods to relational knowledge bases, in particular in Linked Open Data in particular. We review representative statistical relational learning approaches, e.g., Inductive Logic Programming and Probabilistic Relational Models. While logic-based reasoning can infer and include new relations via deduction by using ontologies, machine learning can be exploited to predict new relations (with some degree of certainty) via induction, purely based on the data. Because the application of machine learning approaches to relation prediction usually requires handling large datasets, we also discuss the scalability of machine learning as a solution to relation prediction, as well as the significant challenge posed by incomplete relational data (such as social network data, which is often much more extensive for some users than others). The main contribution of this thesis is to develop a learning framework called the Statistical Unit Node Set (SUNS) and to propose a multivariate prediction approach used in the framework. We argue that multivariate prediction approaches are most suitable for dealing with large, sparse data matrices. According to the characteristics and intended application of the data, the approach can be extended in different ways. We discuss and test two extensions of the approach--kernelization and a probabilistic method of handling complex n-ary relationships--in empirical studies based on real-world data sets. Additionally, this thesis contributes to the field of relation prediction by applying the SUNS framework to various domains. We focus on three applications: 1. In social network analysis, we present a combined approach of inductive and deductive reasoning for recommending movies to users. 2. In the life sciences, we address the disease gene prioritization problem. 3. In the recommendation system, we describe and investigate the back-end of a mobile app called BOTTARI, which provides personalized location-based recommendations of restaurants.Die Beziehungsvorhersage strebt an, unbekannte aber potenziell wahre Beziehungen vorherzusagen, indem fehlende Relationen in verfügbaren Daten aufgedeckt, zukünftige Ereignisse auf der Grundlage historischer Daten prognostiziert und vorhergesagte Relationen durch Anfragen abrufbar gemacht werden. Der in dieser Arbeit entwickelte Ansatz lässt sich für eine Vielzahl von Zwecken einschließlich der Vorhersage wahrscheinlicher neuer Freunde in sozialen Netzen, der Empfehlung attraktiver Sehenswürdigkeiten für Touristen in fremden Städten und der Priorisierung möglicher Assoziationen zwischen Genen und bestimmten Krankheiten, verwenden. In den letzten Jahren hat die Beziehungsvorhersage sowohl in Forschungs- als auch in Anwendungsbereichen eine enorme Aufmerksamkeit erregt, aufgrund des Zuwachses veröffentlichter strukturierter Daten und von Hintergrundwissen. In der Linked Open Data-Initiative des Semantischen Web werden beispielsweise Entitäten eindeutig identifiziert, sodass die veröffentlichten Informationen in Anwendungen und Dienste integriert werden können. Diese rapide Erhöhung der Verfügbarkeit strukturierter Daten bietet hervorragende Gelegenheiten sowie Herausforderungen für die Beziehungsvorhersage. Diese Arbeit fokussiert sich auf die Vorhersage potenzieller Beziehungen durch Ausnutzung von Regelmäßigkeiten in Daten unter der Verwendung statistischer relationaler Lernalgorithmen und durch Einsatz dieser Methoden in relationale Wissensbasen, insbesondere in den Linked Open Daten. Wir geben einen Überblick über repräsentative statistische relationale Lernansätze, z.B. die Induktive Logikprogrammierung und Probabilistische Relationale Modelle. Während das logikbasierte Reasoning neue Beziehungen unter der Nutzung von Ontologien ableiten und diese einbeziehen kann, kann maschinelles Lernen neue Beziehungen (mit gewisser Wahrscheinlichkeit) durch Induktion ausschließlich auf der Basis der vorliegenden Daten vorhersagen. Da die Verarbeitung von massiven Datenmengen in der Regel erforderlich ist, wenn maschinelle Lernmethoden in die Beziehungsvorhersage eingesetzt werden, diskutieren wir auch die Skalierbarkeit des maschinellen Lernens sowie die erhebliche Herausforderung, die sich aus unvollständigen relationalen Daten ergibt (z. B. Daten aus sozialen Netzen, die oft für manche Benutzer wesentlich umfangreicher sind als für Anderen). Der Hauptbeitrag der vorliegenden Arbeit besteht darin, ein Lernframework namens Statistical Unit Node Set (SUNS) zu entwickeln und einen im Framework angewendeten multivariaten Prädiktionsansatz einzubringen. Wir argumentieren, dass multivariate Vorhersageansätze am besten für die Bearbeitung von großen und dünnbesetzten Datenmatrizen geeignet sind. Je nach den Eigenschaften und der beabsichtigten Anwendung der Daten kann der Ansatz auf verschiedene Weise erweitert werden. In empirischen Studien werden zwei Erweiterungen des Ansatzes--ein kernelisierter Ansatz sowie ein probabilistischer Ansatz zur Behandlung komplexer n-stelliger Beziehungen-- diskutiert und auf realen Datensätzen untersucht. Ein weiterer Beitrag dieser Arbeit ist die Anwendung des SUNS Frameworks auf verschiedene Bereiche. Wir konzentrieren uns auf drei Anwendungen: 1. In der Analyse sozialer Netze stellen wir einen kombinierten Ansatz von induktivem und deduktivem Reasoning vor, um Benutzern Filme zu empfehlen. 2. In den Biowissenschaften befassen wir uns mit dem Problem der Priorisierung von Krankheitsgenen. 3. In den Empfehlungssystemen beschreiben und untersuchen wir das Backend einer mobilen App "BOTTARI", das personalisierte ortsbezogene Empfehlungen von Restaurants bietet

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    Formal Methods Specification and Analysis Guidebook for the Verification of Software and Computer Systems

    Get PDF
    This guidebook, the second of a two-volume series, is intended to facilitate the transfer of formal methods to the avionics and aerospace community. The 1st volume concentrates on administrative and planning issues [NASA-95a], and the second volume focuses on the technical issues involved in applying formal methods to avionics and aerospace software systems. Hereafter, the term "guidebook" refers exclusively to the second volume of the series. The title of this second volume, A Practitioner's Companion, conveys its intent. The guidebook is written primarily for the nonexpert and requires little or no prior experience with formal methods techniques and tools. However, it does attempt to distill some of the more subtle ingredients in the productive application of formal methods. To the extent that it succeeds, those conversant with formal methods will also nd the guidebook useful. The discussion is illustrated through the development of a realistic example, relevant fragments of which appear in each chapter. The guidebook focuses primarily on the use of formal methods for analysis of requirements and high-level design, the stages at which formal methods have been most productively applied. Although much of the discussion applies to low-level design and implementation, the guidebook does not discuss issues involved in the later life cycle application of formal methods

    First IJCAI International Workshop on Graph Structures for Knowledge Representation and Reasoning (GKR@IJCAI'09)

    Get PDF
    International audienceThe development of effective techniques for knowledge representation and reasoning (KRR) is a crucial aspect of successful intelligent systems. Different representation paradigms, as well as their use in dedicated reasoning systems, have been extensively studied in the past. Nevertheless, new challenges, problems, and issues have emerged in the context of knowledge representation in Artificial Intelligence (AI), involving the logical manipulation of increasingly large information sets (see for example Semantic Web, BioInformatics and so on). Improvements in storage capacity and performance of computing infrastructure have also affected the nature of KRR systems, shifting their focus towards representational power and execution performance. Therefore, KRR research is faced with a challenge of developing knowledge representation structures optimized for large scale reasoning. This new generation of KRR systems includes graph-based knowledge representation formalisms such as Bayesian Networks (BNs), Semantic Networks (SNs), Conceptual Graphs (CGs), Formal Concept Analysis (FCA), CPnets, GAI-nets, all of which have been successfully used in a number of applications. The goal of this workshop is to bring together the researchers involved in the development and application of graph-based knowledge representation formalisms and reasoning techniques

    Approximate Data Mining Techniques on Clinical Data

    Get PDF
    The past two decades have witnessed an explosion in the number of medical and healthcare datasets available to researchers and healthcare professionals. Data collection efforts are highly required, and this prompts the development of appropriate data mining techniques and tools that can automatically extract relevant information from data. Consequently, they provide insights into various clinical behaviors or processes captured by the data. Since these tools should support decision-making activities of medical experts, all the extracted information must be represented in a human-friendly way, that is, in a concise and easy-to-understand form. To this purpose, here we propose a new framework that collects different new mining techniques and tools proposed. These techniques mainly focus on two aspects: the temporal one and the predictive one. All of these techniques were then applied to clinical data and, in particular, ICU data from MIMIC III database. It showed the flexibility of the framework, which is able to retrieve different outcomes from the overall dataset. The first two techniques rely on the concept of Approximate Temporal Functional Dependencies (ATFDs). ATFDs have been proposed, with their suitable treatment of temporal information, as a methodological tool for mining clinical data. An example of the knowledge derivable through dependencies may be "within 15 days, patients with the same diagnosis and the same therapy usually receive the same daily amount of drug". However, current ATFD models are not analyzing the temporal evolution of the data, such as "For most patients with the same diagnosis, the same drug is prescribed after the same symptom". To this extent, we propose a new kind of ATFD called Approximate Pure Temporally Evolving Functional Dependencies (APEFDs). Another limitation of such kind of dependencies is that they cannot deal with quantitative data when some tolerance can be allowed for numerical values. In particular, this limitation arises in clinical data warehouses, where analysis and mining have to consider one or more measures related to quantitative data (such as lab test results and vital signs), concerning multiple dimensional (alphanumeric) attributes (such as patient, hospital, physician, diagnosis) and some time dimensions (such as the day since hospitalization and the calendar date). According to this scenario, we introduce a new kind of ATFD, named Multi-Approximate Temporal Functional Dependency (MATFD), which considers dependencies between dimensions and quantitative measures from temporal clinical data. These new dependencies may provide new knowledge as "within 15 days, patients with the same diagnosis and the same therapy receive a daily amount of drug within a fixed range". The other techniques are based on pattern mining, which has also been proposed as a methodological tool for mining clinical data. However, many methods proposed so far focus on mining of temporal rules which describe relationships between data sequences or instantaneous events, without considering the presence of more complex temporal patterns into the dataset. These patterns, such as trends of a particular vital sign, are often very relevant for clinicians. Moreover, it is really interesting to discover if some sort of event, such as a drug administration, is capable of changing these trends and how. To this extent, we propose a new kind of temporal patterns, called Trend-Event Patterns (TEPs), that focuses on events and their influence on trends that can be retrieved from some measures, such as vital signs. With TEPs we can express concepts such as "The administration of paracetamol on a patient with an increasing temperature leads to a decreasing trend in temperature after such administration occurs". We also decided to analyze another interesting pattern mining technique that includes prediction. This technique discovers a compact set of patterns that aim to describe the condition (or class) of interest. Our framework relies on a classification model that considers and combines various predictive pattern candidates and selects only those that are important to improve the overall class prediction performance. We show that our classification approach achieves a significant reduction in the number of extracted patterns, compared to the state-of-the-art methods based on minimum predictive pattern mining approach, while preserving the overall classification accuracy of the model. For each technique described above, we developed a tool to retrieve its kind of rule. All the results are obtained by pre-processing and mining clinical data and, as mentioned before, in particular ICU data from MIMIC III database

    Providing Information by Resource- Constrained Data Analysis

    Get PDF
    The Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school

    Parallel Methods for Mining Frequent Sequential patterns

    Get PDF
    The explosive growth of data and the rapid progress of technology have led to a huge amount of data that is collected every day. In that data volume contains much valuable information. Data mining is the emerging field of applying statistical and artificial intelligence techniques to the problem of finding novel, useful and non-trivial patterns from large databases. It is the task of discovering interesting patterns from large amounts of data. This is achieved by determining both implicit and explicit unidentified patterns in data that can direct the process of decision making. There are many data mining tasks, such as classification, clustering, association rule mining and sequential pattern mining. In that, sequential pattern mining is an important problem in data mining. It provides an effective way to analyze the sequence data. The goal of sequential pattern mining is to discover interesting, unexpected and useful patterns from sequence databases. This task is used in many wide applications such as financial data analysis of banks, retail industry, customer shopping history, goods transportation, consumption and services, telecommunication industry, biological data analysis, scientific applications, network intrusion detection, scientific research, etc. Different types of sequential pattern mining can be performed, they are sequential patterns, maximal sequential patterns, closed sequences, constraint based and time interval based sequential patterns. Sequential pattern mining refers to the identification of frequent subsequences in sequence databases as patterns. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent sequential patterns, in which the downward closure property plays a fundamental role. Sequential pattern is a sequence of itemsets that frequently occur in a specific order, where all items in the same itemsets are supposed to have the same transaction time value. One of the challenges for sequential pattern mining is the computational costs beside that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for sequential pattern mining and develop parallel methods for mining frequent sequential patterns in sequence databases that can tackle emerging data processing workloads while coping with larger and larger scales.The explosive growth of data and the rapid progress of technology have led to a huge amount of data that is collected every day. In that data volume contains much valuable information. Data mining is the emerging field of applying statistical and artificial intelligence techniques to the problem of finding novel, useful and non-trivial patterns from large databases. It is the task of discovering interesting patterns from large amounts of data. This is achieved by determining both implicit and explicit unidentified patterns in data that can direct the process of decision making. There are many data mining tasks, such as classification, clustering, association rule mining and sequential pattern mining. In that, sequential pattern mining is an important problem in data mining. It provides an effective way to analyze the sequence data. The goal of sequential pattern mining is to discover interesting, unexpected and useful patterns from sequence databases. This task is used in many wide applications such as financial data analysis of banks, retail industry, customer shopping history, goods transportation, consumption and services, telecommunication industry, biological data analysis, scientific applications, network intrusion detection, scientific research, etc. Different types of sequential pattern mining can be performed, they are sequential patterns, maximal sequential patterns, closed sequences, constraint based and time interval based sequential patterns. Sequential pattern mining refers to the identification of frequent subsequences in sequence databases as patterns. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent sequential patterns, in which the downward closure property plays a fundamental role. Sequential pattern is a sequence of itemsets that frequently occur in a specific order, where all items in the same itemsets are supposed to have the same transaction time value. One of the challenges for sequential pattern mining is the computational costs beside that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for sequential pattern mining and develop parallel methods for mining frequent sequential patterns in sequence databases that can tackle emerging data processing workloads while coping with larger and larger scales.460 - Katedra informatikyvyhově
    corecore