11 research outputs found

    Comparison Between Hotdeck Method and Regression Method in Handling Health Science Missing Data

    Get PDF
    Introduction: Missing data or missing value is information that is not available on a subject (case). Missing data occurs because some information on the object is not given, thus it is difficult to find or the actual information does not exist. The case of missing data is ignored as it will certainly make it difficult to obtain a high accuracy for result classification even though the most reliable classification algorithm is used. One method in handling the missing data problem is by imputation. Multiple imputation methods can be used to replace missing data with a constant value, hot deck, regression method, expectation maximization method, and multiple imputation. Purpose: To analyze, compare, and determine the best imputation method of missing data between hot deck and regression methods.Materials and Methods: Data used is the data of respondents who practice family planning in the town of Pasuruan, East Java, Indonesia, and age variable. Variable age is used as the simulation data is lost, then imputated by hot deck or regression. The original data results will be compared with the imputed data using t-test, Pearson correlation, and root mean square error (RMSE) test. Results: Results of imputation using simulated data age variable show that regression method is better than hot deck method in handling missing data on health science. Conclusion: The best method views from the results are not significant P value, r value close +1, and smallest RMSE value. Hot deck method resulted in P value not significant at 5% missing data, but the method has small r values even negative and RMSE were great. Regression method resulted in P value not significant data missing 5% and 10%. Besides looking at the results of the consistency analysis views also repeat values of P, r, and RMSE of value three methods

    Automated anomaly recognition in real time data streams for oil and gas industry.

    Get PDF
    There is a growing demand for computer-assisted real-time anomaly detection - from the identification of suspicious activities in cyber security, to the monitoring of engineering data for various applications across the oil and gas, automotive and other engineering industries. To reduce the reliance on field experts' knowledge for identification of these anomalies, this thesis proposes a deep-learning anomaly-detection framework that can help to create an effective real-time condition-monitoring framework. The aim of this research is to develop a real-time and re-trainable generic anomaly-detection framework, which is capable of predicting and identifying anomalies with a high level of accuracy - even when a specific anomalous event has no precedent. Machine-based condition monitoring is preferable in many practical situations where fast data analysis is required, and where there are harsh climates or otherwise life-threatening environments. For example, automated conditional monitoring systems are ideal in deep sea exploration studies, offshore installations and space exploration. This thesis firstly reviews studies about anomaly detection using machine learning. It then adopts the best practices from those studies in order to propose a multi-tiered framework for anomaly detection with heterogeneous input sources, which can deal with unseen anomalies in a real-time dynamic problem environment. The thesis then applies the developed generic multi-tiered framework to two fields of engineering: data analysis and malicious cyber attack detection. Finally, the framework is further refined based on the outcomes of those case studies and is used to develop a secure cross-platform API, capable of re-training and data classification on a real-time data feed

    Graph-based Analysis of Dynamic Systems

    Get PDF
    The analysis of dynamic systems provides insights into their time-dependent characteristics. This enables us to monitor, evaluate, and improve systems from various areas. They are often represented as graphs that model the system's components and their relations. The analysis of the resulting dynamic graphs yields great insights into the system's underlying structure, its characteristics, as well as properties of single components. The interpretation of these results can help us understand how a system works and how parameters influence its performance. This knowledge supports the design of new systems and the improvement of existing ones. The main issue in this scenario is the performance of analyzing the dynamic graph to obtain relevant properties. While various approaches have been developed to analyze dynamic graphs, it is not always clear which one performs best for the analysis of a specific graph. The runtime also depends on many other factors, including the size and topology of the graph, the frequency of changes, and the data structures used to represent the graph in memory. While the benefits and drawbacks of many data structures are well-known, their runtime is hard to predict when used for the representation of dynamic graphs. Hence, tools are required to benchmark and compare different algorithms for the computation of graph properties and data structures for the representation of dynamic graphs in memory. Based on deeper insights into their performance, new algorithms can be developed and efficient data structures can be selected. In this thesis, we present four contributions to tackle these problems: A benchmarking framework for dynamic graph analysis, novel algorithms for the efficient analysis of dynamic graphs, an approach for the parallelization of dynamic graph analysis, and a novel paradigm to select and adapt graph data structures. In addition, we present three use cases from the areas of social, computer, and biological networks to illustrate the great insights provided by their graph-based analysis. We present a new benchmarking framework for the analysis of dynamic graphs, the Dynamic Network Analyzer (DNA). It provides tools to benchmark and compare different algorithms for the analysis of dynamic graphs as well as the data structures used to represent them in memory. DNA supports the development of new algorithms and the automatic verification of their results. Its visualization component provides different ways to represent dynamic graphs and the results of their analysis. We introduce three new stream-based algorithms for the analysis of dynamic graphs. We evaluate their performance on synthetic as well as real-world dynamic graphs and compare their runtimes to snapshot-based algorithms. Our results show great performance gains for all three algorithms. The new stream-based algorithm StreaM_k, which counts the frequencies of k-vertex motifs, achieves speedups up to 19,043 x for synthetic and 2882 x for real-world datasets. We present a novel approach for the distributed processing of dynamic graphs, called parallel Dynamic Graph Analysis (pDNA). To analyze a dynamic graph, the work is distributed by a partitioner that creates subgraphs and assigns them to workers. They compute the properties of their respective subgraph using standard algorithms. Their results are used by the collator component to merge them to the properties of the original graph. We evaluate the performance of pDNA for the computation of five graph properties on two real-world dynamic graphs with up to 32 workers. Our approach achieves great speedups, especially for the analysis of complex graph measures. We introduce two novel approaches for the selection of efficient graph data structures. The compile-time approach estimates the workload of an analysis after an initial profiling phase and recommends efficient data structures based on benchmarking results. It achieves speedups of up to 5.4 x over baseline data structure configurations for the analysis of real-word dynamic graphs. The run-time approach monitors the workload during analysis and exchanges the graph representation if it finds a configuration that promises to be more efficient for the current workload. Compared to baseline configurations, it achieves speedups up to 7.3 x for the analysis of a synthetic workload. Our contributions provide novel approaches for the efficient analysis of dynamic graphs and tools to further investigate the trade-offs between different factors that influence the performance.:1 Introduction 2 Notation and Terminology 3 Related Work 4 DNA - Dynamic Network Analyzer 5 Algorithms 6 Parallel Dynamic Network Analysis 7 Selection of Efficient Graph Data Structures 8 Use Cases 9 Conclusion A DNA - Dynamic Network Analyzer B Algorithms C Selection of Efficient Graph Data Structures D Parallel Dynamic Network Analysis E Graph-based Intrusion Detection System F Molecular Dynamic

    Graph-based Analysis of Dynamic Systems

    Get PDF
    The analysis of dynamic systems provides insights into their time-dependent characteristics. This enables us to monitor, evaluate, and improve systems from various areas. They are often represented as graphs that model the system's components and their relations. The analysis of the resulting dynamic graphs yields great insights into the system's underlying structure, its characteristics, as well as properties of single components. The interpretation of these results can help us understand how a system works and how parameters influence its performance. This knowledge supports the design of new systems and the improvement of existing ones. The main issue in this scenario is the performance of analyzing the dynamic graph to obtain relevant properties. While various approaches have been developed to analyze dynamic graphs, it is not always clear which one performs best for the analysis of a specific graph. The runtime also depends on many other factors, including the size and topology of the graph, the frequency of changes, and the data structures used to represent the graph in memory. While the benefits and drawbacks of many data structures are well-known, their runtime is hard to predict when used for the representation of dynamic graphs. Hence, tools are required to benchmark and compare different algorithms for the computation of graph properties and data structures for the representation of dynamic graphs in memory. Based on deeper insights into their performance, new algorithms can be developed and efficient data structures can be selected. In this thesis, we present four contributions to tackle these problems: A benchmarking framework for dynamic graph analysis, novel algorithms for the efficient analysis of dynamic graphs, an approach for the parallelization of dynamic graph analysis, and a novel paradigm to select and adapt graph data structures. In addition, we present three use cases from the areas of social, computer, and biological networks to illustrate the great insights provided by their graph-based analysis. We present a new benchmarking framework for the analysis of dynamic graphs, the Dynamic Network Analyzer (DNA). It provides tools to benchmark and compare different algorithms for the analysis of dynamic graphs as well as the data structures used to represent them in memory. DNA supports the development of new algorithms and the automatic verification of their results. Its visualization component provides different ways to represent dynamic graphs and the results of their analysis. We introduce three new stream-based algorithms for the analysis of dynamic graphs. We evaluate their performance on synthetic as well as real-world dynamic graphs and compare their runtimes to snapshot-based algorithms. Our results show great performance gains for all three algorithms. The new stream-based algorithm StreaM_k, which counts the frequencies of k-vertex motifs, achieves speedups up to 19,043 x for synthetic and 2882 x for real-world datasets. We present a novel approach for the distributed processing of dynamic graphs, called parallel Dynamic Graph Analysis (pDNA). To analyze a dynamic graph, the work is distributed by a partitioner that creates subgraphs and assigns them to workers. They compute the properties of their respective subgraph using standard algorithms. Their results are used by the collator component to merge them to the properties of the original graph. We evaluate the performance of pDNA for the computation of five graph properties on two real-world dynamic graphs with up to 32 workers. Our approach achieves great speedups, especially for the analysis of complex graph measures. We introduce two novel approaches for the selection of efficient graph data structures. The compile-time approach estimates the workload of an analysis after an initial profiling phase and recommends efficient data structures based on benchmarking results. It achieves speedups of up to 5.4 x over baseline data structure configurations for the analysis of real-word dynamic graphs. The run-time approach monitors the workload during analysis and exchanges the graph representation if it finds a configuration that promises to be more efficient for the current workload. Compared to baseline configurations, it achieves speedups up to 7.3 x for the analysis of a synthetic workload. Our contributions provide novel approaches for the efficient analysis of dynamic graphs and tools to further investigate the trade-offs between different factors that influence the performance.:1 Introduction 2 Notation and Terminology 3 Related Work 4 DNA - Dynamic Network Analyzer 5 Algorithms 6 Parallel Dynamic Network Analysis 7 Selection of Efficient Graph Data Structures 8 Use Cases 9 Conclusion A DNA - Dynamic Network Analyzer B Algorithms C Selection of Efficient Graph Data Structures D Parallel Dynamic Network Analysis E Graph-based Intrusion Detection System F Molecular Dynamic

    Proceedings of the ECMLPKDD 2015 Doctoral Consortium

    Get PDF
    ECMLPKDD 2015 Doctoral Consortium was organized for the second time as part of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD), organised in Porto during September 7-11, 2015. The objective of the doctoral consortium is to provide an environment for students to exchange their ideas and experiences with peers in an interactive atmosphere and to get constructive feedback from senior researchers in machine learning, data mining, and related areas. These proceedings collect together and document all the contributions of the ECMLPKDD 2015 Doctoral Consortium

    교통 패턴 분석과 비정상 탐지를 위한 온라인 추론 모델

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 최진영.In this thesis, we propose a method for modeling trajectory patterns with both regional and velocity observations through the probabilistic inference model. By embedding Gaussian models into the discrete topic model framework, our method uses continuous velocity as well as regional observations unlike existing approaches. In addition, the proposed framework combined with Hidden Markov Model can cover the temporal transition of the scene state, which is useful in checking a violation of the rule that some conflict topics (e.g. two cross-traffic patterns) should not occur at the same time. To achieve online learning even with the complexity of the proposed model, we suggest a novel learning scheme instead of collapsed Gibbs sampling. The proposed two-stage greedy learning scheme is not only efficient at reducing the search space but also accurate in a way that the accuracy of online learning becomes not worse than that of the batch learning. To validate the performance of our method, experiments were conducted on various datasets. Experimental results show that our model explains satisfactorily the trajectory patterns with respect to scene understanding, anomaly detection, and prediction.Abstract Chapter 1 Introduction 1.1 Statement of Problem 1.2 Related Works 1.2.1 Motion Pattern Analysis Using Trajectory 1.2.2 Motion Pattern Analysis Using Local Motions 1.3 Contributions 1.4 Thesis Organization Chapter 2 Preliminaries 2.1 Latent Dirichlet Allocation (LDA) 2.1.1 Probabilistic Graphical Model 2.1.2 LDA Property & Formulation 2.2 Inference of LDA 2.2.1 Collapsed Gibbs Sampling 2.2.2 Variational Inference Chapter 3 Proposed Approach 3.1 Probabilistic Inference Model 3.2 Model Learning 3.2.1 Online Trajectory Clustering 3.2.2 Spatio-Temporal Dependency of Activities 3.2.3 Velocity Learning 3.3 Anomaly Detection 3.4 Summary of the Proposed Method Chapter 4 Experiments 4.1 Result of Traffic Pattern Understanding 4.2 Applications in Anomaly Detection 4.3 Prediction Task 4.4 Comparison with Sampling Chapter 5 Conculsion 5.1 Concluding Remarks 5.2 Future Works 초록Docto

    Shallow Representations, Profound Discoveries : A methodological study of game culture in social media

    Get PDF
    This thesis explores the potential of representation learning techniques in game studies, highlighting their effectiveness and addressing challenges in data analysis. The primary focus of this thesis is shallow representation learning, which utilizes simpler model architectures but is able to yield effective modeling results. This thesis investigates the following research objectives: disentangling the dependencies of data, modeling temporal dynamics, learning multiple representations, and learning from heterogeneous data. The contributions of this thesis are made from two perspectives: empirical analysis and methodology development, to address these objectives. Chapters 1 and 2 provide a thorough introduction, motivation, and necessary background information for the thesis, framing the research and setting the stage for subsequent publications. Chapters 3 to 5 summarize the contribution of the 6 publications, each of which contributes to demonstrating the effectiveness of representation learning techniques in addressing various analytical challenges. In Chapter 1 and 2, the research objects and questions are also motivated and described. In particular, Introduction to the primary application field game studies is provided and the connections of data analysis and game culture is highlighted. Basic notion of representation learning, and canonical techniques such as probabilistic principal component analysis, topic modeling, and embedding models are described. Analytical challenges and data types are also described to motivate the research of this thesis. Chapter 3 presents two empirical analyses conducted in Publication I and II that present empirical data analysis on player typologies and temporal dynamics of player perceptions. The first empirical analysis takes the advantage of a factor model to offer a flexible player typology analysis. Results and analytical framework are particularly useful for personalized gamification. The Second empirical analysis uses topic modeling to analyze the temporal dynamic of player perceptions of the game No Man’s Sky in relation to game changes. The results reflect a variety of player perceptions including general gaming activities, game mechanic. Moreover, a set of underlying topics that are directly related to game updates and changes are extracted and the temporal dynamics of them have reflected that players responds differently to different updates and changes. Chapter 4 presents two method developments that are related to factor models. The first method, DNBGFA, developed in Publication III, is a matrix factorization model for modeling the temporal dynamics of non-negative matrices from multiple sources. The second mothod, CFTM, developed in Publication IV introduces a factor model to a topic model to handle sophisticated document-level covariates. The develeopd methods in Chapter 4 are also demonstrated for analyzing text data. Chapter 5 summarizes Publication V and Publication VI that develop embedding models. Publication V introduces Bayesian non-parametric to a graph embedding model to learn multiple representations for nodes. Publication VI utilizes a Gaussian copula model to deal with heterogeneous data in representation learning. The develeopd methods in Chapter 5 are also demonstrated for data analysis tasks in the context of online communities. Lastly, Chapter 6 renders discussions and conclusions. Contributions of this thesis are highlighted, limitations, ongoing challenges, and potential future research directions are discussed

    30th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    Information modelling is becoming more and more important topic for researchers, designers, and users of information systems. The amount and complexity of information itself, the number of abstraction levels of information, and the size of databases and knowledge bases are continuously growing. Conceptual modelling is one of the sub-areas of information modelling. The aim of this conference is to bring together experts from different areas of computer science and other disciplines, who have a common interest in understanding and solving problems on information modelling and knowledge bases, as well as applying the results of research to practice. We also aim to recognize and study new areas on modelling and knowledge bases to which more attention should be paid. Therefore philosophy and logic, cognitive science, knowledge management, linguistics and management science are relevant areas, too. In the conference, there will be three categories of presentations, i.e. full papers, short papers and position papers

    Shortest Route at Dynamic Location with Node Combination-Dijkstra Algorithm

    Get PDF
    Abstract— Online transportation has become a basic requirement of the general public in support of all activities to go to work, school or vacation to the sights. Public transportation services compete to provide the best service so that consumers feel comfortable using the services offered, so that all activities are noticed, one of them is the search for the shortest route in picking the buyer or delivering to the destination. Node Combination method can minimize memory usage and this methode is more optimal when compared to A* and Ant Colony in the shortest route search like Dijkstra algorithm, but can’t store the history node that has been passed. Therefore, using node combination algorithm is very good in searching the shortest distance is not the shortest route. This paper is structured to modify the node combination algorithm to solve the problem of finding the shortest route at the dynamic location obtained from the transport fleet by displaying the nodes that have the shortest distance and will be implemented in the geographic information system in the form of map to facilitate the use of the system. Keywords— Shortest Path, Algorithm Dijkstra, Node Combination, Dynamic Location (key words
    corecore