131 research outputs found

    An intelligent information forwarder for healthcare big data systems with distributed wearable sensors

    Get PDF
    © 2016 IEEE. An increasing number of the elderly population wish to live an independent lifestyle, rather than rely on intrusive care programmes. A big data solution is presented using wearable sensors capable of carrying out continuous monitoring of the elderly, alerting the relevant caregivers when necessary and forwarding pertinent information to a big data system for analysis. A challenge for such a solution is the development of context-awareness through the multidimensional, dynamic and nonlinear sensor readings that have a weak correlation with observable human behaviours and health conditions. To address this challenge, a wearable sensor system with an intelligent data forwarder is discussed in this paper. The forwarder adopts a Hidden Markov Model for human behaviour recognition. Locality sensitive hashing is proposed as an efficient mechanism to learn sensor patterns. A prototype solution is implemented to monitor health conditions of dispersed users. It is shown that the intelligent forwarders can provide the remote sensors with context-awareness. They transmit only important information to the big data server for analytics when certain behaviours happen and avoid overwhelming communication and data storage. The system functions unobtrusively, whilst giving the users peace of mind in the knowledge that their safety is being monitored and analysed

    On the Use of Software Tracing and Boolean Combination of Ensemble Classifiers to Support Software Reliability and Security Tasks

    Get PDF
    In this thesis, we propose an approach that relies on Boolean combination of multiple one-class classification methods based on Hidden Markov Models (HMMs), which are pruned using weighted Kappa coefficient to select and combine accurate and diverse classifiers. Our approach, called WPIBC (Weighted Pruning Iterative Boolean Combination) works in three phases. The first phase selects a subset of the available base diverse soft classifiers by pruning all the redundant soft classifiers based on a weighted version of Cohen’s kappa measure of agreement. The second phase selects a subset of diverse and accurate crisp classifiers from the base soft classifiers (selected in Phase1) based on the unweighted kappa measure. The selected complementary crisp classifiers are then combined in the final phase using Boolean combinations. We apply the proposed approach to two important problems in software security and reliability: The detection of system anomalies and the prediction of the reassignment of bug report fields. Detecting system anomalies at run-time is a critical component of system reliability and security. Studies in this area focus mainly on the effectiveness of the proposed approaches -the ability to detect anomalies with high accuracy. Less attention was given to false alarm and efficiency. Although ensemble approaches for the detection of anomalies that use Boolean combination of classifier decisions have been shown to be useful in reducing the false alarm rate over that of a single classifier, existing methods rely on an exponential number of combinations making them impractical even for a small number of classifiers. Our approach is not only able to maintain and even improve the accuracy of existing Boolean combination techniques, but also significantly reduce the combination time and the number of classifiers selected for combination. The second application domain of our approach is the prediction of the reassignment of bug report fields. Bug reports contain a wealth of information that is used by triaging and development teams to understand the causes of bugs in order to provide fixes. The problem is that, for various reasons, it is common to have bug reports with missing or incorrect information, hindering the bug resolution process. To address this problem. researchers have turned to machine learning techniques. The common practice is to build models that leverage historical bug reports to automatically predict when a given bug report field should be reassigned. Existing approaches have mainly relied upon classifiers that make use of natural language in the title and description of the bug reports. They fail to take advantage of the richly detailed sequential information that is present in stack traces included in bug reports. To address this, we propose an approach called EnHMM which uses WPIBC and stack traces to predict the reassignment of bug report fields. Another contribution of this thesis is an approach to improve the efficiency of WPIBC by leveraging the Hadoop framework and the MapReduce programming model. We also show how WPIBC can be extended to support heterogenous classifiers

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Detecting Anomalies From Big Data System Logs

    Get PDF
    Nowadays, big data systems (e.g., Hadoop and Spark) are being widely adopted by many domains for offering effective data solutions, such as manufacturing, healthcare, education, and media. A common problem about big data systems is called anomaly, e.g., a status deviated from normal execution, which decreases the performance of computation or kills running programs. It is becoming a necessity to detect anomalies and analyze their causes. An effective and economical approach is to analyze system logs. Big data systems produce numerous unstructured logs that contain buried valuable information. However manually detecting anomalies from system logs is a tedious and daunting task. This dissertation proposes four approaches that can accurately and automatically analyze anomalies from big data system logs without extra monitoring overhead. Moreover, to detect abnormal tasks in Spark logs and analyze root causes, we design a utility to conduct fault injection and collect logs from multiple compute nodes. (1) Our first method is a statistical-based approach that can locate those abnormal tasks and calculate the weights of factors for analyzing the root causes. In the experiment, four potential root causes are considered, i.e., CPU, memory, network, and disk I/O. The experimental results show that the proposed approach is accurate in detecting abnormal tasks as well as finding the root causes. (2) To give a more reasonable probability result and avoid ad-hoc factor weights calculating, we propose a neural network approach to analyze root causes of abnormal tasks. We leverage General Regression Neural Network (GRNN) to identify root causes for abnormal tasks. The likelihood of reported root causes is presented to users according to the weighted factors by GRNN. (3) To further improve anomaly detection by avoiding feature extraction, we propose a novel approach by leveraging Convolutional Neural Networks (CNN). Our proposed model can automatically learn event relationships in system logs and detect anomaly with high accuracy. Our deep neural network consists of logkey2vec embeddings, three 1D convolutional layers, a dropout layer, and max pooling. According to our experiment, our CNN-based approach has better accuracy compared to other approaches using Long Short-Term Memory (LSTM) and Multilayer Perceptron (MLP) on detecting anomaly in Hadoop DistributedFile System (HDFS) logs. (4) To analyze system logs more accurately, we extend our CNN-based approach with two attention schemes to detect anomalies in system logs. The proposed two attention schemes focus on different features from CNN\u27s output. We evaluate our approaches with several benchmarks, and the attention-based CNN model shows the best performance among all state-of-the-art methods

    A Review of Movie Recommendation System : Limitations, Survey and Challenges

    Get PDF
    Recommendation System is a major area which is very popular and useful for people to take proper decision. It is a method that helps user to find out the information which is beneficial for the user from variety of data available. When it comes to Movie Recommendation System, recommendation is done based on similarity between users (Collaborative Filtering) or by considering particular user's activity (Content Based Filtering) which he wants to engage with. So to overcome the limitations of collaborative and content based filtering generally, combination of collaborative and content based filtering is used so that a better recommendation system can be developed. Also various similarity measures are used to find out similarity between users for recommendation. In this paper, we have reviewed different similarity measures. Various companies like face book which recommends friends, LinkedIn which recommends job, Pandora recommends music, Netflix recommends movies, Amazon recommends products etc. use recommendation system to increase their profit and also benefit their customers. This paper mainly concentrates on the brief review of the different techniques and its methods for movie recommendation, so that research in recommendation system can be explored

    Artificial intelligence (AI) methods in optical networks: A comprehensive survey

    Get PDF
    Producción CientíficaArtificial intelligence (AI) is an extensive scientific discipline which enables computer systems to solve problems by emulating complex biological processes such as learning, reasoning and self-correction. This paper presents a comprehensive review of the application of AI techniques for improving performance of optical communication systems and networks. The use of AI-based techniques is first studied in applications related to optical transmission, ranging from the characterization and operation of network components to performance monitoring, mitigation of nonlinearities, and quality of transmission estimation. Then, applications related to optical network control and management are also reviewed, including topics like optical network planning and operation in both transport and access networks. Finally, the paper also presents a summary of opportunities and challenges in optical networking where AI is expected to play a key role in the near future.Ministerio de Economía, Industria y Competitividad (Project EC2014-53071-C3-2-P, TEC2015-71932-REDT

    Real-time performance diagnosis and evaluation of big data systems in cloud datacenters

    Get PDF
    PhD ThesisModern big data processing systems are becoming very complex in terms of largescale, high-concurrency and multiple talents. Thus, many failures and performance reductions only happen at run-time and are very difficult to capture. Moreover, some issues may only be triggered when some components are executed. To analyze the root cause of these types of issues, we have to capture the dependencies of each component in real-time. Big data processing systems, such as Hadoop and Spark, usually work in large-scale, highly-concurrent, and multi-tenant environments that can easily cause hardware and software malfunctions or failures, thereby leading to performance degradation. Several systems and methods exist to detect big data processing systems’ performance degradation, perform root-cause analysis, and even overcome the issues causing such degradation. However, these solutions focus on specific problems such as stragglers and inefficient resource utilization. There is a lack of a generic and extensible framework to support the real-time diagnosis of big data systems. Performance diagnosis and prediction of big data systems are highly complex as these frameworks are typically deployed in cloud data centers that are large-scale, highly concurrent, and follows a multi-tenant model. Several factors, including hardware heterogeneity, stochastic networks and application workloads may impact the performance of big data systems. The current state-of-the-art does not sufficiently address the challenge of determining complex, usually stochastic and hidden relationships between these factors. To handle performance diagnosis and evaluation of big data systems in cloud environments, this thesis proposes multilateral research towards monitoring and performance diagnosis and prediction in cloud-based large-scale distributed systems by involving a novel combination of an effective and efficient deployment pipeline.The key contributions of this dissertation are listed below: - i - • Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs). • Developing AutoDiagn, an automated real-time diagnosis framework for big data systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online root-cause analysis for a big data system. • Designing a novel root-cause analysis technique/system called BigPerf for big data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex relationships between performance related factors. The key contributions of this dissertation are listed below: - i - • Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs). • Developing AutoDiagn, an automated real-time diagnosis framework for big data systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online root-cause analysis for a big data system. • Designing a novel root-cause analysis technique/system called BigPerf for big data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex relationships between performance related factors. The key contributions of this dissertation are listed below: - i - • Designing a real-time big data monitoring system called SmartMonit that efficiently collects the runtime system information including computing resource utilization and job execution information and then interacts the collected information with the Execution Graph modeled as directed acyclic graphs (DAGs). • Developing AutoDiagn, an automated real-time diagnosis framework for big data systems, that automatically detects performance degradation and inefficient resource utilization problems, while providing an online detection and semi-online root-cause analysis for a big data system. • Designing a novel root-cause analysis technique/system called BigPerf for big data systems that analyzes and characterizes the performance of big data applications by incorporating Bayesian networks to determine uncertain and complex relationships between performance related factors.State of the Republic of Turkey and the Turkish Ministry of National Educatio
    corecore