212,206 research outputs found

    Situation recognition using soft computing techniques

    Get PDF
    Includes bibliographical references.The last decades have witnessed the emergence of a large number of devices pervasively launched into our daily lives as systems producing and collecting data from a variety of information sources to provide different services to different users via a variety of applications. These include infrastructure management, business process monitoring, crisis management and many other system-monitoring activities. Being processed in real-time, these information production/collection activities raise an interest for live performance monitoring, analysis and reporting, and call for data-mining methods in the recognition, prediction, reasoning and controlling of the performance of these systems by controlling changes in the system and/or deviations from normal operation. In recent years, soft computing methods and algorithms have been applied to data mining to identify patterns and provide new insight into data. This thesis revisits the issue of situation recognition for systems producing massive datasets by assessing the relevance of using soft computing techniques for finding hidden pattern in these systems

    Failure prediction for high-performance computing systems

    Get PDF
    The failure rate in high-performance computing (HPC) systems continues to escalate as the number of components in these systems increases. This affects the scalability and the performance of parallel applications in large-scale HPC systems. Fault tolerance (FT) mechanisms help mitigating the impact of failures on parallel applications. However, utilizing such mechanisms requires additional overhead. Besides, the overuse of FT mechanisms results in unnecessarily large overhead in the parallel applications. Knowing when and where failures will occur can greatly reduce the excessive overhead. As such, failure prediction is critical in order to effectively utilize FT mechanisms. In addition, it also helps in system administration and management, as the predicted failure can be handled beforehand with limited impact to the running systems. This dissertation proposes new proficiency metrics for failure prediction based on failure impact in UPC environment that the existing proficiency metrics tire unable to reflect. Furthermore, an efficient log message clustering algorithm is proposed for system event log data preprocessing and analysis. Then, two novel association rule mining approaches are introduced and employed for HPC failure prediction. Finally, the performances of the existing and the proposed association rule mining methods are compared and analyzed

    Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition

    Get PDF
    Over the last decade, research has focused on machine learning and data mining to develop frameworks that can improve data analysis and output performance; to build accurate decision support systems that benefit from real-life datasets. This leads to the field of clinical data analysis, which has attracted a significant amount of interest in the computing, information systems, and medical fields. To create and develop models by machine learning algorithms, there is a need for a particular type of data for the existing algorithms to build an efficient model. Clinical datasets pose several issues that can affect the classification of the dataset: missing values, high dimensionality, and class imbalance. In order to build a framework for mining the data, it is necessary first to preprocess data, by eliminating patients’ records that have too many missing values, imputing missing values, addressing high dimensionality, and classifying the data for decision support.This thesis investigates a real clinical dataset to solve their challenges. Autoencoder is employed as a tool that can compress data mining methodology, by extracting features and classifying data in one model. The first step in data mining methodology is to impute missing values, so several imputation methods are analysed and employed. Then high dimensionality is demonstrated and used to discard irrelevant and redundant features, in order to improve prediction accuracy and reduce computational complexity. Class imbalance is manipulated to investigate the effect on feature selection algorithms and classification algorithms.The first stage of analysis is to investigate the role of the missing values. Results found that techniques based on class separation will outperform other techniques in predictive ability. The next stage is to investigate the high dimensionality and a class imbalance. However it was found a small set of features that can improve the classification performance, the balancing class does not affect the performance as much as imbalance class

    Crime prediction and monitoring in Porto, Portugal, using machine learning, spatial and text analytics

    Get PDF
    Crimes are a common societal concern impacting quality of life and economic growth. Despite the global decrease in crime statistics, specific types of crime and feelings of insecurity, have often increased, leading safety and security agencies with the need to apply novel approaches and advanced systems to better predict and prevent occurrences. The use of geospatial technologies, combined with data mining and machine learning techniques allows for significant advances in the criminology of place. In this study, official police data from Porto, in Portugal, between 2016 and 2018, was georeferenced and treated using spatial analysis methods, which allowed the identification of spatial patterns and relevant hotspots. Then, machine learning processes were applied for space-time pattern mining. Using lasso regression analysis, significance for crime variables were found, with random forest and decision tree supporting the important variable selection. Lastly, tweets related to insecurity were collected and topic modeling and sentiment analysis was performed. Together, these methods assist interpretation of patterns, prediction and ultimately, performance of both police and planning professionals

    Predicting student performance using data mining and learning analysis technique in Libyan Higher Education

    Get PDF
    The Technology has an increasing impact on all areas of life, including the education sector, and requires developing countries to emulate developed countries and integrate technology into their education systems. Recently schools in Libya are facing an issue trying to figure out why students perform poorly in certain subjects and how can they know how they will perform next in the future in coming semesters in perspective subject. There are several methods proposed to predict the student’s performance, using data mining techniques. In this paper, there are plans to create Data Mining Techniques in Education (i.e., DME) prediction model clustering, classification and association rule mining in many universities and schools in order to provide students and teachers with the most advanced platform. Although relatively late, the Libyan government finally responded to this challenge by investing heavily in rebuilding the education system and launching a national plan to presented method in terms of predicting students’ performance based on their grades in Math and English. The results are divided in to three main sections clustering analysis using k-mean algorithm, classification analysis was done using two rounds first using Gain Ratio Evaluations to find out the top attributes that used by J84 algorithm in second round of classification, and rule association analysis using A priori algorithm. Rule association analysis is applied for the clusters generate by clustering analysis to generate the rules associated with each cluster. For each section, a list of inputs is presented with the scale used for the values followed by the results of the algorithm and explanation for the finding

    Dynamic adversarial mining - effectively applying machine learning in adversarial non-stationary environments.

    Get PDF
    While understanding of machine learning and data mining is still in its budding stages, the engineering applications of the same has found immense acceptance and success. Cybersecurity applications such as intrusion detection systems, spam filtering, and CAPTCHA authentication, have all begun adopting machine learning as a viable technique to deal with large scale adversarial activity. However, the naive usage of machine learning in an adversarial setting is prone to reverse engineering and evasion attacks, as most of these techniques were designed primarily for a static setting. The security domain is a dynamic landscape, with an ongoing never ending arms race between the system designer and the attackers. Any solution designed for such a domain needs to take into account an active adversary and needs to evolve over time, in the face of emerging threats. We term this as the ‘Dynamic Adversarial Mining’ problem, and the presented work provides the foundation for this new interdisciplinary area of research, at the crossroads of Machine Learning, Cybersecurity, and Streaming Data Mining. We start with a white hat analysis of the vulnerabilities of classification systems to exploratory attack. The proposed ‘Seed-Explore-Exploit’ framework provides characterization and modeling of attacks, ranging from simple random evasion attacks to sophisticated reverse engineering. It is observed that, even systems having prediction accuracy close to 100%, can be easily evaded with more than 90% precision. This evasion can be performed without any information about the underlying classifier, training dataset, or the domain of application. Attacks on machine learning systems cause the data to exhibit non stationarity (i.e., the training and the testing data have different distributions). It is necessary to detect these changes in distribution, called concept drift, as they could cause the prediction performance of the model to degrade over time. However, the detection cannot overly rely on labeled data to compute performance explicitly and monitor a drop, as labeling is expensive and time consuming, and at times may not be a possibility altogether. As such, we propose the ‘Margin Density Drift Detection (MD3)’ algorithm, which can reliably detect concept drift from unlabeled data only. MD3 provides high detection accuracy with a low false alarm rate, making it suitable for cybersecurity applications; where excessive false alarms are expensive and can lead to loss of trust in the warning system. Additionally, MD3 is designed as a classifier independent and streaming algorithm for usage in a variety of continuous never-ending learning systems. We then propose a ‘Dynamic Adversarial Mining’ based learning framework, for learning in non-stationary and adversarial environments, which provides ‘security by design’. The proposed ‘Predict-Detect’ classifier framework, aims to provide: robustness against attacks, ease of attack detection using unlabeled data, and swift recovery from attacks. Ideas of feature hiding and obfuscation of feature importance are proposed as strategies to enhance the learning framework\u27s security. Metrics for evaluating the dynamic security of a system and recover-ability after an attack are introduced to provide a practical way of measuring efficacy of dynamic security strategies. The framework is developed as a streaming data methodology, capable of continually functioning with limited supervision and effectively responding to adversarial dynamics. The developed ideas, methodology, algorithms, and experimental analysis, aim to provide a foundation for future work in the area of ‘Dynamic Adversarial Mining’, wherein a holistic approach to machine learning based security is motivated

    An agent-based service oriented architecture for risk mining

    Full text link
    University of Technology, Sydney. Faculty of Engineering and Information Technology.Risk Mining (RM) is the process of analyzing data including risk information by data mining methods, with the mining results for risk prevention. In the last few years, some researchers have proposed the combination of data mining and agent technology (agent mining) to improve the performance of data mining methodology in the heterogeneous business environments. However, problems exist for further research with the application of risk mining systems in real industry environments to enhance the robustness of system architect, dynamic business process and model accuracy etc. Therefore, in this thesis we present an Agent-based Service-oriented Risk Mining Architecture (ABSORM), which has been designed to facilitate the development of agent mining systems to address the above issues. This thesis focuses on developing the following strategies: • The integration of agent technology with web service. In this framework, we propose a new and easier method, by which the system functions are not integrated into the structure of the agents, rather modeled as distributed services and applications which are invoked by the agents acting as controllers and coordinators. Therefore, techniques developed in this framework can improve the interoperability between different modules, distribution of resources, and the lack of dependency of programming languages. • The integration of agent technology with business process management. In this work, we develop the autonomous agents that can collaborate in a business flow, which not only increases the reusability of the system, but also eases the system development in terms of re-usability of the computational resources. A group of agents solves problems in the following way: each individual agent solves the problem individually, and then interacts with each other to finalize a business process. • The integration of agent technology with ensemble learning methods. In this thesis, we are interested in developing agent-based ensemble learning strategies for risk mining: each ensemble agent individually gathers the evidence about model evaluation, and then ensembles learning methods like bagging and boosting is used to obtain prediction from the individually gathered evidence. Agent based ensemble learning can provide a critical boost to risk mining where predictive accuracy is more vital than model interpretability. The proposed architecture has been evaluated for building an online banking fraud detection system and a student risk management system. These two applications have been proved to be a sophisticated, yet user friendly, risk analysis and management tool. They are modular, interactive, dynamic and globally oriented
    • …
    corecore