1,597 research outputs found

    Supervised Machine Learning in SAS Viya: Development of a Supervised Machine Learning pipeline in SAS Viya for comparison with a pipeline developed in Python

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsThis internship report details the development of a supervised ML pipeline in SAS Viya, a cloud-based environment composed of several solutions for importing, managing and transforming data and building and deploying predictive models into production environments. As a practical case study, this report showcases the SAS Viya features and capabilities which can be offered to the end-user. A comparison with a similar supervised ML pipeline in Python was made, to highlight both tools’ advantages and disadvantages. Thus, analytical tasks were employed, to demonstrate which different supervised ML techniques can be used in each technology. Furthermore, it was shown that, depending on the experience and knowledge of the end-user, both SAS Viya and Jupyter Notebook/Python are able to produce satisfactory results, being the latter more suited to data scientists with some experience in programming and ML. At the same time, SAS Viya fits more for employees who are getting started in the ML field, due to its point-and-click user interface. On the other hand, building a supervised ML pipeline in SAS Viya can be more straightforward than in Jupyter Notebook/Python, since the code is already developed and the process automatized, while pipeline templates are made available to the user. However, due to its open-source nature, Python has more supervised ML techniques available to be used in Jupyter Notebook. This report shows that these two solutions can complement each other, as SAS Viya offers good visualizations for data exploration, while Jupyter Notebook/Python can be dedicated to data transformation and predictive models’ development

    Using Visual Analytics to Discover Bot Traffic

    Get PDF
    With the advance of technology, the Internet has become a medium tool used for many malicious activities. The presence of bot traffic has increased greatly that causes significant problems for businesses and organisations, such as spam bots, scraper bots, distributed denial of service bots and adaptive bots that aim to exploit the vulnerabilities of a website. Discriminating bot traffic against legitimate flash crowds remains an open challenge to date.In order to address the above issues and enhance security awareness, this thesis proposes an interactive visual analytics system for discovering bot traffic. The system provides an interactive visualisation, with details on demand capabilities, which enables knowledge discovery from very large datasets. It enables an analyst to understand comprehensive details without being constrained by large datasets. The system has a dashboard view to represent legitimate and bot traffic by adopting Quadtree data structure and Voronoi diagrams. The main contribution of this thesis is a novel visual analytics system that is capable of discovering bot traffic.This research conducted a literature review in order to gain systematic understanding of the research area. Furthermore, the research was conducted by utilising experiment and simulation approaches. The experiment was conducted by capturing website traffic, identifying browser fingerprints, simulating bot attacks and analysing mouse dynamics, such as movements and events, of participants. Data were captured as the participants performed a list of tasks, such as responding to the banner. The data collection is transparent to the participants and only requires JavaScript to be activated on the client side. This study involved 10 participants who are familiar with the Internet. To analyse the data, Weka 3.6.10 was used to perform classification based on a training dataset. The test dataset of all participants was evaluated using a built-in decision tree algorithm. The results of classifying the test dataset were promising, and the model was able to identify ten participants and six simulated bot attacks with an accuracy of 86.67%. Finally, the visual analytics design was formulated in order to assist an analyst to discover bot presence

    Learning log-based automatic group formation: system design and classroom implementation study

    Get PDF
    Collaborative learning in the form of group work is becoming increasingly significant in education since interpersonal skills count in modern society. However, teachers often get overwhelmed by the logistics involved in conducting any group work. Valid support for executing and managing such activities in a timely and informed manner becomes imperative. This research introduces an intelligent system focusing on group formation which consists of a parameter setting module and the group member visualization panel where the results of the created group are shown to the user and can be graded. The system supports teachers by applying algorithms to actual learning log data thereby simplifying the group formation process and saving time for them. A pilot study in a primary school mathematics class proved to have a positive effect on students’ engagement and affections while participating in group activities based on the system-generated groups, thus providing empirical evidence to the practice of Computer-Supported Collaborative Learning (CSCL) systems

    Using machine learning to support better and intelligent visualisation for genomic data

    Get PDF
    Massive amounts of genomic data are created for the advent of Next Generation Sequencing technologies. Great technological advances in methods of characterising the human diseases, including genetic and environmental factors, make it a great opportunity to understand the diseases and to find new diagnoses and treatments. Translating medical data becomes more and more rich and challenging. Visualisation can greatly aid the processing and integration of complex data. Genomic data visual analytics is rapidly evolving alongside with advances in high-throughput technologies such as Artificial Intelligence (AI), and Virtual Reality (VR). Personalised medicine requires new genomic visualisation tools, which can efficiently extract knowledge from the genomic data effectively and speed up expert decisions about the best treatment of an individual patient’s needs. However, meaningful visual analysis of such large genomic data remains a serious challenge. Visualising these complex genomic data requires not only simply plotting of data but should also lead to better decisions. Machine learning has the ability to make prediction and aid in decision-making. Machine learning and visualisation are both effective ways to deal with big data, but they focus on different purposes. Machine learning applies statistical learning techniques to automatically identify patterns in data to make highly accurate prediction, while visualisation can leverage the human perceptual system to interpret and uncover hidden patterns in big data. Clinicians, experts and researchers intend to use both visualisation and machine learning to analyse their complex genomic data, but it is a serious challenge for them to understand and trust machine learning models in the serious medical industry. The main goal of this thesis is to study the feasibility of intelligent and interactive visualisation which combined with machine learning algorithms for medical data analysis. A prototype has also been developed to illustrate the concept that visualising genomics data from childhood cancers in meaningful and dynamic ways could lead to better decisions. Machine learning algorithms are used and illustrated during visualising the cancer genomic data in order to provide highly accurate predictions. This research could open a new and exciting path to discovery for disease diagnostics and therapies

    Interactive gesture controller for a motorised wheelchair

    Get PDF
    This paper explores in great detail the design and testing of a gesture controller for a motorised wheelchair. For some, motorised wheelchairs are part of their everyday life. Those individuals who depend on their motorised wheelchair do so for a vast range of reasons; therefore, it is reasonable to assume that modifying and improving upon the standard joystick controller for a motorised wheelchair can benefit a person’s way of life significantly. The design of the gesture controller is heavily based around the user’s needs so as to benefit them and compliment their strengths to give them more control. For individuals with limited movement and dexterity, the user interface, system responsiveness, ergonomics and safety were considered when engineering a system that is intended for people to use. A device capable of recognising a hand gesture was carefully chosen. The technology that is readily available for this application is relatively new and not extensively documented. The LEAP motion sensor was chosen as the hand gesture recognition device to be the controller for a wheelchair. This device has hand recognition software but the device’s software lacks the predictability and accuracy required for a motorised wheelchair controller. Through testing, the controller accuracy improved. Although this controller is adequate for a laboratory environment, further testing and development will be required for this alternative wheelchair controller to evolve into a commercial product. The gesture triggered controller was designed around the capabilities of the developer’s hand; but the method outlined in this paper is transferable to any individual hand size and more importantly the limitations of their hand gestures. The outcome of this thesis is a customised, non-invasive hand gesture controller for a motorised wheelchair that is able to be fully tailored to a person’s capability without losing it responsiveness or accuracy

    DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps

    Full text link
    As the complexity of machine learning (ML) models increases and the applications in different (and critical) domains grow, there is a strong demand for more interpretable and trustworthy ML. One straightforward and model-agnostic way to interpret complex ML models is to train surrogate models, such as rule sets and decision trees, that sufficiently approximate the original ones while being simpler and easier-to-explain. Yet, rule sets can become very lengthy, with many if-else statements, and decision tree depth grows rapidly when accurately emulating complex ML models. In such cases, both approaches can fail to meet their core goal, providing users with model interpretability. We tackle this by proposing DeforestVis, a visual analytics tool that offers user-friendly summarization of the behavior of complex ML models by providing surrogate decision stumps (one-level decision trees) generated with the adaptive boosting (AdaBoost) technique. Our solution helps users to explore the complexity vs fidelity trade-off by incrementally generating more stumps, creating attribute-based explanations with weighted stumps to justify decision making, and analyzing the impact of rule overriding on training instance allocation between one or more stumps. An independent test set allows users to monitor the effectiveness of manual rule changes and form hypotheses based on case-by-case investigations. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers.Comment: This manuscript is currently under revie

    Open Data

    Get PDF
    Open data is freely usable, reusable, or redistributable by anybody, provided there are safeguards in place that protect the data’s integrity and transparency. This book describes how data retrieved from public open data repositories can improve the learning qualities of digital networking, particularly performance and reliability. Chapters address such topics as knowledge extraction, Open Government Data (OGD), public dashboards, intrusion detection, and artificial intelligence in healthcare

    Comparative Analysis of Building Insurance Prediction Using Some Machine Learning Algorithms

    Get PDF
    In finance and management, insurance is a product that tends to reduce or eliminate in totality or partially the loss caused due to different risks. Various factors affect house insurance claims, some of which contribute to formulating insurance policies including specific features that the house has. Machine Learning (ML) when brought into the field of insurance would enable seamless formulation of insurance policies with a better performance which will also save time. Various classification algorithms have been used since they have a long history and have also got some modifications for optimum functionality. To illustrate the performance of each of the ML algorithms that we used here, we analyzed an insurance dataset drawn from Zindi Africa competition which is said to be from Olusola Insurance Company in Lagos Nigeria. This study therefore, compares the performance of Logistic Regression (LR), Decision Tree (DT), K-Nearest Neighbor (KNN), Kernel Support Vector Machine (kSVM), Naïve Bayes (NB), and Random Forest (RF) Regressors on a dataset got from Zindi.africa competition and their performances are checked using not only accuracy and precision metrics but also recall, and F1 score metrics, all displayed on the confusion matrix. The accuracy result shows that logistic regression and Kernel SVM both gave 78% but kSVM outperformed LR in precision with a percentage of 70.8% for kSVM and 64.8% for LR showing that kSVM offered the best result
    corecore