2,360 research outputs found

    An approach towards standardising vulnerability categories

    Get PDF
    Computer vulnerabilities are design flaws, implementation or configuration errors that provide a means of exploiting a system or network that would not be available otherwise. The recent growth in the number of vulnerability scanning (VS) tools and independent vulnerability databases points to an apparent need for further means to protect computer systems from compromise. It is important for these tools and databases to interpret, correlate and exchange a large amount of information about computer vulnerabilities in order to use them effectively. However, this goal is hard to achieve because the current VS products differ extensively both in the way that they can detect vulnerabilities and in the number of vulnerabilities that they can detect. Each tool or database represents, identifies and classifies vulnerabilities in its own way, thus making them difficult to study and compare. Although the list of Common Vulnerabilities and Exposures (CVE) provides a means of solving the disparity in vulnerability names used in the different VS products, it does not standardise vulnerability categories. This dissertation highlights the importance of having a standard vulnerability category set and outlines an approach towards achieving this goal by categorising the CVE repository using a data-clustering algorithm. Prototypes are presented to verify the concept of standardizing vulnerability categories and how this can be used as the basis for comparison of VS products and improving scan reports.Dissertation (MSc (Computer Science))--University of Pretoria, 2008.Computer Scienceunrestricte

    Valve Health Identification Using Sensors and Machine Learning Methods

    Get PDF
    Predictive maintenance models attempt to identify developing issues with industrial equipment before they become critical. In this paper, we describe both supervised and unsupervised approaches to predictive maintenance for subsea valves in the oil and gas industry. The supervised approach is appropriate for valves for which a long history of operation along with manual assessments of the state of the valves exists, while the unsupervised approach is suitable to address the cold start problem when new valves, for which we do not have an operational history, come online. For the supervised prediction problem, we attempt to distinguish between healthy and unhealthy valve actuators using sensor data measuring hydraulic pressures and flows during valve opening and closing events. Unlike previous approaches that solely rely on raw sensor data, we derive frequency and time domain features, and experiment with a range of classification algorithms and different feature subsets. The performing models for the supervised approach were discovered to be Adaboost and Random Forest ensembles. In the unsupervised approach, the goal is to detect sudden abrupt changes in valve behaviour by comparing the sensor readings from consecutive opening or closing events. Our novel methodology doing this essentially works by comparing the sequences of sensor readings captured during these events using both raw sensor readings, as well as normalised and first derivative versions of the sequences. We evaluate the effectiveness of a number of well-known time series similarity measures and find that using discrete Frechet distance or dynamic time warping leads to the best results, with the Bray-Curtis similarity measure leading to only marginally poorer change detection but requiring considerably less computational effort

    Distance,Time and Terms in First Story Detection

    Get PDF
    First Story Detection (FSD) is an important application of online novelty detection within Natural Language Processing (NLP). Given a stream of documents, or stories, about news events in a chronological order, the goal of FSD is to identify the very first story for each event. While a variety of NLP techniques have been applied to the task, FSD remains challenging because it is still not clear what is the most crucial factor in defining the “story novelty”. Giventhesechallenges,thethesisaddressedinthisdissertationisthat the notion of novelty in FSD is multi-dimensional. To address this, the work presented has adopted a three dimensional analysis of the relative qualities of FSD systems and gone on to propose a specific method that wearguesignificantlyimprovesunderstandingandperformanceofFSD. FSD is of course not a new problem type; therefore, our first dimen sion of analysis consists of a systematic study of detection models for firststorydetectionandthedistancesthatareusedinthedetectionmod els for defining novelty. This analysis presents a tripartite categorisa tion of the detection models based on the end points of the distance calculation. The study also considers issues of document representation explicitly, and shows that even in a world driven by distributed repres iv entations,thenearestneighbourdetectionmodelwithTF-IDFdocument representations still achieves the state-of-the-art performance for FSD. Weprovideanalysisofthisimportantresultandsuggestpotentialcauses and consequences. Events are introduced and change at a relatively slow rate relative to the frequency at which words come in and out of usage on a docu ment by document basis. Therefore we argue that the second dimen sion of analysis should focus on the temporal aspects of FSD. Here we are concerned with not only the temporal nature of the detection pro cess, e.g., the time/history window over the stories in the data stream, but also the processes that underpin the representational updates that underpin FSD. Through a systematic investigation of static representa tions, and also dynamic representations with both low and high update frequencies, we show that while a dynamic model unsurprisingly out performs static models, the dynamic model in fact stops improving but stays steady when the update frequency gets higher than a threshold. Our third dimension of analysis moves across to the particulars of lexicalcontent,andcriticallytheaffectoftermsinthedefinitionofstory novelty. Weprovideaspecificanalysisofhowtermsarerepresentedfor FSD, including the distinction between static and dynamic document representations, and the affect of out-of-vocabulary terms and the spe cificity of a word in the calculation of the distance. Our investigation showed that term distributional similarity rather than scale of common v terms across the background and target corpora is the most important factor in selecting background corpora for document representations in FSD. More crucially, in this work the simple idea of the new terms emerged as a vital factor in defining novelty for the first story

    Real time detection of malicious webpages using machine learning techniques

    Get PDF
    In today's Internet, online content and especially webpages have increased exponentially. Alongside this huge rise, the number of users has also amplified considerably in the past two decades. Most responsible institutions such as banks and governments follow specific rules and regulations regarding conducts and security. But, most websites are designed and developed using little restrictions on these issues. That is why it is important to protect users from harmful webpages. Previous research has looked at to detect harmful webpages, by running the machine learning models on a remote website. The problem with this approach is that the detection rate is slow, because of the need to handle large number of webpages. There is a gap in knowledge to research into which machine learning algorithms are capable of detecting harmful web applications in real time on a local machine. The conventional method of detecting malicious webpages is going through the black list and checking whether the webpages are listed. Black list is a list of webpages which are classified as malicious from a user's point of view. These black lists are created by trusted organisations and volunteers. They are then used by modern web browsers such as Chrome, Firefox, Internet Explorer, etc. However, black list is ineffective because of the frequent-changing nature of webpages, growing numbers of webpages that pose scalability issues and the crawlers' inability to visit intranet webpages that require computer operators to login as authenticated users. The thesis proposes to use various machine learning algorithms, both supervised and unsupervised to categorise webpages based on parsing their features such as content (which played the most important role in this thesis), URL information, URL links and screenshots of webpages. The features were then converted to a format understandable by machine learning algorithms which analysed these features to make one important decision: whether a given webpage is malicious or not, using commonly available software and hardware. Prototype tools were developed to compare and analyse the efficiency of these machine learning techniques. These techniques include supervised algorithms such as Support Vector Machine, Naïve Bayes, Random Forest, Linear Discriminant Analysis, Quantitative Discriminant Analysis and Decision Tree. The unsupervised techniques are Self-Organising Map, Affinity Propagation and K-Means. Self-Organising Map was used instead of Neural Networks and the research suggests that the new version of Neural Network i.e. Deep Learning would be great for this research. The supervised algorithms performed better than the unsupervised algorithms and the best out of all these techniques is SVM that achieves 98% accuracy. The result was validated by the Chrome extension which used the classifier in real time. Unsupervised algorithms came close to supervised algorithms. This is surprising given the fact that they do not have access to the class information beforehand

    Behaviour Profiling using Wearable Sensors for Pervasive Healthcare

    Get PDF
    In recent years, sensor technology has advanced in terms of hardware sophistication and miniaturisation. This has led to the incorporation of unobtrusive, low-power sensors into networks centred on human participants, called Body Sensor Networks. Amongst the most important applications of these networks is their use in healthcare and healthy living. The technology has the possibility of decreasing burden on the healthcare systems by providing care at home, enabling early detection of symptoms, monitoring recovery remotely, and avoiding serious chronic illnesses by promoting healthy living through objective feedback. In this thesis, machine learning and data mining techniques are developed to estimate medically relevant parameters from a participant‘s activity and behaviour parameters, derived from simple, body-worn sensors. The first abstraction from raw sensor data is the recognition and analysis of activity. Machine learning analysis is applied to a study of activity profiling to detect impaired limb and torso mobility. One of the advances in this thesis to activity recognition research is in the application of machine learning to the analysis of 'transitional activities': transient activity that occurs as people change their activity. A framework is proposed for the detection and analysis of transitional activities. To demonstrate the utility of transition analysis, we apply the algorithms to a study of participants undergoing and recovering from surgery. We demonstrate that it is possible to see meaningful changes in the transitional activity as the participants recover. Assuming long-term monitoring, we expect a large historical database of activity to quickly accumulate. We develop algorithms to mine temporal associations to activity patterns. This gives an outline of the user‘s routine. Methods for visual and quantitative analysis of routine using this summary data structure are proposed and validated. The activity and routine mining methodologies developed for specialised sensors are adapted to a smartphone application, enabling large-scale use. Validation of the algorithms is performed using datasets collected in laboratory settings, and free living scenarios. Finally, future research directions and potential improvements to the techniques developed in this thesis are outlined

    Use of Artificial Intelligence in Healthcare Delivery

    Get PDF
    In recent years, there has been an amplified focus on the use of artificial intelligence (AI) in various domains to resolve complex issues. Likewise, the adoption of artificial intelligence (AI) in healthcare is growing while radically changing the face of healthcare delivery. AI is being employed in a myriad of settings including hospitals, clinical laboratories, and research facilities. AI approaches employing machines to sense and comprehend data like humans has opened up previously unavailable or unrecognised opportunities for clinical practitioners and health service organisations. Some examples include utilising AI approaches to analyse unstructured data such as photos, videos, physician notes to enable clinical decision making; use of intelligence interfaces to enhance patient engagement and compliance with treatment; and predictive modelling to manage patient flow and hospital capacity/resource allocation. Yet, there is an incomplete understanding of AI and even confusion as to what it is? Also, it is not completely clear what the implications are in using AI generally and in particular for clinicians? This chapter aims to cover these topics and also introduce the reader to the concept of AI, the theories behind AI programming and the various applications of AI in the medical domain

    Text mining and natural language processing for the early stages of space mission design

    Get PDF
    Final thesis submitted December 2021 - degree awarded in 2022A considerable amount of data related to space mission design has been accumulated since artificial satellites started to venture into space in the 1950s. This data has today become an overwhelming volume of information, triggering a significant knowledge reuse bottleneck at the early stages of space mission design. Meanwhile, virtual assistants, text mining and Natural Language Processing techniques have become pervasive to our daily life. The work presented in this thesis is one of the first attempts to bridge the gap between the worlds of space systems engineering and text mining. Several novel models are thus developed and implemented here, targeting the structuring of accumulated data through an ontology, but also tasks commonly performed by systems engineers such as requirement management and heritage analysis. A first collection of documents related to space systems is gathered for the training of these methods. Eventually, this work aims to pave the way towards the development of a Design Engineering Assistant (DEA) for the early stages of space mission design. It is also hoped that this work will actively contribute to the integration of text mining and Natural Language Processing methods in the field of space mission design, enhancing current design processes.A considerable amount of data related to space mission design has been accumulated since artificial satellites started to venture into space in the 1950s. This data has today become an overwhelming volume of information, triggering a significant knowledge reuse bottleneck at the early stages of space mission design. Meanwhile, virtual assistants, text mining and Natural Language Processing techniques have become pervasive to our daily life. The work presented in this thesis is one of the first attempts to bridge the gap between the worlds of space systems engineering and text mining. Several novel models are thus developed and implemented here, targeting the structuring of accumulated data through an ontology, but also tasks commonly performed by systems engineers such as requirement management and heritage analysis. A first collection of documents related to space systems is gathered for the training of these methods. Eventually, this work aims to pave the way towards the development of a Design Engineering Assistant (DEA) for the early stages of space mission design. It is also hoped that this work will actively contribute to the integration of text mining and Natural Language Processing methods in the field of space mission design, enhancing current design processes

    A machine learning-based investigation of cloud service attacks

    Get PDF
    In this thesis, the security challenges of cloud computing are investigated in the Infrastructure as a Service (IaaS) layer, as security is one of the major concerns related to Cloud services. As IaaS consists of different security terms, the research has been further narrowed down to focus on Network Layer Security. Review of existing research revealed that several types of attacks and threats can affect cloud security. Therefore, there is a need for intrusion defence implementations to protect cloud services. Intrusion Detection (ID) is one of the most effective solutions for reacting to cloud network attacks. [Continues.

    Detection of Software Vulnerability Communication in Expert Social Media Channels: A Data-driven Approach

    Get PDF
    Conceptually, a vulnerability is: A flaw or weakness in a system’s design, implementation,or operation and management that could be exploited to violate the system’s security policy .Some of these flaws can go undetected and exploited for long periods of time after soft-ware release. Although some software providers are making efforts to avoid this situ-ation, inevitability, users are still exposed to vulnerabilities that allow criminal hackersto take advantage. These vulnerabilities are constantly discussed in specialised forumson social media. Therefore, from a cyber security standpoint, the information found inthese places can be used for countermeasures actions against malicious exploitation ofsoftware. However, manual inspection of the vast quantity of shared content in socialmedia is impractical. For this reason, in this thesis, we analyse the real applicability ofsupervised classification models to automatically detect software vulnerability com-munication in expert social media channels. We cover the following three principal aspects: Firstly, we investigate the applicability of classification models in a range of 5 differ-ent datasets collected from 3 Internet Domains: Dark Web, Deep Web and SurfaceWeb. Since supervised models require labelled data, we have provided a systematiclabelling process using multiple annotators to guarantee accurate labels to carry outexperiments. Using these datasets, we have investigated the classification models withdifferent combinations of learning-based algorithms and traditional features represen-tation. Also, by oversampling the positive instances, we have achieved an increaseof 5% in Positive Recall (on average) in these models. On top of that, we have appiiplied Feature Reduction, Feature Extraction and Feature Selection techniques, whichprovided a reduction on the dimensionality of these models without damaging the accuracy, thus, providing computationally efficient models. Furthermore, in addition to traditional features representation, we have investigated the performance of robust language models, such as Word Embedding (WEMB) andSentence Embedding (SEMB) on the accuracy of classification models. RegardingWEMB, our experiment has shown that this model trained with a small security-vocabulary dataset provides comparable results with WEMB trained in a very large general-vocabulary dataset. Regarding SEMB model, our experiment has shown thatits use overcomes WEMB model in detecting vulnerability communication, recording 8% of Avg. Class Accuracy and 74% of Positive Recall. In addition, we investigate twoDeep Learning algorithms as classifiers, text CNN (Convolutional Neural Network)and RNN (Recurrent Neural Network)-based algorithms, which have improved ourmodel, resulting in the best overall performance for our task
    corecore