11 research outputs found

    Survey on Classification Algorithms for Data Mining:(Comparison and Evaluation)

    Get PDF
    Data mining concept is growing fast in popularity, it is a technology that involving methods at the intersection of (Artificial intelligent, Machine learning, Statistics and database system), the main goal of data mining process is to extract information from a large data into form which could be understandable for further use. Some algorithms of data mining are used to give solutions to classification problems in database. In this paper a comparison among three classification’s algorithms will be studied, these are (K- Nearest Neighbor classifier, Decision tree and Bayesian network) algorithms. The paper will demonstrate the strength and accuracy of each algorithm for classification in term of performance efficiency and time complexity required. For model validation purpose, twenty-four-month data analysis is conducted on a mock-up basis. Keywords: Decision tree, Bayesian network, k- nearest neighbour classifier

    Monitoring of Complex Processes with Bayesian Networks

    Get PDF
    This chapter is about the multivariate process monitoring (detection and diagnosis) with Bayesian networks. It allows to unify in a same tool (a Bayesian network) some monitoring dedicated methods like multivariate control charts or discriminant analysis. After the context introduction, we develop in section 2, principles of process monitoring, namely fault detection and fault diagnosis. We presents classical statistical techniques to achieve these tasks. In section 3, after a presentation of Bayesian networks (with discrete and Gaussian nodes), we propose the modeling of the two tasks (detection and diagnosis) in the Bayesian network framework, unifying the two steps of the process monitoring in a sole tool, the Bayesian network. An application is given in section 4 in order to demonstrate the effectiveness of the proposed approach. This application is a benchmark problem in process monitoring: the Tennessee Eastman Process. Efficiency of the network is evaluated for detection and for diagnosis. Finally, we give conclusions on the proposed approach and outlooks concerning the use of Bayesian network for the process monitoring

    The Impact of Software Team Project Measurements on Students' Performance in Software Engineering Education

    Get PDF
    It is essential to the software engineering instructors to monitor the students' performance in their course projects. Detecting key measures of software engineering project helps to get a better assessment for students' performance, resolve difficulties of low expectation-team's, and consequently improves the overall learning outcomes. Several studies attempted to present the important measures of software project but they only captured the early phases of the whole project time period. This paper introduces a hybrid approach of classification and feature selection techniques, which aims to comprehensively cover all phases of software development through investigating all product and process measures of software project. Experiments were conducted using five classifiers and two feature selection techniques. The results show the significant process and product measures for the software engineering team projects, which primarily improves the students' performance assessment. The performance prediction of our proposed assessment model outperforms prediction of the previous models. Keywords: Assessment, Classification, Feature selection, Software engineering education, Software team DOI: 10.7176/JEP/11-31-02 Publication date: November 30th 2020

    Defining Generic Attributes for IDS Classification

    Get PDF
    Detection accuracy of Intrusion Detection System (IDS) depends on classifying network traffic based on data features. Using all features for classification consumes more computation time and computer resources. Some of these features may be redundant and irrelevant therefore, they affect the detection of traffic anomalies and the overall performance of the IDS. The literature proposed different algorithms and techniques to define the most relevant sets of features of KDD cup 1999 that can achieve high detection accuracy and maintain the same performance as the total data features. However, all these algorithms and techniques did not produce optimal solutions even when they utilized same datasets. In this paper, a new approach is proposed to analyze the researches that have been conducted on KDD cup 1999 for features selection to define the possibility of determining effective generic features of the common dataset KDD cup 1999 for constructing an efficient classification model. The approach does not rely on algorithms, which shortens the computational cost and reduces the computer resources. The essence of the approach is based on selecting the most frequent features of each class and all classes in all researches, then a threshold is used to define the most significant generic features. The results revealed two sets of features containing 7 and 8 features. The classification accuracy by using eight features is almost the same as using all dataset features

    Data visualisation in digital forensics

    Get PDF
    As digital crimes have risen, so has the need for digital forensics. Numerous state-of-the-art tools have been developed to assist digital investigators conduct proper investigations into digital crimes. However, digital investigations are becoming increasingly complex and time consuming due to the amount of data involved, and digital investigators can find themselves unable to conduct them in an appropriately efficient and effective manner. This situation has prompted the need for new tools capable of handling such large, complex investigations. Data mining is one such potential tool. It is still relatively unexplored from a digital forensics perspective, but the purpose of data mining is to discover new knowledge from data where the dimensionality, complexity or volume of data is prohibitively large for manual analysis. This study assesses the self-organising map (SOM), a neural network model and data mining technique that could potentially offer tremendous benefits to digital forensics. The focus of this study is to demonstrate how the SOM can help digital investigators to make better decisions and conduct the forensic analysis process more efficiently and effectively during a digital investigation. The SOM’s visualisation capabilities can not only be used to reveal interesting patterns, but can also serve as a platform for further, interactive analysis.Dissertation (MSc (Computer Science))--University of Pretoria, 2007.Computer Scienceunrestricte

    A feasibility study on the use of agent-based image recognition on a desktop computer for the purpose of quality control in a production environment

    Get PDF
    Thesis (M. Tech.) - Central University of Technology, Free State, 2006A multi-threaded, multi-agent image recognition software application called RecMaster has been developed specifically for the purpose of quality control in a production environment. This entails using the system as a monitor to identify invalid objects moving on a conveyor belt and to pass on the relevant information to an attached device, such as a robotic arm, which will remove the invalid object. The main purpose of developing this system was to prove that a desktop computer could run an image recognition system efficiently, without the need for high-end, high-cost, specialised computer hardware. The programme operates by assigning each agent a task in the recognition process and then waiting for resources to become available. Tasks related to edge detection, colour inversion, image binarisation and perimeter determination were assigned to individual agents. Each agent is loaded onto its own processing thread, with some of the agents delegating their subtasks to other processing threads. This enables the application to utilise the available system resources more efficiently. The application is very limited in its scope, as it requires a uniform image background as well as little to no variance in camera zoom levels and object to lens distance. This study focused solely on the development of the application software, and not on the setting up of the actual imaging hardware. The imaging device, on which the system was tested, was a web cam capable of a 640 x 480 resolution. As such, all image capture and processing was done on images with a horizontal resolution of 640 pixels and a vertical resolution of 480 pixels, so as not to distort image quality. The application locates objects on an image feed - which can be in the format of a still image, a video file or a camera feed - and compares these objects to a model of the object that was created previously. The coordinates of the object are calculated and translated into coordinates on the conveyor system. These coordinates are then passed on to an external recipient, such as a robotic arm, via a serial link. The system has been applied to the model of a DVD, and tested against a variety of similar and dissimilar objects to determine its accuracy. The tests were run on both an AMD- and Intel-based desktop computer system, with the results indicating that both systems are capable of efficiently running the application. On average, the AMD-based system tended to be 81% faster at matching objects in still images, and 100% faster at matching objects in moving images. The system made matches within an average time frame of 250 ms, making the process fast enough to be used on an actual conveyor system. On still images, the results showed an 87% success rate for the AMD-based system, and 73% for Intel. For moving images, however, both systems showed a 100% success rate

    Computerized cancer malignancy grading of fine needle aspirates

    Get PDF
    According to the World Health Organization, breast cancer is a leading cause of death among middle-aged women. Precise diagnosis and correct treatment significantly reduces the high number of deaths caused by breast cancer. Being successful in the treatment strictly relies on the diagnosis. Specifically, the accuracy of the diagnosis and the stage at which a cancer was diagnosed. Precise and early diagnosis has a major impact on the survival rate, which indicates how many patients will live after the treatment. For many years researchers in medical and computer science fields have been working together to find the approach for precise diagnosis. For this thesis, precise diagnosis means finding a cancer at as early a stage as possible by developing new computer aided diagnostic tools. These tools differ depending on the type of cancer and the type of the examination that is used for diagnosis. This work concentrates on cytological images of breast cancer that are produced during fine needle aspiration biopsy examination. This kind of examination allows pathologists to estimate the malignancy of the cancer with very high accuracy. Malignancy estimation is very important when assessing a patients survival rate and the type of treatment. To achieve precise malignancy estimation, a classification framework is presented. This framework is able to classify breast cancer malignancy into two malignancy classes and is based on features calculated according to the Bloom-Richardson grading scheme. This scheme is commonly used by pathologists when grading breast cancer tissue. In Bloom-Richardson scheme two types of features are assessed depending on the magnification. Low magnification images are used for examining the dispersion of the cells in the image while the high magnification images are used for precise analysis of the cells' nuclear features. In this thesis, different types of segmentation algorithms were compared to estimate the algorithm that allows for relatively fast and accurate nuclear segmentation. Based on that segmentation a set of 34 features was extracted for further malignancy classification. For classification purposes 6 different classifiers were compared. From all of the tests a set of the best preforming features were chosen. The presented system is able to classify images of fine needle aspiration biopsy slides with high accurac

    Solução ciber-física para a gestão de edifícios suportada por dispositivos inteligentes e modelos de ambientes inteligentes

    Get PDF
    A utilização de dispositivos ligados à internet e modelos de ambientes inteligentes em sistemas de gestão de edifícios tem vindo a ganhar notoriedade nos últimos anos, sendo cada vez mais comum a sua aplicação em edifícios. Estes conceitos, de internet das coisas e ambientes inteligentes, fornecem um meio para automatizar e otimizar as operações de gestão de edifícios, levando a uma maior eficiência no uso dos recursos, diminuição de custos e aumento do conforto dos utilizadores. Contudo, muitas das soluções existentes carecem de interoperabilidade e modelos inteligentes que considerem as necessidades e requisitos únicos de edifícios individuais e as preferências e necessidades dinâmicas dos utilizadores. Como principal objetivo, esta dissertação propõe a conceção, implementação, teste e validação de uma solução robusta que integra modelos de ambientes inteligentes e mecanismos de acesso controlado a dados. A solução proposta inclui a utilização de sensores e dispositivos ligados à internet para a recolha e analise de dados em tempo real, que serão posteriormente usados para a criação de modelos de previsão de comportamento do edifício e dos seus utilizadores. Para a identificação de padrões e contextos, foram concebidos algoritmos de aprendizagem automática e técnicas de análise de dados. O acesso aos dados, da solução proposta, contempla um mecanismo de acesso seguro e eficiente, seguindo as diretrizes do Regulamento Geral sobre a Proteção de Dados (RGPD), nacional e europeu. Para suportar o uso da solução proposta, foi concebida e implementada uma interface gráfica que permite aos gestores e utilizadores do edifício monitorizarem e controlarem as operações em tempo real, proporcionando-lhes a capacidade de responder rapidamente às condições atuais, tomando decisões informadas. Esta interface gráfica, baseada em web, permite ainda consultar os dados históricos e interagir com os modelos de suporte que foram desenvolvidos. A solução proposta foi avaliada através de casos de estudo executados em ambiente realista. Os resultados destes estudos foram utilizados para avaliar a eficácia da solução proposta na melhoria do desempenho dos edifícios. Os estudos concluem que a utilização de dispositivos inteligentes e modelos de ambientes inteligentes na gestão de edifícios é uma abordagem promissora que pode culminar em melhorias significativas no desempenho e operação dos edifícios inteligentes. Esta dissertação contribui para o domínio dos edifícios inteligentes, fornecendo uma solução abrangente que integra dispositivos ligados à internet e modelos de ambientes inteligentes para melhorar o desempenho dos edifícios e o conforto dos utilizadores.The use of internet connected devices and ambient intelligence models in building management systems has been gaining notoriety in recent years, and its application in buildings is becoming more and more common. These concepts, of the internet of things and ambient intelligence, provide a means to automate and optimise building management operations, leading to greater efficiency in the use of resources, reduced costs and increased user comfort. However, many existing solutions lack interoperability and intelligent models that consider the unique needs and requirements of individual buildings and the dynamic preferences and needs of users. As the main objective, this dissertation proposes the design, implementation, testing and validation of a robust solution that integrates ambient intelligence models and controlled data access mechanisms. The proposed solution includes the use of sensors and devices connected to the internet for real-time data collection and analysis, which will be later used for the creation of forecasting models for the behaviour of the building and its users. For the identification of patterns and contexts, machine learning algorithms and data analysis techniques were designed. The data access, of the proposed solution, contemplates a safe and efficient access mechanism, following the guidelines of the national and European General Data Protection Regulation (GDPR). To support the use of the proposed solution, a graphic interface was designed and implemented to allow building managers and users to monitor and control operations in real time, providing them with the ability to quickly respond to current conditions, making informed decisions. This web-based graphical interface also allows consulting historical data and interacting with the support models that were developed. The proposed solution was evaluated through case studies executed in a realistic environment. The results of these studies were used to evaluate the effectiveness of the proposed solution in improving building performance. The studies conclude that the use of smart devices and ambient intelligence models in building management is a promising approach that can culminate in significant improvements in the performance and operation of smart buildings. This dissertation contributes to the domain of intelligent buildings by providing a comprehensive solution that integrates internet-connected devices and ambient intelligence models to improve building performance and user comfort

    Novel approaches for hierarchical classification with case studies in protein function prediction

    Get PDF
    A very large amount of research in the data mining, machine learning, statistical pattern recognition and related research communities has focused on flat classification problems. However, many problems in the real world such as hierarchical protein function prediction have their classes naturally organised into hierarchies. The task of hierarchical classification, however, needs to be better defined as researchers into one application domain are often unaware of similar efforts developed in other research areas. The first contribution of this thesis is to survey the task of hierarchical classification across different application domains and present an unifying framework for the task. After clearly defining the problem, we explore novel approaches to the task. Based on the understanding gained by surveying the task of hierarchical classification, there are three major approaches to deal with hierarchical classification problems. The first approach is to use one of the many existing flat classification algorithms to predict only the leaf classes in the hierarchy. Note that, in the training phase, this approach completely ignores the hierarchical class relationships, i.e. the parent-child and sibling class relationships, but in the testing phase the ancestral classes of an instance can be inferred from its predicted leaf classes. The second approach is to build a set of local models, by training one flat classification algorithm for each local view of the hierarchy. The two main variations of this approach are: (a) training a local flat multi-class classifier at each non-leaf class node, where each classifier discriminates among the child classes of its associated class; or (b) training a local fiat binary classifier at each node of the class hierarchy, where each classifier predicts whether or not a new instance has the classifier’s associated class. In both these variations, in the testing phase a procedure is used to combine the predictions of the set of local classifiers in a coherent way, avoiding inconsistent predictions. The third approach is to use a global-model hierarchical classification algorithm, which builds one single classification model by taking into account all the hierarchical class relationships in the training phase. In the context of this categorization of hierarchical classification approaches, the other contributions of this thesis are as follows. The second contribution of this thesis is a novel algorithm which is based on the local classifier per parent node approach. The novel algorithm is the selective representation approach that automatically selects the best protein representation to use at each non-leaf class node. The third contribution is a global-model hierarchical classification extension of the well known naive Bayes algorithm. Given the good predictive performance of the global-model hierarchical-classification naive Bayes algorithm, we relax the Naive Bayes’ assumption that attributes are independent from each other given the class by using the concept of k dependencies. Hence, we extend the flat classification /¿-Dependence Bayesian network classifier to the task of hierarchical classification, which is the fourth contribution of this thesis. Both the proposed global-model hierarchical classification Naive Bayes and the proposed global-model hierarchical /¿-Dependence Bayesian network classifier have achieved predictive accuracies that were, overall, significantly higher than the predictive accuracies obtained by their corresponding local hierarchical classification versions, across a number of datasets for the task of hierarchical protein function prediction
    corecore