34 research outputs found

    Data Mining and Machine Learning for Software Engineering

    Get PDF
    Software engineering is one of the most utilizable research areas for data mining. Developers have attempted to improve software quality by mining and analyzing software data. In any phase of software development life cycle (SDLC), while huge amount of data is produced, some design, security, or software problems may occur. In the early phases of software development, analyzing software data helps to handle these problems and lead to more accurate and timely delivery of software projects. Various data mining and machine learning studies have been conducted to deal with software engineering tasks such as defect prediction, effort estimation, etc. This study shows the open issues and presents related solutions and recommendations in software engineering, applying data mining and machine learning techniques

    AN ENHANCEMENT ON TARGETED PHISHING ATTACKS IN THE STATE OF QATAR

    Get PDF
    The latest report by Kaspersky on Spam and Phishing, listed Qatar as one of the top 10 countries by percentage of email phishing and targeted phishing attacks. Since the Qatari economy has grown exponentially and become increasingly global in nature, email phishing and targeted phishing attacks have the capacity to be devastating to the Qatari economy, yet there are no adequate measures put in place such as awareness training programmes to minimise these threats to the state of Qatar. Therefore, this research aims to explore targeted attacks in specific organisations in the state of Qatar by presenting a new technique to prevent targeted attacks. This novel enterprise-wide email phishing detection system has been used by organisations and individuals not only in the state of Qatar but also in organisations in the UK. This detection system is based on domain names by which attackers carefully register domain names which victims trust. The results show that this detection system has proven its ability to reduce email phishing attacks. Moreover, it aims to develop email phishing awareness training techniques specifically designed for the state of Qatar to complement the presented technique in order to increase email phishing awareness, focused on targeted attacks and the content, and reduce the impact of phishing email attacks. This research was carried out by developing an interactive email phishing awareness training website that has been tested by organisations in the state of Qatar. The results of this training programme proved to get effective results by training users on how to spot email phishing and targeted attacks

    Application of ant based routing and intelligent control to telecommunications network management

    Get PDF
    This thesis investigates the use of novel Artificial Intelligence techniques to improve the control of telecommunications networks. The approaches include the use of Ant-Based Routing and software Agents to encapsulate learning mechanisms to improve the performance of the Ant-System and a highly modular approach to network-node configuration and management into which this routing system can be incorporated. The management system uses intelligent Agents distributed across the nodes of the network to automate the process of network configuration. This is important in the context of increasingly complex network management, which will be accentuated with the introduction of IPv6 and QoS-aware hardware. The proposed novel solution allows an Agent, with a Neural Network based Q-Learning capability, to adapt the response speed of the Ant-System - increasing it to counteract congestion, but reducing it to improve stability otherwise. It has the ability to adapt its strategy and learn new ones for different network topologies. The solution has been shown to improve the performance of the Ant-System, as well as outperform a simple non-learning strategy which was not able to adapt to different networks. This approach has a wide region of applicability to such areas as road-traffic management, and more generally, positioning of learning techniques into complex domains. Both Agent architectures are Subsumption style, blending short-term responses with longer term goal-driven behaviour. It is predicted that this will be an important approach for the application of AI, as it allows modular design of systems in a similar fashion to the frameworks developed for interoperability of telecommunications systems

    Serial analysis of genes expressed in normal human glomerular mesangial cells

    Get PDF
    Advances in sequencing based genomics like the Human Genome Mapping Project (HGMP) have meant that the majority of the estimated human genes have been at least partially sequenced. The variation in expression of a set of essentially identical genes will provide information on the molecular basis of phenotype. Serial analysis of gene expression (SAGE) is based on the ability to assign an individual transcript to a ten base pair 'tag', and the technology facilitating rapid sampling of such tags. Glomerular mesangial cells (MC) are considered to play a major role in the development of renal disease and in vitro culturing of MC's has become a model system with which to study the molecular mechanisms of glomerular pathology. To this end, a SAGE project was undertaken to identify genes expressed in normal human mesangial cells (NHMC). Primary normal human mesangial cells were cultured for periods up to 96 hrs. A total of 46,219 tags were sampled (14,953 unique tags). Tags were mapped to 20,382 sequences. Of these 79% of tags mapped to characterised cDNAs, 16% tags mapped to ESTs. 5% of tags failed to match any database entry. The most abundant tags mapped to ribosomal genes or genes associated with the cytoskeleton. Represented in the top ten tags were the matricellular genes transgelin (1.2%), SPARC (1%) type IV collagen (0.5%) and fibronectin (0.53%), which support the notion that the MC is a producer and re-modeller of the glomerular extracellular matrix (ECM). The contractile nature of MC was apparent with the high abundance of contractile proteins like myosins and tropomyosins. Also apparent in the transcriptome were lineage specific isoforms of several genes, supporting the myoblastoid linage of MC. Comparing the transcriptomes of the MC to other libraries revealed a high correlation between cells in the same lineage as MC, such as astrocytes, smooth muscle cells and fibroblasts when compared to libraries sampled from heart, liver and various other unrelated cell lines. Understanding gene expression in the mesangial cell facilitates a greater understanding of its role in renal pathology

    Artificial Intelligence Applications to Critical Transportation Issues

    Full text link

    Combining SOA and BPM Technologies for Cross-System Process Automation

    Get PDF
    This paper summarizes the results of an industry case study that introduced a cross-system business process automation solution based on a combination of SOA and BPM standard technologies (i.e., BPMN, BPEL, WSDL). Besides discussing major weaknesses of the existing, custom-built, solution and comparing them against experiences with the developed prototype, the paper presents a course of action for transforming the current solution into the proposed solution. This includes a general approach, consisting of four distinct steps, as well as specific action items that are to be performed for every step. The discussion also covers language and tool support and challenges arising from the transformation

    Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.

    Get PDF
    For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions

    A quality model considering program architecture

    Full text link
    Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal
    corecore