34 research outputs found
Data Mining and Machine Learning for Software Engineering
Software engineering is one of the most utilizable research areas for data mining. Developers have attempted to improve software quality by mining and analyzing software data. In any phase of software development life cycle (SDLC), while huge amount of data is produced, some design, security, or software problems may occur. In the early phases of software development, analyzing software data helps to handle these problems and lead to more accurate and timely delivery of software projects. Various data mining and machine learning studies have been conducted to deal with software engineering tasks such as defect prediction, effort estimation, etc. This study shows the open issues and presents related solutions and recommendations in software engineering, applying data mining and machine learning techniques
Recommended from our members
Surety of the nation`s critical infrastructures: The challenge restructuring poses to the telecommunications sector
The telecommunications sector plays a pivotal role in the system of increasingly connected and interdependent networks that make up national infrastructure. An assessment of the probable structure and function of the bit-moving industry in the twenty-first century must include issues associated with the surety of telecommunications. The term surety, as used here, means confidence in the acceptable behavior of a system in both intended and unintended circumstances. This paper outlines various engineering approaches to surety in systems, generally, and in the telecommunications infrastructure, specifically. It uses the experience and expectations of the telecommunications system of the US as an example of the global challenges. The paper examines the principal factors underlying the change to more distributed systems in this sector, assesses surety issues associated with these changes, and suggests several possible strategies for mitigation. It also studies the ramifications of what could happen if this sector became a target for those seeking to compromise a nation`s security and economic well being. Experts in this area generally agree that the U. S. telecommunications sector will eventually respond in a way that meets market demands for surety. Questions remain open, however, about confidence in the telecommunications sector and the nation`s infrastructure during unintended circumstances--such as those posed by information warfare or by cascading software failures. Resolution of these questions is complicated by the lack of clear accountability of the private and the public sectors for the surety of telecommunications
AN ENHANCEMENT ON TARGETED PHISHING ATTACKS IN THE STATE OF QATAR
The latest report by Kaspersky on Spam and Phishing, listed Qatar as one of the top 10 countries by percentage of email phishing and targeted phishing attacks. Since the Qatari economy has grown exponentially and become increasingly global in nature, email phishing and targeted phishing attacks have the capacity to be devastating to the Qatari economy, yet there are no adequate measures put in place such as awareness training programmes to minimise these threats to the state of Qatar. Therefore, this research aims to explore targeted attacks in specific organisations in the state of Qatar by presenting a new technique to prevent targeted attacks. This novel enterprise-wide email phishing detection system has been used by organisations and individuals not only in the state of Qatar but also in organisations in the UK. This detection system is based on domain names by which attackers carefully register domain names which victims trust. The results show that this detection system has proven its ability to reduce email phishing attacks. Moreover, it aims to develop email phishing awareness training techniques specifically designed for the state of Qatar to complement the presented technique in order to increase email phishing awareness, focused on targeted attacks and the content, and reduce the impact of phishing email attacks. This research was carried out by developing an interactive email phishing awareness training website that has been tested by organisations in the state of Qatar. The results of this training programme proved to get effective results by training users on how to spot email phishing and targeted attacks
Application of ant based routing and intelligent control to telecommunications network management
This thesis investigates the use of novel Artificial Intelligence techniques to improve the control of telecommunications networks. The approaches include the use of Ant-Based Routing and software Agents to encapsulate learning mechanisms to improve the performance of the Ant-System and a highly modular approach to network-node configuration and management into which this routing system can be incorporated. The management system uses intelligent Agents distributed across the nodes of the network to automate the process of network configuration. This is important in the context of increasingly complex network management, which will be accentuated with the introduction of IPv6 and QoS-aware hardware. The proposed novel solution allows an Agent, with a Neural Network based Q-Learning capability, to adapt the response speed of the Ant-System - increasing it to counteract congestion, but reducing it to improve stability otherwise. It has the ability to adapt its strategy and learn new ones for different network topologies. The solution has been shown to improve the performance of the Ant-System, as well as outperform a simple non-learning strategy which was not able to adapt to different networks. This approach has a wide region of applicability to such areas as road-traffic management, and more generally, positioning of learning techniques into complex domains. Both Agent architectures are Subsumption style, blending short-term responses with longer term goal-driven behaviour. It is predicted that this will be an important approach for the application of AI, as it allows modular design of systems in a similar fashion to the frameworks developed for interoperability of telecommunications systems
Recommended from our members
Adoptive Transfer of HIV-Specific Cytotoxic T Lymphocytes
Several independent observations suggest that cytotoxic T lymphocytes (CTL) are critical for the control of HIV infection. We have studied the adoptive transfer of CTL in three patients with aquired immunodeficiency syndrome (AIDS). In the first patient, we examined the CD8+ T cell repertoire before and after the transfer of syngeneic lymphocytes from his uninfected sibling to confirm whether aberrations exist in the CTL repertoire during advanced HIV infection and to determine whether adoptive immunotherapy with lymphocytes can lead to sustained expansions of CD8+ cells. Repertoire analysis revealed baseline expansions in some TCR subsets. Following cell transfer, there were new changes in two V-beta families, one at 24 hours post-infusion and the other and after 28 days, post-infusion. This study demonstrated that expansion and transient restoration of both CD4+ and CD8+ T-cells can occur in vivo following sygeneic cell transfer and that maximal lymphocyte expansion occurring appears to be maximal around 4 weeks postinfusion. In the second and third patients, we studied the adoptive transfer of HIV-specific CTL clones. Despite substantial HIV-specific lytic activity in vitro, there were no significant changes in the virus load of patients following adoptive transfer. In one patient, we traced the fate of an infused clone using soluble MHC-peptide complexes and showed that cells were rapidly eliminated within hours of infusion, probably through apoptosis. The use of CTL adoptive therapy in AIDS needs to be re-examined in light of these finding. Further trials of adoptive transfer of CTL should take into account the susceptibility of infused cells to in vivo apoptosis
Serial analysis of genes expressed in normal human glomerular mesangial cells
Advances in sequencing based genomics like the Human Genome Mapping Project (HGMP) have meant that the majority of the estimated human genes have been at least partially sequenced. The variation in expression of a set of essentially identical genes will provide information on the molecular basis of phenotype. Serial analysis of gene expression (SAGE) is based on the ability to assign an individual transcript to a ten base pair 'tag', and the technology facilitating rapid sampling of such tags. Glomerular mesangial cells (MC) are considered to play a major role in the development of renal disease and in vitro culturing of MC's has become a model system with which to study the molecular mechanisms of glomerular pathology. To this end, a SAGE project was undertaken to identify genes expressed in normal human mesangial cells (NHMC). Primary normal human mesangial cells were cultured for periods up to 96 hrs. A total of 46,219 tags were sampled (14,953 unique tags). Tags were mapped to 20,382 sequences. Of these 79% of tags mapped to characterised cDNAs, 16% tags mapped to ESTs. 5% of tags failed to match any database entry. The most abundant tags mapped to ribosomal genes or genes associated with the cytoskeleton. Represented in the top ten tags were the matricellular genes transgelin (1.2%), SPARC (1%) type IV collagen (0.5%) and fibronectin (0.53%), which support the notion that the MC is a producer and re-modeller of the glomerular extracellular matrix (ECM). The contractile nature of MC was apparent with the high abundance of contractile proteins like myosins and tropomyosins. Also apparent in the transcriptome were lineage specific isoforms of several genes, supporting the myoblastoid linage of MC. Comparing the transcriptomes of the MC to other libraries revealed a high correlation between cells in the same lineage as MC, such as astrocytes, smooth muscle cells and fibroblasts when compared to libraries sampled from heart, liver and various other unrelated cell lines. Understanding gene expression in the mesangial cell facilitates a greater understanding of its role in renal pathology
Combining SOA and BPM Technologies for Cross-System Process Automation
This paper summarizes the results of an industry case study that introduced a cross-system business process automation solution based on a combination of SOA and BPM standard technologies (i.e., BPMN, BPEL, WSDL). Besides discussing major weaknesses of the existing, custom-built, solution and comparing them against experiences with the developed prototype, the paper presents a course of action for transforming the current solution into the proposed solution. This includes a general approach, consisting of four distinct steps, as well as specific action items that are to be performed for every step. The discussion also covers language and tool support and challenges arising from the transformation
Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.
For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack
of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical
investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes
of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained
whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from
imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective
GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions
A quality model considering program architecture
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal