209 research outputs found

    SecREP : A Framework for Automating the Extraction and Prioritization of Security Requirements Using Machine Learning and NLP Techniques

    Get PDF
    Gathering and extracting security requirements adequately requires extensive effort, experience, and time, as large amounts of data need to be analyzed. While many manual and academic approaches have been developed to tackle the discipline of Security Requirements Engineering (SRE), a need still exists for automating the SRE process. This need stems mainly from the difficult, error-prone, and time-consuming nature of traditional and manual frameworks. Machine learning techniques have been widely used to facilitate and automate the extraction of useful information from software requirements documents and artifacts. Such approaches can be utilized to yield beneficial results in automating the process of extracting and eliciting security requirements. However, the extraction of security requirements alone leaves software engineers with yet another tedious task of prioritizing the most critical security requirements. The competitive and fast-paced nature of software development, in addition to resource constraints make the process of security requirements prioritization crucial for software engineers to make educated decisions in risk-analysis and trade-off analysis. To that end, this thesis presents an automated framework/pipeline for extracting and prioritizing security requirements. The proposed framework, called the Security Requirements Extraction and Prioritization Framework (SecREP) consists of two parts: SecREP Part 1: Proposes a machine learning approach for identifying/extracting security requirements from natural language software requirements artifacts (e.g., the Software Requirement Specification document, known as the SRS documents) SecREP Part 2: Proposes a scheme for prioritizing the security requirements identified in the previous step. For the first part of the SecREP framework, three machine learning models (SVM, Naive Bayes, and Random Forest) were trained using an enhanced dataset the “SecREP Dataset” that was created as a result of this work. Each model was validated using resampling (80% of for training and 20% for validation) and 5-folds cross validation techniques. For the second part of the SecREP framework, a prioritization scheme was established with the aid of NLP techniques. The proposed prioritization scheme analyzes each security requirement using Part-of-speech (POS) and Named Entity Recognition methods to extract assets, security attributes, and threats from the security requirement. Additionally, using a text similarity method, each security requirement is compared to a super-sentence that was defined based on the STRIDE threat model. This prioritization scheme was applied to the extracted list of security requirements obtained from the case study in part one, and the priority score for each requirement was calculated and showcase

    Optimisation Method for Training Deep Neural Networks in Classification of Non- functional Requirements

    Get PDF
    Non-functional requirements (NFRs) are regarded critical to a software system's success. The majority of NFR detection and classification solutions have relied on supervised machine learning models. It is hindered by the lack of labelled data for training and necessitate a significant amount of time spent on feature engineering. In this work we explore emerging deep learning techniques to reduce the burden of feature engineering. The goal of this study is to develop an autonomous system that can classify NFRs into multiple classes based on a labelled corpus. In the first section of the thesis, we standardise the NFRs ontology and annotations to produce a corpus based on five attributes: usability, reliability, efficiency, maintainability, and portability. In the second section, the design and implementation of four neural networks, including the artificial neural network, convolutional neural network, long short-term memory, and gated recurrent unit are examined to classify NFRs. These models, necessitate a large corpus. To overcome this limitation, we proposed a new paradigm for data augmentation. This method uses a sort and concatenates strategy to combine two phrases from the same class, resulting in a two-fold increase in data size while keeping the domain vocabulary intact. We compared our method to a baseline (no augmentation) and an existing approach Easy data augmentation (EDA) with pre-trained word embeddings. All training has been performed under two modifications to the data; augmentation on the entire data before train/validation split vs augmentation on train set only. Our findings show that as compared to EDA and baseline, NFRs classification model improved greatly, and CNN outperformed when trained using our suggested technique in the first setting. However, we saw a slight boost in the second experimental setup with just train set augmentation. As a result, we can determine that augmentation of the validation is required in order to achieve acceptable results with our proposed approach. We hope that our ideas will inspire new data augmentation techniques, whether they are generic or task specific. Furthermore, it would also be useful to implement this strategy in other languages

    Design and Deployment of an Access Control Module for Data Lakes

    Get PDF
    Nowadays big data is considered an extremely valued asset for companies, which are discovering new avenues to use it for their business profit. However, an organization’s ability to effectively extract valuable information from data is based on its knowledge management infrastructure. Thus, most organizations are transitioning from data warehouse (DW) storages to data lake (DL) infrastructures, from which further insights are derived. The present work is carried out as part of a cybersecurity project in a financial institution that manages vast volumes and variety of data that is kept in a data lake. Although DL is presented as the answer to the current big data scenario, this infrastructure presents certain flaws on authentication and access control. Preceding work on DL access control points out that the main goal is to avoid fraudulent behaviors derived from user’s access, such as secondary use1, that could result in business data being exposed to third parties. To overcome the risk, traditional mechanisms attempt to identify these behaviors based on rules, however, they cannot reveal all different kinds of fraud because they only look for known patterns of misuse. The present work proposes a novel access control system for data lakes, assisted by Oracle’s database audit trail and based on anomaly detection mechanisms, that automatically looks for events that do not conform the normal or expected behavior. Thus, the overall aim of this project is to develop and deploy an automated system for identifying abnormal accesses to the DL, which can be separated into four subgoals: explore the different technologies that could be applied in the domain of anomaly detection, design the solution, deploy it, and evaluate the results. For the purpose, feature engineering is performed, and four different unsupervised ML models are built and evaluated. According to the quality of the results, the better model is finally productionalized with Docker. To conclude, although anomaly detection has been a lasting yet active research area for several decades, there are still some unique problem complexities and challenges that leave the way open for the proposed solution to be further improved.Doble Grado en Ingeniería Informática y Administración de Empresa

    Dispatcher3 D1.1 - Technical resources and problem definition

    Get PDF
    This deliverable starts with the proposal of Dispatcher3 and incorporates the development produced during the first five months of the project: activities on different workpackages, interaction with Topic Manager and Project Officer, and input received during the first Advisory Board meeting and follow up consultations. This deliverable presents the definition of Dispatcher3 concept and methodology. It includes the high level the requirements of the prototype, preliminary data requirements, preliminary technical infrastructure requirements, preliminary data processing and analytic techniques identification and a preliminary definition of scenarios. The deliverable aims at defining the view of the consortium on the project at these early stages, incorporating the feedback obtained from the Advisory Board and highlighting the further activities required to define some of the aspects of the project

    On the Intersection of Explainable and Reliable AI for physical fatigue prediction

    Get PDF
    In the era of Industry 4.0, the use of Artificial Intelligence (AI) is widespread in occupational settings. Since dealing with human safety, explainability and trustworthiness of AI are even more important than achieving high accuracy. eXplainable AI (XAI) is investigated in this paper to detect physical fatigue during manual material handling task simulation. Besides comparing global rule-based XAI models (LLM and DT) to black-box models (NN, SVM, XGBoost) in terms of performance, we also compare global models with local ones (LIME over XGBoost). Surprisingly, global and local approaches achieve similar conclusions, in terms of feature importance. Moreover, an expansion from local rules to global rules is designed for Anchors, by posing an appropriate optimization method (Anchors coverage is enlarged from an original low value, 11%, up to 43%). As far as trustworthiness is concerned, rule sensitivity analysis drives the identification of optimized regions in the feature space, where physical fatigue is predicted with zero statistical error. The discovery of such “non-fatigue regions” helps certifying the organizational and clinical decision making

    EFFECTIVE METHODS AND TOOLS FOR MINING APP STORE REVIEWS

    Get PDF
    Research on mining user reviews in mobile application (app) stores has noticeably advanced in the past few years. The main objective is to extract useful information that app developers can use to build more sustainable apps. In general, existing research on app store mining can be classified into three genres: classification of user feedback into different types of software maintenance requests (e.g., bug reports and feature requests), building practical tools that are readily available for developers to use, and proposing visions for enhanced mobile app stores that integrate multiple sources of user feedback to ensure app survivability. Despite these major advances, existing tools and techniques still suffer from several drawbacks. Specifically, the majority of techniques rely on the textual content of user reviews for classification. However, due to the inherently diverse and unstructured nature of user-generated online textual reviews, text-based review mining techniques often produce excessively complicated models that are prone to over-fitting. Furthermore, the majority of proposed techniques focus on extracting and classifying the functional requirements in mobile app reviews, providing a little or no support for extracting and synthesizing the non-functional requirements (NFRs) raised in user feedback (e.g., security, reliability, and usability). In terms of tool support, existing tools are still far from being adequate for practical applications. In general, there is a lack of off-the-shelf tools that can be used by researchers and practitioners to accurately mine user reviews. Motivated by these observations, in this dissertation, we explore several research directions aimed at addressing the current issues and shortcomings in app store review mining research. In particular, we introduce a novel semantically aware approach for mining and classifying functional requirements from app store reviews. This approach reduces the dimensionality of the data and enhances the predictive capabilities of the classifier. We then present a two-phase study aimed at automatically capturing the NFRs in user reviews. We also introduce MARC, a tool that enables developers to extract, classify, and summarize user reviews

    Extracting Non-Functional Requirements from Unstructured Text

    Get PDF
    Non-functional requirements (NFRs) of a software system describe desired quality attributes rather than specific user-visible features; NFRs model stakeholder expectations about pervasive system properties such as performance, security, reliability, and usability. Failing to meet these expectations can result in systems that, while functionally complete, may lead to user dissatisfaction and ultimately to the failure of the product. While NFRs may be documented, tracked, and evaluated in a variety of ways during development, there is no single common approach to doing so. In this work, we investigate extracting information about NFRs that may be contained in source code and source code comments, since they are often considered to be the ultimate source of ground truth about a software system. Specifically, we examine how often NFRs are mentioned explicitly or implicitly in source code comments by using natural language processing (NLP) techniques, and we evaluate how effectively they can be identified using machine learning (ML). We modeled the problem as a text classification problem in which the goal is to identify comments about NFRs, and we evaluated the classifiers using example systems from the electronic health records (EHR) domain. The best performance was achieved using SVM classifier, with an F1 measure of 0.86. Our results indicate that using supervised method for our problem outperforms unsupervised methods which try to find common NFR patterns in comments. Comparing our results to previous studies shows that NFRs can be extracted more accurately from source code comments compared to other software artifacts (e.g., SRS or RFP documents). Moreover, we found that bag-of-words features are more effective compared to more complicated features (i.e., doc2vec) for the problem of extracting NFRs from source code comments

    Understanding and Managing Non-functional Requirements for Machine Learning Systems

    Get PDF
    Background: Machine Learning (ML) systems learn using big data and solve a wide range of prediction and decision making problems that would be difficult to solve with traditional systems. However, increasing use of ML in complex and safety-critical systems has raised concerns about quality requirements, which are defined as Non-Functional requirements (NFRs). Many NFRs, such as fairness, transparency, explainability, and safety are critical in ensuring the success and acceptance of ML systems. However, many NFRs for ML systems are not well understood (e.g., maintainability), some known NFRs may become more important (e.g., fairness), while some may become irrelevant in the ML context (e.g., modularity), some new NFRs may come into play (e.g., retrainability), and the scope of defining and measuring NFRs in ML systems is also a challenging task.Objective: The research project focuses on addressing and managing issues related to NFRs for ML systems. The objective of the research is to identify current practices and challenges related to NFRs in an ML context, and to develop solutions to manage NFRs for ML systems.Method: We are using design science as a base of the research method. We carried out different empirical methodologies–including interviews, survey, and a part of systematic mapping study to collect data, and to explore the problem space. To get in-depth insights on collected data, we performed thematic analysis on qualitative data and used descriptive statistics to analyze qualitative data. We are working towards proposing a quality framework as an artifact to identify, define, specify, and manage NFRs for ML systems.Findings: We found that NFRs are crucial and play an important role for the success of the ML systems. However, there is a research gap in this area, and managing NFRs for ML systems is challenging. To address the research objectives, we have identified important NFRs for ML systems, and NFR and NFR measurement-related challenges. We also identified preliminary NFR definition and measurement scope and RE-related challenges in different example contexts.Conclusion: Although NFRs are very important for ML systems, it is complex and difficult to define, allocate, specify, and measure NFRs for ML systems. Currently the industry and research is does not have specific and well organized solutions for managing NFRs for ML systems because of unintended bias, the non-deterministic behavior of ML, and expensive and time-consuming exhaustive testing. Currently, we are working on the development of a quality framework to manage (e.g., identify important NFRs, scoping and measuring NFRs) NFRs in the ML systems development process
    • …
    corecore