46 research outputs found

    Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

    Full text link
    Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities. To effectively mitigate this concern, this paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective. We introduce SecuCoGen\footnote{SecuCoGen has been uploaded as supplemental material and will be made publicly available after publication.}, a meticulously curated dataset targeting 21 critical vulnerability types. SecuCoGen comprises 180 samples and serves as the foundation for conducting experiments on three crucial code-related tasks: code generation, code repair and vulnerability classification, with a strong emphasis on security. Our experimental results reveal that existing models often overlook security concerns during code generation, leading to the generation of vulnerable code. To address this, we propose effective approaches to mitigate the security vulnerabilities and enhance the overall robustness of code generated by LLMs. Moreover, our study identifies weaknesses in existing models' ability to repair vulnerable code, even when provided with vulnerability information. Additionally, certain vulnerability types pose challenges for the models, hindering their performance in vulnerability classification. Based on these findings, we believe our study will have a positive impact on the software engineering community, inspiring the development of improved methods for training and utilizing LLMs, thereby leading to safer and more trustworthy model deployment

    DBBRBF- Convalesce optimization for software defect prediction problem using hybrid distribution base balance instance selection and radial basis Function classifier

    Full text link
    Software is becoming an indigenous part of human life with the rapid development of software engineering, demands the software to be most reliable. The reliability check can be done by efficient software testing methods using historical software prediction data for development of a quality software system. Machine Learning plays a vital role in optimizing the prediction of defect-prone modules in real life software for its effectiveness. The software defect prediction data has class imbalance problem with a low ratio of defective class to non-defective class, urges an efficient machine learning classification technique which otherwise degrades the performance of the classification. To alleviate this problem, this paper introduces a novel hybrid instance-based classification by combining distribution base balance based instance selection and radial basis function neural network classifier model (DBBRBF) to obtain the best prediction in comparison to the existing research. Class imbalanced data sets of NASA, Promise and Softlab were used for the experimental analysis. The experimental results in terms of Accuracy, F-measure, AUC, Recall, Precision, and Balance show the effectiveness of the proposed approach. Finally, Statistical significance tests are carried out to understand the suitability of the proposed model.Comment: 32 pages, 24 Tables, 8 Figures

    Understanding the relationship between Kano model’s customer satisfaction scores and self-stated requirements importance

    Get PDF
    Customer satisfaction is the result of product quality and viability.The place of the perceived satisfaction of users/customers for a software product cannot be neglected especially in today competitive market environment as it drives the loyalty of customers and promotes high profitability and return on investment.Therefore understanding the importance of requirements as it is associated with the satisfaction of users/customers when their requirements are met is worth the pain considering.It is necessary to know the relationship between customer satisfactions when their requirements are met (or their dissatisfaction when their requirements are unmet) and the importance of such requirement.So many works have been carried out on customer satisfaction in connection with the importance of requirements but the relationship between customer satisfaction scores (coefficients) of the Kano model and users/customers self-stated requirements importance have not been sufficiently explored. In this study, an attempt is made to unravel the underlying relationship existing between Kano model’s customer satisfaction indexes and users/customers self reported requirements importance.The results of the study indicate some interesting associations between these considered variables.These bivariate associations reveal that customer satisfaction index (SI), and average satisfaction coefficient (ASC) and customer dissatisfaction index (DI) and average satisfaction coefficient (ASC) are highly correlated (r = 96 %) and thus ASC can be used in place of either SI or DI in representing customer satisfaction scores.Also, these Kano model’s customer satisfaction variables (SI, DI, and ASC) are each associated with self-stated requirements importance (IMP).Further analysis indicates that the value customers or users place on requirements that are met or on features that are incorporated into a product influences the level of satisfaction such customers derive from the product.The worth of a product feature is indicated by the perceived satisfaction customers get from the inclusion of such feature in the product design and development.The satisfaction users/customers derive when a requirement is fulfilled or when a feature is placed in the product (SI or ASC) is strongly influenced by the value the users/customers place on such requirements/features when met (IMP).However, the dissatisfaction users/customers received when a requirement is not met or when a feature is not incorporated into the product (DI), even though related to self-stated requirements importance (IMP), does not have a strong effect on the importance/worth (IMP) of that given requirement/feature as perceived by the users or customers. Therefore, since customer satisfaction is proportionally related to the perceived requirements importance (worth), it is then necessary to give adequate attention to user/customer satisfying requirements (features) from elicitation to design and to the final implementation of the design. Incorporating user or customer satisfying requirements in product design is of great worth or value to the future users or customers of the product

    Evidence based medical query system on large scale data

    Get PDF
    Title from PDF of title page, viewed on July 30, 2014Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 62-65)Thesis (M. S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2014As huge amounts of data are created rapidly, the demand for the integration and analysis of such data has been growing steadily. It is especially essential to retrieve relevant and accurate evidence in healthcare and biomedical research. Even though query systems based on Ontology, Medical Subject Headings (MeSH), or keyword searches are available, query systems based on evidence and effective retrieval of data from large collections of clinical data are not sufficiently available. This thesis proposes a novel approach to analyze big data sets collected from Clinical trials research and discover significant evidence and association patterns with respect to conditions, treatment, and medication side effects. Our approach makes use of machine learning techniques in the Apache Hadoop framework with support from MetaMap and RxNorm. In this thesis, a heuristic measure of empirical evidence was newly designed considering the association degree of conditions, treatment, and medication side effects and the percentage of people affected. The Apriori algorithm was used to discover strong positive association rules with various measures including support, and confidence. We have examined a large and complex data set (12,327 study results) from clinicaltrials.gov and identified 8,291 strong association rules and 59,228 combinations with 432,841 subjects, 1761 conditions, 2836 drugs, and 27 side effects. The significance of these association patterns was evaluated in terms of the impact factor representing the percentage of the population with a high rate of side effects. Using these association rules and combination strengths, an evidence based query system was implemented to answer some integral questions. This query system also provided an interface to retrieve relevant publications from PubMed. The searching outcomes from this query system are compared with those from the PubMed search based on medical subject headings.Abstract -- Illustrations -- Tables -- Introductions -- Related work -- Evidence based medical query model -- Implementation -- Results & Evaluation -- Conclusion and future work -- Reference

    Discovering business process simulation models in the presence of multitasking and availability constraints

    Get PDF
    Business process simulation is a versatile technique for quantitative analysis of business processes. A well-known limitation of process simulation is that the accuracy of the simulation results is limited by the faithfulness of the process model and simulation parameters given as input to the simulator. To tackle this limitation, various authors have proposed to discover simulation models from process execution logs, so that the resulting simulation models more closely match reality. However, existing techniques in this field make certain assumptions about resource behavior that do not typically hold in practice, including: (i) that each resource performs one task at a time; and (ii) that resources are continuously available (24/7). In reality, resources may engage in multitasking behavior and they work only during certain periods of the day or the week. This article proposes an approach to discover process simulation models from execution logs in the presence of multitasking and availability constraints. To account for multitasking, we adjust the processing times of tasks in such a way that executing the multitasked tasks sequentially with the adjusted times is equivalent to executing them concurrently with the original times. Meanwhile, to account for availability constraints, we use an algorithm for discovering calendar expressions from collections of time-points to infer resource timetables from an execution log. We then adjust the parameters of this algorithm to maximize the similarity between the simulated log and the original one. We evaluate the approach using real-life and synthetic datasets. The results show that the approach improves the accuracy of simulation models discovered from execution logs both in the presence of multitasking and availability constraintsEuropean Research Council PIX 834141Ministerio de Ciencia, Innovación y Universidades OPHELIA RTI2018-101204-B-C22Junta de Andalucía EKIPMENTPLUS (P18–FR–2895

    Shopping Assistant App For People With Visual Impairment: An Acceptance Evaluation

    Get PDF
    Visual impairment refers to when someone lose part or all of the ability to see. People with visual impairment has many limitations including the freedom of doing grocery shopping independently. They will have difficulty to read ingredients or dietary information which usually returned in small font letters on the products. This information is deemed important to make informed decision in order to purchase products. Therefore, this research is conducted to investigate the need of grocery shopping assistant app for people with visual impairment and their acceptance level. An empirical investigation method is adapted and data was collected based on Technology Acceptance Model (TAM). The evaluation results indicate that the people with visual impairment positively inclined towards utilizing shopping assistant app caused by the technology is easy to use and therefore they can obtain benefit from the app, concluding that Perceived Ease of Use is a better indicator for the attitude towards using the shopping assistant app

    Shopping Assistant App For People With Visual Impairment: An Acceptance Evaluation

    Get PDF
    Visual impairment refers to when someone lose part or all of the ability to see. People with visual impairment has many limitations including the freedom of doing grocery shopping independently. They will have difficulty to read ingredients or dietary information which usually returned in small font letters on the products. This information is deemed important to make informed decision in order to purchase products. Therefore, this research is conducted to investigate the need of grocery shopping assistant app for people with visual impairment and their acceptance level. An empirical investigation method is adapted and data was collected based on Technology Acceptance Model (TAM). The evaluation results indicate that the people with visual impairment positively inclined towards utilizing shopping assistant app caused by the technology is easy to use and therefore they can obtain benefit from the app, concluding that Perceived Ease of Use is a better indicator for the attitude towards using the shopping assistant app

    Distributed RDF query processing and reasoning for big data / linked data

    Get PDF
    Title from PDF of title page, viewed on August 27, 2014Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 61-65)Thesis (M. S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2014The Linked Data Movement is aimed at converting unstructured and semi-structured data on the documents to semantically connected documents called the "web of data." This is based on Resource Description Framework (RDF) that represents the semantic data and a collection of such statements shapes an RDF graph. SPARQL is a query language designed specifically to query RDF data. Linked Data faces the same challenge that Big Data does. We now lead the way to a new wave of a new paradigm, Big Data and Linked Data that identify massive amounts of data in a connected form. Indeed, utilizing Linked Data and Big Data continue to be in high demand. Therefore, we need a scalable and accessible query system for the reusability and availability of existing web data. However, existing SPAQL query systems are not sufficiently scalable for Big Data and Linked Data. In this thesis, we address an issue of how to improve the scalability and performance of query processing with Big Data / Linked Data. Our aim is to evaluate and assess presently available SPARQL query engines and develop an effective model to query RDF data that should be scalable with reasoning capabilities. We designed an efficient and distributed SPARQL engine using MapReduce (parallel and distributed processing for large data sets on a cluster) and the Apache Cassandra database (scalable and highly available peer to peer distributed database system). We evaluated an existing in-memory based ARQ engine provided by Jena framework and found that it cannot handle large datasets, as it only works based on the in-memory feature of the system. It was shown that the proposed model had powerful reasoning capabilities and dealt efficiently with big datasetsAbstract -- Illistrations -- Tables -- Introduction -- Background and related work -- Graph-store based SPARQL model -- Graph-store based SPARQL model implementation -- Results and evaluation -- Conclusion and future work -- Reference
    corecore