46 research outputs found
Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation
Large language models (LLMs) have brought significant advancements to code
generation, benefiting both novice and experienced developers. However, their
training using unsanitized data from open-source repositories, like GitHub,
introduces the risk of inadvertently propagating security vulnerabilities. To
effectively mitigate this concern, this paper presents a comprehensive study
focused on evaluating and enhancing code LLMs from a software security
perspective. We introduce SecuCoGen\footnote{SecuCoGen has been uploaded as
supplemental material and will be made publicly available after publication.},
a meticulously curated dataset targeting 21 critical vulnerability types.
SecuCoGen comprises 180 samples and serves as the foundation for conducting
experiments on three crucial code-related tasks: code generation, code repair
and vulnerability classification, with a strong emphasis on security. Our
experimental results reveal that existing models often overlook security
concerns during code generation, leading to the generation of vulnerable code.
To address this, we propose effective approaches to mitigate the security
vulnerabilities and enhance the overall robustness of code generated by LLMs.
Moreover, our study identifies weaknesses in existing models' ability to repair
vulnerable code, even when provided with vulnerability information.
Additionally, certain vulnerability types pose challenges for the models,
hindering their performance in vulnerability classification. Based on these
findings, we believe our study will have a positive impact on the software
engineering community, inspiring the development of improved methods for
training and utilizing LLMs, thereby leading to safer and more trustworthy
model deployment
DBBRBF- Convalesce optimization for software defect prediction problem using hybrid distribution base balance instance selection and radial basis Function classifier
Software is becoming an indigenous part of human life with the rapid
development of software engineering, demands the software to be most reliable.
The reliability check can be done by efficient software testing methods using
historical software prediction data for development of a quality software
system. Machine Learning plays a vital role in optimizing the prediction of
defect-prone modules in real life software for its effectiveness. The software
defect prediction data has class imbalance problem with a low ratio of
defective class to non-defective class, urges an efficient machine learning
classification technique which otherwise degrades the performance of the
classification. To alleviate this problem, this paper introduces a novel hybrid
instance-based classification by combining distribution base balance based
instance selection and radial basis function neural network classifier model
(DBBRBF) to obtain the best prediction in comparison to the existing research.
Class imbalanced data sets of NASA, Promise and Softlab were used for the
experimental analysis. The experimental results in terms of Accuracy,
F-measure, AUC, Recall, Precision, and Balance show the effectiveness of the
proposed approach. Finally, Statistical significance tests are carried out to
understand the suitability of the proposed model.Comment: 32 pages, 24 Tables, 8 Figures
Understanding the relationship between Kano model’s customer satisfaction scores and self-stated requirements importance
Customer satisfaction is the result of product quality and viability.The place of the perceived satisfaction of users/customers for a software product cannot be neglected especially in today competitive market environment as it drives the loyalty of customers and promotes high profitability and return on investment.Therefore understanding the importance of requirements as it is associated with the satisfaction of users/customers when their requirements are met is worth the pain considering.It is necessary to know the relationship between customer satisfactions when their requirements are met (or their dissatisfaction when their requirements are unmet) and the importance of such requirement.So many works have been carried out on customer satisfaction in connection with the importance of requirements but the relationship between customer satisfaction scores (coefficients) of the Kano model and users/customers self-stated requirements importance have not been sufficiently explored. In this study, an attempt is made to unravel the underlying relationship existing between Kano model’s customer satisfaction indexes and users/customers self reported requirements importance.The results of the study indicate some interesting associations between these considered variables.These bivariate associations reveal that customer satisfaction index (SI), and average satisfaction coefficient (ASC) and customer dissatisfaction index (DI) and average satisfaction coefficient (ASC) are highly correlated (r = 96 %) and thus ASC can be used in place of either SI or DI in representing customer satisfaction scores.Also, these Kano model’s customer satisfaction variables (SI, DI, and ASC) are each associated with self-stated requirements importance (IMP).Further analysis indicates that the value customers or users place on requirements that are met or on features that are incorporated into a product influences the level of satisfaction such customers derive from the product.The worth of a product feature is indicated by the perceived satisfaction customers get from the inclusion of such feature in the product design and development.The satisfaction users/customers derive when a requirement is fulfilled or when a feature is placed in the product (SI or ASC) is strongly influenced by the value the users/customers place on such requirements/features when met (IMP).However, the dissatisfaction users/customers received when a requirement is not met or when a feature is not incorporated into the product (DI), even though related to self-stated requirements importance (IMP), does not have a strong effect on the importance/worth (IMP) of that given requirement/feature as perceived by the users or customers. Therefore, since customer satisfaction is proportionally related to the perceived requirements importance (worth), it is then necessary to give adequate attention to user/customer satisfying requirements (features) from elicitation to design and to the final implementation of the design. Incorporating user or customer satisfying requirements in product design is of great worth or value to the future users or customers of the product
Evidence based medical query system on large scale data
Title from PDF of title page, viewed on July 30, 2014Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 62-65)Thesis (M. S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2014As huge amounts of data are created rapidly, the demand for the integration and analysis of such data has been growing steadily. It is especially essential to retrieve relevant and accurate evidence in healthcare and biomedical research. Even though query systems based on Ontology, Medical Subject Headings (MeSH), or keyword searches are available, query systems based on evidence and effective retrieval of data from large collections of clinical data are not sufficiently available. This thesis proposes a novel approach to analyze big data sets collected from Clinical trials research and discover significant evidence and association patterns with respect to conditions, treatment, and medication side effects. Our approach makes use of machine learning techniques in the Apache Hadoop framework with support from MetaMap and RxNorm. In this thesis, a heuristic measure of empirical evidence was newly designed considering the association degree of conditions, treatment, and medication side effects and the percentage of people affected. The Apriori algorithm was used to discover strong positive association rules with various measures including support, and confidence. We have examined a large and complex data set (12,327 study results) from clinicaltrials.gov and identified 8,291 strong association rules and 59,228 combinations with 432,841 subjects, 1761 conditions, 2836 drugs, and 27 side effects. The significance of these association patterns was evaluated in terms of the impact factor representing the percentage of the population with a high rate of side effects. Using these association rules and combination strengths, an evidence based query system was implemented to answer some integral questions. This query system also provided an interface to retrieve relevant publications from PubMed. The searching outcomes from this query system are compared with those from the PubMed search based on medical subject headings.Abstract -- Illustrations -- Tables -- Introductions -- Related work -- Evidence based medical query model -- Implementation -- Results & Evaluation -- Conclusion and future work -- Reference
Discovering business process simulation models in the presence of multitasking and availability constraints
Business process simulation is a versatile technique for quantitative analysis of business
processes. A well-known limitation of process simulation is that the accuracy of the simulation
results is limited by the faithfulness of the process model and simulation parameters given as
input to the simulator. To tackle this limitation, various authors have proposed to discover
simulation models from process execution logs, so that the resulting simulation models more
closely match reality. However, existing techniques in this field make certain assumptions
about resource behavior that do not typically hold in practice, including: (i) that each resource
performs one task at a time; and (ii) that resources are continuously available (24/7). In reality,
resources may engage in multitasking behavior and they work only during certain periods
of the day or the week. This article proposes an approach to discover process simulation
models from execution logs in the presence of multitasking and availability constraints. To
account for multitasking, we adjust the processing times of tasks in such a way that executing
the multitasked tasks sequentially with the adjusted times is equivalent to executing them
concurrently with the original times. Meanwhile, to account for availability constraints, we
use an algorithm for discovering calendar expressions from collections of time-points to infer
resource timetables from an execution log. We then adjust the parameters of this algorithm
to maximize the similarity between the simulated log and the original one. We evaluate the
approach using real-life and synthetic datasets. The results show that the approach improves
the accuracy of simulation models discovered from execution logs both in the presence of
multitasking and availability constraintsEuropean Research Council PIX 834141Ministerio de Ciencia, Innovación y Universidades OPHELIA RTI2018-101204-B-C22Junta de Andalucía EKIPMENTPLUS (P18–FR–2895
Shopping Assistant App For People With Visual Impairment: An Acceptance Evaluation
Visual impairment refers to when someone lose part or all of the
ability to see. People with visual impairment has many limitations including the freedom of doing grocery shopping independently. They will have difficulty to read ingredients or dietary information which usually returned in small font letters on the products. This information is deemed important to make informed decision in order to purchase products. Therefore, this research is conducted to investigate the need of grocery shopping assistant app for people with visual impairment and their acceptance level. An empirical investigation method is adapted and data was collected based on Technology Acceptance Model (TAM). The evaluation results indicate that the people with visual impairment positively inclined towards utilizing shopping assistant app caused by the technology is
easy to use and therefore they can obtain benefit from the app, concluding that Perceived Ease of Use is a better indicator for the attitude towards using the shopping assistant app
Shopping Assistant App For People With Visual Impairment: An Acceptance Evaluation
Visual impairment refers to when someone lose part or all of the
ability to see. People with visual impairment has many limitations including the freedom of doing grocery shopping independently. They will have difficulty to read ingredients or dietary information which usually returned in small font letters on the products. This information is deemed important to make informed decision in order to purchase products. Therefore, this research is conducted to investigate the need of grocery shopping assistant app for people with visual impairment and their acceptance level. An empirical investigation method is adapted and data was collected based on Technology Acceptance Model (TAM). The evaluation results indicate that the people with visual impairment positively inclined towards utilizing shopping assistant app caused by the technology is
easy to use and therefore they can obtain benefit from the app, concluding that Perceived Ease of Use is a better indicator for the attitude towards using the shopping assistant app
Distributed RDF query processing and reasoning for big data / linked data
Title from PDF of title page, viewed on August 27, 2014Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 61-65)Thesis (M. S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2014The Linked Data Movement is aimed at converting unstructured and semi-structured
data on the documents to semantically connected documents called the "web of data." This is
based on Resource Description Framework (RDF) that represents the semantic data and a
collection of such statements shapes an RDF graph. SPARQL is a query language designed
specifically to query RDF data. Linked Data faces the same challenge that Big Data does. We
now lead the way to a new wave of a new paradigm, Big Data and Linked Data that identify
massive amounts of data in a connected form. Indeed, utilizing Linked Data and Big Data
continue to be in high demand. Therefore, we need a scalable and accessible query system
for the reusability and availability of existing web data. However, existing SPAQL query
systems are not sufficiently scalable for Big Data and Linked Data.
In this thesis, we address an issue of how to improve the scalability and performance
of query processing with Big Data / Linked Data. Our aim is to evaluate and assess presently
available SPARQL query engines and develop an effective model to query RDF data that
should be scalable with reasoning capabilities. We designed an efficient and distributed
SPARQL engine using MapReduce (parallel and distributed processing for large data sets on
a cluster) and the Apache Cassandra database (scalable and highly available peer to peer distributed database system). We evaluated an existing in-memory based ARQ engine
provided by Jena framework and found that it cannot handle large datasets, as it only works
based on the in-memory feature of the system. It was shown that the proposed model had
powerful reasoning capabilities and dealt efficiently with big datasetsAbstract -- Illistrations -- Tables -- Introduction -- Background and related work -- Graph-store based SPARQL model -- Graph-store based SPARQL model implementation -- Results and evaluation -- Conclusion and future work -- Reference