607 research outputs found

    Probabilistic SynSet Based Concept Location

    Get PDF
    Concept location is a common task in program comprehension techniques, essential in many approaches used for software care and software evolution. An important goal of this process is to discover a mapping between source code and human oriented concepts. Although programs are written in a strict and formal language, natural language terms and sentences like identifiers (variables or functions names), constant strings or comments, can still be found embedded in programs. Using terminology concepts and natural language processing techniques these terms can be exploited to discover clues about which real world concepts source code is addressing. This work extends symbol tables build by compilers with ontology driven constructs, extends synonym sets defined by linguistics, with automatically created Probabilistic SynSets from software domain parallel corpora. And using a relational algebra, creates semantic bridges between program elements and human oriented concepts, to enhance concept location tasks

    REmail - Integrating e-mail Communication in the Eclipse IDE

    Get PDF
    Během vývoje softwaru musí vývojáři mezi sebou komunikovat. Zvláště pokud pracují v distribuovaném prostředí. Například na open source projektech jsou nuceni využít různých asynchronních metod komunikace. Ze studií vyplývá, že ve srovnání s instatními zprávami, komentáři zdrojového kódu, či komentáři verzovacích systémů e-mail představuje zdaleka nejpoužívanější způsob komunikace při distribuovaném vývoji softwaru. Lze si proto představit, že archívy vývojářských e-mailů obsahují podstatné informace o nejrůznějších entitách zdrojového kódu. Časem však se takové informace ztrácejí, jelikož tyto e-maily je těžké dohledat. Proto jsme vyvinuli REmail, zásuvný modul pro Eclise, integrující e-mailovou komunikaci do IDE. Umožňuje vývojářům pracovat souběžně se zdrojovým kódem a e-maily, které jej diskutují, bez nutnosti opuštění IDE. Využitím relativně výpočetně nenáročných technik REmail dohledá všechny e-maily relevantní k vybrané entitě zdrojového kódu a umožní vývojáři s nimi pracovat.Developers of software systems have to communicate about the project they are building. Especially when working in a distributed development team, such as open source projects, developers must use an asynchronous means of communication. Studies tell us that e-mails are, by far, the means of communication mostly used during the distributed development, opposed to instant messaging, commit comments, or code comments. Therefore, we can imagine archives containing development e-mails enclose essential information concerning various entities of the source code. Unfortunately, such information gets lost with time, since relevant e-mails are hard to retrieve. We have developed REmail, an Eclipse plug-in, to integrate e-mail communication in the IDE. It allows developers to seamlessly handle source code entities and e-mails concerning the source code, without ever exiting from the IDE. Using lightweight linking techniques, REmail retrieves all the e-mails relevant to the chosen source code entities and makes them available to the developer.

    On construction, performance, and diversification for structured queries on the semantic desktop

    Get PDF
    [no abstract

    ETEASH-An Enhanced Tiny Encryption Algorithm for Secured Smart Home

    Get PDF
    The proliferation of the "Internet of Things" (IoT) and its applications have affected every aspect of human endeavors from smart manufacturing, agriculture, healthcare, and transportation to homes. The smart home is vulnerable to malicious attacks due to memory constraint which inhibits the usage of traditional antimalware and antivirus software. This makes the application of traditional cryptography for its security impossible. This work aimed at securing Smart home devices, by developing an enhanced Tiny Encryption Algorithm (TEA). The enhancement on TEA was to get rid of its vulnerabilities of related-key attacks and weakness of predictable keys to be usable in securing smart devices through entropy shifting, stretching, and mixing technique. The Enhanced Tiny Encryption Algorithm for Smart Home devices (ETEASH) technique was benchmarked with the original TEA using the Runs test and avalanche effect. ETEASH successfully passed the Runs test with the significance level of 0.05 for the null hypothesis, and the ETEASH avalanche effect of 58.44% was achieved against 52.50% for TEA. These results showed that ETEASH is more secured in securing smart home devices than the standard TEA

    Discovering Loners and Phantoms in Commit and Issue Data

    Full text link
    The interlinking of commit and issue data has become a de-facto standard in software development. Modern issue tracking systems, such as JIRA, automatically interlink commits and issues by the extraction of identifiers (e.g., issue key) from commit messages. However, the conventions for the use of interlinking methodologies vary between software projects. For example, some projects enforce the use of identifiers for every commit while others have less restrictive conventions. In this work, we introduce a model called PaLiMod to enable the analysis of interlinking characteristics in commit and issue data. We surveyed 15 Apache projects to investigate differences and commonalities between linked and non-linked commits and issues. Based on the gathered information, we created a set of heuristics to interlink the residual of non-linked commits and issues. We present the characteristics of Loners and Phantoms in commit and issue data. The results of our evaluation indicate that the proposed PaLiMod model and heuristics enable an automatic interlinking and can indeed reduce the residual of non-linked commits and issues in software projects

    SoK:Prudent Evaluation Practices for Fuzzing

    Get PDF
    Fuzzing has proven to be a highly effective approach to uncover software bugs over the past decade. After AFL popularized the groundbreaking concept of lightweight coverage feedback, the field of fuzzing has seen a vast amount of scientific work proposing new techniques, improving methodological aspects of existing strategies, or porting existing methods to new domains. All such work must demonstrate its merit by showing its applicability to a problem, measuring its performance, and often showing its superiority over existing works in a thorough, empirical evaluation. Yet, fuzzing is highly sensitive to its target, environment, and circumstances, e.g., randomness in the testing process. After all, relying on randomness is one of the core principles of fuzzing, governing many aspects of a fuzzer's behavior. Combined with the often highly difficult to control environment, the reproducibility of experiments is a crucial concern and requires a prudent evaluation setup. To address these threats to validity, several works, most notably Evaluating Fuzz Testing by Klees et al., have outlined how a carefully designed evaluation setup should be implemented, but it remains unknown to what extent their recommendations have been adopted in practice. In this work, we systematically analyze the evaluation of 150 fuzzing papers published at the top venues between 2018 and 2023. We study how existing guidelines are implemented and observe potential shortcomings and pitfalls. We find a surprising disregard of the existing guidelines regarding statistical tests and systematic errors in fuzzing evaluations. For example, when investigating reported bugs, we find that the search for vulnerabilities in real-world software leads to authors requesting and receiving CVEs of questionable quality. Extending our literature analysis to the practical domain, we attempt to reproduce claims of eight fuzzing papers. These case studies allow us to assess the practical reproducibility of fuzzing research and identify archetypal pitfalls in the evaluation design. Unfortunately, our reproduced results reveal several deficiencies in the studied papers, and we are unable to fully support and reproduce the respective claims. To help the field of fuzzing move toward a scientifically reproducible evaluation strategy, we propose updated guidelines for conducting a fuzzing evaluation that future work should follow

    Unsupervised Green Object Tracker (GOT) without Offline Pre-training

    Full text link
    Supervised trackers trained on labeled data dominate the single object tracking field for superior tracking accuracy. The labeling cost and the huge computational complexity hinder their applications on edge devices. Unsupervised learning methods have also been investigated to reduce the labeling cost but their complexity remains high. Aiming at lightweight high-performance tracking, feasibility without offline pre-training, and algorithmic transparency, we propose a new single object tracking method, called the green object tracker (GOT), in this work. GOT conducts an ensemble of three prediction branches for robust box tracking: 1) a global object-based correlator to predict the object location roughly, 2) a local patch-based correlator to build temporal correlations of small spatial units, and 3) a superpixel-based segmentator to exploit the spatial information of the target frame. GOT offers competitive tracking accuracy with state-of-the-art unsupervised trackers, which demand heavy offline pre-training, at a lower computation cost. GOT has a tiny model size (<3k parameters) and low inference complexity (around 58M FLOPs per frame). Since its inference complexity is between 0.1%-10% of DL trackers, it can be easily deployed on mobile and edge devices

    Improving Software Project Health Using Machine Learning

    Get PDF
    In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools

    BIOLOGICAL INSPIRED INTRUSION PREVENTION AND SELF-HEALING SYSTEM FOR CRITICAL SERVICES NETWORK

    Get PDF
    With the explosive development of the critical services network systems and Internet, the need for networks security systems have become even critical with the enlargement of information technology in everyday life. Intrusion Prevention System (IPS) provides an in-line mechanism focus on identifying and blocking malicious network activity in real time. This thesis presents new intrusion prevention and self-healing system (SH) for critical services network security. The design features of the proposed system are inspired by the human immune system, integrated with pattern recognition nonlinear classification algorithm and machine learning. Firstly, the current intrusions preventions systems, biological innate and adaptive immune systems, autonomic computing and self-healing mechanisms are studied and analyzed. The importance of intrusion prevention system recommends that artificial immune systems (AIS) should incorporate abstraction models from innate, adaptive immune system, pattern recognition, machine learning and self-healing mechanisms to present autonomous IPS system with fast and high accurate detection and prevention performance and survivability for critical services network system. Secondly, specification language, system design, mathematical and computational models for IPS and SH system are established, which are based upon nonlinear classification, prevention predictability trust, analysis, self-adaptation and self-healing algorithms. Finally, the validation of the system carried out by simulation tests, measuring, benchmarking and comparative studies. New benchmarking metrics for detection capabilities, prevention predictability trust and self-healing reliability are introduced as contributions for the IPS and SH system measuring and validation. Using the software system, design theories, AIS features, new nonlinear classification algorithm, and self-healing system show how the use of presented systems can ensure safety for critical services networks and heal the damage caused by intrusion. This autonomous system improves the performance of the current intrusion prevention system and carries on system continuity by using self-healing mechanism
    corecore