129 research outputs found

    SQL query log analysis for identifying user interests and query recommendations

    Get PDF
    In the sciences and elsewhere, the use of relational databases has become ubiquitous. To get maximum profit from a database, one should have in-depth knowledge in both SQL and a domain (data structure and meaning that a database contains). To assist inexperienced users in formulating their needs, SQL query recommendation system (SQL QRS) has been proposed. It utilizes the experience of previous users captured by SQL query log as well as the user query history to suggest. When constructing such a system, one should solve related problems: (1) clean the query log and (2) define appropriate query similarity functions. These two tasks are not only necessary for building SQL QRS, but they apply to other problems. In what follows, we describe three scenarios of SQL query log analysis: (1) cleaning an SQL query log, (2) SQL query log clustering when testing SQL query similarity functions and (3) recommending SQL queries. We also explain how these three branches are related to each other. Scenario 1. Cleaning SQL query log as a general pre-processing step The raw query log is often not suitable for query log analysis tasks such as clustering, giving recommendations. That is because it contains antipatterns and robotic data downloads, also known as Sliding Window Search (SWS). An antipattern in software engineering is a special case of a pattern. While a pattern is a standard solution, an antipattern is a pattern with a negative effect. When it comes to SQL query recommendation, leaving such artifacts in the log during analysis results in a wrong suggestion. Firstly, the behaviour of "mortal" users who need a recommendation is different from robots, which perform SWS. Secondly, one does not want to recommend antipatterns, so they need to be excluded from the query pool. Thirdly, the bigger a log is, the slower a recommendation engine operates. Thus, excluding SWS and antipatterns from the input data makes the recommendation better and faster. The effect of SWS and antipatterns on query log clustering depends on the chosen similarity function. The result can either (1) do not change or (2) add clusters which cover a big part of data. In any case, having antipatterns and SWS in an input log increases only the time one need to cluster and do not increase the quality of results. Scenario 2. Identifying User Interests via Clustering To identify the hot spots of user interests, one clusters SQL queries. In a scientific domain, it exposes research trends. In business, it points to popular data slices which one might want to refactor for better accessibility. A good clustering result must be precise (match ground truth) and interpretable. Query similarity relies on SQL query representation. There are three strategies to represent an SQL query. FB (feature-based) query representation sees a query as structure, not considering the data, a query accesses. WB (witness-based) approach treat a query as a set of tuples in the result set. AAB (access area-based) representation considers a query as an expression in relational algebra. While WB and FB query similarity functions are straightforward (Jaccard or cosine similarities), AAB query similarity requires additional definition. We proposed two variants of AAB similarity measure – overlap (AABovl) and closeness (AABcl). In AABovl, the similarity of two queries is the overlap of their access areas. AABcl relies on the distance between two access areas in the data space – two queries may be similar even if their access areas do not overlap. The extensive experiments consist of two parts. The first one is clustering a rather small dataset with ground truth. This experiment serves to study the precision of various similarity functions by comparing clustering results to supervised insights. The second experiment aims to investigate on the interpretability of clustering results with different similarity functions. It clusters a big real-world query log. The domain expert then evaluates the results. Both experiments show that AAB similarity functions produce better results in both precision and interpretability. Scenario 3. SQL Query Recommendation A sound SQL query recommendation system (1) provides a query which can be run directly, (2) supports comparison operators and various logical operators, (3) is scalable and has low response times, (4) provides recommendations of high quality. The existing approaches fail to fulfill all the requirements. We proposed DASQR, scalable and data-aware query recommendation to meet all four needs. In a nutshell, DASQR is a hybrid (collaborative filtering + content-based) approach. Its variations utilize all similarity functions, which we define or find in the related work. Measuring the quality of SQL query recommendation system (QRS) is particularly challenging since there is no standard way approaching it. Previous studies have evaluated the results using quality metrics which only rely on the query representations used in these studies. It is somewhat subjective since a similarity function and a quality metric are dependent. We propose AAB quality metrics and then evaluate each approach based on all the metrics. The experiments test DASQR approaches and competitors. Both performance and runtime experiments indicate that DASQR approaches outperform the existing ones

    Experiences on Managing Technical Debt with Code Smells and AntiPatterns

    Get PDF
    Technical debt has become a common metaphor for the accumulation of software design and implementation choices that seek fast initial gains but that are under par and counterproductive in the long run. However, as a metaphor, technical debt does not offer actionable advice on how to get rid of it. To get to a practical level in solving problems, more focused mechanisms are needed. Commonly used approaches for this include identifying code smells as quick indications of possible problems in the codebase and detecting the presence of AntiPatterns that refer to overt, recurring problems in design. There are known remedies for both code smells and AntiPatterns. In paper, our goal is to show how to effectively use common tools and the existing body of knowledge on code smells and AntiPatterns to detect technical debt and pay it back. We present two main results: (i) How a combination of static code analysis and manual inspection was used to detect code smells in a codebase leading to the discovery of AntiPatterns; and (ii) How AntiPatterns were used to identify, characterize, and fix problems in the software. The experiences stem from a private company and its long-lasting software product development effort.Peer reviewe

    Detailed Overview of Software Smells

    Get PDF
    This document provides an overview of literature concerning software smells covering various dimensions of smells along with their corresponding references

    Refactoring of Security Antipatterns in Distributed Java Components

    Get PDF
    The importance of JAVA as a programming and execution environment has grown steadily over the past decade. Furthermore, the IT industry has adapted JAVA as a major building block for the creation of new middleware as well as a technology facilitating the migration of existing applications towards web-driven environments. Parallel in time, the role of security in distributed environments has gained attention, as a large amount of middleware applications has replaced enterprise-level mainframe systems. The protection of confidentiality, integrity and availability are therefore critical for the market success of a product. The vulnerability level of every product is determined by the weakest embedded component, and selling vulnerable products can cause enormous economic damage to software vendors. An important goal of this work is to create the awareness that the usage of a programming language, which is designed as being secure, is not sufficient to create secure and trustworthy distributed applications. Moreover, the incorporation of the threat model of the programming language improves the risk analysis by allowing a better definition of the attack surface of the application. The evolution of a programming language leads towards common patterns for solutions for recurring quality aspects. Suboptimal solutions, also known as ´antipatterns´, are typical causes for quality weaknesses such as security vulnerabilities. Moreover, the exposure to a specific environment is an important parameter for threat analysis, as code considered secure in a specific scenario can cause unexpected risks when switching the environment. Antipatterns are a well-established means on the abstractional level of system modeling to inform about the effects of incomplete solutions, which are also important in the later stages of the software development process. Especially on the implementation level, we see a deficit of helpful examples, that would give programmers a better and holistic understanding. In our basic assumption, we link the missing experience of programmers regarding the security properties of patterns within their code to the creation of software vulnerabilities. Traditional software development models focus on security properties only on the meta layer. To transfer these efficiently to the practical level, we provide a three-stage approach: First, we focus on typical security problems within JAVA applications, and develop a standardized catalogue of ´antipatterns´ with examples from standard software products. Detecting and avoiding these antipatterns positively influences software quality. We therefore focus, as second element of our methodology, on possible enhancements to common models for the software development process. These help to control and identify the occurrence of antipatterns during development activities, i. e. during the coding phase and during the phase of component assembly, integrating one´s own and third party code. Within the third part, and emphasizing the practical focus of this research, we implement prototypical tools for support of the software development phase. The practical findings of this research helped to enhance the security of the standard JAVA platforms and JEE frameworks. We verified the relevance of our methods and tools by applying these to standard software products leading to a measurable reduction of vulnerabilities and an information exchange with middleware vendors (Sun Microsystems, JBoss) targeting runtime security. Our goal is to enable software architects and software developers developing end-user applications to apply our findings with embedded standard components on their environments. From a high-level perspective, software architects profit from this work through the projection of the quality-of-service goals to protection details. This supports their task of deriving security requirements when selecting standard components. In order to give implementation-near practitioners a helpful starting point to benefit from our research we provide tools and case-studies to achieve security improvements within their own code base.Die Bedeutung der Programmiersprache JAVA als Baustein für Softwareentwicklungs- und Produktionsinfrastrukturen ist im letzten Jahrzehnt stetig gestiegen. JAVA hat sich als bedeutender Baustein für die Programmierung von Middleware-Lösungen etabliert. Ebenfalls evident ist die Verwendung von JAVA-Technologien zur Migration von existierenden Arbeitsplatz-Anwendungen hin zu webbasierten Einsatzszenarien. Parallel zu dieser Entwicklung hat sich die Rolle der IT-Sicherheit nicht zuletzt aufgrund der Verdrängung von mainframe-basierten Systemen hin zu verteilten Umgebungen verstärkt. Der Schutz von Vertraulichkeit, Integrität und Verfügbarkeit ist seit einigen Jahren ein kritisches Alleinstellungsmerkmal für den Markterfolg von Produkten. Verwundbarkeiten in Produkten wirken mittlerweile indirekt über kundenseitigen Vertrauensverlust negativ auf den wirtschaftlichen Erfolg der Softwarehersteller, zumal der Sicherheitsgrad eines Systems durch die verwundbarste Komponente bestimmt wird. Ein zentrales Ziel dieser Arbeit ist die Erkenntnis zu vermitteln, dass die alleinige Nutzung einer als ´sicher´ eingestuften Programmiersprache nicht als alleinige Grundlage zur Erstellung von sicheren und vertrauenswürdigen Anwendungen ausreicht. Vielmehr führt die Einbeziehung des Bedrohungsmodells der Programmiersprache zu einer verbesserten Risikobetrachtung, da die Angriffsfläche einer Anwendung detaillierter beschreibbar wird. Die Entwicklung und fortschreitende Akzeptanz einer Programmiersprache führt zu einer Verbreitung von allgemein anerkannten Lösungsmustern zur Erfüllung wiederkehrender Qualitätsanforderungen. Im Bereich der Dienstqualitäten fördern ´Gegenmuster´, d.h. nichtoptimale Lösungen, die Entstehung von Strukturschwächen, welche in der Domäne der IT-Sicherheit ´Verwundbarkeiten´ genannt werden. Des Weiteren ist die Einsatzumgebung einer Anwendung eine wichtige Kenngröße, um eine Bedrohungsanalyse durchzuführen, denn je nach Beschaffenheit der Bedrohungen im Zielszenario kann eine bestimmte Benutzeraktion eine Bedrohung darstellen, aber auch einen erwarteten Anwendungsfall charakterisieren. Während auf der Modellierungsebene ein breites Angebot von Beispielen zur Umsetzung von Sicherheitsmustern besteht, fehlt es den Programmierern auf der Implementierungsebene häufig an ganzheitlichem Verständnis. Dieses kann durch Beispiele, welche die Auswirkungen der Verwendung von ´Gegenmustern´ illustrieren, vermittelt werden. Unsere Kernannahme besteht darin, dass fehlende Erfahrung der Programmierer bzgl. der Sicherheitsrelevanz bei der Wahl von Implementierungsmustern zur Entstehung von Verwundbarkeiten führt. Bei der Vermittlung herkömmlicher Software-Entwicklungsmodelle wird die Integration von praktischen Ansätzen zur Umsetzung von Sicherheitsanforderungen zumeist nur in Meta-Modellen adressiert. Zur Erweiterung des Wirkungsgrades auf die praktische Ebene wird ein dreistufiger Ansatz präsentiert. Im ersten Teil stellen wir typische Sicherheitsprobleme von JAVA-Anwendungen in den Mittelpunkt der Betrachtung, und entwickeln einen standardisierten Katalog dieser ´Gegenmuster´. Die Relevanz der einzelnen Muster wird durch die Untersuchung des Auftretens dieser in Standardprodukten verifiziert. Der zweite Untersuchungsbereich widmet sich der Integration von Vorgehensweisen zur Identifikation und Vermeidung der ´Sicherheits-Gegenmuster´ innerhalb des Software-Entwicklungsprozesses. Hierfür werden zum einen Ansätze für die Analyse und Verbesserung von Implementierungsergebnissen zur Verfügung gestellt. Zum anderen wird, induziert durch die verbreitete Nutzung von Fremdkomponenten, die arbeitsintensive Auslieferungsphase mit einem Ansatz zur Erstellung ganzheitlicher Sicherheitsrichtlinien versorgt. Da bei dieser Arbeit die praktische Verwendbarkeit der Ergebnisse eine zentrale Anforderung darstellt, wird diese durch prototypische Werkzeuge und nachvollziehbare Beispiele in einer dritten Perspektive unterstützt. Die Relevanz der Anwendung der entwickelten Methoden und Werkzeuge auf Standardprodukte zeigt sich durch die im Laufe der Forschungsarbeit entdeckten Sicherheitsdefizite. Die Rückmeldung bei führenden Middleware-Herstellern (Sun Microsystems, JBoss) hat durch gegenseitigen Erfahrungsaustausch im Laufe dieser Forschungsarbeit zu einer messbaren Verringerung der Verwundbarkeit ihrer Middleware-Produkte geführt. Neben den erreichten positiven Auswirkungen bei den Herstellern der Basiskomponenten sollen Erfahrungen auch an die Architekten und Entwickler von Endprodukten, welche Standardkomponenten direkt oder indirekt nutzen, weitergereicht werden. Um auch dem praktisch interessierten Leser einen möglichst einfachen Einstieg zu bieten, stehen die Werkzeuge mit Hilfe von Fallstudien in einem praktischen Gesamtzusammenhang. Die für das Tiefenverständnis notwendigen Theoriebestandteile bieten dem Software-Architekten die Möglichkeit sicherheitsrelevante Auswirkungen einer Komponentenauswahl frühzeitig zu erkennen und bei der Systemgestaltung zu nutzen

    Defining linguistic antipatterns towards the improvement of source code quality

    Get PDF
    Previous studies showed that linguistic aspect of source code is a valuable source of information that can help to improve program comprehension. The proposed research work focuses on supporting quality improvement of source code by identifying, specifying, and studying common negative practices (i.e., linguistic antipatterns) with respect to linguistic information. We expect the definition of linguistic antipatterns to increase the awareness of the existence of such bad practices and to discourage their use. We also propose to study the relation between negative practices in linguistic information (i.e., linguistic antipatterns) and negative practices in structural information (i.e., design antipatterns) with respect to comprehension effort and fault/change proneness. We discuss the proposed methodology and some preliminary results

    Software Evolution for Industrial Automation Systems. Literature Overview

    Get PDF
    corecore