Search CORE

524 research outputs found

SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns

Author: Arulraj Joy
Dintyala Visweswara Sai Prashanth
Narechania Arpit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/04/2020
Field of study

The emergence of database-as-a-service platforms has made deploying database applications easier than before. Now, developers can quickly create scalable applications. However, designing performant, maintainable, and accurate applications is challenging. Developers may unknowingly introduce anti-patterns in the application's SQL statements. These anti-patterns are design decisions that are intended to solve a problem, but often lead to other problems by violating fundamental design principles. In this paper, we present SQLCheck, a holistic toolchain for automatically finding and fixing anti-patterns in database applications. We introduce techniques for automatically (1) detecting anti-patterns with high precision and recall, (2) ranking the anti-patterns based on their impact on performance, maintainability, and accuracy of applications, and (3) suggesting alternative queries and changes to the database design to fix these anti-patterns. We demonstrate the prevalence of these anti-patterns in a large collection of queries and databases collected from open-source repositories. We introduce an anti-pattern detection algorithm that augments query analysis with data analysis. We present a ranking model for characterizing the impact of frequently occurring anti-patterns. We discuss how SQLCheck suggests fixes for high-impact anti-patterns using rule-based query refactoring techniques. Our experiments demonstrate that SQLCheck enables developers to create more performant, maintainable, and accurate applications.Comment: 18 pages (14 page paper, 1 page references, 2 page Appendix), 12 figures, Conference: SIGMOD'2

arXiv.org e-Print Archive

Crossref

Active caching for recommender systems

Author: Qasim Muhammad Umar
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2011
Field of study

Web users are often overwhelmed by the amount of information available while carrying out browsing and searching tasks. Recommender systems substantially reduce the information overload by suggesting a list of similar documents that users might find interesting. However, generating these ranked lists requires an enormous amount of resources that often results in access latency. Caching frequently accessed data has been a useful technique for reducing stress on limited resources and improving response time. Traditional passive caching techniques, where the focus is on answering queries based on temporal locality or popularity, achieve a very limited performance gain. In this dissertation, we are proposing an ‘active caching’ technique for recommender systems as an extension of the caching model. In this approach estimation is used to generate an answer for queries whose results are not explicitly cached, where the estimation makes use of the partial order lists cached for related queries. By answering non-cached queries along with cached queries, the active caching system acts as a form of query processor and offers substantial improvement over traditional caching methodologies. Test results for several data sets and recommendation techniques show substantial improvement in the cache hit rate, byte hit rate and CPU costs, while achieving reasonable recall rates. To ameliorate the performance of proposed active caching solution, a shared neighbor similarity measure is introduced which improves the recall rates by eliminating the dependence on monotinicity in the partial order lists. Finally, a greedy balancing cache selection policy is also proposed to select most appropriate data objects for the cache that help to improve the cache hit rate and recall further

Digital Commons @ New Jersey Institute of Technology (NJIT)

Controlled experiments on the web: survey and practical guide

Author: C Hopkins
Dan Sommerfield
DD Boos
H Manning
M Burns
OL Davies
Randal M. Henne
RL Plackett
Roger Longbotham
Ron Kohavi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

On-the-fly recommendation retrieval from linked open data repositories

Author: Wenige Lisa
Publication venue
Publication date: 01/01/2018
Field of study

Some recommender systems (RS) utilize Linked Open Data (LOD) to enhance the item descriptions in the local database. However, these systems do not yet take full advantage of the potential of RDF data for personalized retrieval. The work describes the strengths of LOD repositories as well as the challenges of RDF processing for recommendation tasks. Against the background of these characteristics, a recommendation engine, called SKOSRecommender (SKOSRec), was developed. The system utilizes SKOS annotations to determine similar items and provides a graph-based query language for on-the-fly retrieval from SPARQL endpoints. This enables novel retrieval approaches. For instance, the SKOSRec language facilitates the representation of individual user preferences as query-based statements. Hence, it is possible to generate a user profile with the help of a SPARQL-like request (preference querying). Additionally, the language enables subquerying with recommendation results and the usage of graph-based query patterns to formulate powerful filter conditions for result lists (expressive constraint-based queries). Besides, the language allows flexible combinations of graph- and search-based query patterns (i.e., advanced recommendation requests). Examples of such requests are rollup retrieval patterns or cross-domain queries. The novel approaches were evaluated in a series of offline and online experiments in different domains (travel RS, multimedia RS and scientific publication retrieval). The results show that most of the developed methods improve the quality of existing recommendation methods. Effects predominantly occurred in the performance dimensions of recall, novelty, and diversity. The positive evaluation results demonstrate the effectiveness of the new methods. Thus, the work can contribute to the advancement of personalized search techniques, which can be applied for semantic retrieval in LOD repositories as well as for typical recommendation tasks

Digitale Bibliothek Thüringen

Recommended from our members

Understanding Flaws in the Deployment and Implementation of Web Encryption

Author: Sivakorn Suphannee
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

In recent years, the web has switched from using the unencrypted HTTP protocol to using encrypted communications. Primarily, this resulted in increasing deployment of TLS to mitigate information leakage over the network. This development has led many web service operators to mistakenly think that migrating from HTTP to HTTPS will magically protect them from information leakage without any additional effort on their end to guar- antee the desired security properties. In reality, despite the fact that there exists enough infrastructure in place and the protocols have been “tested” (by virtue of being in wide, but not ubiquitous, use for many years), deploying HTTPS is a highly challenging task due to the technical complexity of its underlying protocols (i.e., HTTP, TLS) as well as the complexity of the TLS certificate ecosystem and this of popular client applications such as web browsers. For example, we found that many websites still avoid ubiquitous encryption and force only critical functionality and sensitive data access over encrypted connections while allowing more innocuous functionality to be accessed over HTTP. In practice, this approach is prone to flaws that can expose sensitive information or functionality to third parties. Thus, it is crucial for developers to verify the correctness of their deployments and implementations. In this dissertation, in an effort to improve users’ privacy, we highlight semantic flaws in the implementations of both web servers and clients, caused by the improper deployment of web encryption protocols. First, we conduct an in-depth assessment of major websites and explore what functionality and information is exposed to attackers that have hijacked a user’s HTTP cookies. We identify a recurring pattern across websites with partially de- ployed HTTPS, namely, that service personalization inadvertently results in the exposure of private information. The separation of functionality across multiple cookies with different scopes and inter-dependencies further complicates matters, as imprecise access control renders restricted account functionality accessible to non-secure cookies. Our cookie hijacking study reveals a number of severe flaws; for example, attackers can obtain the user’s saved address and visited websites from e.g., Google, Bing, and Yahoo allow attackers to extract the contact list and send emails from the user’s account. To estimate the extent of the threat, we run measurements on a university public wireless network for a period of 30 days and detect over 282K accounts exposing the cookies required for our hijacking attacks. Next, we explore and study security mechanisms purposed to eliminate this problem by enforcing encryption such as HSTS and HTTPS Everywhere. We evaluate each mechanism in terms of its adoption and effectiveness. We find that all mechanisms suffer from implementation flaws or deployment issues and argue that, as long as servers continue to not support ubiquitous encryption across their entire domain, no mechanism can effectively protect users from cookie hijacking and information leakage. Finally, as the security guarantees of TLS (in turn HTTPS), are critically dependent on the correct validation of X.509 server certificates, we study hostname verification, a critical component in the certificate validation process. We develop HVLearn, a novel testing framework to verify the correctness of hostname verification implementations and use HVLearn to analyze a number of popular TLS libraries and applications. To this end, we found 8 unique violations of the RFC specifications. Several of these violations are critical and can render the affected implementations vulnerable to man-in-the-middle attacks

Columbia University Academic Commons

Graph Processing in Main-Memory Column Stores

Author: Paradies Marcus
Publication venue
Publication date: 03/02/2017
Field of study

Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access. Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries. A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language. In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals. Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators. Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes

Technische Universität Dresden: Qucosa

Online Product Search with Focus on Customers\u27 Needs

Author: Custódio Igor
Siqueira Sean
Publication venue: AIS Electronic Library (AISeL)
Publication date: 11/08/2016
Field of study

The success of e-commerce depends on the Information Systems that support it. Currently, the most used approach in e-commerce systems is the faceted search, that requires the customer to be familiar with the technical specification, to find the products that best meet their needs. The aim of this research is to evaluate a novel proposal to improve the online product search. Our solution will automatically map products\u27 features with less technical criteria, which will replace the filters in a faceted search. To achieve this goal, we have adopted a multi-criteria analysis method to rank the result. The proposed solution was evaluated through an empirical experiment with some product categories, using as data set the reviews of experts retrieved from the web. Results showed a strong rank correlation between our solution and the expert reviews, proving its feasibility and effectiveness

AIS Electronic Library (AISeL)

Web page performance analysis

Author: Chiew Thiam Kian
Publication venue
Publication date: 01/01/2009
Field of study

Computer systems play an increasingly crucial and ubiquitous role in human endeavour by carrying out or facilitating tasks and providing information and services. How much work these systems can accomplish, within a certain amount of time, using a certain amount of resources, characterises the systems’ performance, which is a major concern when the systems are planned, designed, implemented, deployed, and evolve. As one of the most popular computer systems, the Web is inevitably scrutinised in terms of performance analysis that deals with its speed, capacity, resource utilisation, and availability. Performance analyses for the Web are normally done from the perspective of the Web servers and the underlying network (the Internet). This research, on the other hand, approaches Web performance analysis from the perspective of Web pages. The performance metric of interest here is response time. Response time is studied as an attribute of Web pages, instead of being considered purely a result of network and server conditions. A framework that consists of measurement, modelling, and monitoring (3Ms) of Web pages that revolves around response time is adopted to support the performance analysis activity. The measurement module enables Web page response time to be measured and is used to support the modelling module, which in turn provides references for the monitoring module. The monitoring module estimates response time. The three modules are used in the software development lifecycle to ensure that developed Web pages deliver at worst satisfactory response time (within a maximum acceptable time), or preferably much better response time, thereby maximising the efficiency of the pages. The framework proposes a systematic way to understand response time as it is related to specific characteristics of Web pages and explains how individual Web page response time can be examined and improved

Glasgow Theses Service

OpenGrey Repository