87 research outputs found

    Taint and Information Flow Analysis Using Sweet.js Macros

    Get PDF
    JavaScript has been the primary language for application development in browsers and with the advent of JIT compilers, it is increasingly becoming popular on server side development as well. However, JavaScript suffers from vulnerabilities like cross site scripting and malicious advertisement code on the the client side and on the server side from SQL injection. In this paper, we present a dynamic approach to efficiently track information flow and taint detection to aid in mitigation and prevention of such attacks using JavaScript based hygienic macros. We use Sweet.js and object proxies to override built-in JavaScript operators to track information flow and detect tainted values. We also demonstrate taint detection and information flow analysis using our technique in a REST service running on Node.js. We finally present cross browser compatibility and performance metrics of our solution using the popular SunSpider benchmark on Safari, Chrome and Firefox and suggest some performance improvement techniques

    Cost-Based Optimization of Integration Flows

    Get PDF
    Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, and high requirements for data consistency and up-to-dateness of query results, many instances of integration flows are executed over time. Due to this high load and blocking synchronous source systems, the performance of the central integration platform is crucial for an IT infrastructure. To tackle these high performance requirements, we introduce the concept of cost-based optimization of imperative integration flows that relies on incremental statistics maintenance and inter-instance plan re-optimization. As a foundation, we introduce the concept of periodical re-optimization including novel cost-based optimization techniques that are tailor-made for integration flows. Furthermore, we refine the periodical re-optimization to on-demand re-optimization in order to overcome the problems of many unnecessary re-optimization steps and adaptation delays, where we miss optimization opportunities. This approach ensures low optimization overhead and fast workload adaptation

    From Business Understanding to Deployment: An application of Machine Learning Algorithms to Forecast Customer Visits per Hour to a Fast-Casual Restaurant in Dublin

    Get PDF
    This research project identifies the significant factors that affects the number of customer visits to a fast-casual restaurant every hour and proceeds to develop several machine learning models to forecast customer visits. The core value proposition of fast-casual restaurants is quality food delivered at speed which means they have to prepare meals in advance of customers visit but the problem with this approach is in forecasting future demand, under estimating demand could lead to inadequate meal preparation which would leave customers unsatisfied while over estimation of demand could lead to wastage especially with restaurants having to comply with food safety regulations whereby heated food not consumed within 90 minutes has to be discarded. Hourly forecasting of demand as opposed to monthly or even daily forecasting is important to help the manager of the fast-casual restaurant optimize resources and reduce wastage. Approaches to forecasting demand can be broadly categorized into qualitative and quantitative methods. Quantitative methods can be further divided into time series and regression-based methods. The regression-based approach which is used for this study enabled the researcher to gather data on several factors hypothesized to have an impact on the number of customer visits to the fast-casual restaurant every hour, carry out an experiment to test for the significance of these factors and to develop several predictive machine learning models capable of predicting the number of customer visits every hour. The results of the experiments carried out shows that hour, day, public holidays, temperature, humidity, rain and windspeed are significant factors in predicting the number of hourly customer visits. Multiple linear regression, regression tree, random forest and gradient boosting machine learning algorithms were also trained to predict the number of customer visits with the Gradient boosting algorithm achieving the lowest Mean Absolute Percentage Error(MAPE) of 18.82%

    An Automated Negotiation System for eCommerce Store Owners to Enable Flexible Product Pricing

    Get PDF
    If a store owner wishes to sell a product online, they traditionally have two options for deciding on a price. They can sell the product at a fixesd price like the products sold on sites like Amazon, or they can put the product in an auction and let demand from customers drive the final sales price like the products sold on sites like eBay. Both options have their pros and cons. An alternative option for deciding on a final sales price for the product is to enable negotiation on the product. With this, there is a dynamic nature to the price; each customer can negotiate with the store owner on the price which allows the final sales price to both change over time and on a customer by customer basis. The issue with enabling negotiation in the context of eCommerce is the time investment needed from the store owner. A store owner cannot negotiate every time an offer comes in from a potential customer, the potential time investment would not be acceptable. Using software agents to automate the process of negotiation for the seller is a potential solution to enabling negotiation in eCommerce for store owners. In this research, a system such as the one just described is developed in a way that mirrors real life negotiations more closely and after evaluation, is found to be a potential solution for the enabling of negotiation in eCommerce

    Architecture and implementation of online communities

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references.by Philip Greenspun.Ph.D

    Merging Queries in OLTP Workloads

    Get PDF
    OLTP applications are usually executed by a high number of clients in parallel and are typically faced with high throughput demand as well as a constraint latency requirement for individual statements. In enterprise scenarios, they often face the challenge to deal with overload spikes resulting from events such as Cyber Monday or Black Friday. The traditional solution to prevent running out of resources and thus coping with such spikes is to use a significant over-provisioning of the underlying infrastructure. In this thesis, we analyze real enterprise OLTP workloads with respect to statement types, complexity, and hot-spot statements. Interestingly, our findings reveal that workloads are often read-heavy and comprise similar query patterns, which provides a potential to share work of statements belonging to different transactions. In the past, resource sharing has been extensively studied for OLAP workloads. Naturally, the question arises, why studies mainly focus on OLAP and not on OLTP workloads? At first sight, OLTP queries often consist of simple calculations, such as index look-ups with little sharing potential. In consequence, such queries – due to their short execution time – may not have enough potential for the additional overhead. In addition, OLTP workloads do not only execute read operations but also updates. Therefore, sharing work needs to obey transactional semantics, such as the given isolation level and read-your-own-writes. This thesis presents THE LEVIATHAN, a novel batching scheme for OLTP workloads, an approach for merging read statements within interactively submitted multi-statement transactions consisting of reads and updates. Our main idea is to merge the execution of statements by merging their plans, thus being able to merge the execution of not only complex, but also simple calculations, such as the aforementioned index look-up. We identify mergeable statements by pattern matching of prepared statement plans, which comes with low overhead. For obeying the isolation level properties and providing read-your-own-writes, we first define a formal framework for merging transactions running under a given isolation level and provide insights into a prototypical implementation of merging within a commercial database system. Our experimental evaluation shows that, depending on the isolation level, the load in the system, and the read-share of the workload, an improvement of the transaction throughput by up to a factor of 2.5x is possible without compromising the transactional semantics. Another interesting effect we show is that with our strategy, we can increase the throughput of a real enterprise workload by 20%.:1 INTRODUCTION 1.1 Summary of Contributions 1.2 Outline 2 WORKLOAD ANALYSIS 2.1 Analyzing OLTP Benchmarks 2.1.1 YCSB 2.1.2 TATP 2.1.3 TPC Benchmark Scenarios 2.1.4 Summary 2.2 Analyzing OLTP Workloads from Open Source Projects 2.2.1 Characteristics of Workloads 2.2.2 Summary 2.3 Analyzing Enterprise OLTP Workloads 2.3.1 Overview of Reports about OLTP Workload Characteristics 2.3.2 Analysis of SAP Hybris Workload 2.3.3 Summary 2.4 Conclusion 3 RELATED WORK ON QUERY MERGING 3.1 Merging the Execution of Operators 3.2 Merging the Execution of Subplans 3.3 Merging the Results of Subplans 3.4 Merging the Execution of Full Plans 3.5 Miscellaneous Works on Merging 3.6 Discussion 4 MERGING STATEMENTS IN MULTI STATEMENT TRANSACTIONS 4.1 Overview of Our Approach 4.1.1 Examples 4.1.2 Why Naïve Merging Fails 4.2 THE LEVIATHAN Approach 4.3 Formalizing THE LEVIATHAN Approach 4.3.1 Transaction Theory 4.3.2 Merging Under MVCC 4.4 Merging Reads Under Different Isolation Levels 4.4.1 Read Uncommitted 4.4.2 Read Committed 4.4.3 Repeatable Read 4.4.4 Snapshot Isolation 4.4.5 Serializable 4.4.6 Discussion 4.5 Merging Writes Under Different Isolation Levels 4.5.1 Read Uncommitted 4.5.2 Read Committed 4.5.3 Snapshot Isolation 4.5.4 Serializable 4.5.5 Handling Dependencies 4.5.6 Discussion 5 SYSTEM MODEL 5.1 Definition of the Term “Overload” 5.2 Basic Queuing Model 5.2.1 Option (1): Replacement with a Merger Thread 5.2.2 Option (2): Adding Merger Thread 5.2.3 Using Multiple Merger Threads 5.2.4 Evaluation 5.3 Extended Queue Model 5.3.1 Option (1): Replacement with a Merger Thread 5.3.2 Option (2): Adding Merger Thread 5.3.3 Evaluation 6 IMPLEMENTATION 6.1 Background: SAP HANA 6.2 System Design 6.2.1 Read Committed 6.2.2 Snapshot Isolation 6.3 Merger Component 6.3.1 Overview 6.3.2 Dequeuing 6.3.3 Merging 6.3.4 Sending 6.3.5 Updating MTx State 6.4 Challenges in the Implementation of Merging Writes 6.4.1 SQL String Implementation 6.4.2 Update Count 6.4.3 Error Propagation 6.4.4 Abort and Rollback 7 EVALUATION 7.1 Benchmark Settings 7.2 System Settings 7.2.1 Experiment I: End-to-end Response Time Within a SAP Hybris System 7.2.2 Experiment II: Dequeuing Strategy 7.2.3 Experiment III: Merging Improvement on Different Statement, Transaction and Workload Types 7.2.4 Experiment IV: End-to-End Latency in YCSB 7.2.5 Experiment V: Breakdown of Execution in YCSB 7.2.6 Discussion of System Settings 7.3 Merging in Interactive Transactions 7.3.1 Experiment VI: Merging TATP in Read Uncommitted 7.3.2 Experiment VII: Merging TATP in Read Committed 7.3.3 Experiment VIII: Merging TATP in Snapshot Isolation 7.4 Merging Queries in Stored Procedures Experiment IX: Merging TATP Stored Procedures in Read Committed 7.5 Merging SAP Hybris 7.5.1 Experiment X: CPU-time Breakdown on HANA Components 7.5.2 Experiment XI: Merging Media Query in SAP Hybris 7.5.3 Discussion of our Results in Comparison with Related Work 8 CONCLUSION 8.1 Summary 8.2 Future Research Directions REFERENCES A UML CLASS DIAGRAM

    Pattern Masking for Dictionary Matching:Theory and Practice

    Get PDF
    Data masking is a common technique for sanitizing sensitive data maintained in database systems which is becoming increasingly important in various application areas, such as in record linkage of personal data. This work formalizes the Pattern Masking for Dictionary Matching (PMDM) problem: given a dictionary D of d strings, each of length ℓ, a query string q of length ℓ, and a positive integer z, we are asked to compute a smallest set K⊆{1, 
, ℓ}, so that if q[i] is replaced by a wildcard for all i∈K, then q matches at least z strings from D. Solving PMDM allows providing data utility guarantees as opposed to existing approaches. We first show, through a reduction from the well-known k-Clique problem, that a decision version of the PMDM problem is NP-complete, even for binary strings. We thus approach the problem from a more practical perspective. We show a combinatorial O((dℓ)|K|/3+dℓ)-time and O(dℓ)-space algorithm for PMDM for |K|=O(1). In fact, we show that we cannot hope for a faster combinatorial algorithm, unless the combinatorial k-Clique hypothesis fails (Abboud et al. in SIAM J Comput 47:2527–2555, 2018; Lincoln et al., in: 29th ACM-SIAM Symposium on Discrete Algorithms (SODA), 2018). Our combinatorial algorithm, executed with small |K|, is the backbone of a greedy heuristic that we propose. Our experiments on real-world and synthetic datasets show that our heuristic finds nearly-optimal solutions in practice and is also very efficient. We also generalize this algorithm for the problem of masking multiple query strings simultaneously so that every string has at least z matches in D. PMDM can be viewed as a generalization of the decision version of the dictionary matching with mismatches problem: by querying a PMDM data structure with string q and z=1, one obtains the minimal number of mismatches of q with any string from D. The query time or space of all known data structures for the more restricted problem of dictionary matching with at most k mismatches incurs some exponential factor with respect to k. A simple exact algorithm for PMDM runs in time O(2ℓd). We present a data structure for PMDM that answers queries over D in time O(2ℓ/2(2ℓ/2+τ)ℓ) and requires space O(2ℓd2/τ2+2ℓ/2d), for any parameter τ∈[1, d]. We complement our results by showing a two-way polynomial-time reduction between PMDM and the Minimum Union problem [Chlamtáč et al., ACM-SIAM Symposium on Discrete Algorithms (SODA) 2017]. This gives a polynomial-time O(d1/4+Ï”)-approximation algorithm for PMDM, which is tight under a plausible complexity conjecture. This is an extended version of a paper that was presented at International Symposium on Algorithms and Computation (ISAAC) 2021
    • 

    corecore