735 research outputs found
The Role of Quasi-identifiers in k-Anonymity Revisited
The concept of k-anonymity, used in the recent literature to formally
evaluate the privacy preservation of published tables, was introduced based on
the notion of quasi-identifiers (or QI for short). The process of obtaining
k-anonymity for a given private table is first to recognize the QIs in the
table, and then to anonymize the QI values, the latter being called
k-anonymization. While k-anonymization is usually rigorously validated by the
authors, the definition of QI remains mostly informal, and different authors
seem to have different interpretations of the concept of QI. The purpose of
this paper is to provide a formal underpinning of QI and examine the
correctness and incorrectness of various interpretations of QI in our formal
framework. We observe that in cases where the concept has been used correctly,
its application has been conservative; this note provides a formal
understanding of the conservative nature in such cases.Comment: 17 pages. Submitted for publicatio
Evaluation of Range Queries with Predicates on Moving Objects
Abstract-A well-studied query type on moving objects is the continuous range query. An interesting and practical situation is that instead of being continuously evaluated, the query may be evaluated at different degrees of continuity, e.g. every 2 seconds (close to continuous), every 10 minutes or at irregular time intervals (close to snapshot). Furthermore, the range query may be stacked under predicates applied to the returned objects. An example is the count predicate that requires the number of objects in the range to be at least γ. The conjecture is that these two practical considerations can help reduce communication costs. We propose a safe region-based solution that exploits these two practical considerations. An extensive experimental study shows that our solution can reduce communication costs by a factor of 9.5 compared to an existing state-of-the-art system
Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
The Natural Language Interface to Databases (NLIDB) empowers non-technical
users with database access through intuitive natural language (NL)
interactions. Advanced approaches, utilizing neural sequence-to-sequence models
or large-scale language models, typically employ auto-regressive decoding to
generate unique SQL queries sequentially. While these translation models have
greatly improved the overall translation accuracy, surpassing 70% on NLIDB
benchmarks, the use of auto-regressive decoding to generate single SQL queries
may result in sub-optimal outputs, potentially leading to erroneous
translations. In this paper, we propose Metasql, a unified generate-then-rank
framework that can be flexibly incorporated with existing NLIDBs to
consistently improve their translation accuracy. Metasql introduces query
metadata to control the generation of better SQL query candidates and uses
learning-to-rank algorithms to retrieve globally optimized queries.
Specifically, Metasql first breaks down the meaning of the given NL query into
a set of possible query metadata, representing the basic concepts of the
semantics. These metadata are then used as language constraints to steer the
underlying translation model toward generating a set of candidate SQL queries.
Finally, Metasql ranks the candidates to identify the best matching one for the
given NL query. Extensive experiments are performed to study Metasql on two
public NLIDB benchmarks. The results show that the performance of the
translation models can be effectively improved using Metasql
Nature-inspired three-dimensional surface serration topologies enable silent flight by suppressing airfoil-turbulence interaction noise
As natural predators, owls fly with astonishing stealth due to the
sophisticated serrated surface morphology of their feathers that produces
advantageous flow characteristics and favorable boundary layer structures.
Traditionally, these serrations are tailored for airfoil edges with simple
two-dimensional patterns, limiting their effect on overall noise reduction
while negotiating tradeoffs in aerodynamic performance. Here, we formulate new
design strategies that can mitigate tradeoffs between noise reduction and
aerodynamic performance by merging owl feather and cicada insect wing
geometries to create a three-dimensional topology that features silent and
efficient flight. Aeroacoustics and aerodynamics experimental results show that
the application of our hybrid topology yields a reduction in overall sound
pressure levels by up to 9.93% and an increase in propulsive efficiency by over
48.14% compared to benchmark designs. Computational fluid dynamics simulations
reveal that the three-dimensional, owl-inspired surface serrations can enhance
surface vorticity. The produced coherent vortex structures serve to suppress
the source strength of dipole and quadrupole pressure sources at various
Reynolds numbers, resulting in a universal noise reduction effect. Our work
demonstrates how a bioinspired three-dimensional serration topology refines the
turbulence-airfoil interaction mode and improves multiple functionalities of an
aerodynamic surface to enable quieter and more fuel-efficient, aerial vehicles.Comment: 33 page
Vetting undesirable behaviors in android apps with permission use analysis
Android platform adopts permissions to protect sensitive resources from untrusted apps. However, after permissions are granted by users at install time, apps could use these permissions (sensitive resources) with no further restrictions. Thus, recent years have witnessed the explosion of undesirable behaviors in Android apps. An important part in the defense is the accurate analysis of Android apps. However, traditional syscall-based analysis techniques are not well-suited for Android, because they could not capture critical interactions between the application and the Android system. This paper presents VetDroid, a dynamic analysis platform for reconstructing sensitive behaviors in Android apps from a novel permission use perspective. VetDroid features a systematic frame-work to effectively construct permission use behaviors, i.e., how applications use permissions to access (sensitive) system resources, and how these acquired permission-sensitive resources are further utilized by the application. With permission use behaviors, security analysts can easily examine the internal sensitive behaviors of an app. Using real-world Android malware, we show that VetDroid can clearly reconstruct fine-grained malicious behaviors to ease malware analysis. We further apply VetDroid to 1,249 top free apps in Google Play. VetDroid can assist in finding more information leaks than TaintDroid [24], a state-of-the-art technique. In addition, we show howwe can use VetDroid to analyze fine-grained causes of information leaks that TaintDroid cannot reveal. Finally, we show that VetDroid can help identify subtle vulnerabilities in some (top free) applications otherwise hard to detect
Generating Market Basket Data with Temporal Information
This paper presents a synthetic data generator that outputs timestamped transactional data with embedded temporal patterns controlled by a set of input parameters. In particular, calendar schema, which is determined by a hierarchy of input time granularities, is used as a framework of possible temporal patterns. An example of calendar schema is (year, month, day), which provides a framework for calendar-based temporal patterns of the form -38352 , where each is either an integer or the symbol . For example, is such a pattern, which corresponds to the time intervals consisting of all the 16th days of all months in year 2000. This paper also evaluates the data generator through a series of experiments. The synthetic data generator is intended to provide support for data mining community in evaluating various aspects (especially the temporal aspects and the scalability) of data mining algorithms
PURPLE: Making a Large Language Model a Better SQL Writer
Large Language Model (LLM) techniques play an increasingly important role in
Natural Language to SQL (NL2SQL) translation. LLMs trained by extensive corpora
have strong natural language understanding and basic SQL generation abilities
without additional tuning specific to NL2SQL tasks. Existing LLMs-based NL2SQL
approaches try to improve the translation by enhancing the LLMs with an
emphasis on user intention understanding. However, LLMs sometimes fail to
generate appropriate SQL due to their lack of knowledge in organizing complex
logical operator composition. A promising method is to input the LLMs with
demonstrations, which include known NL2SQL translations from various databases.
LLMs can learn to organize operator compositions from the input demonstrations
for the given task. In this paper, we propose PURPLE (Pre-trained models
Utilized to Retrieve Prompts for Logical Enhancement), which improves accuracy
by retrieving demonstrations containing the requisite logical operator
composition for the NL2SQL task on hand, thereby guiding LLMs to produce better
SQL translation. PURPLE achieves a new state-of-the-art performance of 80.5%
exact-set match accuracy and 87.8% execution match accuracy on the validation
set of the popular NL2SQL benchmark Spider. PURPLE maintains high accuracy
across diverse benchmarks, budgetary constraints, and various LLMs, showing
robustness and cost-effectiveness.Comment: 12 pages, accepted by ICDE 2024 (40th IEEE International Conference
on Data Engineering
- …