Search CORE

735 research outputs found

The Role of Quasi-identifiers in k-Anonymity Revisited

Author: Bettini Claudio
Jajodia Sushil
Wang X. Sean
Publication venue
Publication date: 01/01/2006
Field of study

The concept of k-anonymity, used in the recent literature to formally evaluate the privacy preservation of published tables, was introduced based on the notion of quasi-identifiers (or QI for short). The process of obtaining k-anonymity for a given private table is first to recognize the QIs in the table, and then to anonymize the QI values, the latter being called k-anonymization. While k-anonymization is usually rigorously validated by the authors, the definition of QI remains mostly informal, and different authors seem to have different interpretations of the concept of QI. The purpose of this paper is to provide a formal underpinning of QI and examine the correctness and incorrectness of various interpretations of QI in our formal framework. We observe that in cases where the concept has been used correctly, its application has been conservative; this note provides a formal understanding of the conservative nature in such cases.Comment: 17 pages. Submitted for publicatio

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Evaluation of Range Queries with Predicates on Moving Objects

Author: Mitzi Mccarthy
X Sean Wang
Zhen He
Publication venue
Publication date: 02/04/2020
Field of study

Abstract-A well-studied query type on moving objects is the continuous range query. An interesting and practical situation is that instead of being continuously evaluated, the query may be evaluated at different degrees of continuity, e.g. every 2 seconds (close to continuous), every 10 minutes or at irregular time intervals (close to snapshot). Furthermore, the range query may be stacked under predicates applied to the returned objects. An example is the count predicate that requires the number of objects in the range to be at least γ. The conjecture is that these two practical considerations can help reduce communication costs. We propose a safe region-based solution that exploits these two practical considerations. An extensive experimental study shows that our solution can reduce communication costs by a factor of 9.5 compared to an existing state-of-the-art system

CiteSeerX

Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation

Author: Fan Yuankai
He Zhenying
Huang Can
Jing Yinan
Ren Tonghui
Wang X. Sean
Zhang Kai
Publication venue
Publication date: 26/02/2024
Field of study

The Natural Language Interface to Databases (NLIDB) empowers non-technical users with database access through intuitive natural language (NL) interactions. Advanced approaches, utilizing neural sequence-to-sequence models or large-scale language models, typically employ auto-regressive decoding to generate unique SQL queries sequentially. While these translation models have greatly improved the overall translation accuracy, surpassing 70% on NLIDB benchmarks, the use of auto-regressive decoding to generate single SQL queries may result in sub-optimal outputs, potentially leading to erroneous translations. In this paper, we propose Metasql, a unified generate-then-rank framework that can be flexibly incorporated with existing NLIDBs to consistently improve their translation accuracy. Metasql introduces query metadata to control the generation of better SQL query candidates and uses learning-to-rank algorithms to retrieve globally optimized queries. Specifically, Metasql first breaks down the meaning of the given NL query into a set of possible query metadata, representing the basic concepts of the semantics. These metadata are then used as language constraints to steer the underlying translation model toward generating a set of candidate SQL queries. Finally, Metasql ranks the candidates to identify the best matching one for the given NL query. Extensive experiments are performed to study Metasql on two public NLIDB benchmarks. The results show that the performance of the translation models can be effectively improved using Metasql

arXiv.org e-Print Archive

Nature-inspired three-dimensional surface serration topologies enable silent flight by suppressing airfoil-turbulence interaction noise

Author: Chennuri Naga
Demir Kahraman
Farris Sean
Gu Grace X.
Horii Maya
Shinsato Stara
Wang Ningping
Wang Stanley
Wei Zixiao
Publication venue
Publication date: 17/08/2023
Field of study

As natural predators, owls fly with astonishing stealth due to the sophisticated serrated surface morphology of their feathers that produces advantageous flow characteristics and favorable boundary layer structures. Traditionally, these serrations are tailored for airfoil edges with simple two-dimensional patterns, limiting their effect on overall noise reduction while negotiating tradeoffs in aerodynamic performance. Here, we formulate new design strategies that can mitigate tradeoffs between noise reduction and aerodynamic performance by merging owl feather and cicada insect wing geometries to create a three-dimensional topology that features silent and efficient flight. Aeroacoustics and aerodynamics experimental results show that the application of our hybrid topology yields a reduction in overall sound pressure levels by up to 9.93% and an increase in propulsive efficiency by over 48.14% compared to benchmark designs. Computational fluid dynamics simulations reveal that the three-dimensional, owl-inspired surface serrations can enhance surface vorticity. The produced coherent vortex structures serve to suppress the source strength of dipole and quadrupole pressure sources at various Reynolds numbers, resulting in a universal noise reduction effect. Our work demonstrates how a bioinspired three-dimensional serration topology refines the turbulence-airfoil interaction mode and improves multiple functionalities of an aerodynamic surface to enable quieter and more fuel-efficient, aerial vehicles.Comment: 33 page

arXiv.org e-Print Archive

Vetting undesirable behaviors in android apps with permission use analysis

Author: Bingquan Xu
Guofei Gu
Min Yang
Peng Ning
Wang Binyu Zang
X. Sean
Yuan Zhang
Zhemin Yang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Android platform adopts permissions to protect sensitive resources from untrusted apps. However, after permissions are granted by users at install time, apps could use these permissions (sensitive resources) with no further restrictions. Thus, recent years have witnessed the explosion of undesirable behaviors in Android apps. An important part in the defense is the accurate analysis of Android apps. However, traditional syscall-based analysis techniques are not well-suited for Android, because they could not capture critical interactions between the application and the Android system. This paper presents VetDroid, a dynamic analysis platform for reconstructing sensitive behaviors in Android apps from a novel permission use perspective. VetDroid features a systematic frame-work to effectively construct permission use behaviors, i.e., how applications use permissions to access (sensitive) system resources, and how these acquired permission-sensitive resources are further utilized by the application. With permission use behaviors, security analysts can easily examine the internal sensitive behaviors of an app. Using real-world Android malware, we show that VetDroid can clearly reconstruct fine-grained malicious behaviors to ease malware analysis. We further apply VetDroid to 1,249 top free apps in Google Play. VetDroid can assist in finding more information leaks than TaintDroid [24], a state-of-the-art technique. In addition, we show howwe can use VetDroid to analyze fine-grained causes of information leaks that TaintDroid cannot reveal. Finally, we show that VetDroid can help identify subtle vulnerabilities in some (top free) applications otherwise hard to detect

CiteSeerX

Crossref

Generating Market Basket Data with Temporal Information

Author: JAJODIA Sushil
LI Yingjiu
NING Peng
WANG X. Sean
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2001
Field of study

This paper presents a synthetic data generator that outputs timestamped transactional data with embedded temporal patterns controlled by a set of input parameters. In particular, calendar schema, which is determined by a hierarchy of input time granularities, is used as a framework of possible temporal patterns. An example of calendar schema is (year, month, day), which provides a framework for calendar-based temporal patterns of the form -38352 , where each is either an integer or the symbol . For example, is such a pattern, which corresponds to the time intervals consisting of all the 16th days of all months in year 2000. This paper also evaluates the data generator through a series of experiments. The synthetic data generator is intended to provide support for data mining community in evaluating various aspects (especially the temporal aspects and the scalability) of data mining algorithms

CiteSeerX

Institutional Knowledge at Singapore Management University

Enhancing Profiles for Anomaly Detection Using Time Granularities

Author: JAJODIA Sushil
LI Yingjiu
WANG X. Sean
WU Ningning
Publication venue: 'IOS Press'
Publication date: 01/01/2002
Field of study

Crossref

Institutional Knowledge at Singapore Management University

PURPLE: Making a Large Language Model a Better SQL Writer

Author: Dai Jiaqi
Fan Yuankai
He Zhenying
Huang Can
Huang Ren
Jing Yinan
Ren Tonghui
Wang X. Sean
Yang Yifan
Zhang Kai
Publication venue
Publication date: 29/03/2024
Field of study

Large Language Model (LLM) techniques play an increasingly important role in Natural Language to SQL (NL2SQL) translation. LLMs trained by extensive corpora have strong natural language understanding and basic SQL generation abilities without additional tuning specific to NL2SQL tasks. Existing LLMs-based NL2SQL approaches try to improve the translation by enhancing the LLMs with an emphasis on user intention understanding. However, LLMs sometimes fail to generate appropriate SQL due to their lack of knowledge in organizing complex logical operator composition. A promising method is to input the LLMs with demonstrations, which include known NL2SQL translations from various databases. LLMs can learn to organize operator compositions from the input demonstrations for the given task. In this paper, we propose PURPLE (Pre-trained models Utilized to Retrieve Prompts for Logical Enhancement), which improves accuracy by retrieving demonstrations containing the requisite logical operator composition for the NL2SQL task on hand, thereby guiding LLMs to produce better SQL translation. PURPLE achieves a new state-of-the-art performance of 80.5% exact-set match accuracy and 87.8% execution match accuracy on the validation set of the popular NL2SQL benchmark Spider. PURPLE maintains high accuracy across diverse benchmarks, budgetary constraints, and various LLMs, showing robustness and cost-effectiveness.Comment: 12 pages, accepted by ICDE 2024 (40th IEEE International Conference on Data Engineering

arXiv.org e-Print Archive