Search CORE

14 research outputs found

Nonadaptive Mastermind Algorithms for String and Vector Databases, with Case Studies

Author: Asuncion Arthur U.
Goodrich Michael T.
Publication venue
Publication date: 01/01/2010
Field of study

In this paper, we study sparsity-exploiting Mastermind algorithms for attacking the privacy of an entire database of character strings or vectors, such as DNA strings, movie ratings, or social network friendship data. Based on reductions to nonadaptive group testing, our methods are able to take advantage of minimal amounts of privacy leakage, such as contained in a single bit that indicates if two people in a medical database have any common genetic mutations, or if two people have any common friends in an online social network. We analyze our Mastermind attack algorithms using theoretical characterizations that provide sublinear bounds on the number of queries needed to clone the database, as well as experimental tests on genomic information, collaborative filtering data, and online social networks. By taking advantage of the generally sparse nature of these real-world databases and modulating a parameter that controls query sparsity, we demonstrate that relatively few nonadaptive queries are needed to recover a large majority of each database

arXiv.org e-Print Archive

CiteSeerX

Software Traceability with Topic Modeling

Author: Arthur U. Asuncion
Hazeline U. Asuncion
Richard N. Taylor
Publication venue
Publication date: 01/01/2010
Field of study

Software traceability is a fundamentally important task in software engineering. The need for automated traceability increases as projects become more complex and as the number of artifacts increases. We propose an automated technique that combines traceability with a machine learning technique known as topic modeling. Our approach automatically records traceability links during the software development process and learns a probabilistic topic model over artifacts. The learned model allows for the semantic categorization of artifacts and the topical visualization of the software system. To test our approach, we have implemented several tools: an artifact search tool combining keyword-based search and topic modeling, a recording tool that performs prospective traceability, and a visualization tool that allows one to navigate the software architecture and view semantic topics associated with relevant artifacts and architectural components. We apply our approach to several data sets and discuss how topic modeling enhances software traceability, and vice versa. Categories and Subject Descriptor

CiteSeerX

Crossref

A Dynamic Relational Infinite Feature Model for Longitudinal Social Networks

Author: Arthur U. Asuncion
Carter T. Butts
Christopher Dubois
James Foulds
Padhraic Smyth
Publication venue
Publication date: 01/01/2011
Field of study

Real-world relational data sets, such as social networks, often involve measurements over time. We propose a Bayesian nonparametric latent feature model for such data, where the latent features for each actor in the network evolve according to a Markov process, extending recent work on similar models for static networks. We show how the number of features and their trajectories for each actor can be inferred simultaneously and demonstrate the utility of this model on prediction tasks using synthetic and real-world data.

CiteSeerX