Search CORE

11,185 research outputs found

Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded Memory

Author: Hong Yuan
Li Xiaochen
Liu Weiran
Lou Jian
Qin Zhan
Ren Kui
Zhang Lei
Publication venue
Publication date: 27/11/2023
Field of study

Top-

k

frequent items detection is a fundamental task in data stream mining. Many promising solutions are proposed to improve memory efficiency while still maintaining high accuracy for detecting the Top-

k

items. Despite the memory efficiency concern, the users could suffer from privacy loss if participating in the task without proper protection, since their contributed local data streams may continually leak sensitive individual information. However, most existing works solely focus on addressing either the memory-efficiency problem or the privacy concerns but seldom jointly, which cannot achieve a satisfactory tradeoff between memory efficiency, privacy protection, and detection accuracy. In this paper, we present a novel framework HG-LDP to achieve accurate Top-

k

item detection at bounded memory expense, while providing rigorous local differential privacy (LDP) protection. Specifically, we identify two key challenges naturally arising in the task, which reveal that directly applying existing LDP techniques will lead to an inferior ``accuracy-privacy-memory efficiency'' tradeoff. Therefore, we instantiate three advanced schemes under the framework by designing novel LDP randomization methods, which address the hurdles caused by the large size of the item domain and by the limited space of the memory. We conduct comprehensive experiments on both synthetic and real-world datasets to show that the proposed advanced schemes achieve a superior ``accuracy-privacy-memory efficiency'' tradeoff, saving

2300\times

memory over baseline methods when the item domain size is

41,270

. Our code is open-sourced via the link

arXiv.org e-Print Archive

Time-aware topic recommendation based on micro-blogs

Author: Christen Peter
Liang Huizhi
Tjondronegoro Dian
Xu Yue
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Topic recommendation can help users deal with the information overload issue in micro-blogging communities. This paper proposes to use the implicit information network formed by the multiple relationships among users, topics and micro-blogs, and the temporal information of micro-blogs to find semantically and temporally relevant topics of each topic, and to profile users' time-drifting topic interests. The Content based, Nearest Neighborhood based and Matrix Factorization models are used to make personalized recommendations. The effectiveness of the proposed approaches is demonstrated in the experiments conducted on a real world dataset that collected from Twitter.com

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

The Australian National University

‘Where else is the money? A study of innovation in online business models at newspapers in Britain’s 66 cities’

Author: Nel Francois
Publication venue: 'Informa UK Limited'
Publication date: 01/07/2010
Field of study

Much like their counterparts in the United States and elsewhere, British newspaper publishers have seen a sharp decline in revenues from traditional sources—print advertising and copy sales—and many are intensifying efforts to generate new income by expanding their online offerings. A study of the largest circulation newspapers in the 66 cities in England, Scotland, Wales and Northern Ireland showed that while only a small minority did not have companion websites, many of the publishers who do have an online presence have transferred familiar revenue models. It has also been recognised that income from these sources is not enough to sustain current operations and innovative publishers have diversified into additional broad categories of Web business models. Significantly, this study did not only compare the approaches of various news publishers with each other, but it also considered how active newspaper publishers were in taking advantage of the variety of business models generally being employed on the Web—and which opportunities were ignored

CLoK

CamFlow: Managed Data-sharing for Cloud Services

Author: Bacon Jean
Eyers David
Pasquier Thomas F. J. -M.
Singh Jatinder
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2015
Field of study

A model of cloud services is emerging whereby a few trusted providers manage the underlying hardware and communications whereas many companies build on this infrastructure to offer higher level, cloud-hosted PaaS services and/or SaaS applications. From the start, strong isolation between cloud tenants was seen to be of paramount importance, provided first by virtual machines (VM) and later by containers, which share the operating system (OS) kernel. Increasingly it is the case that applications also require facilities to effect isolation and protection of data managed by those applications. They also require flexible data sharing with other applications, often across the traditional cloud-isolation boundaries; for example, when government provides many related services for its citizens on a common platform. Similar considerations apply to the end-users of applications. But in particular, the incorporation of cloud services within `Internet of Things' architectures is driving the requirements for both protection and cross-application data sharing. These concerns relate to the management of data. Traditional access control is application and principal/role specific, applied at policy enforcement points, after which there is no subsequent control over where data flows; a crucial issue once data has left its owner's control by cloud-hosted applications and within cloud-services. Information Flow Control (IFC), in addition, offers system-wide, end-to-end, flow control based on the properties of the data. We discuss the potential of cloud-deployed IFC for enforcing owners' dataflow policy with regard to protection and sharing, as well as safeguarding against malicious or buggy software. In addition, the audit log associated with IFC provides transparency, giving configurable system-wide visibility over data flows. [...]Comment: 14 pages, 8 figure

arXiv.org e-Print Archive

Apollo (Cambridge)

Explore Bristol Research

DUET: A Generic Framework for Finding Special Quadratic Elements in Data Streams

Author: Basat Ran Ben
Chen Guihai
Dai Haipeng
Li Meng
Li Rui
Liu Jiaqian
Xia Rui
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/04/2022
Field of study

Finding special items, like heavy hitters, top-k, and persistent items, has always been a hot issue in data stream processing for web analysis. While data streams nowadays are usually high-dimensional, most prior works focus on special items according to a certain primary dimension and yield little insight into the correlations between dimensions. Therefore, we propose to find special quadratic elements to reveal close correlations. Based on the items mentioned above, we extend our problem to three applications related to heavy hitters, top-k, and persistent items, and design a generic framework DUET to process them. Besides, we analyze the error bound of our algorithm and conduct extensive experiments on four data sets. Our experimental results show that DUET can achieve 3.5 times higher throughput and three orders of magnitude lower average relative error compared with cutting-edge algorithms

UCL Discovery

On Frequency Estimation and Detection of Heavy Hitters in Data Streams

Author: Federica Ventruto
Italo Epicoco
Marco Pulimeno
Massimo Cafaro
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

A stream can be thought of as a very large set of data, sometimes even infinite, which arrives sequentially and must be processed without the possibility of being stored. In fact, the memory available to the algorithm is limited and it is not possible to store the whole stream of data which is instead scanned upon arrival and summarized through a succinct data structure in order to maintain only the information of interest. Two of the main tasks related to data stream processing are frequency estimation and heavy hitter detection. The frequency estimation problem requires estimating the frequency of each item, that is the number of times or the weight with which each appears in the stream, while heavy hitter detection means the detection of all those items with a frequency higher than a fixed threshold. In this work we design and analyze ACMSS, an algorithm for frequency estimation and heavy hitter detection, and compare it against the state of the art ASKETCH algorithm. We show that, given the same budgeted amount of memory, for the task of frequency estimation our algorithm outperforms ASKETCH with regard to accuracy. Furthermore, we show that, under the assumptions stated by its authors, ASKETCH may not be able to report all of the heavy hitters whilst ACMSS will provide with high probability the full list of heavy hitters

Multidisciplinary Digital Publishing Institute

Archivio Istituzionale della Ricerca- Università del Salento