4,280 research outputs found

    Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded Memory

    Full text link
    Top-kk frequent items detection is a fundamental task in data stream mining. Many promising solutions are proposed to improve memory efficiency while still maintaining high accuracy for detecting the Top-kk items. Despite the memory efficiency concern, the users could suffer from privacy loss if participating in the task without proper protection, since their contributed local data streams may continually leak sensitive individual information. However, most existing works solely focus on addressing either the memory-efficiency problem or the privacy concerns but seldom jointly, which cannot achieve a satisfactory tradeoff between memory efficiency, privacy protection, and detection accuracy. In this paper, we present a novel framework HG-LDP to achieve accurate Top-kk item detection at bounded memory expense, while providing rigorous local differential privacy (LDP) protection. Specifically, we identify two key challenges naturally arising in the task, which reveal that directly applying existing LDP techniques will lead to an inferior ``accuracy-privacy-memory efficiency'' tradeoff. Therefore, we instantiate three advanced schemes under the framework by designing novel LDP randomization methods, which address the hurdles caused by the large size of the item domain and by the limited space of the memory. We conduct comprehensive experiments on both synthetic and real-world datasets to show that the proposed advanced schemes achieve a superior ``accuracy-privacy-memory efficiency'' tradeoff, saving 2300×2300\times memory over baseline methods when the item domain size is 41,27041,270. Our code is open-sourced via the link

    When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Processing

    Full text link
    Carefully balancing load in distributed stream processing systems has a fundamental impact on execution latency and throughput. Load balancing is challenging because real-world workloads are skewed: some tuples in the stream are associated to keys which are significantly more frequent than others. Skew is remarkably more problematic in large deployments: more workers implies fewer keys per worker, so it becomes harder to "average out" the cost of hot keys with cold keys. We propose a novel load balancing technique that uses a heaving hitter algorithm to efficiently identify the hottest keys in the stream. These hot keys are assigned to d≥2d \geq 2 choices to ensure a balanced load, where dd is tuned automatically to minimize the memory and computation cost of operator replication. The technique works online and does not require the use of routing tables. Our extensive evaluation shows that our technique can balance real-world workloads on large deployments, and improve throughput and latency by 150%\mathbf{150\%} and 60%\mathbf{60\%} respectively over the previous state-of-the-art when deployed on Apache Storm.Comment: 12 pages, 14 Figures, this paper is accepted and will be published at ICDE 201

    Flower consumption, ambient temperature and rainfall modulate drinking behavior in a folivorous-frugivorous arboreal mammal

    Get PDF
    Water is vital for the survival of any species because of its key role in most physiological processes. However, little is known about the non-food-related water sources exploited by arboreal mammals, the seasonality of their drinking behavior and its potential drivers, including diet composition, temperature, and rainfall. We investigated this subject in 14 wild groups of brown howler monkeys (Alouatta guariba clamitans) inhabiting small, medium, and large Atlantic Forest fragments in southern Brazil. We found a wide variation in the mean rate of drinking among groups (range = 0–16 records/day). Streams (44% of 1,258 records) and treeholes (26%) were the major types of water sources, followed by bromeliads in the canopy (16%), pools (11%), and rivers (3%). The type of source influenced whether howlers used a hand to access the water or not. Drinking tended to be evenly distributed throughout the year, except for a slightly lower number of records in the spring than in the other seasons, but it was unevenly distributed during the day. It increased in the afternoon in all groups, particularly during temperature peaks around 15:00 and 17:00. We found via generalized linear mixed modelling that the daily frequency of drinking was mainly influenced negatively by flower consumption and positively by weekly rainfall and ambient temperature, whereas fragment size and the consumption of fruit and leaves played negligible roles. Overall, we confirm the importance of preformed water in flowers to satisfy the howler’s water needs, whereas the influence of the climatic variables is compatible with the ‘thermoregulation/dehydration-avoiding hypothesis’. In sum, we found that irrespective of habitat characteristics, brown howlers seem to seek a positive water balance by complementing the water present in the diet with drinking water, even when it is associated with a high predation risk in terrestrial sources.Coordinación de la formación del personal de nivel superior/[2755/2010]/CAPES/BrasilNational Council for scientific and Technological Development/[303306/2013-0]/CNPq/BrasilNational Council for scientific and Technological Development/[304475/2018-1]/CNPq/BrasilNational Council for scientific and Technological Development/[140641/2016-5]/CNPq/BrasilUCR::Vicerrectoría de Docencia::Ciencias Básicas::Facultad de Ciencias::Escuela de Biologí

    DUET: A Generic Framework for Finding Special Quadratic Elements in Data Streams

    Get PDF
    Finding special items, like heavy hitters, top-k, and persistent items, has always been a hot issue in data stream processing for web analysis. While data streams nowadays are usually high-dimensional, most prior works focus on special items according to a certain primary dimension and yield little insight into the correlations between dimensions. Therefore, we propose to find special quadratic elements to reveal close correlations. Based on the items mentioned above, we extend our problem to three applications related to heavy hitters, top-k, and persistent items, and design a generic framework DUET to process them. Besides, we analyze the error bound of our algorithm and conduct extensive experiments on four data sets. Our experimental results show that DUET can achieve 3.5 times higher throughput and three orders of magnitude lower average relative error compared with cutting-edge algorithms

    Spatial predictive risk mapping of lymphatic filariasis residual hotspots in American Samoa using demographic and environmental factors

    Get PDF
    Background: American Samoa successfully completed seven rounds of mass drug administration (MDA) for lymphatic filariasis (LF) from 2000–2006. The territory passed the school-based transmission assessment surveys in 2011 and 2015 but failed in 2016. One of the key challenges after the implementation of MDA is the identification of any residual hotspots of transmission. Method: Based on data collected in a 2016 community survey in persons aged ≥8 years, Bayesian geostatistical models were developed for LF antigen (Ag), and Wb123, Bm14, Bm33 antibodies (Abs) to predict spatial variation in infection markers using demographic and environmental factors (including land cover, elevation, rainfall, distance to the coastline and distance to streams). Results: In the Ag model, females had a 26.8% (95% CrI: 11.0–39.8%) lower risk of being Ag-positive than males. There was a 2.4% (95% CrI: 1.8–3.0%) increase in the odds of Ag positivity for every year of age. Also, the odds of Ag-positivity increased by 0.4% (95% CrI: 0.1–0.7%) for each 1% increase in tree cover. The models for Wb123, Bm14 and Bm33 Abs showed simi-lar significant associations as the Ag model for sex, age and tree coverage. After accounting for the effect of covariates, the radii of the clusters were larger for Bm14 and Bm33 Abs compared to Ag and Wb123 Ab. The predictive maps showed that Ab-positivity was more wide-spread across the territory, while Ag-positivity was more confined to villages in the north-west of the main island. Conclusion: The findings may facilitate more specific targeting of post-MDA surveillance activities by prioritising those areas at higher risk of ongoing transmission

    Assessment of the performance of alternative aviation fuel in a modern air-spray combustor (MAC)

    Get PDF
    Recent concerns over energy security and environmental considerations have highlighted the importance of finding alternative aviation fuels. It is expected that coal and biomass derived fuels will fulfil a substantial part of these energy requirements. However, because of the physical and chemical difference in the composition of these fuels, there are potential problems associated with the efficiency and the emissions of the combustion process. Over the past 25 years Computational Fluid Dynamics (CFD) has become increasingly popular with the gas turbine industry as a design tool for establishing and optimising key parameters of systems prior to starting expensive trials. In this paper the performance of a typical aviation fuel, kerosene, an alternative aviation fuel, biofuel and a blend have been examined using CFD modelling. A good knowledge of the kinetics of the reaction of bio aviation fuels at both high and low temperature is necessary to perform reliable simulations of ignition, combustion and emissions in aero-engine. A novel detailed reaction mechanism was used to represent aviation fuel oxidation mechanism. The fuel combustion is calculated using a 3D commercial solver using a mixture fraction/pdf approach. Firstly, the study demonstrates that CFD predictions compare favourably with experimental data obtained by QinetiQ for a Modern Airspray Combustor (MAC) when used with traditional jet fuel (kerosene). Furthermore, the 3D CFD model has been refined to use the laminar flamelet model (LFM) approach that incorporates recently developed chemical reaction mechanisms for the bio-aviation fuel. This has enabled predictions for the bio-aviation fuel to be made. The impact of using the blended fuel has been shown to be very similar in performance to that of the 100% kerosene, confirming that aircraft running on 20% blended fuel should have no significant reduction in performance. It was also found that for the given operating conditions there is a significant reduction in performance when 100% biofuel if used. Additionally, interesting predictions were obtained, related to NOx emissions for the blend and 100% biofuel

    Conditional heavy hitters : detecting interesting correlations in data streams

    Get PDF
    The notion of heavy hitters—items that make up a large fraction of the population—has been successfully used in a variety of applications across sensor and RFID monitoring, network data analysis, event mining, and more. Yet this notion often fails to capture the semantics we desire when we observe data in the form of correlated pairs. Here, we are interested in items that are conditionally frequent: when a particular item is frequent within the context of its parent item. In this work, we introduce and formalize the notion of conditional heavy hitters to identify such items, with applications in network monitoring and Markov chain modeling. We explore the relationship between conditional heavy hitters and other related notions in the literature, and show analytically and experimentally the usefulness of our approach. We introduce several algorithm variations that allow us to efficiently find conditional heavy hitters for input data with very different characteristics, and provide analytical results for their performance. Finally, we perform experimental evaluations with several synthetic and real datasets to demonstrate the efficacy of our methods and to study the behavior of the proposed algorithms for different types of data

    Tracing the evolution of digitalisation research in business and management fields: Bibliometric analysis, topic modelling and deep learning trend forecasting

    Get PDF
    Research on digitalisation trends and digital topics has become one of the most prolific streams of research within the fields of business and management during the course of the past few years. The purpose of this study is to provide a general picture of the intellectual structure and the conceptual space of this research realm. To this purpose, 6067 publications related to digital topics, indexed in the business and management categories of Web of Science (WoS), and dated from 1990 to 2020 are explored based on the approaches of bibliometric analysis, topic modelling and trend forecasting. The results of the bibliometric analysis comprise insights into the publication and citation structure, the most productive authors, the most productive universities, the most productive countries, the most productive journals, the most cited studies and the most prevalent themes and sub-themes on digitalisation in business and management. In addition, the outcomes of the topic modelling give new knowledge on the latent topical structure along with the rising, falling and fluctuating trends of this literature. In addition, the results of the trend forecasting enable readers to have a glimpse of how the underlying trends of the literature will probably change within the next years until 2025. These results provide guidance and orientation for both academics and practitioners who are initiating or currently developing their efforts in this discipline.info:eu-repo/semantics/acceptedVersio
    • …
    corecore