60,329 research outputs found
Stochastic Modeling of Hybrid Cache Systems
In recent years, there is an increasing demand of big memory systems so to
perform large scale data analytics. Since DRAM memories are expensive, some
researchers are suggesting to use other memory systems such as non-volatile
memory (NVM) technology to build large-memory computing systems. However,
whether the NVM technology can be a viable alternative (either economically and
technically) to DRAM remains an open question. To answer this question, it is
important to consider how to design a memory system from a "system
perspective", that is, incorporating different performance characteristics and
price ratios from hybrid memory devices.
This paper presents an analytical model of a "hybrid page cache system" so to
understand the diverse design space and performance impact of a hybrid cache
system. We consider (1) various architectural choices, (2) design strategies,
and (3) configuration of different memory devices. Using this model, we provide
guidelines on how to design hybrid page cache to reach a good trade-off between
high system throughput (in I/O per sec or IOPS) and fast cache reactivity which
is defined by the time to fill the cache. We also show how one can configure
the DRAM capacity and NVM capacity under a fixed budget. We pick PCM as an
example for NVM and conduct numerical analysis. Our analysis indicates that
incorporating PCM in a page cache system significantly improves the system
performance, and it also shows larger benefit to allocate more PCM in page
cache in some cases. Besides, for the common setting of performance-price ratio
of PCM, "flat architecture" offers as a better choice, but "layered
architecture" outperforms if PCM write performance can be significantly
improved in the future.Comment: 14 pages; mascots 201
A Hybrid Recommender System for Patient-Doctor Matchmaking in Primary Care
We partner with a leading European healthcare provider and design a mechanism
to match patients with family doctors in primary care. We define the
matchmaking process for several distinct use cases given different levels of
available information about patients. Then, we adopt a hybrid recommender
system to present each patient a list of family doctor recommendations. In
particular, we model patient trust of family doctors using a large-scale
dataset of consultation histories, while accounting for the temporal dynamics
of their relationships. Our proposed approach shows higher predictive accuracy
than both a heuristic baseline and a collaborative filtering approach, and the
proposed trust measure further improves model performance.Comment: This paper is accepted at DSAA 2018 as a full paper, Proc. of the 5th
IEEE International Conference on Data Science and Advanced Analytics (DSAA),
Turin, Ital
Recommended from our members
Hybrid Black-box Solar Analytics and their Privacy Implications
The aggregate solar capacity in the U.S. is rising rapidly due to continuing decreases in the cost of solar modules. For example, the installed cost per Watt (W) for residential photovoltaics (PVs) decreased by 6X from 2009 to 2018 (from 1.2/W), resulting in the installed aggregate solar capacity increasing 128X from 2009 to 2018 (from 435 megawatts to 55.9 gigawatts). This increasing solar capacity is imposing operational challenges on utilities in balancing electricity\u27s real-time supply and demand, as solar generation is more stochastic and less predictable than aggregate demand.
To address this problem, both academia and utilities have raised strong interests in solar analytics to accurately monitor, predict and react to variations in intermittent solar power. Prior solar analytics are mostly white-box approaches that are based on site-specific information and require expert knowledge and thus do not scale, recent research focuses on black-box approaches that use training data to automatically learn a custom machine learning (ML) model. Unfortunately, this approach requires months-to-years of training data, and often does not incorporate well-known physical models of solar generation, which reduces its accuracy. Instead, in this dissertation, we present a hybrid black box approach that can achieve the best of both to solar analytics. Our hypothesis is that the hybrid black-box approach can enable a wide range of accurate solar analytics, including modeling, disaggregation, and localization, with limited training data and without knowledge of key system parameters by integrating black-box machine learning approaches with white-box physical models. In evaluating our hypothesis, we make the following contributions:
(Mostly) ML black-box Solar Modeling. To get benefits from both of ML and physical approaches, we present a configurable hybrid black-box ML approach that combines well-known relationships from physical models with unknown relationships learned via ML. Rather than manually determining values for physical model parameters, our approach automatically calibrates them by finding values that best to the data. This calibration requires much less data (as few as 2 datapoints) than training an ML model. And we show that our hybrid approach significantly improves solar modeling accuracy.
(Mostly) Physical black-box Solar Modeling. The physical model used in the hybrid model above performs significantly worse than other approaches. To determine the primary source of this inaccuracy, we conduct a large-scale data analysis and show that the only weather metrics that affect solar output are temperature and cloud cover, and then derive a new physical model that accurately quantify cloud cover\u27s effect on solar generation at all sites. We then enhance our physical model with an ML model that learns each site\u27s unique shading effect. And we show that the hybrid modeling yields higher accuracy than current state-of-the-art ML approaches. We also identify a universal weather-solar effect that has not been articulated before and is broadly applicable to other solar analytics.
Solar Disaggregation. Solar forecast models require historical solar generation data for training. Unfortunately, pure solar generation data is often not available, as the vast majority of small-scale residential solar deployments (\u3c10kW) are Behind the Meter (BTM) , such that smart meter data exposed to utilities represents only the net of a building\u27s solar generation and its energy consumption. To address this problem, we design SunDance, a black-box\u27\u27 system that leverages the clear sky maximum solar generation model, and the universal weather-solar effect from the hybrid black-box models above. We show that SunDance can accurately disaggregate solar generation from net meter data without access to a building\u27s pure solar generation data for training.
Solar-based Localization. The energy data produced by solar-powered homes is considered anonymous and usually publicly available if it is not associated with identifying account information, e.g., a name and address. Our key insight is that solar energy data is not anonymous: every location on Earth has a unique solar signature, and it embeds detailed location information. We then design SunSpot to localize the source of solar generation data and show that SunSpot is able to localize a solar-powered home within 500 meters and 28 kilometers radius for per-second and per-minute resolution.
Weather-based Localization. However, the above solar-based localization has a fundamental limit due to Earth\u27s rotation. To further localize towards a specific home, we identify another key insight: every location on Earth has a distinct weather signature that uniquely identifies it. Interestingly, we find that localizing coarse (one-hour resolution) solar data using weather signature is more accurate than localizing solar data (one minute or one second resolution) using its solar signature. Both of SunSpot and Weatherman expose a new serious privacy threat from energy data, which has not been presented in the past
Reconsidering big data security and privacy in cloud and mobile cloud systems
Large scale distributed systems in particular cloud and mobile cloud deployments provide great services improving people\u27s quality of life and organizational efficiency. In order to match the performance needs, cloud computing engages with the perils of peer-to-peer (P2P) computing and brings up the P2P cloud systems as an extension for federated cloud. Having a decentralized architecture built on independent nodes and resources without any specific central control and monitoring, these cloud deployments are able to handle resource provisioning at a very low cost. Hence, we see a vast amount of mobile applications and services that are ready to scale to billions of mobile devices painlessly. Among these, data driven applications are the most successful ones in terms of popularity or monetization. However, data rich applications expose other problems to consider including storage, big data processing and also the crucial task of protecting private or sensitive information. In this work, first, we go through the existing layered cloud architectures and present a solution addressing the big data storage. Secondly, we explore the use of P2P Cloud System (P2PCS) for big data processing and analytics. Thirdly, we propose an efficient hybrid mobile cloud computing model based on cloudlets concept and we apply this model to health care systems as a case study. Then, the model is simulated using Mobile Cloud Computing Simulator (MCCSIM). According to the experimental power and delay results, the hybrid cloud model performs up to 75% better when compared to the traditional cloud models. Lastly, we enhance our proposals by presenting and analyzing security and privacy countermeasures against possible attacks
Integrated Face Analytics Networks through Cross-Dataset Hybrid Training
Face analytics benefits many multimedia applications. It consists of a number
of tasks, such as facial emotion recognition and face parsing, and most
existing approaches generally treat these tasks independently, which limits
their deployment in real scenarios. In this paper we propose an integrated Face
Analytics Network (iFAN), which is able to perform multiple tasks jointly for
face analytics with a novel carefully designed network architecture to fully
facilitate the informative interaction among different tasks. The proposed
integrated network explicitly models the interactions between tasks so that the
correlations between tasks can be fully exploited for performance boost. In
addition, to solve the bottleneck of the absence of datasets with comprehensive
training data for various tasks, we propose a novel cross-dataset hybrid
training strategy. It allows "plug-in and play" of multiple datasets annotated
for different tasks without the requirement of a fully labeled common dataset
for all the tasks. We experimentally show that the proposed iFAN achieves
state-of-the-art performance on multiple face analytics tasks using a single
integrated model. Specifically, iFAN achieves an overall F-score of 91.15% on
the Helen dataset for face parsing, a normalized mean error of 5.81% on the
MTFL dataset for facial landmark localization and an accuracy of 45.73% on the
BNU dataset for emotion recognition with a single model.Comment: 10 page
Exploring Application Performance on Emerging Hybrid-Memory Supercomputers
Next-generation supercomputers will feature more hierarchical and
heterogeneous memory systems with different memory technologies working
side-by-side. A critical question is whether at large scale existing HPC
applications and emerging data-analytics workloads will have performance
improvement or degradation on these systems. We propose a systematic and fair
methodology to identify the trend of application performance on emerging
hybrid-memory systems. We model the memory system of next-generation
supercomputers as a combination of "fast" and "slow" memories. We then analyze
performance and dynamic execution characteristics of a variety of workloads,
from traditional scientific applications to emerging data analytics to compare
traditional and hybrid-memory systems. Our results show that data analytics
applications can clearly benefit from the new system design, especially at
large scale. Moreover, hybrid-memory systems do not penalize traditional
scientific applications, which may also show performance improvement.Comment: 18th International Conference on High Performance Computing and
Communications, IEEE, 201
Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm
Twitter is a popular social network platform where users can interact and
post texts of up to 280 characters called tweets. Hashtags, hyperlinked words
in tweets, have increasingly become crucial for tweet retrieval and search.
Using hashtags for tweet topic classification is a challenging problem because
of context dependent among words, slangs, abbreviation and emoticons in a short
tweet along with evolving use of hashtags. Since Twitter generates millions of
tweets daily, tweet analytics is a fundamental problem of Big data stream that
often requires a real-time Distributed processing. This paper proposes a
distributed online approach to tweet topic classification with hashtags. Being
implemented on Apache Storm, a distributed real time framework, our approach
incrementally identifies and updates a set of strong predictors in the Na\"ive
Bayes model for classifying each incoming tweet instance. Preliminary
experiments show promising results with up to 97% accuracy and 37% increase in
throughput on eight processors.Comment: IEEE International Conference on Big Data 201
- …