Search CORE

60,329 research outputs found

Stochastic Modeling of Hybrid Cache Systems

Author: Chen Jiqiang
Ju Gaoying
Li Yongkun
Lui John C. S.
Xu Yinlong
Publication venue
Publication date: 30/09/2016
Field of study

In recent years, there is an increasing demand of big memory systems so to perform large scale data analytics. Since DRAM memories are expensive, some researchers are suggesting to use other memory systems such as non-volatile memory (NVM) technology to build large-memory computing systems. However, whether the NVM technology can be a viable alternative (either economically and technically) to DRAM remains an open question. To answer this question, it is important to consider how to design a memory system from a "system perspective", that is, incorporating different performance characteristics and price ratios from hybrid memory devices. This paper presents an analytical model of a "hybrid page cache system" so to understand the diverse design space and performance impact of a hybrid cache system. We consider (1) various architectural choices, (2) design strategies, and (3) configuration of different memory devices. Using this model, we provide guidelines on how to design hybrid page cache to reach a good trade-off between high system throughput (in I/O per sec or IOPS) and fast cache reactivity which is defined by the time to fill the cache. We also show how one can configure the DRAM capacity and NVM capacity under a fixed budget. We pick PCM as an example for NVM and conduct numerical analysis. Our analysis indicates that incorporating PCM in a page cache system significantly improves the system performance, and it also shows larger benefit to allocate more PCM in page cache in some cases. Besides, for the common setting of performance-price ratio of PCM, "flat architecture" offers as a better choice, but "layered architecture" outperforms if PCM write performance can be significantly improved in the future.Comment: 14 pages; mascots 201

arXiv.org e-Print Archive

Crossref

A Hybrid Recommender System for Patient-Doctor Matchmaking in Primary Care

Author: de Troya Inigo Martinez de Rituerto
Gaur Manas
Han Qiwei
Ji Mengxin
Zejnilovic Leid
Publication venue
Publication date: 18/03/2019
Field of study

We partner with a leading European healthcare provider and design a mechanism to match patients with family doctors in primary care. We define the matchmaking process for several distinct use cases given different levels of available information about patients. Then, we adopt a hybrid recommender system to present each patient a list of family doctor recommendations. In particular, we model patient trust of family doctors using a large-scale dataset of consultation histories, while accounting for the temporal dynamics of their relationships. Our proposed approach shows higher predictive accuracy than both a heuristic baseline and a collaborative filtering approach, and the proposed trust measure further improves model performance.Comment: This paper is accepted at DSAA 2018 as a full paper, Proc. of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA), Turin, Ital

arXiv.org e-Print Archive

Crossref

Recommended from our members

Hybrid Black-box Solar Analytics and their Privacy Implications

Author: Chen Dong
Publication venue: ScholarWorks@UMass Amherst
Publication date: 25/10/2018
Field of study

The aggregate solar capacity in the U.S. is rising rapidly due to continuing decreases in the cost of solar modules. For example, the installed cost per Watt (W) for residential photovoltaics (PVs) decreased by 6X from 2009 to 2018 (from

8/W to

1.2/W), resulting in the installed aggregate solar capacity increasing 128X from 2009 to 2018 (from 435 megawatts to 55.9 gigawatts). This increasing solar capacity is imposing operational challenges on utilities in balancing electricity\u27s real-time supply and demand, as solar generation is more stochastic and less predictable than aggregate demand. To address this problem, both academia and utilities have raised strong interests in solar analytics to accurately monitor, predict and react to variations in intermittent solar power. Prior solar analytics are mostly white-box approaches that are based on site-specific information and require expert knowledge and thus do not scale, recent research focuses on black-box approaches that use training data to automatically learn a custom machine learning (ML) model. Unfortunately, this approach requires months-to-years of training data, and often does not incorporate well-known physical models of solar generation, which reduces its accuracy. Instead, in this dissertation, we present a hybrid black box approach that can achieve the best of both to solar analytics. Our hypothesis is that the hybrid black-box approach can enable a wide range of accurate solar analytics, including modeling, disaggregation, and localization, with limited training data and without knowledge of key system parameters by integrating black-box machine learning approaches with white-box physical models. In evaluating our hypothesis, we make the following contributions: (Mostly) ML black-box Solar Modeling. To get benefits from both of ML and physical approaches, we present a configurable hybrid black-box ML approach that combines well-known relationships from physical models with unknown relationships learned via ML. Rather than manually determining values for physical model parameters, our approach automatically calibrates them by finding values that best to the data. This calibration requires much less data (as few as 2 datapoints) than training an ML model. And we show that our hybrid approach significantly improves solar modeling accuracy. (Mostly) Physical black-box Solar Modeling. The physical model used in the hybrid model above performs significantly worse than other approaches. To determine the primary source of this inaccuracy, we conduct a large-scale data analysis and show that the only weather metrics that affect solar output are temperature and cloud cover, and then derive a new physical model that accurately quantify cloud cover\u27s effect on solar generation at all sites. We then enhance our physical model with an ML model that learns each site\u27s unique shading effect. And we show that the hybrid modeling yields higher accuracy than current state-of-the-art ML approaches. We also identify a universal weather-solar effect that has not been articulated before and is broadly applicable to other solar analytics. Solar Disaggregation. Solar forecast models require historical solar generation data for training. Unfortunately, pure solar generation data is often not available, as the vast majority of small-scale residential solar deployments (\u3c10kW) are Behind the Meter (BTM) , such that smart meter data exposed to utilities represents only the net of a building\u27s solar generation and its energy consumption. To address this problem, we design SunDance, a black-box\u27\u27 system that leverages the clear sky maximum solar generation model, and the universal weather-solar effect from the hybrid black-box models above. We show that SunDance can accurately disaggregate solar generation from net meter data without access to a building\u27s pure solar generation data for training. Solar-based Localization. The energy data produced by solar-powered homes is considered anonymous and usually publicly available if it is not associated with identifying account information, e.g., a name and address. Our key insight is that solar energy data is not anonymous: every location on Earth has a unique solar signature, and it embeds detailed location information. We then design SunSpot to localize the source of solar generation data and show that SunSpot is able to localize a solar-powered home within 500 meters and 28 kilometers radius for per-second and per-minute resolution. Weather-based Localization. However, the above solar-based localization has a fundamental limit due to Earth\u27s rotation. To further localize towards a specific home, we identify another key insight: every location on Earth has a distinct weather signature that uniquely identifies it. Interestingly, we find that localizing coarse (one-hour resolution) solar data using weather signature is more accurate than localizing solar data (one minute or one second resolution) using its solar signature. Both of SunSpot and Weatherman expose a new serious privacy threat from energy data, which has not been presented in the past

ScholarWorks@UMass Amherst

Reconsidering big data security and privacy in cloud and mobile cloud systems

Author: Saldamli Gokay
Tawalbeh Lo\u27ai A.
Publication venue: 'Elsevier BV'
Publication date: 01/09/2021
Field of study

Large scale distributed systems in particular cloud and mobile cloud deployments provide great services improving people\u27s quality of life and organizational efficiency. In order to match the performance needs, cloud computing engages with the perils of peer-to-peer (P2P) computing and brings up the P2P cloud systems as an extension for federated cloud. Having a decentralized architecture built on independent nodes and resources without any specific central control and monitoring, these cloud deployments are able to handle resource provisioning at a very low cost. Hence, we see a vast amount of mobile applications and services that are ready to scale to billions of mobile devices painlessly. Among these, data driven applications are the most successful ones in terms of popularity or monetization. However, data rich applications expose other problems to consider including storage, big data processing and also the crucial task of protecting private or sensitive information. In this work, first, we go through the existing layered cloud architectures and present a solution addressing the big data storage. Secondly, we explore the use of P2P Cloud System (P2PCS) for big data processing and analytics. Thirdly, we propose an efficient hybrid mobile cloud computing model based on cloudlets concept and we apply this model to health care systems as a case study. Then, the model is simulated using Mobile Cloud Computing Simulator (MCCSIM). According to the experimental power and delay results, the hybrid cloud model performs up to 75% better when compared to the traditional cloud models. Lastly, we enhance our proposals by presenting and analyzing security and privacy countermeasures against possible attacks

Directory of Open Access Journals

SJSU ScholarWorks

Integrated Face Analytics Networks through Cross-Dataset Hybrid Training

Author: Feng Jiashi
Li Jianan
Li Jianshu
Sim Terence
Xiao Shengtao
Yan Shuicheng
Zhao Fang
Zhao Jian
Publication venue
Publication date: 16/11/2017
Field of study

Face analytics benefits many multimedia applications. It consists of a number of tasks, such as facial emotion recognition and face parsing, and most existing approaches generally treat these tasks independently, which limits their deployment in real scenarios. In this paper we propose an integrated Face Analytics Network (iFAN), which is able to perform multiple tasks jointly for face analytics with a novel carefully designed network architecture to fully facilitate the informative interaction among different tasks. The proposed integrated network explicitly models the interactions between tasks so that the correlations between tasks can be fully exploited for performance boost. In addition, to solve the bottleneck of the absence of datasets with comprehensive training data for various tasks, we propose a novel cross-dataset hybrid training strategy. It allows "plug-in and play" of multiple datasets annotated for different tasks without the requirement of a fully labeled common dataset for all the tasks. We experimentally show that the proposed iFAN achieves state-of-the-art performance on multiple face analytics tasks using a single integrated model. Specifically, iFAN achieves an overall F-score of 91.15% on the Helen dataset for face parsing, a normalized mean error of 5.81% on the MTFL dataset for facial landmark localization and an accuracy of 45.73% on the BNU dataset for emotion recognition with a single model.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Exploring Application Performance on Emerging Hybrid-Memory Supercomputers

Author: Gioiosa Roberto
Kestor Gokcen
Laure Erwin
Markidis Stefano
Peng Ivy Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/04/2017
Field of study

Next-generation supercomputers will feature more hierarchical and heterogeneous memory systems with different memory technologies working side-by-side. A critical question is whether at large scale existing HPC applications and emerging data-analytics workloads will have performance improvement or degradation on these systems. We propose a systematic and fair methodology to identify the trend of application performance on emerging hybrid-memory systems. We model the memory system of next-generation supercomputers as a combination of "fast" and "slow" memories. We then analyze performance and dynamic execution characteristics of a variety of workloads, from traditional scientific applications to emerging data analytics to compare traditional and hybrid-memory systems. Our results show that data analytics applications can clearly benefit from the new system design, especially at large scale. Moreover, hybrid-memory systems do not penalize traditional scientific applications, which may also show performance improvement.Comment: 18th International Conference on High Performance Computing and Communications, IEEE, 201

arXiv.org e-Print Archive

Crossref

Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm

Author: Gupta Vibhuti
Hewett Rattikorn
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/12/2018
Field of study

Twitter is a popular social network platform where users can interact and post texts of up to 280 characters called tweets. Hashtags, hyperlinked words in tweets, have increasingly become crucial for tweet retrieval and search. Using hashtags for tweet topic classification is a challenging problem because of context dependent among words, slangs, abbreviation and emoticons in a short tweet along with evolving use of hashtags. Since Twitter generates millions of tweets daily, tweet analytics is a fundamental problem of Big data stream that often requires a real-time Distributed processing. This paper proposes a distributed online approach to tweet topic classification with hashtags. Being implemented on Apache Storm, a distributed real time framework, our approach incrementally identifies and updates a set of strong predictors in the Na\"ive Bayes model for classifying each incoming tweet instance. Preliminary experiments show promising results with up to 97% accuracy and 37% increase in throughput on eight processors.Comment: IEEE International Conference on Big Data 201

arXiv.org e-Print Archive

Crossref