Search CORE

88 research outputs found

User Privacy on Spotify: Predicting Personal Data from Music Preferences

Author: YE JIANCHENG
Publication venue
Publication date: 20/12/2022
Field of study

openThe way we listen to music has changed drastically in the past decade. Now we can play any kind of music from various artists around the world through our smart devices. Many music streaming providers, if not most, are built with systems to track users’ music preferences and suggest new content. The music we listen to reveals a great deal about who we are. In general, people share their playlists and songs of their favorite artists on the music platform; find people with common music genres and connect with them. It is not always easy to make friends with unknown people, but music is a good way to accomplish that. In spite of that, we must also look at other sides of the coin from a security perspective. Is it a good idea to share music interests with others or will it compromise our privacy? According to privacy experts and developers, there is no purposeless data. Everything can be used to infer private information, even a single like on social media, which seems, at first sight, meaningless, but it can reveal more information than it promises. In the case that our musical tastes reveal our information, we may be profiled for targeted advertisement, by surveillance agencies, or in general, become potential victims of malicious activities Since music is part of our daily lives, and there are many providers that let us listen to music, we are even more at risk of being profiled and having our data sold. In this research, we demonstrate the feasibility of inferring personal data based on playlists and songs people publicly shared on Spotify. Through an online survey, we collected a new dataset containing the private information of 750 Spotify users and we downloaded around 402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant correlations between users’ music preferences (e.g., music genre) and private information (e.g., age, gender, economic status). As a consequence of significant correlations, we built several machine-learning models to infer private information and our results demonstrated that such inference is possible, posing a real privacy threat to all music listeners. In particular, we accurately predicted the gender (71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8% f1-score) or smokes (60.2% f1-score) regularly. The purpose of this project is to raise awareness about how seemingly purposeless data can reveal personal information and educate users about how to better protect their privacy.The way we listen to music has changed drastically in the past decade. Now we can play any kind of music from various artists around the world through our smart devices. Many music streaming providers, if not most, are built with systems to track users’ music preferences and suggest new content. The music we listen to reveals a great deal about who we are. In general, people share their playlists and songs of their favorite artists on the music platform; find people with common music genres and connect with them. It is not always easy to make friends with unknown people, but music is a good way to accomplish that. In spite of that, we must also look at other sides of the coin from a security perspective. Is it a good idea to share music interests with others or will it compromise our privacy? According to privacy experts and developers, there is no purposeless data. Everything can be used to infer private information, even a single like on social media, which seems, at first sight, meaningless, but it can reveal more information than it promises. In the case that our musical tastes reveal our information, we may be profiled for targeted advertisement, by surveillance agencies, or in general, become potential victims of malicious activities Since music is part of our daily lives, and there are many providers that let us listen to music, we are even more at risk of being profiled and having our data sold. In this research, we demonstrate the feasibility of inferring personal data based on playlists and songs people publicly shared on Spotify. Through an online survey, we collected a new dataset containing the private information of 750 Spotify users and we downloaded around 402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant correlations between users’ music preferences (e.g., music genre) and private information (e.g., age, gender, economic status). As a consequence of significant correlations, we built several machine-learning models to infer private information and our results demonstrated that such inference is possible, posing a real privacy threat to all music listeners. In particular, we accurately predicted the gender (71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8% f1-score) or smokes (60.2% f1-score) regularly. The purpose of this project is to raise awareness about how seemingly purposeless data can reveal personal information and educate users about how to better protect their privac

Padua Thesis and Dissertation Archive

Characteristics and patterns of retention in hypertension care in primary care settings from the Hypertension Treatment in Nigeria Program

Author: et al.
Huffman Mark D
Ye Jiancheng
Publication venue: Digital Commons@Becker
Publication date: 01/09/2022
Field of study

Background: More than 1.2 billion adults worldwide have hypertension. High retention in clinical care is essential for long-term management of hypertension, but 1-year retention rates are less than 50% in many resource-limited settings. Objective: To evaluate short-term retention rates and associated factors among patients with hypertension in primary health care centers in the Federal Capital Territory of Nigeria. Design, Setting, and Participants: In this cohort study, data were collected by trained study staff from adults aged 18 years or older at 60 public, primary health care centers in Nigeria between January 2020 and July 2021 as part of the Hypertension Treatment in Nigeria (HTN) Program. Patients with hypertension were registered. Exposures: Follow-up visit for hypertension care within 37 days of the registration visit. Main Outcomes and Measures: The main outcome was the 3-month rolling average 37-day retention rate in hypertension care, calculated by dividing the number of patients who had a follow-up visit within 37 days of their first (ie, registration) visit in the program by the total number of registered patients with hypertension during multiple consecutive 3-month periods. Interrupted time series analyses evaluated trends in retention rates before and after the intervention phase of the HTN Program. Mixed-effects, multivariable regression models evaluated associations between patient-, site-, and area council-level factors, hypertension treatment and control status, and 37-day retention rate. Results: In total, 10 686 patients (68.3% female; mean [SD] age, 48.8 [12.7] years) were included in the analysis. During the study period, the 3-month rolling average 37-day retention rate was 41% (95% CI, 37%-46%), with wide variability among sites. The retention rate was higher among patients who were older (adjusted odds ratio [aOR], 1.01 per year; 95% CI, 1.01-1.02 per year), were female (aOR, 1.11; 95% CI, 1.01-1.23), had a higher body mass index (aOR, 1.01; 95% CI, 1.00-1.02), were in the Kuje vs the Abaji area council (aOR, 2.25; 95% CI, 1.25-4.04), received hypertension treatment at the registration visit (aOR, 1.27; 95% CI, 1.07-1.50), and were registered during the postintervention period (aOR, 1.16; 95% CI, 1.06-1.26). Conclusions and Relevance: The findings suggest that retention in hypertension care is suboptimal in primary health care centers in Nigeria, although large variability among sites was found. Potentially modifiable and nonmodifiable factors associated with retention were identified and may inform multilevel, contextualized implementation strategies to improve retention

Digital Commons@Becker

Personalized Federated Learning with Hidden Information on Personalized Prior

Author: Lv Jiancheng
Shi Mingjia
Ye Qing
Zhou Yuhao
Publication venue
Publication date: 24/11/2022
Field of study

Federated learning (FL for simplification) is a distributed machine learning technique that utilizes global servers and collaborative clients to achieve privacy-preserving global model training without direct data sharing. However, heterogeneous data problem, as one of FL's main problems, makes it difficult for the global model to perform effectively on each client's local data. Thus, personalized federated learning (PFL for simplification) aims to improve the performance of the model on local data as much as possible. Bayesian learning, where the parameters of the model are seen as random variables with a prior assumption, is a feasible solution to the heterogeneous data problem due to the tendency that the more local data the model use, the more it focuses on the local data, otherwise focuses on the prior. When Bayesian learning is applied to PFL, the global model provides global knowledge as a prior to the local training process. In this paper, we employ Bayesian learning to model PFL by assuming a prior in the scaled exponential family, and therefore propose pFedBreD, a framework to solve the problem we model using Bregman divergence regularization. Empirically, our experiments show that, under the prior assumption of the spherical Gaussian and the first order strategy of mean selection, our proposal significantly outcompetes other PFL algorithms on multiple public benchmarks.Comment: 19 pages, 6 figures, 3 table

arXiv.org e-Print Archive

DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

Author: Lv Jiancheng
Shi Mingjia
Sun Yanan
Ye Qing
Zhou Yuhao
Publication venue
Publication date: 03/11/2022
Field of study

Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy implementation yet promising performance. Particularly, each worker ofthe cluster hosts a copy of the DNN and an evenly divided share of the datasetwith the fixed mini-batch size, to keep the training of DNNs convergence. In thestrategies, the workers with different computational capability, need to wait foreach other because of the synchronization and delays in network transmission,which will inevitably result in the high-performance workers wasting computation.Consequently, the utilization of the cluster is relatively low. To alleviate thisissue, we propose the Dynamic Batch Size (DBS) strategy for the distributedtraining of DNNs. Specifically, the performance of each worker is evaluatedfirst based on the fact in the previous epoch, and then the batch size and datasetpartition are dynamically adjusted in consideration of the current performanceof the worker, thereby improving the utilization of the cluster. To verify theeffectiveness of the proposed strategy, extensive experiments have been conducted,and the experimental results indicate that the proposed strategy can fully utilizethe performance of the cluster, reduce the training time, and have good robustnesswith disturbance by irrelevant tasks. Furthermore, rigorous theoretical analysis hasalso been provided to prove the convergence of the proposed strategy.Comment: The latest version of this article has been accepted by IEEE TETC

arXiv.org e-Print Archive

A two-stage framework for short-term wind power forecasting using different feature-learning models

Author: Hua Li
Jiancheng Qin
Jin Yang
Qiang Ye
Ying Chen
Publication venue
Publication date: 30/05/2020
Field of study

With the growing dependence on wind power generation, improving the accuracy of short-term forecasting has become increasingly important for ensuring continued economical and reliable system operations. In the wind power forecasting field, ensemble-based forecasting models have been studied extensively; however, few of them considered learning the features from both historical wind data and NWP data. In addition, the exploration of the multiple-input and multiple-output learning structures is lacking in the wind power forecasting literature. Therefore, this study exploits the NWP and historical wind data as input and proposes a two-stage forecasting framework on the shelf of moving window algorithm. Specifically, at the first stage, four forecasting models are constructed with deep neural networks considering the multiple-input and multiple-output structures; at the second stage, an ensemble model is developed using ridge regression method for reducing the extrapolation error. The experiments are conducted on three existing wind farms for examining the 2-h ahead forecasting point. The results demonstrate that 1) the single-input-multiple-output (SIMO) structure leads to a better forecasting accuracy than the other threes; 2) ridge regression method results in a better ensemble model that is able to further improve the forecasting accuracy, than the other machine learning methods; 3) the proposed two-stage forecasting framework is likely to generate more accurate and stable results than the other existing algorithms

arXiv.org e-Print Archive

Directory of Open Access Journals

Eight-Year Surveillance of Antimicrobial Resistance among Enterobacter Cloacae Isolated in the First Bethune Hospital

Author: Wang Ailin
Xu Jiancheng
Yuan Ye
Zhang Man
Zhou Qi
Publication venue: Published by Elsevier B.V.
Publication date: 31/12/2012
Field of study

AbstractThis study was to investigate the antimicrobial resistance of Enterobacter cloacae isolated in 8 consecutive years in the First Bethune Hospital. Disk diffusion test was used to study the antimicrobial resistance. The data were analyzed by WHONET 5 software according to Clinical and Laboratory Standards Institute (CLSI). Most of 683 strains of Enterobacter cloacae were collected from sputum 410 (60.0%), secretions and pus 105 (15.4%), urine 69 (10.1%) during the past 8 years. No Enterobacter cloacae was resistant to imipenem and meropenem in the First Bethune Hospital. The antimicrobial resistance of Enterobacter cloacae had increased in recent 8 years. The change of the antimicrobial resistance should be investigated in order to direct rational drug usage in the clinic and prevent bacterial strain of drug resistance from b eing transmitted

Elsevier - Publisher Connector