88 research outputs found
User Privacy on Spotify: Predicting Personal Data from Music Preferences
openThe way we listen to music has changed drastically in the past decade. Now we can play any
kind of music from various artists around the world through our smart devices. Many music
streaming providers, if not most, are built with systems to track users’ music preferences and
suggest new content.
The music we listen to reveals a great deal about who we are. In general, people share their
playlists and songs of their favorite artists on the music platform; find people with common
music genres and connect with them. It is not always easy to make friends with unknown
people, but music is a good way to accomplish that. In spite of that, we must also look at other
sides of the coin from a security perspective. Is it a good idea to share music interests with
others or will it compromise our privacy? According to privacy experts and developers, there
is no purposeless data. Everything can be used to infer private information, even a single like
on social media, which seems, at first sight, meaningless, but it can reveal more information
than it promises. In the case that our musical tastes reveal our information, we may be profiled
for targeted advertisement, by surveillance agencies, or in general, become potential victims of
malicious activities Since music is part of our daily lives, and there are many providers that let
us listen to music, we are even more at risk of being profiled and having our data sold.
In this research, we demonstrate the feasibility of inferring personal data based on playlists
and songs people publicly shared on Spotify. Through an online survey, we collected a new
dataset containing the private information of 750 Spotify users and we downloaded around
402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant
correlations between users’ music preferences (e.g., music genre) and private information (e.g.,
age, gender, economic status).
As a consequence of significant correlations, we built several machine-learning models to
infer private information and our results demonstrated that such inference is possible, posing
a real privacy threat to all music listeners. In particular, we accurately predicted the gender
(71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8%
f1-score) or smokes (60.2% f1-score) regularly.
The purpose of this project is to raise awareness about how seemingly purposeless data can
reveal personal information and educate users about how to better protect their privacy.The way we listen to music has changed drastically in the past decade. Now we can play any
kind of music from various artists around the world through our smart devices. Many music
streaming providers, if not most, are built with systems to track users’ music preferences and
suggest new content.
The music we listen to reveals a great deal about who we are. In general, people share their
playlists and songs of their favorite artists on the music platform; find people with common
music genres and connect with them. It is not always easy to make friends with unknown
people, but music is a good way to accomplish that. In spite of that, we must also look at other
sides of the coin from a security perspective. Is it a good idea to share music interests with
others or will it compromise our privacy? According to privacy experts and developers, there
is no purposeless data. Everything can be used to infer private information, even a single like
on social media, which seems, at first sight, meaningless, but it can reveal more information
than it promises. In the case that our musical tastes reveal our information, we may be profiled
for targeted advertisement, by surveillance agencies, or in general, become potential victims of
malicious activities Since music is part of our daily lives, and there are many providers that let
us listen to music, we are even more at risk of being profiled and having our data sold.
In this research, we demonstrate the feasibility of inferring personal data based on playlists
and songs people publicly shared on Spotify. Through an online survey, we collected a new
dataset containing the private information of 750 Spotify users and we downloaded around
402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant
correlations between users’ music preferences (e.g., music genre) and private information (e.g.,
age, gender, economic status).
As a consequence of significant correlations, we built several machine-learning models to
infer private information and our results demonstrated that such inference is possible, posing
a real privacy threat to all music listeners. In particular, we accurately predicted the gender
(71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8%
f1-score) or smokes (60.2% f1-score) regularly.
The purpose of this project is to raise awareness about how seemingly purposeless data can
reveal personal information and educate users about how to better protect their privac
Characteristics and patterns of retention in hypertension care in primary care settings from the Hypertension Treatment in Nigeria Program
Background: More than 1.2 billion adults worldwide have hypertension. High retention in clinical care is essential for long-term management of hypertension, but 1-year retention rates are less than 50% in many resource-limited settings.
Objective: To evaluate short-term retention rates and associated factors among patients with hypertension in primary health care centers in the Federal Capital Territory of Nigeria.
Design, Setting, and Participants: In this cohort study, data were collected by trained study staff from adults aged 18 years or older at 60 public, primary health care centers in Nigeria between January 2020 and July 2021 as part of the Hypertension Treatment in Nigeria (HTN) Program. Patients with hypertension were registered.
Exposures: Follow-up visit for hypertension care within 37 days of the registration visit.
Main Outcomes and Measures: The main outcome was the 3-month rolling average 37-day retention rate in hypertension care, calculated by dividing the number of patients who had a follow-up visit within 37 days of their first (ie, registration) visit in the program by the total number of registered patients with hypertension during multiple consecutive 3-month periods. Interrupted time series analyses evaluated trends in retention rates before and after the intervention phase of the HTN Program. Mixed-effects, multivariable regression models evaluated associations between patient-, site-, and area council-level factors, hypertension treatment and control status, and 37-day retention rate.
Results: In total, 10 686 patients (68.3% female; mean [SD] age, 48.8 [12.7] years) were included in the analysis. During the study period, the 3-month rolling average 37-day retention rate was 41% (95% CI, 37%-46%), with wide variability among sites. The retention rate was higher among patients who were older (adjusted odds ratio [aOR], 1.01 per year; 95% CI, 1.01-1.02 per year), were female (aOR, 1.11; 95% CI, 1.01-1.23), had a higher body mass index (aOR, 1.01; 95% CI, 1.00-1.02), were in the Kuje vs the Abaji area council (aOR, 2.25; 95% CI, 1.25-4.04), received hypertension treatment at the registration visit (aOR, 1.27; 95% CI, 1.07-1.50), and were registered during the postintervention period (aOR, 1.16; 95% CI, 1.06-1.26).
Conclusions and Relevance: The findings suggest that retention in hypertension care is suboptimal in primary health care centers in Nigeria, although large variability among sites was found. Potentially modifiable and nonmodifiable factors associated with retention were identified and may inform multilevel, contextualized implementation strategies to improve retention
Personalized Federated Learning with Hidden Information on Personalized Prior
Federated learning (FL for simplification) is a distributed machine learning
technique that utilizes global servers and collaborative clients to achieve
privacy-preserving global model training without direct data sharing. However,
heterogeneous data problem, as one of FL's main problems, makes it difficult
for the global model to perform effectively on each client's local data. Thus,
personalized federated learning (PFL for simplification) aims to improve the
performance of the model on local data as much as possible. Bayesian learning,
where the parameters of the model are seen as random variables with a prior
assumption, is a feasible solution to the heterogeneous data problem due to the
tendency that the more local data the model use, the more it focuses on the
local data, otherwise focuses on the prior. When Bayesian learning is applied
to PFL, the global model provides global knowledge as a prior to the local
training process. In this paper, we employ Bayesian learning to model PFL by
assuming a prior in the scaled exponential family, and therefore propose
pFedBreD, a framework to solve the problem we model using Bregman divergence
regularization. Empirically, our experiments show that, under the prior
assumption of the spherical Gaussian and the first order strategy of mean
selection, our proposal significantly outcompetes other PFL algorithms on
multiple public benchmarks.Comment: 19 pages, 6 figures, 3 table
DBS: Dynamic Batch Size For Distributed Deep Neural Network Training
Synchronous strategies with data parallelism, such as the Synchronous
StochasticGradient Descent (S-SGD) and the model averaging methods, are widely
utilizedin distributed training of Deep Neural Networks (DNNs), largely owing
to itseasy implementation yet promising performance. Particularly, each worker
ofthe cluster hosts a copy of the DNN and an evenly divided share of the
datasetwith the fixed mini-batch size, to keep the training of DNNs
convergence. In thestrategies, the workers with different computational
capability, need to wait foreach other because of the synchronization and
delays in network transmission,which will inevitably result in the
high-performance workers wasting computation.Consequently, the utilization of
the cluster is relatively low. To alleviate thisissue, we propose the Dynamic
Batch Size (DBS) strategy for the distributedtraining of DNNs. Specifically,
the performance of each worker is evaluatedfirst based on the fact in the
previous epoch, and then the batch size and datasetpartition are dynamically
adjusted in consideration of the current performanceof the worker, thereby
improving the utilization of the cluster. To verify theeffectiveness of the
proposed strategy, extensive experiments have been conducted,and the
experimental results indicate that the proposed strategy can fully utilizethe
performance of the cluster, reduce the training time, and have good
robustnesswith disturbance by irrelevant tasks. Furthermore, rigorous
theoretical analysis hasalso been provided to prove the convergence of the
proposed strategy.Comment: The latest version of this article has been accepted by IEEE TETC
A two-stage framework for short-term wind power forecasting using different feature-learning models
With the growing dependence on wind power generation, improving the accuracy
of short-term forecasting has become increasingly important for ensuring
continued economical and reliable system operations. In the wind power
forecasting field, ensemble-based forecasting models have been studied
extensively; however, few of them considered learning the features from both
historical wind data and NWP data. In addition, the exploration of the
multiple-input and multiple-output learning structures is lacking in the wind
power forecasting literature. Therefore, this study exploits the NWP and
historical wind data as input and proposes a two-stage forecasting framework on
the shelf of moving window algorithm. Specifically, at the first stage, four
forecasting models are constructed with deep neural networks considering the
multiple-input and multiple-output structures; at the second stage, an ensemble
model is developed using ridge regression method for reducing the extrapolation
error. The experiments are conducted on three existing wind farms for examining
the 2-h ahead forecasting point. The results demonstrate that 1) the
single-input-multiple-output (SIMO) structure leads to a better forecasting
accuracy than the other threes; 2) ridge regression method results in a better
ensemble model that is able to further improve the forecasting accuracy, than
the other machine learning methods; 3) the proposed two-stage forecasting
framework is likely to generate more accurate and stable results than the other
existing algorithms
Eight-Year Surveillance of Antimicrobial Resistance among Enterobacter Cloacae Isolated in the First Bethune Hospital
AbstractThis study was to investigate the antimicrobial resistance of Enterobacter cloacae isolated in 8 consecutive years in the First Bethune Hospital. Disk diffusion test was used to study the antimicrobial resistance. The data were analyzed by WHONET 5 software according to Clinical and Laboratory Standards Institute (CLSI). Most of 683 strains of Enterobacter cloacae were collected from sputum 410 (60.0%), secretions and pus 105 (15.4%), urine 69 (10.1%) during the past 8 years. No Enterobacter cloacae was resistant to imipenem and meropenem in the First Bethune Hospital. The antimicrobial resistance of Enterobacter cloacae had increased in recent 8 years. The change of the antimicrobial resistance should be investigated in order to direct rational drug usage in the clinic and prevent bacterial strain of drug resistance from b eing transmitted
- …