8,012 research outputs found
IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective
With the wide spread of sensors and smart devices in recent years, the data
generation speed of the Internet of Things (IoT) systems has increased
dramatically. In IoT systems, massive volumes of data must be processed,
transformed, and analyzed on a frequent basis to enable various IoT services
and functionalities. Machine Learning (ML) approaches have shown their capacity
for IoT data analytics. However, applying ML models to IoT data analytics tasks
still faces many difficulties and challenges, specifically, effective model
selection, design/tuning, and updating, which have brought massive demand for
experienced data scientists. Additionally, the dynamic nature of IoT data may
introduce concept drift issues, causing model performance degradation. To
reduce human efforts, Automated Machine Learning (AutoML) has become a popular
field that aims to automatically select, construct, tune, and update machine
learning models to achieve the best performance on specified tasks. In this
paper, we conduct a review of existing methods in the model selection, tuning,
and updating procedures in the area of AutoML in order to identify and
summarize the optimal solutions for every step of applying ML algorithms to IoT
data analytics. To justify our findings and help industrial users and
researchers better implement AutoML approaches, a case study of applying AutoML
to IoT anomaly detection problems is conducted in this work. Lastly, we discuss
and classify the challenges and research directions for this domain.Comment: Published in Engineering Applications of Artificial Intelligence
(Elsevier, IF:7.8); Code/An AutoML tutorial is available at Github link:
https://github.com/Western-OC2-Lab/AutoML-Implementation-for-Static-and-Dynamic-Data-Analytic
Streaming Feature Grouping and Selection (Sfgs) For Big Data Classification
Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes massive and diverse and forms what is known as a big data challenge. In machine learning, streaming feature selection has always been a preferred method in the preprocessing of streaming data. Recently, feature grouping, which can measure the hidden information between selected features, has begun gaining attention. This dissertation’s main contribution is in solving the issue of the extremely high dimensionality of streaming big data by delivering a streaming feature grouping and selection algorithm. Also, the literature review presents a comprehensive review of the current streaming feature selection approaches and highlights the state-of-the-art algorithms trending in this area. The proposed algorithm is designed with the idea of grouping together similar features to reduce redundancy and handle the stream of features in an online fashion. This algorithm has been implemented and evaluated using benchmark datasets against state-of-the-art streaming feature selection algorithms and feature grouping techniques. The results showed better performance regarding prediction accuracy than with state-of-the-art algorithms
User Privacy on Spotify: Predicting Personal Data from Music Preferences
openThe way we listen to music has changed drastically in the past decade. Now we can play any
kind of music from various artists around the world through our smart devices. Many music
streaming providers, if not most, are built with systems to track users’ music preferences and
suggest new content.
The music we listen to reveals a great deal about who we are. In general, people share their
playlists and songs of their favorite artists on the music platform; find people with common
music genres and connect with them. It is not always easy to make friends with unknown
people, but music is a good way to accomplish that. In spite of that, we must also look at other
sides of the coin from a security perspective. Is it a good idea to share music interests with
others or will it compromise our privacy? According to privacy experts and developers, there
is no purposeless data. Everything can be used to infer private information, even a single like
on social media, which seems, at first sight, meaningless, but it can reveal more information
than it promises. In the case that our musical tastes reveal our information, we may be profiled
for targeted advertisement, by surveillance agencies, or in general, become potential victims of
malicious activities Since music is part of our daily lives, and there are many providers that let
us listen to music, we are even more at risk of being profiled and having our data sold.
In this research, we demonstrate the feasibility of inferring personal data based on playlists
and songs people publicly shared on Spotify. Through an online survey, we collected a new
dataset containing the private information of 750 Spotify users and we downloaded around
402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant
correlations between users’ music preferences (e.g., music genre) and private information (e.g.,
age, gender, economic status).
As a consequence of significant correlations, we built several machine-learning models to
infer private information and our results demonstrated that such inference is possible, posing
a real privacy threat to all music listeners. In particular, we accurately predicted the gender
(71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8%
f1-score) or smokes (60.2% f1-score) regularly.
The purpose of this project is to raise awareness about how seemingly purposeless data can
reveal personal information and educate users about how to better protect their privacy.The way we listen to music has changed drastically in the past decade. Now we can play any
kind of music from various artists around the world through our smart devices. Many music
streaming providers, if not most, are built with systems to track users’ music preferences and
suggest new content.
The music we listen to reveals a great deal about who we are. In general, people share their
playlists and songs of their favorite artists on the music platform; find people with common
music genres and connect with them. It is not always easy to make friends with unknown
people, but music is a good way to accomplish that. In spite of that, we must also look at other
sides of the coin from a security perspective. Is it a good idea to share music interests with
others or will it compromise our privacy? According to privacy experts and developers, there
is no purposeless data. Everything can be used to infer private information, even a single like
on social media, which seems, at first sight, meaningless, but it can reveal more information
than it promises. In the case that our musical tastes reveal our information, we may be profiled
for targeted advertisement, by surveillance agencies, or in general, become potential victims of
malicious activities Since music is part of our daily lives, and there are many providers that let
us listen to music, we are even more at risk of being profiled and having our data sold.
In this research, we demonstrate the feasibility of inferring personal data based on playlists
and songs people publicly shared on Spotify. Through an online survey, we collected a new
dataset containing the private information of 750 Spotify users and we downloaded around
402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant
correlations between users’ music preferences (e.g., music genre) and private information (e.g.,
age, gender, economic status).
As a consequence of significant correlations, we built several machine-learning models to
infer private information and our results demonstrated that such inference is possible, posing
a real privacy threat to all music listeners. In particular, we accurately predicted the gender
(71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8%
f1-score) or smokes (60.2% f1-score) regularly.
The purpose of this project is to raise awareness about how seemingly purposeless data can
reveal personal information and educate users about how to better protect their privac
- …