8,012 research outputs found

    IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective

    Full text link
    With the wide spread of sensors and smart devices in recent years, the data generation speed of the Internet of Things (IoT) systems has increased dramatically. In IoT systems, massive volumes of data must be processed, transformed, and analyzed on a frequent basis to enable various IoT services and functionalities. Machine Learning (ML) approaches have shown their capacity for IoT data analytics. However, applying ML models to IoT data analytics tasks still faces many difficulties and challenges, specifically, effective model selection, design/tuning, and updating, which have brought massive demand for experienced data scientists. Additionally, the dynamic nature of IoT data may introduce concept drift issues, causing model performance degradation. To reduce human efforts, Automated Machine Learning (AutoML) has become a popular field that aims to automatically select, construct, tune, and update machine learning models to achieve the best performance on specified tasks. In this paper, we conduct a review of existing methods in the model selection, tuning, and updating procedures in the area of AutoML in order to identify and summarize the optimal solutions for every step of applying ML algorithms to IoT data analytics. To justify our findings and help industrial users and researchers better implement AutoML approaches, a case study of applying AutoML to IoT anomaly detection problems is conducted in this work. Lastly, we discuss and classify the challenges and research directions for this domain.Comment: Published in Engineering Applications of Artificial Intelligence (Elsevier, IF:7.8); Code/An AutoML tutorial is available at Github link: https://github.com/Western-OC2-Lab/AutoML-Implementation-for-Static-and-Dynamic-Data-Analytic

    Streaming Feature Grouping and Selection (Sfgs) For Big Data Classification

    Get PDF
    Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes massive and diverse and forms what is known as a big data challenge. In machine learning, streaming feature selection has always been a preferred method in the preprocessing of streaming data. Recently, feature grouping, which can measure the hidden information between selected features, has begun gaining attention. This dissertation’s main contribution is in solving the issue of the extremely high dimensionality of streaming big data by delivering a streaming feature grouping and selection algorithm. Also, the literature review presents a comprehensive review of the current streaming feature selection approaches and highlights the state-of-the-art algorithms trending in this area. The proposed algorithm is designed with the idea of grouping together similar features to reduce redundancy and handle the stream of features in an online fashion. This algorithm has been implemented and evaluated using benchmark datasets against state-of-the-art streaming feature selection algorithms and feature grouping techniques. The results showed better performance regarding prediction accuracy than with state-of-the-art algorithms

    User Privacy on Spotify: Predicting Personal Data from Music Preferences

    Get PDF
    openThe way we listen to music has changed drastically in the past decade. Now we can play any kind of music from various artists around the world through our smart devices. Many music streaming providers, if not most, are built with systems to track users’ music preferences and suggest new content. The music we listen to reveals a great deal about who we are. In general, people share their playlists and songs of their favorite artists on the music platform; find people with common music genres and connect with them. It is not always easy to make friends with unknown people, but music is a good way to accomplish that. In spite of that, we must also look at other sides of the coin from a security perspective. Is it a good idea to share music interests with others or will it compromise our privacy? According to privacy experts and developers, there is no purposeless data. Everything can be used to infer private information, even a single like on social media, which seems, at first sight, meaningless, but it can reveal more information than it promises. In the case that our musical tastes reveal our information, we may be profiled for targeted advertisement, by surveillance agencies, or in general, become potential victims of malicious activities Since music is part of our daily lives, and there are many providers that let us listen to music, we are even more at risk of being profiled and having our data sold. In this research, we demonstrate the feasibility of inferring personal data based on playlists and songs people publicly shared on Spotify. Through an online survey, we collected a new dataset containing the private information of 750 Spotify users and we downloaded around 402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant correlations between users’ music preferences (e.g., music genre) and private information (e.g., age, gender, economic status). As a consequence of significant correlations, we built several machine-learning models to infer private information and our results demonstrated that such inference is possible, posing a real privacy threat to all music listeners. In particular, we accurately predicted the gender (71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8% f1-score) or smokes (60.2% f1-score) regularly. The purpose of this project is to raise awareness about how seemingly purposeless data can reveal personal information and educate users about how to better protect their privacy.The way we listen to music has changed drastically in the past decade. Now we can play any kind of music from various artists around the world through our smart devices. Many music streaming providers, if not most, are built with systems to track users’ music preferences and suggest new content. The music we listen to reveals a great deal about who we are. In general, people share their playlists and songs of their favorite artists on the music platform; find people with common music genres and connect with them. It is not always easy to make friends with unknown people, but music is a good way to accomplish that. In spite of that, we must also look at other sides of the coin from a security perspective. Is it a good idea to share music interests with others or will it compromise our privacy? According to privacy experts and developers, there is no purposeless data. Everything can be used to infer private information, even a single like on social media, which seems, at first sight, meaningless, but it can reveal more information than it promises. In the case that our musical tastes reveal our information, we may be profiled for targeted advertisement, by surveillance agencies, or in general, become potential victims of malicious activities Since music is part of our daily lives, and there are many providers that let us listen to music, we are even more at risk of being profiled and having our data sold. In this research, we demonstrate the feasibility of inferring personal data based on playlists and songs people publicly shared on Spotify. Through an online survey, we collected a new dataset containing the private information of 750 Spotify users and we downloaded around 402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant correlations between users’ music preferences (e.g., music genre) and private information (e.g., age, gender, economic status). As a consequence of significant correlations, we built several machine-learning models to infer private information and our results demonstrated that such inference is possible, posing a real privacy threat to all music listeners. In particular, we accurately predicted the gender (71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8% f1-score) or smokes (60.2% f1-score) regularly. The purpose of this project is to raise awareness about how seemingly purposeless data can reveal personal information and educate users about how to better protect their privac
    • …
    corecore