Search CORE

15,383 research outputs found

Recommended from our members

Indexing Multivariate Mobile Data through Spatio-Temporal Event Detection and Clustering.

Author: Akbari Mohammad
Dobbins Chelsea
Pazzani Michael
Rawassizadeh Reza
Publication venue: eScholarship, University of California
Publication date: 22/01/2019
Field of study

Mobile and wearable devices are capable of quantifying user behaviors based on their contextual sensor data. However, few indexing and annotation mechanisms are available, due to difficulties inherent in raw multivariate data types and the relative sparsity of sensor data. These issues have slowed the development of higher level human-centric searching and querying mechanisms. Here, we propose a pipeline of three algorithms. First, we introduce a spatio-temporal event detection algorithm. Then, we introduce a clustering algorithm based on mobile contextual data. Our spatio-temporal clustering approach can be used as an annotation on raw sensor data. It improves information retrieval by reducing the search space and is based on searching only the related clusters. To further improve behavior quantification, the third algorithm identifies contrasting events withina cluster content. Two large real-world smartphone datasets have been used to evaluate our algorithms and demonstrate the utility and resource efficiency of our approach to search

eScholarship - University of California

Semantic Enrichment of Mobile Phone Data Records Using Background Knowledge

Author: Antonelli Fabrizio
Dashdorj Zolzaya
Ratti Carlo
Serafini Luciano
Sobolevsky Stanislav
Publication venue: 'Elsevier BV'
Publication date: 22/04/2015
Field of study

Every day, billions of mobile network events (i.e. CDRs) are generated by cellular phone operator companies. Latent in this data are inspiring insights about human actions and behaviors, the discovery of which is important because context-aware applications and services hold the key to user-driven, intelligent services, which can enhance our everyday lives such as social and economic development, urban planning, and health prevention. The major challenge in this area is that interpreting such a big stream of data requires a deep understanding of mobile network events' context through available background knowledge. This article addresses the issues in context awareness given heterogeneous and uncertain data of mobile network events missing reliable information on the context of this activity. The contribution of this research is a model from a combination of logical and statistical reasoning standpoints for enabling human activity inference in qualitative terms from open geographical data that aimed at improving the quality of human behaviors recognition tasks from CDRs. We use open geographical data, Openstreetmap (OSM), as a proxy for predicting the content of human activity in the area. The user study performed in Trento shows that predicted human activities (top level) match the survey data with around 93% overall accuracy. The extensive validation for predicting a more specific economic type of human activity performed in Barcelona, by employing credit card transaction data. The analysis identifies that appropriately normalized data on points of interest (POI) is a good proxy for predicting human economical activities, with 84% accuracy on average. So the model is proven to be efficient for predicting the context of human activity, when its total level could be efficiently observed from cell phone data records, missing contextual information however.Comment: 40 pages, 34 figure

arXiv.org e-Print Archive

Full-scale Cascade Dynamics Prediction with a Local-First Approach

Author: Chen Leiting
Guo Yuxiao
Wu Tao
Xian Xingping
Publication venue
Publication date: 28/12/2015
Field of study

Information cascades are ubiquitous in various social networking web sites. What mechanisms drive information diffuse in the networks? How does the structure and size of the cascades evolve in time? When and which users will adopt a certain message? Approaching these questions can considerably deepen our understanding about information cascades and facilitate various vital applications, including viral marketing, rumor prevention and even link prediction. Most previous works focus only on the final cascade size prediction. Meanwhile, they are always cascade graph dependent methods, which make them towards large cascades prediction and lead to the criticism that cascades may only be predictable after they have already grown large. In this paper, we study a fundamental problem: full-scale cascade dynamics prediction. That is, how to predict when and which users are activated at any time point of a cascading process. Here we propose a unified framework, FScaleCP, to solve the problem. Given history cascades, we first model the local spreading behaviors as a classification problem. Through data-driven learning, we recognize the common patterns by measuring the driving mechanisms of cascade dynamics. After that we present an intuitive asynchronous propagation method for full-scale cascade dynamics prediction by effectively aggregating the local spreading behaviors. Extensive experiments on social network data set suggest that the proposed method performs noticeably better than other state-of-the-art baselines

arXiv.org e-Print Archive

Time-aware Analysis and Ranking of Lurkers in Social Networks

Author: Interdonato Roberto
Tagarelli Andrea
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2015
Field of study

Mining the silent members of an online community, also called lurkers, has been recognized as an important problem that accompanies the extensive use of online social networks (OSNs). Existing solutions to the ranking of lurkers can aid understanding the lurking behaviors in an OSN. However, they are limited to use only structural properties of the static network graph, thus ignoring any relevant information concerning the time dimension. Our goal in this work is to push forward research in lurker mining in a twofold manner: (i) to provide an in-depth analysis of temporal aspects that aims to unveil the behavior of lurkers and their relations with other users, and (ii) to enhance existing methods for ranking lurkers by integrating different time-aware properties concerning information-production and information-consumption actions. Network analysis and ranking evaluation performed on Flickr, FriendFeed and Instagram networks allowed us to draw interesting remarks on both the understanding of lurking dynamics and on transient and cumulative scenarios of time-aware ranking.Comment: 23 pages, 9 figures, 7 table

arXiv.org e-Print Archive

Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey

Author: Al-Fuqaha Ala
Alwajidi Safaa
Benhaddou Driss
Fong Alvis C.
Gupta Ajay
Qadir Junaid
Qolomany Basheer
Publication venue
Publication date: 19/05/2019
Field of study

Future buildings will offer new convenience, comfort, and efficiency possibilities to their residents. Changes will occur to the way people live as technology involves into people's lives and information processing is fully integrated into their daily living activities and objects. The future expectation of smart buildings includes making the residents' experience as easy and comfortable as possible. The massive streaming data generated and captured by smart building appliances and devices contains valuable information that needs to be mined to facilitate timely actions and better decision making. Machine learning and big data analytics will undoubtedly play a critical role to enable the delivery of such smart services. In this paper, we survey the area of smart building with a special focus on the role of techniques from machine learning and big data analytics. This survey also reviews the current trends and challenges faced in the development of smart building services

arXiv.org e-Print Archive

The Survey of Data Mining Applications And Feature Scope

Author: Mishra Dr. Pragnyaban
Padhy Neelamadhab
Panigrahi Rasmita
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 24/11/2012
Field of study

In this paper we have focused a variety of techniques, approaches and different areas of the research which are helpful and marked as the important field of data mining Technologies. As we are aware that many Multinational companies and large organizations are operated in different places of the different countries.Each place of operation may generate large volumes of data. Corporate decision makers require access from all such sources and take strategic decisions.The data warehouse is used in the significant business value by improving the effectiveness of managerial decision-making. In an uncertain and highly competitive business environment, the value of strategic information systems such as these are easily recognized however in todays business environment,efficiency or speed is not the only key for competitiveness.This type of huge amount of data are available in the form of tera-topeta-bytes which has drastically changed in the areas of science and engineering.To analyze,manage and make a decision of such type of huge amount of data we need techniques called the data mining which will transforming in many fields.This paper imparts more number of applications of the data mining and also focuses scope of the data mining which will helpful in the further research.Comment: International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.3, June 2012, 16 pages, 1 tabl

arXiv.org e-Print Archive

Game Data Mining Competition on Churn Prediction and Survival Analysis using Commercial Game Log Data

Author: Bertens Paul
Chen Pei Pei
Guitart Anna
Hadiji Fabian
Hwang Inchon
Jang Yoonjae
Jeon JiHoon
Joo Youngjun
Kim Dae-Wook
Kim Kyung-Joong
Lee EunJo
Lee Jiyeon
Lee Sang-Kwang
Müller Marc
Periáñez África
Yang Seong-il
Yoon DuMim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/12/2018
Field of study

Game companies avoid sharing their game data with external researchers. Only a few research groups have been granted limited access to game data so far. The reluctance of these companies to make data publicly available limits the wide use and development of data mining techniques and artificial intelligence research specific to the game industry. In this work, we developed and implemented an international competition on game data mining using commercial game log data from one of the major game companies in South Korea: NCSOFT. Our approach enabled researchers to develop and apply state-of-the-art data mining techniques to game log data by making the data open. For the competition, data were collected from Blade & Soul, an action role-playing game, from NCSOFT. The data comprised approximately 100 GB of game logs from 10,000 players. The main aim of the competition was to predict whether a player would churn and when the player would churn during two periods between which the business model was changed to a free-to-play model from a monthly subscription. The results of the competition revealed that highly ranked competitors used deep learning, tree boosting, and linear regression.Comment: IEEE Transactions on Game

arXiv.org e-Print Archive

Social Status and Communication Behavior in an Evolving Social Network

Author: Akbari Sahand
Publication venue
Publication date: 23/10/2018
Field of study

The degree to which individuals can exert influence on propagation of information and opinion dynamics in online communities is highly dependent on their social status. Therefore, there is a high demand for identifying influential users in a community by predicting their social position in that community. Moreover, understanding how people with various social status behave, can shed light on the dynamics of interaction in social networks. In this paper, I study an evolving online social network originated from an online community for university students and I tackle the problem of forecasting users' social status, represented as their PageRank, based on frequency of recurring temporal sequences of observed behavior, i.e. behavioral motifs. I show that individuals with different values of PageRank exhibit different behavior even in early weeks since the online community's inception and it is possible to forecast future PageRank values given frequency of behavioral motifs with high accuracy

arXiv.org e-Print Archive

Tensor Embedding: A Supervised Framework for Human Behavioral Data Mining and Prediction

Author: Ferrara Emilio
Ghasemian Amir
Hosseinmardi Homa
Lerman Kristina
Narayanan Shrikanth
Publication venue
Publication date: 31/08/2018
Field of study

Today's densely instrumented world offers tremendous opportunities for continuous acquisition and analysis of multimodal sensor data providing temporal characterization of an individual's behaviors. Is it possible to efficiently couple such rich sensor data with predictive modeling techniques to provide contextual, and insightful assessments of individual performance and wellbeing? Prediction of different aspects of human behavior from these noisy, incomplete, and heterogeneous bio-behavioral temporal data is a challenging problem, beyond unsupervised discovery of latent structures. We propose a Supervised Tensor Embedding (STE) algorithm for high dimension multimodal data with join decomposition of input and target variable. Furthermore, we show that features selection will help to reduce the contamination in the prediction and increase the performance. The efficiently of the methods was tested via two different real world datasets

arXiv.org e-Print Archive

Privacy in Social Media: Identification, Mitigation and Applications

Author: Beigi Ghazaleh
Liu Huan
Publication venue
Publication date: 06/08/2018
Field of study

The increasing popularity of social media has attracted a huge number of people to participate in numerous activities on a daily basis. This results in tremendous amounts of rich user-generated data. This data provides opportunities for researchers and service providers to study and better understand users' behaviors and further improve the quality of the personalized services. Publishing user-generated data risks exposing individuals' privacy. Users privacy in social media is an emerging task and has attracted increasing attention in recent years. These works study privacy issues in social media from the two different points of views: identification of vulnerabilities, and mitigation of privacy risks. Recent research has shown the vulnerability of user-generated data against the two general types of attacks, identity disclosure and attribute disclosure. These privacy issues mandate social media data publishers to protect users' privacy by sanitizing user-generated data before publishing it. Consequently, various protection techniques have been proposed to anonymize user-generated social media data. There is a vast literature on privacy of users in social media from many perspectives. In this survey, we review the key achievements of user privacy in social media. In particular, we review and compare the state-of-the-art algorithms in terms of the privacy leakage attacks and anonymization algorithms. We overview the privacy risks from different aspects of social media and categorize the relevant works into five groups 1) graph data anonymization and de-anonymization, 2) author identification, 3) profile attribute disclosure, 4) user location and privacy, and 5) recommender systems and privacy issues. We also discuss open problems and future research directions for user privacy issues in social media.Comment: This survey is currently under revie

arXiv.org e-Print Archive