15,383 research outputs found
Recommended from our members
Indexing Multivariate Mobile Data through Spatio-Temporal Event Detection and Clustering.
Mobile and wearable devices are capable of quantifying user behaviors based on their contextual sensor data. However, few indexing and annotation mechanisms are available, due to difficulties inherent in raw multivariate data types and the relative sparsity of sensor data. These issues have slowed the development of higher level human-centric searching and querying mechanisms. Here, we propose a pipeline of three algorithms. First, we introduce a spatio-temporal event detection algorithm. Then, we introduce a clustering algorithm based on mobile contextual data. Our spatio-temporal clustering approach can be used as an annotation on raw sensor data. It improves information retrieval by reducing the search space and is based on searching only the related clusters. To further improve behavior quantification, the third algorithm identifies contrasting events withina cluster content. Two large real-world smartphone datasets have been used to evaluate our algorithms and demonstrate the utility and resource efficiency of our approach to search
Semantic Enrichment of Mobile Phone Data Records Using Background Knowledge
Every day, billions of mobile network events (i.e. CDRs) are generated by
cellular phone operator companies. Latent in this data are inspiring insights
about human actions and behaviors, the discovery of which is important because
context-aware applications and services hold the key to user-driven,
intelligent services, which can enhance our everyday lives such as social and
economic development, urban planning, and health prevention. The major
challenge in this area is that interpreting such a big stream of data requires
a deep understanding of mobile network events' context through available
background knowledge. This article addresses the issues in context awareness
given heterogeneous and uncertain data of mobile network events missing
reliable information on the context of this activity. The contribution of this
research is a model from a combination of logical and statistical reasoning
standpoints for enabling human activity inference in qualitative terms from
open geographical data that aimed at improving the quality of human behaviors
recognition tasks from CDRs. We use open geographical data, Openstreetmap
(OSM), as a proxy for predicting the content of human activity in the area. The
user study performed in Trento shows that predicted human activities (top
level) match the survey data with around 93% overall accuracy. The extensive
validation for predicting a more specific economic type of human activity
performed in Barcelona, by employing credit card transaction data. The analysis
identifies that appropriately normalized data on points of interest (POI) is a
good proxy for predicting human economical activities, with 84% accuracy on
average. So the model is proven to be efficient for predicting the context of
human activity, when its total level could be efficiently observed from cell
phone data records, missing contextual information however.Comment: 40 pages, 34 figure
Full-scale Cascade Dynamics Prediction with a Local-First Approach
Information cascades are ubiquitous in various social networking web sites.
What mechanisms drive information diffuse in the networks? How does the
structure and size of the cascades evolve in time? When and which users will
adopt a certain message? Approaching these questions can considerably deepen
our understanding about information cascades and facilitate various vital
applications, including viral marketing, rumor prevention and even link
prediction. Most previous works focus only on the final cascade size
prediction. Meanwhile, they are always cascade graph dependent methods, which
make them towards large cascades prediction and lead to the criticism that
cascades may only be predictable after they have already grown large. In this
paper, we study a fundamental problem: full-scale cascade dynamics prediction.
That is, how to predict when and which users are activated at any time point of
a cascading process. Here we propose a unified framework, FScaleCP, to solve
the problem. Given history cascades, we first model the local spreading
behaviors as a classification problem. Through data-driven learning, we
recognize the common patterns by measuring the driving mechanisms of cascade
dynamics. After that we present an intuitive asynchronous propagation method
for full-scale cascade dynamics prediction by effectively aggregating the local
spreading behaviors. Extensive experiments on social network data set suggest
that the proposed method performs noticeably better than other state-of-the-art
baselines
Time-aware Analysis and Ranking of Lurkers in Social Networks
Mining the silent members of an online community, also called lurkers, has
been recognized as an important problem that accompanies the extensive use of
online social networks (OSNs). Existing solutions to the ranking of lurkers can
aid understanding the lurking behaviors in an OSN. However, they are limited to
use only structural properties of the static network graph, thus ignoring any
relevant information concerning the time dimension. Our goal in this work is to
push forward research in lurker mining in a twofold manner: (i) to provide an
in-depth analysis of temporal aspects that aims to unveil the behavior of
lurkers and their relations with other users, and (ii) to enhance existing
methods for ranking lurkers by integrating different time-aware properties
concerning information-production and information-consumption actions. Network
analysis and ranking evaluation performed on Flickr, FriendFeed and Instagram
networks allowed us to draw interesting remarks on both the understanding of
lurking dynamics and on transient and cumulative scenarios of time-aware
ranking.Comment: 23 pages, 9 figures, 7 table
Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey
Future buildings will offer new convenience, comfort, and efficiency
possibilities to their residents. Changes will occur to the way people live as
technology involves into people's lives and information processing is fully
integrated into their daily living activities and objects. The future
expectation of smart buildings includes making the residents' experience as
easy and comfortable as possible. The massive streaming data generated and
captured by smart building appliances and devices contains valuable information
that needs to be mined to facilitate timely actions and better decision making.
Machine learning and big data analytics will undoubtedly play a critical role
to enable the delivery of such smart services. In this paper, we survey the
area of smart building with a special focus on the role of techniques from
machine learning and big data analytics. This survey also reviews the current
trends and challenges faced in the development of smart building services
The Survey of Data Mining Applications And Feature Scope
In this paper we have focused a variety of techniques, approaches and
different areas of the research which are helpful and marked as the important
field of data mining Technologies. As we are aware that many Multinational
companies and large organizations are operated in different places of the
different countries.Each place of operation may generate large volumes of data.
Corporate decision makers require access from all such sources and take
strategic decisions.The data warehouse is used in the significant business
value by improving the effectiveness of managerial decision-making. In an
uncertain and highly competitive business environment, the value of strategic
information systems such as these are easily recognized however in todays
business environment,efficiency or speed is not the only key for
competitiveness.This type of huge amount of data are available in the form of
tera-topeta-bytes which has drastically changed in the areas of science and
engineering.To analyze,manage and make a decision of such type of huge amount
of data we need techniques called the data mining which will transforming in
many fields.This paper imparts more number of applications of the data mining
and also focuses scope of the data mining which will helpful in the further
research.Comment: International Journal of Computer Science, Engineering and
Information Technology (IJCSEIT), Vol.2, No.3, June 2012, 16 pages, 1 tabl
Game Data Mining Competition on Churn Prediction and Survival Analysis using Commercial Game Log Data
Game companies avoid sharing their game data with external researchers. Only
a few research groups have been granted limited access to game data so far. The
reluctance of these companies to make data publicly available limits the wide
use and development of data mining techniques and artificial intelligence
research specific to the game industry. In this work, we developed and
implemented an international competition on game data mining using commercial
game log data from one of the major game companies in South Korea: NCSOFT. Our
approach enabled researchers to develop and apply state-of-the-art data mining
techniques to game log data by making the data open. For the competition, data
were collected from Blade & Soul, an action role-playing game, from NCSOFT. The
data comprised approximately 100 GB of game logs from 10,000 players. The main
aim of the competition was to predict whether a player would churn and when the
player would churn during two periods between which the business model was
changed to a free-to-play model from a monthly subscription. The results of the
competition revealed that highly ranked competitors used deep learning, tree
boosting, and linear regression.Comment: IEEE Transactions on Game
Social Status and Communication Behavior in an Evolving Social Network
The degree to which individuals can exert influence on propagation of
information and opinion dynamics in online communities is highly dependent on
their social status. Therefore, there is a high demand for identifying
influential users in a community by predicting their social position in that
community. Moreover, understanding how people with various social status
behave, can shed light on the dynamics of interaction in social networks. In
this paper, I study an evolving online social network originated from an online
community for university students and I tackle the problem of forecasting
users' social status, represented as their PageRank, based on frequency of
recurring temporal sequences of observed behavior, i.e. behavioral motifs. I
show that individuals with different values of PageRank exhibit different
behavior even in early weeks since the online community's inception and it is
possible to forecast future PageRank values given frequency of behavioral
motifs with high accuracy
Tensor Embedding: A Supervised Framework for Human Behavioral Data Mining and Prediction
Today's densely instrumented world offers tremendous opportunities for
continuous acquisition and analysis of multimodal sensor data providing
temporal characterization of an individual's behaviors. Is it possible to
efficiently couple such rich sensor data with predictive modeling techniques to
provide contextual, and insightful assessments of individual performance and
wellbeing? Prediction of different aspects of human behavior from these noisy,
incomplete, and heterogeneous bio-behavioral temporal data is a challenging
problem, beyond unsupervised discovery of latent structures. We propose a
Supervised Tensor Embedding (STE) algorithm for high dimension multimodal data
with join decomposition of input and target variable. Furthermore, we show that
features selection will help to reduce the contamination in the prediction and
increase the performance. The efficiently of the methods was tested via two
different real world datasets
Privacy in Social Media: Identification, Mitigation and Applications
The increasing popularity of social media has attracted a huge number of
people to participate in numerous activities on a daily basis. This results in
tremendous amounts of rich user-generated data. This data provides
opportunities for researchers and service providers to study and better
understand users' behaviors and further improve the quality of the personalized
services. Publishing user-generated data risks exposing individuals' privacy.
Users privacy in social media is an emerging task and has attracted increasing
attention in recent years. These works study privacy issues in social media
from the two different points of views: identification of vulnerabilities, and
mitigation of privacy risks. Recent research has shown the vulnerability of
user-generated data against the two general types of attacks, identity
disclosure and attribute disclosure. These privacy issues mandate social media
data publishers to protect users' privacy by sanitizing user-generated data
before publishing it. Consequently, various protection techniques have been
proposed to anonymize user-generated social media data. There is a vast
literature on privacy of users in social media from many perspectives. In this
survey, we review the key achievements of user privacy in social media. In
particular, we review and compare the state-of-the-art algorithms in terms of
the privacy leakage attacks and anonymization algorithms. We overview the
privacy risks from different aspects of social media and categorize the
relevant works into five groups 1) graph data anonymization and
de-anonymization, 2) author identification, 3) profile attribute disclosure, 4)
user location and privacy, and 5) recommender systems and privacy issues. We
also discuss open problems and future research directions for user privacy
issues in social media.Comment: This survey is currently under revie
- …