15,383 research outputs found

    Semantic Enrichment of Mobile Phone Data Records Using Background Knowledge

    Full text link
    Every day, billions of mobile network events (i.e. CDRs) are generated by cellular phone operator companies. Latent in this data are inspiring insights about human actions and behaviors, the discovery of which is important because context-aware applications and services hold the key to user-driven, intelligent services, which can enhance our everyday lives such as social and economic development, urban planning, and health prevention. The major challenge in this area is that interpreting such a big stream of data requires a deep understanding of mobile network events' context through available background knowledge. This article addresses the issues in context awareness given heterogeneous and uncertain data of mobile network events missing reliable information on the context of this activity. The contribution of this research is a model from a combination of logical and statistical reasoning standpoints for enabling human activity inference in qualitative terms from open geographical data that aimed at improving the quality of human behaviors recognition tasks from CDRs. We use open geographical data, Openstreetmap (OSM), as a proxy for predicting the content of human activity in the area. The user study performed in Trento shows that predicted human activities (top level) match the survey data with around 93% overall accuracy. The extensive validation for predicting a more specific economic type of human activity performed in Barcelona, by employing credit card transaction data. The analysis identifies that appropriately normalized data on points of interest (POI) is a good proxy for predicting human economical activities, with 84% accuracy on average. So the model is proven to be efficient for predicting the context of human activity, when its total level could be efficiently observed from cell phone data records, missing contextual information however.Comment: 40 pages, 34 figure

    Full-scale Cascade Dynamics Prediction with a Local-First Approach

    Full text link
    Information cascades are ubiquitous in various social networking web sites. What mechanisms drive information diffuse in the networks? How does the structure and size of the cascades evolve in time? When and which users will adopt a certain message? Approaching these questions can considerably deepen our understanding about information cascades and facilitate various vital applications, including viral marketing, rumor prevention and even link prediction. Most previous works focus only on the final cascade size prediction. Meanwhile, they are always cascade graph dependent methods, which make them towards large cascades prediction and lead to the criticism that cascades may only be predictable after they have already grown large. In this paper, we study a fundamental problem: full-scale cascade dynamics prediction. That is, how to predict when and which users are activated at any time point of a cascading process. Here we propose a unified framework, FScaleCP, to solve the problem. Given history cascades, we first model the local spreading behaviors as a classification problem. Through data-driven learning, we recognize the common patterns by measuring the driving mechanisms of cascade dynamics. After that we present an intuitive asynchronous propagation method for full-scale cascade dynamics prediction by effectively aggregating the local spreading behaviors. Extensive experiments on social network data set suggest that the proposed method performs noticeably better than other state-of-the-art baselines

    Time-aware Analysis and Ranking of Lurkers in Social Networks

    Full text link
    Mining the silent members of an online community, also called lurkers, has been recognized as an important problem that accompanies the extensive use of online social networks (OSNs). Existing solutions to the ranking of lurkers can aid understanding the lurking behaviors in an OSN. However, they are limited to use only structural properties of the static network graph, thus ignoring any relevant information concerning the time dimension. Our goal in this work is to push forward research in lurker mining in a twofold manner: (i) to provide an in-depth analysis of temporal aspects that aims to unveil the behavior of lurkers and their relations with other users, and (ii) to enhance existing methods for ranking lurkers by integrating different time-aware properties concerning information-production and information-consumption actions. Network analysis and ranking evaluation performed on Flickr, FriendFeed and Instagram networks allowed us to draw interesting remarks on both the understanding of lurking dynamics and on transient and cumulative scenarios of time-aware ranking.Comment: 23 pages, 9 figures, 7 table

    Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey

    Full text link
    Future buildings will offer new convenience, comfort, and efficiency possibilities to their residents. Changes will occur to the way people live as technology involves into people's lives and information processing is fully integrated into their daily living activities and objects. The future expectation of smart buildings includes making the residents' experience as easy and comfortable as possible. The massive streaming data generated and captured by smart building appliances and devices contains valuable information that needs to be mined to facilitate timely actions and better decision making. Machine learning and big data analytics will undoubtedly play a critical role to enable the delivery of such smart services. In this paper, we survey the area of smart building with a special focus on the role of techniques from machine learning and big data analytics. This survey also reviews the current trends and challenges faced in the development of smart building services

    The Survey of Data Mining Applications And Feature Scope

    Full text link
    In this paper we have focused a variety of techniques, approaches and different areas of the research which are helpful and marked as the important field of data mining Technologies. As we are aware that many Multinational companies and large organizations are operated in different places of the different countries.Each place of operation may generate large volumes of data. Corporate decision makers require access from all such sources and take strategic decisions.The data warehouse is used in the significant business value by improving the effectiveness of managerial decision-making. In an uncertain and highly competitive business environment, the value of strategic information systems such as these are easily recognized however in todays business environment,efficiency or speed is not the only key for competitiveness.This type of huge amount of data are available in the form of tera-topeta-bytes which has drastically changed in the areas of science and engineering.To analyze,manage and make a decision of such type of huge amount of data we need techniques called the data mining which will transforming in many fields.This paper imparts more number of applications of the data mining and also focuses scope of the data mining which will helpful in the further research.Comment: International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.3, June 2012, 16 pages, 1 tabl

    Game Data Mining Competition on Churn Prediction and Survival Analysis using Commercial Game Log Data

    Full text link
    Game companies avoid sharing their game data with external researchers. Only a few research groups have been granted limited access to game data so far. The reluctance of these companies to make data publicly available limits the wide use and development of data mining techniques and artificial intelligence research specific to the game industry. In this work, we developed and implemented an international competition on game data mining using commercial game log data from one of the major game companies in South Korea: NCSOFT. Our approach enabled researchers to develop and apply state-of-the-art data mining techniques to game log data by making the data open. For the competition, data were collected from Blade & Soul, an action role-playing game, from NCSOFT. The data comprised approximately 100 GB of game logs from 10,000 players. The main aim of the competition was to predict whether a player would churn and when the player would churn during two periods between which the business model was changed to a free-to-play model from a monthly subscription. The results of the competition revealed that highly ranked competitors used deep learning, tree boosting, and linear regression.Comment: IEEE Transactions on Game

    Social Status and Communication Behavior in an Evolving Social Network

    Full text link
    The degree to which individuals can exert influence on propagation of information and opinion dynamics in online communities is highly dependent on their social status. Therefore, there is a high demand for identifying influential users in a community by predicting their social position in that community. Moreover, understanding how people with various social status behave, can shed light on the dynamics of interaction in social networks. In this paper, I study an evolving online social network originated from an online community for university students and I tackle the problem of forecasting users' social status, represented as their PageRank, based on frequency of recurring temporal sequences of observed behavior, i.e. behavioral motifs. I show that individuals with different values of PageRank exhibit different behavior even in early weeks since the online community's inception and it is possible to forecast future PageRank values given frequency of behavioral motifs with high accuracy

    Tensor Embedding: A Supervised Framework for Human Behavioral Data Mining and Prediction

    Full text link
    Today's densely instrumented world offers tremendous opportunities for continuous acquisition and analysis of multimodal sensor data providing temporal characterization of an individual's behaviors. Is it possible to efficiently couple such rich sensor data with predictive modeling techniques to provide contextual, and insightful assessments of individual performance and wellbeing? Prediction of different aspects of human behavior from these noisy, incomplete, and heterogeneous bio-behavioral temporal data is a challenging problem, beyond unsupervised discovery of latent structures. We propose a Supervised Tensor Embedding (STE) algorithm for high dimension multimodal data with join decomposition of input and target variable. Furthermore, we show that features selection will help to reduce the contamination in the prediction and increase the performance. The efficiently of the methods was tested via two different real world datasets

    Privacy in Social Media: Identification, Mitigation and Applications

    Full text link
    The increasing popularity of social media has attracted a huge number of people to participate in numerous activities on a daily basis. This results in tremendous amounts of rich user-generated data. This data provides opportunities for researchers and service providers to study and better understand users' behaviors and further improve the quality of the personalized services. Publishing user-generated data risks exposing individuals' privacy. Users privacy in social media is an emerging task and has attracted increasing attention in recent years. These works study privacy issues in social media from the two different points of views: identification of vulnerabilities, and mitigation of privacy risks. Recent research has shown the vulnerability of user-generated data against the two general types of attacks, identity disclosure and attribute disclosure. These privacy issues mandate social media data publishers to protect users' privacy by sanitizing user-generated data before publishing it. Consequently, various protection techniques have been proposed to anonymize user-generated social media data. There is a vast literature on privacy of users in social media from many perspectives. In this survey, we review the key achievements of user privacy in social media. In particular, we review and compare the state-of-the-art algorithms in terms of the privacy leakage attacks and anonymization algorithms. We overview the privacy risks from different aspects of social media and categorize the relevant works into five groups 1) graph data anonymization and de-anonymization, 2) author identification, 3) profile attribute disclosure, 4) user location and privacy, and 5) recommender systems and privacy issues. We also discuss open problems and future research directions for user privacy issues in social media.Comment: This survey is currently under revie
    corecore