256 research outputs found
On the Feature Discovery for App Usage Prediction in Smartphones
With the increasing number of mobile Apps developed, they are now closely
integrated into daily life. In this paper, we develop a framework to predict
mobile Apps that are most likely to be used regarding the current device status
of a smartphone. Such an Apps usage prediction framework is a crucial
prerequisite for fast App launching, intelligent user experience, and power
management of smartphones. By analyzing real App usage log data, we discover
two kinds of features: The Explicit Feature (EF) from sensing readings of
built-in sensors, and the Implicit Feature (IF) from App usage relations. The
IF feature is derived by constructing the proposed App Usage Graph (abbreviated
as AUG) that models App usage transitions. In light of AUG, we are able to
discover usage relations among Apps. Since users may have different usage
behaviors on their smartphones, we further propose one personalized feature
selection algorithm. We explore minimum description length (MDL) from the
training data and select those features which need less length to describe the
training data. The personalized feature selection can successfully reduce the
log size and the prediction time. Finally, we adopt the kNN classification
model to predict Apps usage. Note that through the features selected by the
proposed personalized feature selection algorithm, we only need to keep these
features, which in turn reduces the prediction time and avoids the curse of
dimensionality when using the kNN classifier. We conduct a comprehensive
experimental study based on a real mobile App usage dataset. The results
demonstrate the effectiveness of the proposed framework and show the predictive
capability for App usage prediction.Comment: 10 pages, 17 figures, ICDM 2013 short pape
Graph based Anomaly Detection and Description: A Survey
Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field
SemEval-2016 task 5 : aspect based sentiment analysis
International audienceThis paper describes the SemEval 2016 shared task on Aspect Based Sentiment Analysis (ABSA), a continuation of the respective tasks of 2014 and 2015. In its third year, the task provided 19 training and 20 testing datasets for 8 languages and 7 domains, as well as a common evaluation procedure. From these datasets, 25 were for sentence-level and 14 for text-level ABSA; the latter was introduced for the first time as a subtask in SemEval. The task attracted 245 submissions from 29 teams
Autoencoders for strategic decision support
In the majority of executive domains, a notion of normality is involved in
most strategic decisions. However, few data-driven tools that support strategic
decision-making are available. We introduce and extend the use of autoencoders
to provide strategically relevant granular feedback. A first experiment
indicates that experts are inconsistent in their decision making, highlighting
the need for strategic decision support. Furthermore, using two large
industry-provided human resources datasets, the proposed solution is evaluated
in terms of ranking accuracy, synergy with human experts, and dimension-level
feedback. This three-point scheme is validated using (a) synthetic data, (b)
the perspective of data quality, (c) blind expert validation, and (d)
transparent expert evaluation. Our study confirms several principal weaknesses
of human decision-making and stresses the importance of synergy between a model
and humans. Moreover, unsupervised learning and in particular the autoencoder
are shown to be valuable tools for strategic decision-making
A Survey on Automated Fact-Checking
Fact-checking has become increasingly important due to the speed with which both information and misinformation can spread in the modern media ecosystem. Therefore, researchers have been exploring how factchecking can be automated, using techniques based on natural language processing, machine learning, knowledge representation, and databases to automatically predict the veracity of claims. In this paper, we survey automated fact-checking stemming from natural language processing, and discuss its connections to related tasks and disciplines. In this process, we present an overview of existing datasets and models, aiming to unify the various definitions given and identify common concepts. Finally, we highlight challenges for future research
Virus Propagation in Multiple Profile Networks
Suppose we have a virus or one competing idea/product that propagates over a
multiple profile (e.g., social) network. Can we predict what proportion of the
network will actually get "infected" (e.g., spread the idea or buy the
competing product), when the nodes of the network appear to have different
sensitivity based on their profile? For example, if there are two profiles
and in a network and the nodes of profile
and profile are susceptible to a highly spreading
virus with probabilities and
respectively, what percentage of both profiles will actually get infected from
the virus at the end? To reverse the question, what are the necessary
conditions so that a predefined percentage of the network is infected? We
assume that nodes of different profiles can infect one another and we prove
that under realistic conditions, apart from the weak profile (great
sensitivity), the stronger profile (low sensitivity) will get infected as well.
First, we focus on cliques with the goal to provide exact theoretical results
as well as to get some intuition as to how a virus affects such a multiple
profile network. Then, we move to the theoretical analysis of arbitrary
networks. We provide bounds on certain properties of the network based on the
probabilities of infection of each node in it when it reaches the steady state.
Finally, we provide extensive experimental results that verify our theoretical
results and at the same time provide more insight on the problem
- …