4,439 research outputs found
Personal life event detection from social media
Creating video clips out of personal content from social media is on the rise. MuseumOfMe, Facebook Lookback, and Google Awesome are some popular examples. One core challenge to the creation of such life summaries is the identification of personal events, and their time frame. Such videos can greatly benefit from automatically distinguishing between social media content that is about someone's own wedding from that week, to an old wedding, or to that of a friend. In this paper, we describe our approach for identifying a number of common personal life events from social media content (in this paper we have used Twitter for our test), using multiple feature-based classifiers. Results show that combination of linguistic and social interaction features increases overall classification accuracy of most of the events while some events are relatively more difficult than others (e.g. new born with mean precision of .6 from all three models)
DSL: Discriminative Subgraph Learning via Sparse Self-Representation
The goal in network state prediction (NSP) is to classify the global state
(label) associated with features embedded in a graph. This graph structure
encoding feature relationships is the key distinctive aspect of NSP compared to
classical supervised learning. NSP arises in various applications: gene
expression samples embedded in a protein-protein interaction (PPI) network,
temporal snapshots of infrastructure or sensor networks, and fMRI coherence
network samples from multiple subjects to name a few. Instances from these
domains are typically ``wide'' (more features than samples), and thus, feature
sub-selection is required for robust and generalizable prediction. How to best
employ the network structure in order to learn succinct connected subgraphs
encompassing the most discriminative features becomes a central challenge in
NSP. Prior work employs connected subgraph sampling or graph smoothing within
optimization frameworks, resulting in either large variance of quality or weak
control over the connectivity of selected subgraphs.
In this work we propose an optimization framework for discriminative subgraph
learning (DSL) which simultaneously enforces (i) sparsity, (ii) connectivity
and (iii) high discriminative power of the resulting subgraphs of features. Our
optimization algorithm is a single-step solution for the NSP and the associated
feature selection problem. It is rooted in the rich literature on
maximal-margin optimization, spectral graph methods and sparse subspace
self-representation. DSL simultaneously ensures solution interpretability and
superior predictive power (up to 16% improvement in challenging instances
compared to baselines), with execution times up to an hour for large instances.Comment: 9 page
Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages
Principal component analysis (PCA) and related techniques have been
successfully employed in natural language processing. Text mining applications
in the age of the online social media (OSM) face new challenges due to
properties specific to these use cases (e.g. spelling issues specific to texts
posted by users, the presence of spammers and bots, service announcements,
etc.). In this paper, we employ a Robust PCA technique to separate typical
outliers and highly localized topics from the low-dimensional structure present
in language use in online social networks. Our focus is on identifying
geospatial features among the messages posted by the users of the Twitter
microblogging service. Using a dataset which consists of over 200 million
geolocated tweets collected over the course of a year, we investigate whether
the information present in word usage frequencies can be used to identify
regional features of language use and topics of interest. Using the PCA pursuit
method, we are able to identify important low-dimensional features, which
constitute smoothly varying functions of the geographic location
- …