24,233 research outputs found
Unsupervised Social Event Detection via Hybrid Graph Contrastive Learning and Reinforced Incremental Clustering
Detecting events from social media data streams is gradually attracting
researchers. The innate challenge for detecting events is to extract
discriminative information from social media data thereby assigning the data
into different events. Due to the excessive diversity and high updating
frequency of social data, using supervised approaches to detect events from
social messages is hardly achieved. To this end, recent works explore learning
discriminative information from social messages by leveraging graph contrastive
learning (GCL) and embedding clustering in an unsupervised manner. However, two
intrinsic issues exist in benchmark methods: conventional GCL can only roughly
explore partial attributes, thereby insufficiently learning the discriminative
information of social messages; for benchmark methods, the learned embeddings
are clustered in the latent space by taking advantage of certain specific prior
knowledge, which conflicts with the principle of unsupervised learning
paradigm. In this paper, we propose a novel unsupervised social media event
detection method via hybrid graph contrastive learning and reinforced
incremental clustering (HCRC), which uses hybrid graph contrastive learning to
comprehensively learn semantic and structural discriminative information from
social messages and reinforced incremental clustering to perform efficient
clustering in a solidly unsupervised manner. We conduct comprehensive
experiments to evaluate HCRC on the Twitter and Maven datasets. The
experimental results demonstrate that our approach yields consistent
significant performance boosts. In traditional incremental setting,
semi-supervised incremental setting and solidly unsupervised setting, the model
performance has achieved maximum improvements of 53%, 45%, and 37%,
respectively.Comment: Accepted by Knowledge-Based System
Name Disambiguation from link data in a collaboration graph using temporal and topological features
In a social community, multiple persons may share the same name, phone number
or some other identifying attributes. This, along with other phenomena, such as
name abbreviation, name misspelling, and human error leads to erroneous
aggregation of records of multiple persons under a single reference. Such
mistakes affect the performance of document retrieval, web search, database
integration, and more importantly, improper attribution of credit (or blame).
The task of entity disambiguation partitions the records belonging to multiple
persons with the objective that each decomposed partition is composed of
records of a unique person. Existing solutions to this task use either
biographical attributes, or auxiliary features that are collected from external
sources, such as Wikipedia. However, for many scenarios, such auxiliary
features are not available, or they are costly to obtain. Besides, the attempt
of collecting biographical or external data sustains the risk of privacy
violation. In this work, we propose a method for solving entity disambiguation
task from link information obtained from a collaboration network. Our method is
non-intrusive of privacy as it uses only the time-stamped graph topology of an
anonymized network. Experimental results on two real-life academic
collaboration networks show that the proposed method has satisfactory
performance.Comment: The short version of this paper has been accepted to ASONAM 201
Typical Phone Use Habits: Intense Use Does Not Predict Negative Well-Being
Not all smartphone owners use their device in the same way. In this work, we
uncover broad, latent patterns of mobile phone use behavior. We conducted a
study where, via a dedicated logging app, we collected daily mobile phone
activity data from a sample of 340 participants for a period of four weeks.
Through an unsupervised learning approach and a methodologically rigorous
analysis, we reveal five generic phone use profiles which describe at least 10%
of the participants each: limited use, business use, power use, and
personality- & externally induced problematic use. We provide evidence that
intense mobile phone use alone does not predict negative well-being. Instead,
our approach automatically revealed two groups with tendencies for lower
well-being, which are characterized by nightly phone use sessions.Comment: 10 pages, 6 figures, conference pape
Unsupervised improvement of named entity extraction in short informal context using disambiguation clues
Short context messages (like tweets and SMS’s) are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. Most efforts done in this direction rely on machine learning techniques which are expensive in terms of data collection and training. In this paper we present an unsupervised Semantic Web-driven approach to improve the extraction process by using clues from the disambiguation process. For extraction we used a simple Knowledge-Base matching technique combined with a clustering-based approach for disambiguation. Experimental results on a self-collected set of tweets (as an example of short context messages) show improvement in extraction results when using unsupervised feedback from the disambiguation process
Semantics-driven event clustering in Twitter feeds
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall
- …