256 research outputs found
Named Entity Recognition in Twitter using Images and Text
Named Entity Recognition (NER) is an important subtask of information
extraction that seeks to locate and recognise named entities. Despite recent
achievements, we still face limitations with correctly detecting and
classifying entities, prominently in short and noisy text, such as Twitter. An
important negative aspect in most of NER approaches is the high dependency on
hand-crafted features and domain-specific knowledge, necessary to achieve
state-of-the-art results. Thus, devising models to deal with such
linguistically complex contexts is still challenging. In this paper, we propose
a novel multi-level architecture that does not rely on any specific linguistic
resource or encoded rule. Unlike traditional approaches, we use features
extracted from images and text to classify named entities. Experimental tests
against state-of-the-art NER for Twitter on the Ritter dataset present
competitive results (0.59 F-measure), indicating that this approach may lead
towards better NER models.Comment: The 3rd International Workshop on Natural Language Processing for
Informal Text (NLPIT 2017), 8 page
What you say and how you say it : joint modeling of topics and discourse in microblog conversations
This paper presents an unsupervised framework for jointly modeling topic content and discourse behavior in microblog conversations. Concretely, we propose a neural model to discover word clusters indicating what a conversation concerns (i.e., topics) and those reflecting how participants voice their opinions (i.e., discourse).1 Extensive experiments show that our model can yield both coherent topics and meaningful discourse behavior. Further study shows that our topic and discourse representations can benefit the classification of microblog messages, especially when they are jointly trained with the classifier
Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
Translating verbose information needs into crisp search queries is a
phenomenon that is ubiquitous but hardly understood. Insights into this process
could be valuable in several applications, including synthesizing large
privacy-friendly query logs from public Web sources which are readily available
to the academic research community. In this work, we take a step towards
understanding query formulation by tapping into the rich potential of community
question answering (CQA) forums. Specifically, we sample natural language (NL)
questions spanning diverse themes from the Stack Exchange platform, and conduct
a large-scale conversion experiment where crowdworkers submit search queries
they would use when looking for equivalent information. We provide a careful
analysis of this data, accounting for possible sources of bias during
conversion, along with insights into user-specific linguistic patterns and
search behaviors. We release a dataset of 7,000 question-query pairs from this
study to facilitate further research on query understanding.Comment: ECIR 2020 Short Pape
Detecting New, Informative Propositions in Social Media
The ever growing quantity of online text produced makes it increasingly challenging to find new important or useful information. This is especially so when topics of potential interest are not known a-priori, such as in “breaking news stories”. This thesis examines techniques for detecting the emergence of new, interesting information in Social Media. It sets the investigation in the context of a hypothetical knowledge discovery and acquisition system, and addresses two objectives. The first objective addressed is the detection of new topics. The second is filtering of non-informative text from Social Media. A rolling time-slicing approach is proposed for discovery, in which daily frequencies of nouns, named entities, and multiword expressions are compared to their expected daily frequencies, as estimated from previous days using a Poisson model. Trending features, those showing a significant surge in use, in Social Media are potentially interesting. Features that have not shown a similar recent surge in News are selected as indicative of new information. It is demonstrated that surges in nouns and news entities can be detected that predict corresponding surges in mainstream news. Co-occurring trending features are used to create clusters of potentially topic-related documents. Those formed from co-occurrences of named entities are shown to be the most topically coherent.
Machine learning based filtering models are proposed for finding informative text in Social Media. News/Non-News and Dialogue Act models are explored using the News annotated Redites corpus of Twitter messages. A simple 5-act Dialogue scheme, used to annotate a small sample thereof, is presented. For both News/Non-News and Informative/Non-Informative classification tasks, using non-lexical message features produces more discriminative and robust classification models than using message terms alone. The
combination of all investigated features yield the most accurate models
Rumor Detection with Diverse Counterfactual Evidence
The growth in social media has exacerbated the threat of fake news to
individuals and communities. This draws increasing attention to developing
efficient and timely rumor detection methods. The prevailing approaches resort
to graph neural networks (GNNs) to exploit the post-propagation patterns of the
rumor-spreading process. However, these methods lack inherent interpretation of
rumor detection due to the black-box nature of GNNs. Moreover, these methods
suffer from less robust results as they employ all the propagation patterns for
rumor detection. In this paper, we address the above issues with the proposed
Diverse Counterfactual Evidence framework for Rumor Detection (DCE-RD). Our
intuition is to exploit the diverse counterfactual evidence of an event graph
to serve as multi-view interpretations, which are further aggregated for robust
rumor detection results. Specifically, our method first designs a subgraph
generation strategy to efficiently generate different subgraphs of the event
graph. We constrain the removal of these subgraphs to cause the change in rumor
detection results. Thus, these subgraphs naturally serve as counterfactual
evidence for rumor detection. To achieve multi-view interpretation, we design a
diversity loss inspired by Determinantal Point Processes (DPP) to encourage
diversity among the counterfactual evidence. A GNN-based rumor detection model
further aggregates the diverse counterfactual evidence discovered by the
proposed DCE-RD to achieve interpretable and robust rumor detection results.
Extensive experiments on two real-world datasets show the superior performance
of our method. Our code is available at https://github.com/Vicinity111/DCE-RD
- …