759 research outputs found
Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness
We present an overview of the CLEF-2018 CheckThat! Lab on Automatic
Identification and Verification of Political Claims, with focus on Task 1:
Check-Worthiness. The task asks to predict which claims in a political debate
should be prioritized for fact-checking. In particular, given a debate or a
political speech, the goal was to produce a ranked list of its sentences based
on their worthiness for fact checking. We offered the task in both English and
Arabic, based on debates from the 2016 US Presidential Campaign, as well as on
some speeches during and after the campaign. A total of 30 teams registered to
participate in the Lab and seven teams actually submitted systems for Task~1.
The most successful approaches used by the participants relied on recurrent and
multi-layer neural networks, as well as on combinations of distributional
representations, on matchings claims' vocabulary against lexicons, and on
measures of syntactic dependency. The best systems achieved mean average
precision of 0.18 and 0.15 on the English and on the Arabic test datasets,
respectively. This leaves large room for further improvement, and thus we
release all datasets and the scoring scripts, which should enable further
research in check-worthiness estimation.Comment: Computational journalism, Check-worthiness, Fact-checking, Veracit
Analysis of Second-Order Thrust Bearing Coefficients Considering Misalignment Effect
Peer reviewe
Bifurcation analysis of rotor/bearing system using third-order journal bearing stiffness and damping coefficients
The authors declare that they do not receive any funds from any organization for this research.Peer reviewedPublisher PD
Fatigue Failure in Polymeric Materials : Insights from Experimental Testing
Open access via the Springer agreement The authors wish to express their profound gratitude to Prof. Mokhtar Omar of Cairo University, who sadly passed away on December 22, 2023. His invaluable guidance during the design of the fatigue tester was greatly appreciated. His kind and encouraging words will always be remembered. The authors would like to acknowledge the financial support provided by the Faculty of Engineering at Mataria and the Arab Organization for Industrialization for funding the manufacturing of the test rig. Their contributions are greatly appreciated.Peer reviewe
IDENTITY RESOLUTION IN EMAIL COLLECTIONS
Access to historically significant email collections poses challenges that arise less often in personal collections. Most notably, people exploring a large collection of emails, in which they were not sending or receiving, may not be very familiar with the discussions that exist in this collection. They would not only need to focus on understanding the topical content of those discussions, but would also find it useful to understand who the people sending, receiving, or mentioned in these discussions were.
In this dissertation, the problem of resolving personal identity in the context of large email collections is tackled. In such collections, a common name (e.g., John) might easily refer to any one of several hundred people; when one of these people was mentioned in an email, the question then arises: "who is that John?''
To "resolve identity'' of people in an email collection, two problems need to be solved: (1) modeling the identity of the participants in that collection, and (2) resolving name-mentions (that appeared in the body of the messages) to these identities. To tackle the first problem, a simple computational model of identity, that is built on extracting unambiguous references (e.g., full names from headers, or nicknames from free-text signatures) to people from the whole collection, is presented. To tackle the second problem, a generative probabilistic approach that leverages the model of identity to resolve mentions is presented. The approach is motivated by intuitions about the way people might refer to others in an email; it expands the context surrounding a mention in four directions: the message where the mention was observed, the thread that includes that message, topically-related messages, and messages sent or received by the original communicating parties. It relies on less ambiguous references (e.g., email addresses or full names) that are observed in some context of a given mention to rank potential referents of that mention.
In order to jointly resolve all mentions in the collection, a parallel implementation is presented using the MapReduce distributed-programming framework. The implementation decomposes the structure of the resolution process into subcomponents that fit the MapReduce task model well. At the heart of that implementation, a parallel algorithm for efficient computation of pairwise document similarity in large collections is proposed as a general solution that can be used for scalable context expansion of all mentions and other applications as well.
The resolution approach compares favorably with previously-reported techniques on small test collections (sets of mention-queries that were manually resolved beforehand) that were used to evaluate the task in the literature. However, the mention-queries in those collections, besides being relatively few in number, are limited in that all refer to people for whom a substantial amount of evidence would be expected to be available in the collection thus omitting the "long tail'' of the identity distribution for which less evidence is available. This motivated the development of a new test collection that now is the largest and best-balanced test collection available for the task. To build this collection, a user study was conducted that also provided some insight into the difficulty of the task and how time-consuming it is when humans perform it, and the reliability of their task performance. The study revealed that at least 80% of the 584 annotated mentions were resolvable to people who had sent or received email within the same collection.
The new test collection was used to experimentally evaluate the resolution system. The results highlight the
importance of the social context (that includes messages sent or received by the original communicating parties) when resolving mentions in email. Moreover, the results show that combining evidence from multiple types of contexts yields better resolution than what can be achieved using any individual context. The one-best selection is correct 74% of the time when tested on the full set of the mention-queries, and 51% of the time when tested on the mention-queries labeled as "hard'' by the annotators. Experiments run with iterative reformulation of the resolution algorithm resulted in modest gains only for the second iteration in the social context expansion
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
Adaptive Method for Following Dynamic Topics on Twitter
Many research social studies of public response on social media require following (i.e., tracking) topics on Twitter for long periods of time. The current approaches rely on streaming tweets based on some hashtags or keywords, or following some Twitter accounts. Such approaches lead to limited coverage of on-topic tweets. In this paper, we introduce a novel technique for following such topics in a more effective way. A topic is defined as a set of well-prepared queries that cover the static side of the topic. We propose an automatic approach that adapts to emerging aspects of a tracked broad topic over time. We tested our tracking approach on three broad dynamic topics that are hot in different categories: Egyptian politics, Syrian conflict, and international sports. We measured the effectiveness of our approach over four full days spanning a period of four months to ensure consistency in effectiveness. Experimental results showed that, on average, our approach achieved over 100 % increase in recall relative to the baseline Boolean approach, while maintaining an acceptable precision of 83%
bigIR at TREC 2019: Graph-based Analysis for News Background Linking
Nowadays, it is very rare to find an online news article that is self-contained with everything a reader would want to know about the article's story. Therefore, it became vital for any article to contain links to other articles or resources that provide the background and contextual knowledge required to conceptualize the article's story. However, finding useful background and contextual links can be a challenging problem. In this paper, we address this problem in the context of the participation of the bigIR team at Qatar University in the news background linking task of the TREC 2019 news track. Our methods mainly relied on a graph-based analysis of the query-article's text to extract its most representative and influential keywords, and then use these keywords as a search query to retrieve the article's background links from a collection of news articles. All of our submitted runs outperformed the TREC hypothetical run that achieved a median effectiveness over all queries. Moreover, our best submitted run was ranked second among 28 runs submitted to the task, indicating the potential effectiveness of our approach.This work was made possible by NPRP grant# NPRP 11S-1204-170060 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Scopu
- …
