1,217 research outputs found
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
The scholarly impact of TRECVid (2003-2009)
This paper reports on an investigation into the scholarly impact of the TRECVid (TREC Video Retrieval Evaluation) benchmarking conferences between 2003 and 2009. The contribution of TRECVid to research in video retrieval is assessed by analyzing publication content to show the development of techniques and approaches over time and by analyzing publication impact through publication numbers and citation analysis. Popular conference and journal venues for TRECVid publications are identified in terms of number of citations received. For a selection of participants at different career stages, the relative importance of TRECVid publications in terms of citations vis a vis their other publications is investigated. TRECVid, as an evaluation conference, provides data on which research teams âscoredâ highly against the evaluation criteria and the relationship between âtop scoringâ teams at TRECVid and the âtop scoringâ papers in terms of citations is analysed. A strong relationship was found between âsuccessâ at TRECVid and âsuccessâ at citations both for high scoring and low scoring teams. The implications of the study in terms of the value of TRECVid as a research activity, and the value of bibliometric analysis as a research evaluation tool, are discussed
Applications of Mining Arabic Text: A Review
Since the appearance of text mining, the Arabic language gained some interest in applying several text mining tasks over a text written in the Arabic language. There are several challenges faced by the researchers. These tasks include Arabic text summarization, which is one of the challenging open areas for research in natural language processing (NLP) and text mining fields, Arabic text categorization, and Arabic sentiment analysis. This chapter reviews some of the past and current researches and trends in these areas and some future challenges that need to be tackled. It also presents some case studies for two of the reviewed approaches
Arabic Text Summarization Challenges using Deep Learning Techniques: A Review
Text summarization is a challenging field in Natural Language Processing due to language modelisation and used techniques to give concise summaries. Dealing with Arabic language does increase the challenge while taking into consideration the many features of the Arabic language, the lack of tools and resources for Arabic, and the Algorithms adaptation and modelisation. In this paper, we present several researches dealing with Arabic Text summarization applying different Algorithms on several Datasets. We then compare all these researches and we give a conclusion to guide researchers on their further work
ON RELEVANCE FILTERING FOR REAL-TIME TWEET SUMMARIZATION
Real-time tweet summarization systems (RTS) require mechanisms for capturing relevant tweets, identifying novel tweets, and capturing timely tweets. In this thesis, we tackle the RTS problem with a main focus on the relevance filtering. We experimented with different traditional retrieval models.
Additionally, we propose two extensions to alleviate the sparsity and topic drift challenges that affect the relevance filtering. For the sparsity, we propose leveraging word embeddings in Vector Space model (VSM) term weighting to empower the system to use semantic similarity alongside the lexical matching. To mitigate the effect of topic drift, we exploit explicit relevance feedback to enhance profile representation to cope with its development in the stream over time.
We conducted extensive experiments over three standard English TREC test collections that were built specifically for RTS. Although the extensions do not generally exhibit better performance, they are comparable to the baselines used.
Moreover, we extended an event detection Arabic tweets test collection, called EveTAR, to support tasks that require novelty in the system's output. We collected novelty judgments using in-house annotators and used the collection to test our RTS system. We report preliminary results on EveTAR using different models of the RTS system.This work was made possible by NPRP grants # NPRP 7-1313-1-245 and # NPRP 7-1330-2-483 from the Qatar National Research Fund (a member of Qatar Foundation)
Automatic Arabic Text Summarization System (AATSS) Based on Semantic Feature Extraction
Recently, one of the problems arisen due to the amount of information and itâs availability on the web, is the increased need for effective and powerful tool to automatically summarize text. For English and European languages an intensive works have been done with high performance and nowadays they look forward to multi-document and multi-language summarization. However, Arabic language still suffers from the little attentions and research done in this filed. In our research we propose a model to automatically summarize Arabic text using text extraction. Various steps are involved in the approach: preprocessing text, extract set of feature from sentences, classify sentence based on scoring method, ranking sentences and finally generate an extract summary. The main difference between our proposed system and other Arabic summarization systems are the consideration of semantics, entity objects such as names and places, and similarity factors in our proposed system. The proposed system has been applied on news domain using a dataset obtained from Falesteen newspaper. Manual evaluation techniques are used to evaluate and test the system. The results obtained by the proposed method achieve 86.5% similarity between the system and human summarization. A comparative study between our proposed system and Sakhr Arabic online summarization system has been conducted. The results show that our proposed system outperforms the Shakr system
- âŠ