1,844 research outputs found
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
Neural approaches to sequence labeling for information extraction
Een belangrijk aspect binnen artificiële intelligentie (AI) is het interpreteren van menselijke taal uitgedrukt in tekstuele (geschreven) vorm: natural Language processing (NLP) is belangrijk gezien tekstuele informatie nuttig is voor veel toepassingen. Toch is het verstaan ervan (zogenaamde natural Language understanding, (NLU) een uitdaging, gezien de ongestructureerde vorm van tekst, waarvan de betekenis vaak dubbelzinnig en contextafhankelijk is. In dit proefschrift introduceren we oplossingen voor tekortkomingen van gerelateerd werk bij het behandelen van fundamentele taken in natuurlijke taalverwerking, zoals named entity recognition (i.e. het identificeren van de entiteiten die in een zin voorkomen) en relatie-extractie (het identificeren van relaties tussen entiteiten). Vertrekkend van een specifiek probleem (met name het identificeren van de structuur van een huis aan de hand van een tekstueel zoekertje), bouwen we stapsgewijs een complete (geautomatiseerde) oplossing voor de bovengenoemde taken, op basis van neutrale netwerkarchitecturen. Onze oplossingen zijn algemeen toepasbaar op verschillende toepassingsdomeinen en talen. We beschouwen daarnaast ook de taak van het identificeren van relevante gebeurtenissen tijdens een evenement (bv. een doelpunt tijdens een voetbalwedstrijd), in informatiestromen op Twitter. Meer bepaald formuleren we dit probleem als het labelen van woord sequenties (vergelijkbaar met named entity recognition), waarbij we de chronologische relatie tussen opeenvolgende tweets benutten
DancingLines: An Analytical Scheme to Depict Cross-Platform Event Popularity
Nowadays, events usually burst and are propagated online through multiple
modern media like social networks and search engines. There exists various
research discussing the event dissemination trends on individual medium, while
few studies focus on event popularity analysis from a cross-platform
perspective. Challenges come from the vast diversity of events and media,
limited access to aligned datasets across different media and a great deal of
noise in the datasets. In this paper, we design DancingLines, an innovative
scheme that captures and quantitatively analyzes event popularity between
pairwise text media. It contains two models: TF-SW, a semantic-aware popularity
quantification model, based on an integrated weight coefficient leveraging
Word2Vec and TextRank; and wDTW-CD, a pairwise event popularity time series
alignment model matching different event phases adapted from Dynamic Time
Warping. We also propose three metrics to interpret event popularity trends
between pairwise social platforms. Experimental results on eighteen real-world
event datasets from an influential social network and a popular search engine
validate the effectiveness and applicability of our scheme. DancingLines is
demonstrated to possess broad application potentials for discovering the
knowledge of various aspects related to events and different media
Tweet-to-Act: Towards Tweet-Mining Framework for Extracting Terrorist Attack-related Information and Reporting
The widespread popularity of social networking is leading to the adoption of Twitter as an information dissemination tool. Existing research has shown that information dissemination over Twitter has a much broader reach than traditional media and can be used for effective post-incident measures. People use informal language on Twitter, including acronyms, misspelled words, synonyms, transliteration, and ambiguous terms. This makes incident-related information extraction a non-trivial task. However, this information can be valuable for public safety organizations that need to respond in an emergency. This paper proposes an early event-related information extraction and reporting framework that monitors Twitter streams, synthesizes event-specific information, e.g., a terrorist attack, and alerts law enforcement, emergency services, and media outlets. Specifically, the proposed framework, Tweet-to-Act (T2A), employs word embedding to transform tweets into a vector space model and then utilizes theWord Mover’s Distance (WMD) to cluster tweets for the identification of incidents. To extract reliable and valuable information from a large dataset of short and informal tweets, the proposed framework employs sequence labeling with bidirectional Long Short-Term Memory based Recurrent Neural Networks (bLSTM-RNN). Extensive experimental results suggest that our proposed framework, T2A, outperforms other state-of-the-art methods that use vector space modeling and distance calculation techniques, e.g., Euclidean and Cosine distance. T2A achieves an accuracy of 96% and an F1-score of 86.2% on real-life datasets
Unsupervised Detection of Sub-events in Large Scale Disasters
Social media plays a major role during and after major natural disasters
(e.g., hurricanes, large-scale fires, etc.), as people ``on the ground'' post
useful information on what is actually happening. Given the large amounts of
posts, a major challenge is identifying the information that is useful and
actionable. Emergency responders are largely interested in finding out what
events are taking place so they can properly plan and deploy resources. In this
paper we address the problem of automatically identifying important sub-events
(within a large-scale emergency ``event'', such as a hurricane). In particular,
we present a novel, unsupervised learning framework to detect sub-events in
Tweets for retrospective crisis analysis. We first extract noun-verb pairs and
phrases from raw tweets as sub-event candidates. Then, we learn a semantic
embedding of extracted noun-verb pairs and phrases, and rank them against a
crisis-specific ontology. We filter out noisy and irrelevant information then
cluster the noun-verb pairs and phrases so that the top-ranked ones describe
the most important sub-events. Through quantitative experiments on two large
crisis data sets (Hurricane Harvey and the 2015 Nepal Earthquake), we
demonstrate the effectiveness of our approach over the state-of-the-art. Our
qualitative evaluation shows better performance compared to our baseline.Comment: AAAI-20 Social Impact Trac
- …