113 research outputs found
Event Detection from Social Media Stream: Methods, Datasets and Opportunities
Social media streams contain large and diverse amount of information, ranging
from daily-life stories to the latest global and local events and news.
Twitter, especially, allows a fast spread of events happening real time, and
enables individuals and organizations to stay informed of the events happening
now. Event detection from social media data poses different challenges from
traditional text and is a research area that has attracted much attention in
recent years. In this paper, we survey a wide range of event detection methods
for Twitter data stream, helping readers understand the recent development in
this area. We present the datasets available to the public. Furthermore, a few
research opportunitiesComment: 8 page
Water Data Science: Data Driven Techniques, Training, and Tools for Improved Management of High Frequency Water Resources Data
Electronic sensors can measure water and climate conditions at high frequency and generate large quantities of observed data. This work addresses data management challenges associated with the volume and complexity of high frequency water data. We developed techniques for automatically reviewing data, created materials for training water data managers, and explored existing and emerging technologies for sensor data management.
Data collected by sensors often include errors due to sensor failure or environmental conditions that need to be removed, labeled, or corrected before the data can be used for analysis. Manual review and correction of these data can be tedious and time consuming. To help automate these tasks, we developed a computer program that automatically checks the data for mistakes and attempts to fix them. This tool has the potential to save time and effort and is available to scientists and practitioners who use sensors to monitor water.
Scientists may lack skillsets for working with sensor data because traditional engineering or science courses do not address how work with complex data with modern technology. We surveyed and interviewed instructors who teach courses related to “hydroinformatics” or “water data science” to understand challenges in incorporating data science techniques and tools into water resources teaching. Based on their feedback, we created educational materials that demonstrate how the articulated challenges can be effectively addressed to provide high-quality instruction. These materials are available online for students and teachers.
In addition to skills for working with sensor data, scientists and engineers need tools for storing, managing, and sharing these data. Hydrologic information systems (HIS) help manage the data collected using sensors. HIS make sure that data can be effectively used by providing the computer infrastructure to get data from sensors in the field to secure data storage and then into the hands of scientists and others who use them. This work describes the evolution of software and standards that comprise HIS. We present the main components of HIS, describe currently available systems and gaps in technology or functionality, and then discuss opportunities for improved infrastructure that would make sensor data easier to collect, manage, and use.
In short, we are trying to make sure that sensor data are good and useful; we’re helping instructors teach prospective data collectors and users about water and data; and we are making sure that the systems that enable collection, storage, management, and use of the data work smoothly
Recommended from our members
Advances in statistical script learning
When humans encode information into natural language, they do so with the
clear assumption that the reader will be able to seamlessly make inferences
based on world knowledge. For example, given the sentence ``Mrs. Dalloway said
she would buy the flowers herself,'' one can make a number of probable
inferences based on event co-occurrences: she bought flowers, she went to a
store, she took the flowers home, and so on.
Observing this, it is clear that many different useful natural language
end-tasks could benefit from models of events as they typically co-occur
(so-called script models).
Robust question-answering systems must be able to infer highly-probable implicit
events from what is explicitly stated in a text, as must robust
information-extraction systems that map from unstructured text to formal
assertions about relations expressed in the text. Coreference resolution
systems, semantic role labeling, and even syntactic parsing systems could, in
principle, benefit from event co-occurrence models.
To this end, we present a number of contributions related to statistical
event co-occurrence models. First, we investigate a method of incorporating
multiple entities into events in a count-based co-occurrence model. We find that
modeling multiple entities interacting across events allows for improved
empirical performance on the task of modeling sequences of events in documents.
Second, we give a method of applying Recurrent Neural Network sequence models
to the task of predicting held-out predicate-argument structures from documents.
This model allows us to easily incorporate entity noun information, and can
allow for more complex, higher-arity events than a count-based co-occurrence
model. We find the neural model improves performance considerably over the
count-based co-occurrence model.
Third, we investigate the performance of a sequence-to-sequence encoder-decoder
neural model on the task of predicting held-out predicate-argument events from
text. This model does not explicitly model any external syntactic information,
and does not require a parser. We find the text-level model to be competitive in
predictive performance with an event level model directly mediated by an
external syntactic analysis.
Finally, motivated by this result, we investigate incorporating features derived
from these models into a baseline noun coreference resolution system. We find
that, while our additional features do not appreciably improve top-level
performance, we can nonetheless provide empirical improvement on a number of
restricted classes of difficult coreference decisions.Computer Science
Real-time Event Detection on Social Data Streams
Social networks are quickly becoming the primary medium for discussing what
is happening around real-world events. The information that is generated on
social platforms like Twitter can produce rich data streams for immediate
insights into ongoing matters and the conversations around them. To tackle the
problem of event detection, we model events as a list of clusters of trending
entities over time. We describe a real-time system for discovering events that
is modular in design and novel in scale and speed: it applies clustering on a
large stream with millions of entities per minute and produces a dynamically
updated set of events. In order to assess clustering methodologies, we build an
evaluation dataset derived from a snapshot of the full Twitter Firehose and
propose novel metrics for measuring clustering quality. Through experiments and
system profiling, we highlight key results from the offline and online
pipelines. Finally, we visualize a high profile event on Twitter to show the
importance of modeling the evolution of events, especially those detected from
social data streams.Comment: Accepted as a full paper at KDD 2019 on April 29, 201
Event Detection in Twitter Using Multi Timing Chained Windows
Twitter is a popular microblogging and social networking service. Twitter posts are continuously generated and well suited for knowledge discovery using different data mining techniques. We present a novel near real-time approach for processing tweets and detecting events. The proposed method, Multi Timing Chained Windows (MTCW), is independent of the language of the tweets. The MTCW defines several Timing Windows and links them to each other like a chain. Indeed, in this chain, the input of the larger window will be the output of the smaller previous one. Using MTCW, the events can be detected over a few minutes. To evaluate this idea, the required dataset has been collected using the Twitter API. The results of evaluations show the accuracy and the effectiveness of our approach compared with other state-of-the-art methods in the event detection in Twitter
Data Challenges and Data Analytics Solutions for Power Systems
L'abstract è presente nell'allegato / the abstract is in the attachmen
Event Detection and Tracking Detection of Dangerous Events on Social Media
Online social media platforms have become essential tools for communication and information exchange in our lives. It is used for connecting with people and sharing information.
This phenomenon has been intensively studied in the past decade to investigate users’ sentiments for different scenarios and purposes. As the technology advanced and popularity
increased, it led to the use of different terms referring to similar topics which often result in
confusion. We study such trends and intend to propose a uniform solution that deals with
the subject clearly. We gather all these ambiguous terms under the umbrella of the most
recent and popular terms to reach a concise verdict. Many events have been addressed
in recent works that cover only specific types and domains of events. For the sake of
keeping things simple and practical, the events that are extreme, negative, and dangerous
are grouped under the name Dangerous Events (DE). These dangerous events are further
divided into three main categories of action-based, scenario-based, and sentiments-based
dangerous events to specify their characteristics. We then propose deep-learning-based
models to detect events that are dangerous in nature. The deep-learning models that include BERT, RoBERTa, and XLNet provide valuable results that can effectively help solve
the issue of detecting dangerous events using various dimensions. Even though the models
perform well, the main constraint of fewer available event datasets and lower quality of
certain events data affects the performance of these models can be tackled by handling
the issue accordingly.As plataformas online de redes sociais tornaram-se ferramentas essenciais para a comunicação, conexão com outros, e troca de informação nas nossas vidas. Este fenómeno
tem sido intensamente estudado na última década para investigar os sentimentos dos utilizadores em diferentes cenários e para vários propósitos. Contudo, a utilização dos meios
de comunicação social tornou-se mais complexa e num fenómeno mais vasto devido ao
envolvimento de múltiplos intervenientes, tais como empresas, grupos e outras organizações. À medida que a tecnologia avançou e a popularidade aumentou, a utilização de
termos diferentes referentes a tópicos semelhantes gerou confusão. Por outras palavras, os
modelos são treinados segundo a informação de termos e âmbitos específicos. Portanto, a
padronização é imperativa. O objetivo deste trabalho é unir os diferentes termos utilizados
em termos mais abrangentes e padronizados. O perigo pode ser uma ameaça como violência social, desastres naturais, danos intelectuais ou comunitários, contágio, agitação social,
perda económica, ou apenas a difusão de ideologias odiosas e violentas. Estudamos estes
diferentes eventos e classificamos-los em tópicos para que a ténica de deteção baseada em
tópicos possa ser concebida e integrada sob o termo Evento Perigosos (DE). Consequentemente, definimos o termo proposto “Eventos Perigosos” (Dangerous Events) e dividimo-lo
em três categorias principais de modo a especificar as suas características. Sendo estes
denominados Eventos Perigosos, Eventos Perigosos de nível superior, e Eventos Perigosos
de nível inferior. O conjunto de dados MAVEN foi utilizado para a obtenção de conjuntos
de dados para realizar a experiência. Estes conjuntos de dados são filtrados manualmente
com base no tipo de eventos para separar eventos perigosos de eventos gerais. Os modelos
de transformação BERT, RoBERTa, e XLNet foram utilizados para classificar dados de
texto consoante a respetiva categoria de Eventos Perigosos. Os resultados demonstraram
que o desempenho do BERT é superior a outros modelos e pode ser eficazmente utilizado
para a tarefa de deteção de Eventos Perigosos. Salienta-se que a abordagem de divisão
dos conjuntos de dados aumentou significativamente o desempenho dos modelos.
Existem diversos métodos propostos para a deteção de eventos. A deteção destes eventos
(ED) são maioritariamente classificados na categoria de supervisonado e não supervisionados, como demonstrado nos metódos supervisionados, estão incluidos support vector
machine (SVM), Conditional random field (CRF), Decision tree (DT), Naive Bayes (NB),
entre outros. Enquanto a categoria de não supervisionados inclui Query-based, Statisticalbased, Probabilistic-based, Clustering-based e Graph-based. Estas são as duas abordagens
em uso na deteção de eventos e são denonimados de document-pivot and feature-pivot. A
diferença entre estas abordagens é na sua maioria a clustering approach, a forma como
os documentos são utilizados para caracterizar vetores, e a similaridade métrica utilizada
para identificar se dois documentos correspondem ao mesmo evento ou não. Além da
deteção de eventos, a previsão de eventos é um problema importante mas complicado
que engloba diversas dimensões. Muitos destes eventos são difíceis de prever antes de
se tornarem visíveis e ocorrerem. Como um exemplo, é impossível antecipar catástrofes
naturais, sendo apenas detetáveis após o seu acontecimento. Existe um número limitado
de recursos em ternos de conjuntos de dados de eventos. ACE 2005, MAVEN, EVIN são alguns dos exemplos de conjuntos de dados disponíveis para a deteção de evnetos.
Os trabalhos recentes demonstraram que os Transformer-based pre-trained models (PTMs)
são capazes de alcançar desempenho de última geração em várias tarefas de NLP. Estes
modelos são pré-treinados em grandes quantidades de texto. Aprendem incorporações
para as palavras da língua ou representações de vetores de modo a que as palavras que se
relacionem se agrupen no espaço vectorial. Um total de três transformadores diferentes,
nomeadamente BERT, RoBERTa, e XLNet, será utilizado para conduzir a experiência e
tirar a conclusão através da comparação destes modelos.
Os modelos baseados em transformação (Transformer-based) estão em total sintonia utilizando uma divisão de 70,30 dos conjuntos de dados para fins de formação e teste/validação.
A sintonização do hiperparâmetro inclui 10 epochs, 16 batch size, e o optimizador AdamW
com taxa de aprendizagem 2e-5 para BERT e RoBERTa e 3e-5 para XLNet. Para eventos
perigosos, o BERT fornece 60%, o RoBERTa 59 enquanto a XLNet fornece apenas 54%
de precisão geral. Para as outras experiências de configuração de eventos de alto nível, o
BERT e a XLNet dão 71% e 70% de desempenho com RoBERTa em relação aos outros
modelos com 74% de precisão. Enquanto para o DE baseado em acções, DE baseado em
cenários, e DE baseado em sentimentos, o BERT dá 62%, 85%, e 81% respetivamente;
RoBERTa com 61%, 83%, e 71%; a XLNet com 52%, 81%, e 77% de precisão.
Existe a necessidade de clarificar a ambiguidade entre os diferentes trabalhos que abordam
problemas similares utilizando termos diferentes. A ideia proposta de referir acontecimentos especifícos como eventos perigosos torna mais fácil a abordagem do problema em
questão. No entanto, a escassez de conjunto de dados de eventos limita o desempenho dos
modelos e o progresso na deteção das tarefas. A disponibilidade de uma maior quantidade
de informação relacionada com eventos perigosos pode melhorar o desempenho do modelo
existente. É evidente que o uso de modelos de aprendizagem profunda, tais como como
BERT, RoBERTa, e XLNet, pode ajudar a detetar e classificar eventos perigosos de forma
eficiente. Tem sido evidente que a utilização de modelos de aprendizagem profunda, tais
como BERT, RoBERTa, e XLNet, pode ajudar a detetar e classificar eventos perigosos
de forma eficiente. Em geral, o BERT tem um desempenho superior ao do RoBERTa e
XLNet na detecção de eventos perigosos. É igualmente importante rastrear os eventos
após a sua detecção. Por conseguinte, para trabalhos futuros, propõe-se a implementação
das técnicas que lidam com o espaço e o tempo, a fim de monitorizar a sua emergência
com o tempo
Machine learning methods in finance: Recent applications and prospects
We study how researchers can apply machine learning (ML) methods in finance. We first establish that the two major categories of ML (supervised and unsupervised learning) address fundamentally different problems than traditional econometric approaches. Then, we review the current state of research on ML in finance and identify three archetypes of applications: (i) the construction of superior and novel measures, (ii) the reduction of prediction error, and (iii) the extension of the standard econometric toolset. With this taxonomy, we give an outlook on potential future directions for both researchers and practitioners. Our results suggest many benefits of ML methods compared to traditional approaches and indicate that ML holds great potential for future research in finance
Machine Learning
Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience
- …