84 research outputs found
Using Twitter to Understand Public Interest in Climate Change: The case of Qatar
Climate change has received an extensive attention from public opinion in the
last couple of years, after being considered for decades as an exclusive
scientific debate. Governments and world-wide organizations such as the United
Nations are working more than ever on raising and maintaining public awareness
toward this global issue. In the present study, we examine and analyze Climate
Change conversations in Qatar's Twittersphere, and sense public awareness
towards this global and shared problem in general, and its various related
topics in particular. Such topics include but are not limited to politics,
economy, disasters, energy and sandstorms. To address this concern, we collect
and analyze a large dataset of 109 million tweets posted by 98K distinct users
living in Qatar -- one of the largest emitters of CO2 worldwide. We use a
taxonomy of climate change topics created as part of the United Nations Pulse
project to capture the climate change discourse in more than 36K tweets. We
also examine which topics people refer to when they discuss climate change, and
perform different analysis to understand the temporal dynamics of public
interest toward these topics.Comment: Will appear in the proceedings of the International Workshop on
Social Media for Environment and Ecological Monitoring (SWEEM'16
Making the End-User a Priority in Benchmarking: OrionBench for Unsupervised Time Series Anomaly Detection
Time series anomaly detection is a prevalent problem in many application
domains such as patient monitoring in healthcare, forecasting in finance, or
predictive maintenance in energy. This has led to the emergence of a plethora
of anomaly detection methods, including more recently, deep learning based
methods. Although several benchmarks have been proposed to compare newly
developed models, they usually rely on one-time execution over a limited set of
datasets and the comparison is restricted to a few models. We propose
OrionBench -- a user centric continuously maintained benchmark for unsupervised
time series anomaly detection. The framework provides universal abstractions to
represent models, extensibility to add new pipelines and datasets,
hyperparameter standardization, pipeline verification, and frequent releases
with published benchmarks. We demonstrate the usage of OrionBench, and the
progression of pipelines across 15 releases published over the course of three
years. Moreover, we walk through two real scenarios we experienced with
OrionBench that highlight the importance of continuous benchmarks in
unsupervised time series anomaly detection
Le projet SABRE : de lâontologie aÌ lâinfeÌrence
International audienceLe projet SABRE a pour objet le développement d'un didacticiel destiné à faciliter l'apprentissage des comportements militaires dans les écoles de formation de l'Armée de Terre. La formation militaire générale des élÚves officiers s'appuie sur un corpus de textes de référence et sur l'utilisation de cas concrets issus du retour d'expérience (fiches structurées en XML), qui permettent aux formateurs d'animer une séance pédagogique visant à l'appropriation des comportements militaires. La conception du systÚme a débuté par la création d'une ontologie, préalable à la réalisation d'un analyseur syntaxique permettant d'extraire des rÚgles d'inférence à partir des documents XML
AER: Auto-Encoder with Regression for Time Series Anomaly Detection
Anomaly detection on time series data is increasingly common across various
industrial domains that monitor metrics in order to prevent potential accidents
and economic losses. However, a scarcity of labeled data and ambiguous
definitions of anomalies can complicate these efforts. Recent unsupervised
machine learning methods have made remarkable progress in tackling this problem
using either single-timestamp predictions or time series reconstructions. While
traditionally considered separately, these methods are not mutually exclusive
and can offer complementary perspectives on anomaly detection. This paper first
highlights the successes and limitations of prediction-based and
reconstruction-based methods with visualized time series signals and anomaly
scores. We then propose AER (Auto-encoder with Regression), a joint model that
combines a vanilla auto-encoder and an LSTM regressor to incorporate the
successes and address the limitations of each method. Our model can produce
bi-directional predictions while simultaneously reconstructing the original
time series by optimizing a joint objective function. Furthermore, we propose
several ways of combining the prediction and reconstruction errors through a
series of ablation studies. Finally, we compare the performance of the AER
architecture against two prediction-based methods and three
reconstruction-based methods on 12 well-known univariate time series datasets
from NASA, Yahoo, Numenta, and UCR. The results show that AER has the highest
averaged F1 score across all datasets (a 23.5% improvement compared to ARIMA)
while retaining a runtime similar to its vanilla auto-encoder and regressor
components. Our model is available in Orion, an open-source benchmarking tool
for time series anomaly detection.Comment: This work is accepted by IEEE BigData 2022. The paper contains 10
pages, 6 figures, and 4 table
Sintel: A Machine Learning Framework to Extract Insights from Signals
The detection of anomalies in time series data is a critical task with many
monitoring applications. Existing systems often fail to encompass an end-to-end
detection process, to facilitate comparative analysis of various anomaly
detection methods, or to incorporate human knowledge to refine output. This
precludes current methods from being used in real-world settings by
practitioners who are not ML experts. In this paper, we introduce Sintel, a
machine learning framework for end-to-end time series tasks such as anomaly
detection. The framework uses state-of-the-art approaches to support all steps
of the anomaly detection process. Sintel logs the entire anomaly detection
journey, providing detailed documentation of anomalies over time. It enables
users to analyze signals, compare methods, and investigate anomalies through an
interactive visualization tool, where they can annotate, modify, create, and
remove events. Using these annotations, the framework leverages human knowledge
to improve the anomaly detection pipeline. We demonstrate the usability,
efficiency, and effectiveness of Sintel through a series of experiments on
three public time series datasets, as well as one real-world use case involving
spacecraft experts tasked with anomaly analysis tasks. Sintel's framework,
code, and datasets are open-sourced at https://github.com/sintel-dev/.Comment: This work is accepted by ACM SIGMOD/PODS International Conference on
Management of Data (SIGMOD 2022
Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence
The Web has enabled the availability of a huge amount of useful information,
but has also eased the ability to spread false information and rumors across
multiple sources, making it hard to distinguish between what is true and what
is not. Recent examples include the premature Steve Jobs obituary, the second
bankruptcy of United airlines, the creation of Black Holes by the operation of
the Large Hadron Collider, etc. Since it is important to permit the expression
of dissenting and conflicting opinions, it would be a fallacy to try to ensure
that the Web provides only consistent information. However, to help in
separating the wheat from the chaff, it is essential to be able to determine
dependence between sources. Given the huge number of data sources and the vast
volume of conflicting data available on the Web, doing so in a scalable manner
is extremely challenging and has not been addressed by existing work yet.
In this paper, we present a set of research problems and propose some
preliminary solutions on the issues involved in discovering dependence between
sources. We also discuss how this knowledge can benefit a variety of
technologies, such as data integration and Web 2.0, that help users manage and
access the totality of the available information from various sources.Comment: CIDR 200
Un état de l'art sur la qualité des données
International audienc
- âŠ