Search CORE

10 research outputs found

Env-Aware Anomaly Detection: Ignore Style Changes, Stay True to Content!

Author: Burceanu Elena
Haller Emanuela
Nicolicioiu Andrei Liviu
Smeu Stefan
Publication venue
Publication date: 23/11/2022
Field of study

We introduce a formalization and benchmark for the unsupervised anomaly detection task in the distribution-shift scenario. Our work builds upon the iWildCam dataset, and, to the best of our knowledge, we are the first to propose such an approach for visual data. We empirically validate that environment-aware methods perform better in such cases when compared with the basic Empirical Risk Minimization (ERM). We next propose an extension for generating positive samples for contrastive methods that considers the environment labels when training, improving the ERM baseline score by 8.7%

arXiv.org e-Print Archive

AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection

Author: Brad Florin
Burceanu Elena
Dragoi Marius
Haller Emanuela
Manolache Andrei
Publication venue
Publication date: 03/04/2023
Field of study

Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning (ML), leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This type of data meets the premise of shifting the input distribution: it covers a large time span (

10

years), with naturally occurring changes over time (eg users modifying their behavior patterns, and software updates). We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. Next, we propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing splits. We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning. Finally, we show that by acknowledging the distribution shift problem and properly addressing it, the performance can be improved compared to the classical training which assumes independent and identically distributed data (on average, by up to

3\%

for our approach). Dataset and code are available at https://github.com/bit-ml/AnoShift/

arXiv.org e-Print Archive

Adaptive Distributed Data Storage for Context-Aware Applications, Journal of Telecommunications and Information Technology, 2013, nr 4

Author: Burceanu Elena
Cristea Valentin
Dobre Ciprian
Publication venue: Instytut Łączności - Państwowy Instytut Badawczy, Warszawa
Publication date
Field of study

Context-aware computing is a paradigm that relies on the active use of information coming from a variety of sources, ranging from smartphones to sensors. The paradigm usually leads to storing large volumes of data that need to be processed to derive higher-level context information. The paper presents a cloud-based storage layer for managing sensitive context data. To handle the storage and aggregation of context data for context-aware applications, Clouds are perfect candidates. But a Cloud platform for context-aware computing needs to cope with several requirements: high concurrent access (all data needs to be available to potentially a large number of users), mobility support (such platform should actively use the caches on mobile devices whenever possible, but also cope with storage size limitations), real-time access guarantees – local caches should be located closer to the end-user whenever possible, and persistency (for traceability, a history of the context data should remain available). BlobSeer, a framework for Cloud data storage, is a perfect candidate for storing context data for large-scale applications. It offers capabilities such as persistency, concurrency and support for flexible storage schema requirement. On top of BlobSeer, Context Aware Framework is designed as an extension that offers context-aware data management to higher-level applications, and enables scalable high-throughput under high-concurrency. On a logical level, the most important capabilities offered by Context Aware Framework are transparency, support for mobility, real-time guarantees and support for access based on meta-information. On the physical layer, the most important capability is persistent Cloud storage

Biblioteka Cyfrowa Instytutu Łączności / National Institute of Telecomunications: Digital Library

Rethinking the Authorship Verification Experimental Setups

Author: Barbalau Antonio
Brad Florin
Burceanu Elena
Ionescu Radu
Manolache Andrei
Popescu Marius
Publication venue
Publication date: 01/11/2022
Field of study

One of the main drivers of the recent advances in authorship verification is the PAN large-scale authorship dataset. Despite generating significant progress in the field, inconsistent performance differences between the closed and open test sets have been reported. To this end, we improve the experimental setup by proposing five new public splits over the PAN dataset, specifically designed to isolate and identify biases related to the text topic and to the author's writing style. We evaluate several BERT-like baselines on these splits, showing that such models are competitive with authorship verification state-of-the-art methods. Furthermore, using explainable AI, we find that these baselines are biased towards named entities. We show that models trained without the named entities obtain better results and generalize better when tested on DarkReddit, our new dataset for authorship verification.Comment: Accepted as a short paper at the EMNLP 2022 conference. 10 pages, 5 figures, 9 table

arXiv.org e-Print Archive

Environment-biased Feature Ranking for Novelty Detection Robustness

Author: Burceanu Elena
Haller Emanuela
Nicolicioiu Andrei Liviu
Smeu Stefan
Publication venue
Publication date: 10/10/2023
Field of study

We tackle the problem of robust novelty detection, where we aim to detect novelties in terms of semantic content while being invariant to changes in other, irrelevant factors. Specifically, we operate in a setup with multiple environments, where we determine the set of features that are associated more with the environments, rather than to the content relevant for the task. Thus, we propose a method that starts with a pretrained embedding and a multi-env setup and manages to rank the features based on their environment-focus. First, we compute a per-feature score based on the feature distribution variance between envs. Next, we show that by dropping the highly scored ones, we manage to remove spurious correlations and improve the overall performance by up to 6%, both in covariance and sub-population shift cases, both for a real and a synthetic benchmark, that we introduce for this task.Comment: The updated, long version of the paper is available at arXiv:2310.0373

arXiv.org e-Print Archive

Distributed Data Storage in Support for Context-Aware Applications

Author: Antoniu Gabriel
Burceanu Elena
Costan Alexandru
Cristea Valentin
Dobre Ciprian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

International audienceContext-aware computing is a new paradigm that relies on large amounts of data collected from a variety of sources, ranging from smartphones to sensors, to automatically take smart decisions. This usually leads to large volumes of data, that need to be further processed to derive higher-level context information. Clouds have recently emerged as interesting candidates to support the storage and aggregation of such data for large-scale context-aware applications. However, specific extensions to support context-aware data need to be designed in order to be able to fully exploit the clouds' potential. In this paper we introduce such a cloud-based system, designed to support real-time processing and persistent storage of context data. Context Aware Framework is designed as an extension of the BlobSeer storage system, building a context-aware layer on top of it to enable scalable high-throughput under high-concurrency for big context data. Our experimental evaluation validates the transparency, mobility and real-time guarantees provided by our approach to context-aware applications

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1