Search CORE

94 research outputs found

Data Mining and Official Statistics: The Past, the Present and the Future

Author: Hassani H.
Saporta G.
Silva E.S.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2014
Field of study

Along with the increasing availability of large databases under the purview of National Statistical Institutes, the application of data mining techniques to official statistics is now a hot topic that is far more important at present than it was ever before. Presented in this article is a thorough review of published work to date on the application of data mining in official statistics, and on identification of the techniques that have been explored. In addition, the importance of data mining to official statistics is flagged and a summary of the challenges that have hindered its development over the course of the last two decades is presented

International Evaluation of Research and Doctoral Training at the University of Helsinki 2005-2010 : RC-Specific Evaluation of ALKO - Algorithms and Data Analysis

Author
Publication venue
Publication date: 01/01/2012
Field of study

Helsingin yliopiston digitaalinen arkisto

Data Mining and Official Statistics: The Past, the Present and the Future

Author: Appice A
Brito P
Cecere W
Cheung P
Cohen SB
Diday E
Drummond C
Earp M
Fayyad U
Friedman JH
Frutos S
Garber SC
Gilary A
Glasson M
Hand DJ
Hassani H
Hassani H
Hassani H
Hipp J
Karlberg M
Klosgen W
Klucik M
Leventhal B
Malerba D
McCarthy J
McCarthy JS
McCarthy JS
Miller D
Nithya A
Nordbotten S
Paaβ G
Saporta G
Siebes A
Soares C
Sund R
Yeganegi MR
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref

Distribuované dolovanie dát a dátový sklad

Author: Landryová Lenka
Szappanos Tibor
Zolotová Iveta
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2005
Field of study

Tento článok popisuje distribuované dolovanie dát z dátového skladu. Sú tu diskutované problémy a vývoj distribuovaného dolovania dát v priemeslných podmínkách. Diskutujeme o problémoch učenia sa z distribuovaných dát– algoritmus rozhodovacieho stromu.This article deals briefly about distributed data mining in data warehouse. It further considers the possibilities of applications under industrial conditions, perhaps in connection with modifications of some Distributed Decision Tree Algorithm

DSpace at VSB Technical University of Ostrava

Tiedonlouhinta televerkkojen lokien analysoinnin tukena

Author: Hätönen Kimmo
Publication venue: 'University of Helsinki Libraries'
Publication date: 30/01/2009
Field of study

Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.Tiedonlouhintamenetelmillä analysoidaan suuria tietomääriä, jotka on kerätty esimerkiksi vähittäiskaupan asiakkaista, televerkkojen laitteista, prosessiteollisuuden tuotantolaitoksista, tai erotettu geeneistä tai muista tutkitusta kohteista. Menetelmät havaitsevat tehokkaasti asioiden välisiä yhteyksiä kuten käyttäytymis- ja toimintamalleja ja poikkeamia niistä. Menetelmillä tuotettua tietoa käytetään liike-elämässä ja teollisuudessa toimintojen tehostamiseen sekä tieteessä uusien tutkimustulosten etsimiseen. Tiedonlouhinnan menetelmien ongelmana on niiden monimutkaisuus ja vaikeakäyttöisyys. Pystyäkseen käyttämään menetelmiä, tulee hallita niiden teoreettiset perusteet ja kyetä asettamaan kohdalleen useita kymmeniä tuloksiin vaikuttavia syötearvoja. Tämä on hankalaa käytännön tehtävissä, kuten televerkkojen valvonnassa, joissa seurattavat datamäärät ovat valtavia ja aikaa päätöksen tekoon on vähän: pikemminkin minuutteja kuin tunteja. Minkälaisia tiedonlouhintamenetelmien tulisi olla, jotta ne voitaisiin liittää esimerkiksi osaksi televerkon valvojan työkaluja? Selvittääkseni tiedonlouhintamenetelmille asetettavat vaatimukset tarkastelen väitöskirjassani päätöksentekoa televerkon operoinnin ja ylläpidon eri vaiheissa ja tasoilla. Luon päätöksenteosta mallin ja tarkastelen sitä tukevia tiedonlouhinnan tehtäviä ja niiden tarvitsemia lähtötietoja. Kuvaan teollisessa käyttöympäristössä saatavilla olevan asiantuntemuksen, resurssit ja työvälineet, joiden avulla tiedonlouhintamenetelmiä käytetään ja johdan vaatimuslistan päätöksenteon tukena käytettäville tiedonlouhintamenetelmille. Tutkimuksessani esittelen kaksi menetelmää laajojen tapahtumia sisältävien lokitietokantojen analysointiin. CLC-menetelmä luo ilman etukäteisoppimista tai -määritelmiä annetusta laajasta tapahtumajoukosta tiivistelmän havaitsemalla ja kuvaamalla usein samankaltaisina toistuvat tapahtumat ja tapahtumien ketjut. Menetelmä jättää lokiin asiantuntijan tarkasteltavaksi yksittäiset ja harvoin esiintyvät tapahtumat. QLC-menetelmää puolestaan käytetään lokien tiiviiseen tallennukseen. Sen avulla voidaan lokit tallentaa joissain tapauksissa kolmanneksen pienempään tilaan yleisesti käytettyihin tiivistysmenetelmiin verrattuna. Lisäksi QLC-menetelmän etuna on, että sen avulla tiivistettyihin lokitiedostoihin voidaan kohdistaa kyselyjä ilman, että tiivistystä täytyy erikseen ensin purkaa. Sekä CLC- että QLC-menetelmä täyttää hyvin havaitut tiedonlouhintamenetelmille asetetut vaatimukset. Tutkimuksen lopuksi esitän teollista päätöksentekoa tukevaa jatkuvaa tiedonlouhintaa kuvaavan prosessimallin ja hahmottelen kuinka tiedonlouhintamenetelmät ja -prosessi voidaan yhdistää yrityksen tietojärjestelmään. Olen käyttänyt televerkkojen ylläpitoa tutkimusympäristönä, mutta sekä havaitsemani tiedonlouhintamenetelmille asetettavat vaatimukset että kehittämäni menetelmät ovat sovellettavissa muissa vastaavissa ympäristöissä, joissa tarkkaillaan ja analysoidaan jatkuvaa lokitapahtumien virtaa. Näille ympäristöille on yhteistä, että niissä on jatkuvasti tehtävä päätöksiä, joita ei pystytä tapahtumien ja prosessin tilojen harvinaisuuden tai moniselitteisyyden takia automatisoimaan. Tällaisia ovat muun muassa tietoturvalokit, verkkopalvelujen käytön seuranta, teollisten prosessien ylläpito, sekä laajojen logistiikkapalveluiden seuranta

Helsingin yliopiston digitaalinen arkisto

Detecting Attacks Against Deep Reinforcement Learning for Autonomous Driving

Author: AMBRA DEMONTIS
ANGELO SOTGIU
BATTISTA BIGGIO
CHENGFANG FANG
HSIAO-YING LIN
LUCA DEMETRIO
MAURA PINTOR
Publication venue
Publication date: 01/01/2023
Field of study

With the advent of deep reinforcement learning, we witness the spread of novel autonomous driving agents that learn how to drive safely among humans. However, skilled attackers might steer the decision-making process of these agents through minimal perturbations applied to the readings of their hardware sensors. These force the behavior of the victim agent to change unexpectedly, increasing the likelihood of crashes by inhibiting its braking capability or coercing it into constantly changing lanes. To counter these phenomena, we propose a detector that can be mounted on autonomous driving cars to spot the presence of ongoing attacks. The detector first profiles the agent's behavior without attacks by looking at the representation learned during training. Once deployed, the detector discards all the decisions that deviate from the regular driving pattern. We empirically highlight the detection capabilities of our work by testing it against unseen attacks deployed with increasing strength

Archivio istituzionale della ricerca - Università di Cagliari

Anomaly Detection with Variance Stabilized Density Estimation

Author: Battash Barak
Li Henry
Lindenbaum Ofir
Rozner Amit
Wolf Lior
Publication venue
Publication date: 01/06/2023
Field of study

Density estimation based anomaly detection schemes typically model anomalies as examples that reside in low-density regions. We propose a modified density estimation problem and demonstrate its effectiveness for anomaly detection. Specifically, we assume the density function of normal samples is uniform in some compact domain. This assumption implies the density function is more stable (with lower variance) around normal samples than anomalies. We first corroborate this assumption empirically using a wide range of real-world data. Then, we design a variance stabilized density estimation problem for maximizing the likelihood of the observed samples while minimizing the variance of the density around normal samples. We introduce an ensemble of autoregressive models to learn the variance stabilized distribution. Finally, we perform an extensive benchmark with 52 datasets demonstrating that our method leads to state-of-the-art results while alleviating the need for data-specific hyperparameter tuning.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive