Search CORE

37,647 research outputs found

An efficient closed frequent itemset miner for the MOA stream mining system

Author: Bifet Figuerol Albert Carles
Gavaldà Mestre Ricard
Quadrana Massimo
Publication venue
Publication date: 01/01/2013
Field of study

Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Archiving the Relaxed Consistency Web

Author: Jordan Ramiro
Liu Jinyang
Van de Sompel Herbert
van Reenen Johann
Xie Zhiwu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate that a non-trivial portion of a relaxed consistency web archive may contain observable inconsistency, and the inconsistency window may extend significantly longer than that observed at the data store. We discuss the nature of such quality degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201

arXiv.org e-Print Archive

Crossref

A Transaction Model for Executions of Compositions of Internet of Things Services

Author: Vidyasankar K.
Publication venue: The Author(s). Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractInternet of Things (IoT) is about making “things” smart in some functionality, and connecting and enabling them to perform complex tasks by themselves. The functionality can be encapsulated as services and the task executed by composing the services. Two noteworthy functionalities of IoT services are monitoring and actuation. Monitoring implies continuous executions, and actuation is by triggering. Continuous executions typically involve stream processing. Stream input data are accumulated into batches and each batch is subjected to a sequence of computations, structured as a dataflow graph. The composition may be processing several batches simultaneously. Additionally, some non-stream OLTP transactions may also be executing concurrently. Thus, several composite transactions may be executing concurrently. This is in contrast to a typical Web services composition, where just one composite transaction is executed on each invocation. Therefore, defining transactional properties for executions of IoT service compositions is much more complex than for those of conventional Web service compositions. In this paper, we propose a transaction model and a correctness criterion for executions of IoT service compositions. Our proposal defines relaxed atomicity and isolation properties for transactions in a flexible manner and can be adapted for a variety of IoT applications

Elsevier - Publisher Connector

Towards Building a Knowledge Base of Monetary Transactions from a News Collection

Author: Balog Krisztian
Benetka Jan R.
Nørvåg Kjetil
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/09/2017
Field of study

We address the problem of extracting structured representations of economic events from a large corpus of news articles, using a combination of natural language processing and machine learning techniques. The developed techniques allow for semi-automatic population of a financial knowledge base, which, in turn, may be used to support a range of data mining and exploration tasks. The key challenge we face in this domain is that the same event is often reported multiple times, with varying correctness of details. We address this challenge by first collecting all information pertinent to a given event from the entire corpus, then considering all possible representations of the event, and finally, using a supervised learning method, to rank these representations by the associated confidence scores. A main innovative element of our approach is that it jointly extracts and stores all attributes of the event as a single representation (quintuple). Using a purpose-built test set we demonstrate that our supervised learning approach can achieve 25% improvement in F1-score over baseline methods that consider the earliest, the latest or the most frequent reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '17), 201

arXiv.org e-Print Archive

Crossref

A randomized trial to determine the impact on compliance of a psychophysical peripheral cue based on the Elaboration Likelihood Model

Author: Abu-Hasaballah
Antoinette Minniti
Bloom
Boudes
Byrom
Claxton
Cramer
Damian McEntegart
Edwards
Eisen
Eysenck
Gold
Hansson Scherman
Iskedjian
Joffe
Lee
Levy
Lien
McEntegart
Michell
Patel
Petty
Rachael Jane Horton
Richter
Stern
Stewart Mireylees
Stone
Stone
Toll
Updegraff
Van Der Wal
van Dijk
Wetzels
Wogen
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

Objective: Non-compliance in clinical studies is a significant issue, but causes remain unclear. Utilizing the Elaboration Likelihood Model of persuasion, this study assessed the psychophysical peripheral cue ‘Interactive Voice Response System (IVRS) call frequency’ on compliance. Methods: 71 participants were randomized to once daily (OD), twice daily (BID) or three times daily (TID) call schedules over two weeks. Participants completed 30-item cognitive function tests at each call. Compliance was defined as proportion of expected calls within a narrow window (± 30 min around scheduled time), and within a relaxed window (− 30 min to + 4 h). Data were analyzed by ANOVA and pairwise comparisons adjusted by the Bonferroni correction. Results: There was a relationship between call frequency and compliance. Bonferroni adjusted pairwise comparisons showed significantly higher compliance (p = 0.03) for the BID (51.0%) than TID (30.3%) for the narrow window; for the extended window, compliance was higher (p = 0.04) with OD (59.5%), than TID (38.4%). Conclusion: The IVRS psychophysical peripheral cue call frequency supported the ELM as a route to persuasion. The results also support OD strategy for optimal compliance. Models suggest specific indicators to enhance compliance with medication dosing and electronic patient diaries to improve health outcomes and data integrity respectively

Crossref

Nottingham Trent Institutional Repository (IRep)