37,647 research outputs found

    An efficient closed frequent itemset miner for the MOA stream mining system

    Get PDF
    Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Postprint (published version

    Archiving the Relaxed Consistency Web

    Full text link
    The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate that a non-trivial portion of a relaxed consistency web archive may contain observable inconsistency, and the inconsistency window may extend significantly longer than that observed at the data store. We discuss the nature of such quality degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201

    A Transaction Model for Executions of Compositions of Internet of Things Services

    Get PDF
    AbstractInternet of Things (IoT) is about making “things” smart in some functionality, and connecting and enabling them to perform complex tasks by themselves. The functionality can be encapsulated as services and the task executed by composing the services. Two noteworthy functionalities of IoT services are monitoring and actuation. Monitoring implies continuous executions, and actuation is by triggering. Continuous executions typically involve stream processing. Stream input data are accumulated into batches and each batch is subjected to a sequence of computations, structured as a dataflow graph. The composition may be processing several batches simultaneously. Additionally, some non-stream OLTP transactions may also be executing concurrently. Thus, several composite transactions may be executing concurrently. This is in contrast to a typical Web services composition, where just one composite transaction is executed on each invocation. Therefore, defining transactional properties for executions of IoT service compositions is much more complex than for those of conventional Web service compositions. In this paper, we propose a transaction model and a correctness criterion for executions of IoT service compositions. Our proposal defines relaxed atomicity and isolation properties for transactions in a flexible manner and can be adapted for a variety of IoT applications

    Towards Building a Knowledge Base of Monetary Transactions from a News Collection

    Full text link
    We address the problem of extracting structured representations of economic events from a large corpus of news articles, using a combination of natural language processing and machine learning techniques. The developed techniques allow for semi-automatic population of a financial knowledge base, which, in turn, may be used to support a range of data mining and exploration tasks. The key challenge we face in this domain is that the same event is often reported multiple times, with varying correctness of details. We address this challenge by first collecting all information pertinent to a given event from the entire corpus, then considering all possible representations of the event, and finally, using a supervised learning method, to rank these representations by the associated confidence scores. A main innovative element of our approach is that it jointly extracts and stores all attributes of the event as a single representation (quintuple). Using a purpose-built test set we demonstrate that our supervised learning approach can achieve 25% improvement in F1-score over baseline methods that consider the earliest, the latest or the most frequent reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '17), 201

    A randomized trial to determine the impact on compliance of a psychophysical peripheral cue based on the Elaboration Likelihood Model

    Get PDF
    Objective: Non-compliance in clinical studies is a significant issue, but causes remain unclear. Utilizing the Elaboration Likelihood Model of persuasion, this study assessed the psychophysical peripheral cue ‘Interactive Voice Response System (IVRS) call frequency’ on compliance. Methods: 71 participants were randomized to once daily (OD), twice daily (BID) or three times daily (TID) call schedules over two weeks. Participants completed 30-item cognitive function tests at each call. Compliance was defined as proportion of expected calls within a narrow window (± 30 min around scheduled time), and within a relaxed window (− 30 min to + 4 h). Data were analyzed by ANOVA and pairwise comparisons adjusted by the Bonferroni correction. Results: There was a relationship between call frequency and compliance. Bonferroni adjusted pairwise comparisons showed significantly higher compliance (p = 0.03) for the BID (51.0%) than TID (30.3%) for the narrow window; for the extended window, compliance was higher (p = 0.04) with OD (59.5%), than TID (38.4%). Conclusion: The IVRS psychophysical peripheral cue call frequency supported the ELM as a route to persuasion. The results also support OD strategy for optimal compliance. Models suggest specific indicators to enhance compliance with medication dosing and electronic patient diaries to improve health outcomes and data integrity respectively
    corecore