Search CORE

350 research outputs found

Discontinuities in pattern inference

Author: Daniel Reidenbach (1256598)
Publication venue
Publication date: 01/01/2008
Field of study

This paper deals with the inferrability of classes of E-pattern languages—also referred to as extended or erasing pattern languages—from positive data in Gold’s model of identification in the limit. The first main part of the paper shows that the recently presented negative result on terminal-free E-pattern languages over binary alphabets does not hold for other alphabet sizes, so that the full class of these languages is inferrable from positive data if and only if the corresponding terminal alphabet does not consist of exactly two distinct letters. The second main part yields the insight that the positive result on terminal-free E-pattern languages over alphabets with three or four letters cannot be extended to the class of general E-pattern languages. With regard to larger alphabets, the extensibility remains open. The proof methods developed for these main results do not directly discuss the (non-)existence of appropriate learning strategies, but they deal with structural properties of classes of E-pattern languages, and, in particular, with the problem of finding telltales for these languages. It is shown that the inferrability of classes of E-pattern languages is closely connected to some problems on the ambiguity of morphisms so that the technical contributions of the paper largely consist of combinatorial insights into morphisms in word monoids

Loughborough University Institutional Repository

Discontinuities in pattern inference

Author: Angluin
Angluin
Baliga
Choffrut
Daniel Reidenbach
Ehrenfeucht
Filè
Freydenberger
Gold
Harju
Head
Jain
Jiang
Jiang
Lange
Lange
Lange
Lange
Lipponen
Lothaire
Makanin
Mateescu
Ohlebusch
Reidenbach
Reidenbach
Reidenbach
Reidenbach
Reidenbach
Reischuk
Rogers
Rossmanith
Rozenberg
Salomaa
Salomaa
Shinohara
Shinohara
Thue
Wiehagen
Zeugmann
Publication venue
Publication date: 01/01/2008
Field of study

Loughborough University Institutional Repository

Elsevier - Publisher Connector

Crossref

A discontinuity in pattern inference

Author: Daniel Reidenbach (1256598)
Publication venue
Publication date: 01/01/2004
Field of study

This paper examines the learnability of a major subclass of E-pattern languages – also known as erasing or extended pattern languages – in Gold’s learning model: We show that the class of terminal-free E-pattern languages is inferrable from positive data if the corresponding terminal alphabet consists of three or more letters. Consequently, the recently presented negative result for binary alphabets is unique

CiteSeerX

Loughborough University Institutional Repository

PMP: Privacy-Aware Matrix Profile against Sensitive Pattern Inference

Author: Ding Jiahao
Gao Yifeng
Lin Jessica
Zhang Li
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/01/2023
Field of study

Recent rapid development of sensor technology has allowed massive fine-grained time series (TS) data to be collected and set the foundation for the development of data-driven services and applications. During the process, data sharing is often involved to allow the third-party modelers to perform specific time series data mining (TSDM) tasks based on the need of data owner. The high resolution of TS brings new challenges in protecting privacy. While meaningful information in high-resolution TS shifts from concrete point values to local shape-based segments, numerous research have found that long shape-based patterns could contain more sensitive information and may potentially be extracted and misused by a malicious third party. However, the privacy issue for TS patterns is surprisingly seldom explored in privacy-preserving literature. In this work, we consider a new privacy-preserving problem: preventing malicious inference on long shape-based patterns while preserving short segment information for the utility task performance. To mitigate the challenge, we investigate an alternative approach by sharing Matrix Profile (MP), which is a non-linear transformation of original data and a versatile data structure that supports many data mining tasks. We found that while MP can prevent concrete shape leakage, the canonical correlation in MP index can still reveal the location of sensitive long pattern. Based on this observation, we design two attacks named Location Attack and Entropy Attack to extract the pattern location from MP. To further protect MP from these two attacks, we propose a Privacy-Aware Matrix Profile (PMP) via perturbing the local correlation and breaking the canonical correlation in MP index vector. We evaluate our proposed PMP against baseline noise-adding methods through quantitative analysis and real-world case studies to show the effectiveness of the proposed method

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

Author: Pavllo Dario
Piccardi Tiziano
West Robert
Publication venue
Publication date: 07/04/2018
Field of study

We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as ["Q", said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm's output for computational social science by analyzing the sentiment expressed in our extracted quotations.Comment: Accepted at the 12th International Conference on Web and Social Media (ICWSM), 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

FixMiner: Mining Relevant Fix Patterns for Automated Program Repair

Author: Bissyandé Tegawendé F.
Kim Dongsun
Klein Jacques
Koyuncu Anil
Liu Kui
Monperrus Martin
Traon Yves Le
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/09/2019
Field of study

Patching is a common activity in software development. It is generally performed on a source code base to address bugs or add new functionalities. In this context, given the recurrence of bugs across projects, the associated similar patches can be leveraged to extract generic fix actions. While the literature includes various approaches leveraging similarity among patches to guide program repair, these approaches often do not yield fix patterns that are tractable and reusable as actionable input to APR systems. In this paper, we propose a systematic and automated approach to mining relevant and actionable fix patterns based on an iterative clustering strategy applied to atomic changes within patches. The goal of FixMiner is thus to infer separate and reusable fix patterns that can be leveraged in other patch generation systems. Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree structure of the edit scripts that captures the AST-level context of the code changes. FixMiner uses different tree representations of Rich Edit Scripts for each round of clustering to identify similar changes. These are abstract syntax trees, edit actions trees, and code context trees. We have evaluated FixMiner on thousands of software patches collected from open source projects. Preliminary results show that we are able to mine accurate patterns, efficiently exploiting change information in Rich Edit Scripts. We further integrated the mined patterns to an automated program repair prototype, PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J benchmark. Beyond this quantitative performance, we show that the mined fix patterns are sufficiently relevant to produce patches with a high probability of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

Omics analysis in Caenorhabditis elegans: pattern inference and interpretation

Author: Yang Wentao
Publication venue
Publication date: 01/01/2017
Field of study

High-throughput molecular technologies have greatly enhanced our understanding of biological processes by characterizing expression changes of genes (microarray and RNA-Seq data) and proteins (proteomics data), or transcription factor targets and epigenetics states (ChIP-chip and ChIP-Seq data). Among them, transcriptome studies based on microarrays or RNA-Seq have the ability to identify genes involved in the response to environmental change or specific stressors, thereby helping us to infer the underlying biological processes. During my PhD, I mainly focused on transcriptomic data analysis, using in most cases the nematode Caenorhabditis elegans as a model taxon. In particular, I have addressed seven specific projects: i) development of ABSSeq, an improved detection approach of differential gene expression for RNA-Seq data; ii) development of aFold, a method to fully moderate fold-change of RNA-Seq data and to improve gene ranking and visualization; iii) development of WormExp, a knowledge-based approach for interpreting gene sets in C. elegans; iv) exploration of the regulation of the C. elegans immune system using curated data sets from WormExp; v) characterization of putative major effectors (GATA transcription factors) in the C. elegans innate immune system; vi) comparison of the immune response of C. elegans at protein and transcript level. In general, our work facilitates high-throughput data analysis via improving pattern inference and interpretation, which in practice provides new insights into the immune system of C. elegans

MACAU: Open Access Repository of Kiel University

Interpreting CNN Knowledge via an Explanatory Graph

Author: Cao Ruiming
Shi Feng
Wu Ying Nian
Zhang Quanshi
Zhu Song-Chun
Publication venue
Publication date: 21/11/2017
Field of study

This paper learns a graphical model, namely an explanatory graph, which reveals the knowledge hierarchy hidden inside a pre-trained CNN. Considering that each filter in a conv-layer of a pre-trained CNN usually represents a mixture of object parts, we propose a simple yet efficient method to automatically disentangles different part patterns from each filter, and construct an explanatory graph. In the explanatory graph, each node represents a part pattern, and each edge encodes co-activation relationships and spatial relationships between patterns. More importantly, we learn the explanatory graph for a pre-trained CNN in an unsupervised manner, i.e., without a need of annotating object parts. Experiments show that each graph node consistently represents the same object part through different images. We transfer part patterns in the explanatory graph to the task of part localization, and our method significantly outperforms other approaches.Comment: in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications