Search CORE

442,851 research outputs found

GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data

Author: Al-Naymat Ghazi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2013
Field of study

Recent research on pattern discovery has progressed from mining frequent patterns and sequences to mining structured patterns, such as trees and graphs. Graphs as general data structure can model complex relations among data with wide applications in web exploration and social networks. However, the process of mining large graph patterns is a challenge due to the existence of large number of subgraphs. In this paper, we aim to mine only frequent complete graph patterns. A graph g in a database is complete if every pair of distinct vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining algorithm developed to explore interesting pruning techniques to extract maximal complete graphs from large spatial dataset existing in Sloan Digital Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high efficiency especially in the presence of large number of patterns. In this paper, we describe GCG that can mine not only simple co-location spatial patterns but also complex ones. To the best of our knowledge, this is the first algorithm used to exploit the extraction of maximal complete graphs in the process of mining complex co-location patterns in large spatial dataset.Comment: 1

arXiv.org e-Print Archive

Crossref

Object Discovery From a Single Unlabeled Image by Mining Frequent Itemset With Multi-scale Features

Author: Guan Qingji
Huang Yaping
Ling Haibin
Pu Mengyang
Zhang Jian
Zhang Runsheng
Zou Qi
Publication venue
Publication date: 08/08/2020
Field of study

TThe goal of our work is to discover dominant objects in a very general setting where only a single unlabeled image is given. This is far more challenge than typical co-localization or weakly-supervised localization tasks. To tackle this problem, we propose a simple but effective pattern mining-based method, called Object Location Mining (OLM), which exploits the advantages of data mining and feature representation of pre-trained convolutional neural networks (CNNs). Specifically, we first convert the feature maps from a pre-trained CNN model into a set of transactions, and then discovers frequent patterns from transaction database through pattern mining techniques. We observe that those discovered patterns, i.e., co-occurrence highlighted regions, typically hold appearance and spatial consistency. Motivated by this observation, we can easily discover and localize possible objects by merging relevant meaningful patterns. Extensive experiments on a variety of benchmarks demonstrate that OLM achieves competitive localization performance compared with the state-of-the-art methods. We also evaluate our approach compared with unsupervised saliency detection methods and achieves competitive results on seven benchmark datasets. Moreover, we conduct experiments on fine-grained classification to show that our proposed method can locate the entire object and parts accurately, which can benefit to improving the classification results significantly

arXiv.org e-Print Archive

PfAlbas constitute a new eukaryotic DNA/RNA-binding protein family in malaria parasites

Author: Aravind
Arnaud Chêne
Artur Scherf
Aurelie Claes
Balaji
Bastin
Bell
Blisnick
Boschet
Bozdech
Brennan
Brown
Callebaut
Christine Scheidig-Benatar
Coulson
De Silva
Duraisingh
Epp
Fidock
Figueiredo
Figueiredo
Flueck
Forterre
Foth
Freitas-Junior
Gardner
Garnham
Guo
Hands-Taylor
Hiroshi Sakamoto
Hong
Jelinska
Jelinska
José Juan Lopez-Rubio
LaCount
Le Roch
Liu
Llinas
Lopez-Rubio
Loïc Rivière
Mair
Mair
Mancio-Silva
Nkrumah
Nunes
O'Donnell
Perez-Toledo
Raabe
Rosaura Hernandez-Rivas
Salcedo-Amaya
Sam-Yellowe
Sandman
Shruthi S. Vembar
T. Nicolai Siegel
Templeton
Wardleworth
Wilce
Yuda
Yun
Publication venue: Oxford University Press
Publication date: 31/03/2012
Field of study

In Plasmodium falciparum, perinuclear subtelomeric chromatin conveys monoallelic expression of virulence genes. However, proteins that directly bind to chromosome ends are poorly described. Here we identify a novel DNA/RNA-binding protein family that bears homology to the archaeal protein Alba (Acetylation lowers binding affinity). We isolated three of the four PfAlba paralogs as part of a molecular complex that is associated with the P. falciparum-specific TARE6 (Telomere-Associated Repetitive Elements 6) subtelomeric region and showed in electromobility shift assays (EMSAs) that the PfAlbas bind to TARE6 repeats. In early blood stages, the PfAlba proteins were enriched at the nuclear periphery and partially co-localized with PfSir2, a TARE6-associated histone deacetylase linked to the process of antigenic variation. The nuclear location changed at the onset of parasite proliferation (trophozoite-schizont), where the PfAlba proteins were also detectable in the cytoplasm in a punctate pattern. Using single-stranded RNA (ssRNA) probes in EMSAs, we found that PfAlbas bind to ssRNA, albeit with different binding preferences. We demonstrate for the first time in eukaryotes that Alba-like proteins bind to both DNA and RNA and that their intracellular location is developmentally regulated. Discovery of the PfAlbas may provide a link between the previously described subtelomeric non-coding RNA and the regulation of antigenic variation

Crossref

PubMed Central

HAL-Pasteur

Subjectively Interesting Subgroup Discovery on Real-valued Targets

Author: De Bie Tijl
Duivesteijn Wouter
Kang Bo
Lijffijt Jefrey
Oikarinen Emilia
Puolamäki Kai
Publication venue
Publication date: 01/01/2017
Field of study

Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

Ghent University Academic Bibliography

Aaltodoc Publication Archive

Hypermedia-based discovery for source selection using low-cost linked data interfaces

Author: Colpaert Pieter
Dimou Anastasia
Mannens Erik
Vander Sande Miel
Verborgh Ruben
Publication venue: 'IGI Global'
Publication date: 01/01/2016
Field of study

Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness

Ghent University Academic Bibliography