Search CORE

5,987 research outputs found

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive

Anveshak - A Groundtruth Generation Tool for Foreground Regions of Document Images

Author: B Yanikoglu
DH Douglas
FM Wahl
L Wenyin
L Yang
LC Ha
N Otsu
RC Gonzalez
S Dey
S Suzuki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/08/2017
Field of study

We propose a graphical user interface based groundtruth generation tool in this paper. Here, annotation of an input document image is done based on the foreground pixels. Foreground pixels are grouped together with user interaction to form labeling units. These units are then labeled by the user with the user defined labels. The output produced by the tool is an image with an XML file containing its metadata information. This annotated data can be further used in different applications of document image analysis.Comment: Accepted in DAR 201

arXiv.org e-Print Archive

Crossref

Grids and the Virtual Observatory

Author: Williams Roy
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/01/2003
Field of study

We consider several projects from astronomy that benefit from the Grid paradigm and associated technology, many of which involve either massive datasets or the federation of multiple datasets. We cover image computation (mosaicking, multi-wavelength images, and synoptic surveys); database computation (representation through XML, data mining, and visualization); and semantic interoperability (publishing, ontologies, directories, and service descriptions)

Caltech Authors

FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection

Author: Chang Liang
Fournier-Viger Philippe
Li Hongzhou
Lin Jerry Chun-Wei
Zhang Ji
Zhu Xiaodong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/11/2017
Field of study

In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient

University of Southern Queensland ePrints

A Column Styled Composable Schema Matcher for Semantic Data-types

Author: Bottelier J.
Liao X.
Zhao Z.
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 24/06/2019
Field of study

International Migration, Integration and Social Cohesion online publications

Privacy Violation and Detection Using Pattern Mining Techniques

Author: Bhattacharya Jaijit
Chakraborti Debamitro
Dass Rajanish
Gupta S K
Kapoor Vishal
Publication venue
Publication date
Field of study

Privacy, its violations and techniques to bypass privacy violation have grabbed the centre-stage of both academia and industry in recent months. Corporations worldwide have become conscious of the implications of privacy violation and its impact on them and to other stakeholders. Moreover, nations across the world are coming out with privacy protecting legislations to prevent data privacy violations. Such legislations however expose organizations to the issues of intentional or unintentional violation of privacy data. A violation by either malicious external hackers or by internal employees can expose the organizations to costly litigations. In this paper, we propose PRIVDAM; a data mining based intelligent architecture of a Privacy Violation Detection and Monitoring system whose purpose is to detect possible privacy violations and to prevent them in the future. Experimental evaluations show that our approach is scalable and robust and that it can detect privacy violations or chances of violations quite accurately. Please contact the author for full text at [email protected]

Research Papers in Economics

Deep Extreme Multi-label Learning

Author: Wang Xiangfeng
Yan Junchi
Zha Hongyuan
Zhang Wenjie
Publication venue
Publication date: 08/06/2018
Field of study

Extreme multi-label learning (XML) or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves

2^L

possible label sets especially when the label dimension

L

is huge, e.g., in millions for Wikipedia labels. This paper is motivated to better explore the label space by originally establishing an explicit label graph. In the meanwhile, deep learning has been widely studied and used in various classification problems including multi-label classification, however it has not been properly introduced to XML, where the label space can be as large as in millions. In this paper, we propose a practical deep embedding method for extreme multi-label classification, which harvests the ideas of non-linear embedding and graph priors-based label space modeling simultaneously. Extensive experiments on public datasets for XML show that our method performs competitive against state-of-the-art result

arXiv.org e-Print Archive

Crossref