5,987 research outputs found
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Anveshak - A Groundtruth Generation Tool for Foreground Regions of Document Images
We propose a graphical user interface based groundtruth generation tool in
this paper. Here, annotation of an input document image is done based on the
foreground pixels. Foreground pixels are grouped together with user interaction
to form labeling units. These units are then labeled by the user with the user
defined labels. The output produced by the tool is an image with an XML file
containing its metadata information. This annotated data can be further used in
different applications of document image analysis.Comment: Accepted in DAR 201
Grids and the Virtual Observatory
We consider several projects from astronomy that benefit from the Grid paradigm and
associated technology, many of which involve either massive datasets or the federation
of multiple datasets. We cover image computation (mosaicking, multi-wavelength
images, and synoptic surveys); database computation (representation through XML,
data mining, and visualization); and semantic interoperability (publishing, ontologies,
directories, and service descriptions)
FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection
In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient
Privacy Violation and Detection Using Pattern Mining Techniques
Privacy, its violations and techniques to bypass privacy violation have grabbed the centre-stage of both academia and industry in recent months. Corporations worldwide have become conscious of the implications of privacy violation and its impact on them and to other stakeholders. Moreover, nations across the world are coming out with privacy protecting legislations to prevent data privacy violations. Such legislations however expose organizations to the issues of intentional or unintentional violation of privacy data. A violation by either malicious external hackers or by internal employees can expose the organizations to costly litigations. In this paper, we propose PRIVDAM; a data mining based intelligent architecture of a Privacy Violation Detection and Monitoring system whose purpose is to detect possible privacy violations and to prevent them in the future. Experimental evaluations show that our approach is scalable and robust and that it can detect privacy violations or chances of violations quite accurately. Please contact the author for full text at [email protected]
Deep Extreme Multi-label Learning
Extreme multi-label learning (XML) or classification has been a practical and
important problem since the boom of big data. The main challenge lies in the
exponential label space which involves possible label sets especially
when the label dimension is huge, e.g., in millions for Wikipedia labels.
This paper is motivated to better explore the label space by originally
establishing an explicit label graph. In the meanwhile, deep learning has been
widely studied and used in various classification problems including
multi-label classification, however it has not been properly introduced to XML,
where the label space can be as large as in millions. In this paper, we propose
a practical deep embedding method for extreme multi-label classification, which
harvests the ideas of non-linear embedding and graph priors-based label space
modeling simultaneously. Extensive experiments on public datasets for XML show
that our method performs competitive against state-of-the-art result
- …