322 research outputs found
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Data Mining for Web-Enabled Electronic Business Applications
Web-enabled electronic business is generating massive amounts of data on customer purchases, browsing patterns, usage times, and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for Web-enabled electronicbusiness. Copyright Idea Group Inc
XML documents clustering using a tensor space model
The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information
Improving Recommendation Novelty Based on Topic Taxonomy
Clustering has been a widely applied approach to improve the computation efficiency of collaborative filtering based recommendation systems. Many techniques have been suggested to discover the item-to-item, user-to- user, and item-to-user associations within user clusters. However, there are few systems utilize the cluster based topic-to-topic associations to make recommendations. This paper suggests a taxonomy-based recommender system that utilizes cluster based topic-to-topic associations to improve its recommendation quality and novelty
The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective
The World Wide Web no longer consists just of HTML pages. Our work sheds
light on a number of trends on the Internet that go beyond simple Web pages.
The hidden Web provides a wealth of data in semi-structured form, accessible
through Web forms and Web services. These services, as well as numerous other
applications on the Web, commonly use XML, the eXtensible Markup Language. XML
has become the lingua franca of the Internet that allows customized markups to
be defined for specific domains. On top of XML, the Semantic Web grows as a
common structured data source. In this work, we first explain each of these
developments in detail. Using real-world examples from scientific domains of
great interest today, we then demonstrate how these new developments can assist
the managing, harvesting, and organization of data on the Web. On the way, we
also illustrate the current research avenues in these domains. We believe that
this effort would help bridge multiple database tracks, thereby attracting
researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011
ALGAN: Time Series Anomaly Detection with Adjusted-LSTM GAN
Anomaly detection in time series data, to identify points that deviate from
normal behaviour, is a common problem in various domains such as manufacturing,
medical imaging, and cybersecurity. Recently, Generative Adversarial Networks
(GANs) are shown to be effective in detecting anomalies in time series data.
The neural network architecture of GANs (i.e. Generator and Discriminator) can
significantly improve anomaly detection accuracy. In this paper, we propose a
new GAN model, named Adjusted-LSTM GAN (ALGAN), which adjusts the output of an
LSTM network for improved anomaly detection in both univariate and multivariate
time series data in an unsupervised setting. We evaluate the performance of
ALGAN on 46 real-world univariate time series datasets and a large multivariate
dataset that spans multiple domains. Our experiments demonstrate that ALGAN
outperforms traditional, neural network-based, and other GAN-based methods for
anomaly detection in time series data
- …