CORE
🇺🇦
make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
CURE: Flexible Categorical Data Representation by Hierarchical Coupling Learning
Authors
L Cao
H Gao
+3 more
S Jian
K Lu
G Pang
Publication date
1 May 2019
Publisher
'Institute of Electrical and Electronics Engineers (IEEE)'
Doi
Cite
Abstract
© 1989-2012 IEEE. The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into two models: coupled data embedding (CDE) for clustering and coupled outlier scoring of high-dimensional data (COSH) for outlier detection. These show that CURE is flexible for value clustering and coupling learning between value clusters for different learning tasks. CDE embeds categorical data into a new space in which features are independent and semantics are rich. COSH represents data w.r.t. an outlying vector to capture complex outlying behaviors of objects in high-dimensional data. Substantial experiments show that CDE significantly outperforms three popular unsupervised encoding methods and three state-of-the-art similarity measures, and COSH performs significantly better than five state-of-the-art outlier detection methods on high-dimensional data. CDE and COSH are scalable and stable, linear to data size and quadratic to the number of features, and are insensitive to their parameters
Similar works
Full text
Available Versions
Institutional Knowledge at Singapore Management University
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:ink.library.smu.edu.sg:sis...
Last time updated on 17/05/2022
OPUS - University of Technology Sydney
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:opus.lib.uts.edu.au:10453/...
Last time updated on 18/10/2019