Search CORE

626 research outputs found

Cross-Domain Labeled LDA for Cross-Domain Text Classification

Author: Jing Baoyu
Lu Chenwei
Niu Cheng
Wang Deqing
Zhuang Fuzhen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/09/2018
Field of study

Cross-domain text classification aims at building a classifier for a target domain which leverages data from both source and target domain. One promising idea is to minimize the feature distribution differences of the two domains. Most existing studies explicitly minimize such differences by an exact alignment mechanism (aligning features by one-to-one feature alignment, projection matrix etc.). Such exact alignment, however, will restrict models' learning ability and will further impair models' performance on classification tasks when the semantic distributions of different domains are very different. To address this problem, we propose a novel group alignment which aligns the semantics at group level. In addition, to help the model learn better semantic groups and semantics within these groups, we also propose a partial supervision for model's learning in source domain. To this end, we embed the group alignment and a partial supervision into a cross-domain topic model, and propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and Reuters dataset, extensive quantitative (classification, perplexity etc.) and qualitative (topic detection) experiments are conducted to show the effectiveness of the proposed group alignment and partial supervision.Comment: ICDM 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Learning with Partial Supervision for Clustering and Classification

Author: Pei Yuanli
Publication venue: 'Oregon State University'
Publication date
Field of study

In the field of machine learning, clustering and classification are two fundamental tasks. Traditionally, clustering is an unsupervised method, where no supervision about the data is available for learning; classification is a supervised task, where fully-labeled data are collected for training a classifier. In some scenarios, however, we may not have the full label but only partial supervision about the data, such as instance similarities or incomplete label assignments. In such cases, traditional clustering and classification methods do not directly apply. To address such problems, this thesis focuses on the task of learning from partial supervision for clustering and classification tasks. For clustering with partial supervision, we investigate three problems: a) constrained clustering in multi-instance multi-label learning, where the goal is to group instances into clusters that respect the background knowledge given by the bag-level labels; b) clustering with constraints, where the partial supervision is expressed as "pairwise constraints" or "relative constraints", regarding similarities about instance pairs and triplets respectively; c) active learning of pairwise constraints for clustering, where the goal is to improve the clustering with minimum human effort by iteratively querying the most informative pairs to an oracle. For classification with partial supervision, we address the problem of multi-label learning where data is associated with a latent label hierarchy and incomplete label assignments, and the goal is to simultaneously discover the latent hierarchy as well as to learn a multi-label classifier that is consistent with the hierarchy.Keywords: Classification, Partial Supervision, Active Learning, Clusterin

ScholarsArchive@OSU

On using partial supervision for text categorization

Author: Charu C Aggarwal
Fellow IEEE Philip S Yu
Stephen C Gates
Publication venue
Publication date: 01/01/2004
Field of study

Abstract-In this paper, we discuss the merits of building text categorization systems by using supervised clustering techniques. Traditional approaches for document classification on a predefined set of classes are often unable to provide sufficient accuracy because of the difficulty of fitting a manually categorized collection of documents in a given classification model. This is especially the case for heterogeneous collections of Web documents which have varying styles, vocabulary, and authorship. Hence, this paper investigates the use of clustering in order to create the set of categories and its use for classification of documents. Completely unsupervised clustering has the disadvantage that it has difficulty in isolating sufficiently fine-grained classes of documents relating to a coherent subject matter. In this paper, we use the information from a preexisting taxonomy in order to supervise the creation of a set of related clusters, though with some freedom in defining and creating the classes. We show that the advantage of using partially supervised clustering is that it is possible to have some control over the range of subjects that one would like the categorization system to address, but with a precise mathematical definition of how each category is defined. An extremely effective way then to categorize documents is to use this a priori knowledge of the definition of each category. We also discuss a new technique to help the classifier distinguish better among closely related clusters

CiteSeerX

Label-Set Loss Functions for Partial Supervision: Application to Fetal Brain 3D MRI Parcellation

Author: Aertsen M
David AL
Demaerel P
Deprest J
Deprest T
Emam D
Fidon L
Guffens F
Melbourne A
Mufti N
Ourselin S
Vercauteren T
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 21/09/2021
Field of study

Deep neural networks have increased the accuracy of automatic segmentation, however their accuracy depends on the availability of a large number of fully segmented images. Methods to train deep neural networks using images for which some, but not all, regions of interest are segmented are necessary to make better use of partially annotated datasets. In this paper, we propose the first axiomatic definition of label-set loss functions that are the loss functions that can handle partially segmented images. We prove that there is one and only one method to convert a classical loss function for fully segmented images into a proper label-set loss function. Our theory also allows us to define the leaf-Dice loss, a label-set generalisation of the Dice loss particularly suited for partial supervision with only missing labels. Using the leaf-Dice loss, we set a new state of the art in partially supervised learning for fetal brain 3D MRI segmentation. We achieve a deep neural network able to segment white matter, ventricles, cerebellum, extra-ventricular CSF, cortical gray matter, deep gray matter, brainstem, and corpus callosum based on fetal brain 3D MRI of anatomically normal fetuses or with open spina bifida. Our implementation of the proposed label-set loss functions is available at https://github.com/LucasFidon/label-set-loss-functions

UCL Discovery

Label-Set Loss Functions for Partial Supervision: Application to Fetal Brain 3D MRI Parcellation

Author: Aertsen M
David AL
Demaerel P
Deprest J
Deprest T
Emam D
Fidon L
Guffens F
Melbourne A
Mufti N
Ourselin S
Vercauteren T
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 09/07/2021
Field of study

arXiv.org e-Print Archive

UCL Discovery

PENGARUH FASILITAS KERJA, KEDISIPLINAN DAN PENGAWASAN TERHADAP KINERJA PEGAWAI DINAS PERIKANAN KOTA TANJUNGBALAI

Author: . Isminingsih
. Nurdin
Manurung Elfina
Metia Tengku Anggi
Nura Eko Topan Prihatin
Publication venue: 'Universitas Islam Sumatera Utara'
Publication date: 11/03/2021
Field of study

The formulation of the problem in this research is: How do work facilities, discipline and supervision influence the employees performance at Dinas Perikanan Kota Tanjungbalai. This study aims to determine the effect of work facilities, discipline and supervision on employee performance. With total sampling technique, the sample in this study was 41 people. The results showed; Work facilities partially have a positive effect on employee performance; Discipline partially has a positive effect on employee performance; Partial supervision has a positive effect on employee performance; Work facilities, discipline and supervision have a positive and significant effect on employee performance

Jurnal Online Universitas Islam Sumatera Utara

Active Labeling: Streaming Stochastic Gradients

Author: Bach Francis
Cabannes Vivien
Perchet Vianney
Rudi Alessandro
Publication venue
Publication date: 01/11/2022
Field of study

The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the "active labeling" problem, which focuses on active learning with partial supervision, we provide a streaming technique that provably minimizes the ratio of generalization error over the number of samples. We illustrate our technique in depth for robust regression.Comment: 38 pages (9 main pages), 9 figure

arXiv.org e-Print Archive