15,113 research outputs found
CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise
In this paper, we study the problem of learning image classification models
with label noise. Existing approaches depending on human supervision are
generally not scalable as manually identifying correct or incorrect labels is
time-consuming, whereas approaches not relying on human supervision are
scalable but less effective. To reduce the amount of human supervision for
label noise cleaning, we introduce CleanNet, a joint neural embedding network,
which only requires a fraction of the classes being manually verified to
provide the knowledge of label noise that can be transferred to other classes.
We further integrate CleanNet and conventional convolutional neural network
classifier into one framework for image classification learning. We demonstrate
the effectiveness of the proposed algorithm on both of the label noise
detection task and the image classification on noisy data task on several
large-scale datasets. Experimental results show that CleanNet can reduce label
noise detection error rate on held-out classes where no human supervision
available by 41.5% compared to current weakly supervised methods. It also
achieves 47% of the performance gain of verifying all images with only 3.2%
images verified on an image classification task. Source code and dataset will
be available at kuanghuei.github.io/CleanNetProject.Comment: Accepted to CVPR 201
FAME: Face Association through Model Evolution
We attack the problem of learning face models for public faces from
weakly-labelled images collected from web through querying a name. The data is
very noisy even after face detection, with several irrelevant faces
corresponding to other people. We propose a novel method, Face Association
through Model Evolution (FAME), that is able to prune the data in an iterative
way, for the face models associated to a name to evolve. The idea is based on
capturing discriminativeness and representativeness of each instance and
eliminating the outliers. The final models are used to classify faces on novel
datasets with possibly different characteristics. On benchmark datasets, our
results are comparable to or better than state-of-the-art studies for the task
of face identification.Comment: Draft version of the stud
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
- âŠ