5 research outputs found

    Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes

    Full text link
    In this paper, we present a label transfer model from texts to images for image classification tasks. The problem of image classification is often much more challenging than text classification. On one hand, labeled text data is more widely available than the labeled images for classification tasks. On the other hand, text data tends to have natural semantic interpretability, and they are often more directly related to class labels. On the contrary, the image features are not directly related to concepts inherent in class labels. One of our goals in this paper is to develop a model for revealing the functional relationships between text and image features as to directly transfer intermodal and intramodal labels to annotate the images. This is implemented by learning a transfer function as a bridge to propagate the labels between two multimodal spaces. However, the intermodal label transfers could be undermined by blindly transferring the labels of noisy texts to annotate images. To mitigate this problem, we present an intramodal label transfer process, which complements the intermodal label transfer by transferring the image labels instead when relevant text is absent from the source corpus. In addition, we generalize the inter-modal label transfer to zero-shot learning scenario where there are only text examples available to label unseen classes of images without any positive image examples. We evaluate our algorithm on an image classification task and show the effectiveness with respect to the other compared algorithms.Comment: The paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. It will apear in a future issu

    Transferring a generic pedestrian detector towards specific scenes.

    Get PDF
    近年來,在公開的大規模人工標注數據集上訓練通用行人檢測器的方法有了顯著的進步。然而,當通用行人檢測器被應用到一個特定的,未公開過的場景中時,它的性能會不如預期。這是由待檢測的數據(源樣本)與訓練數據(目標樣本)的不匹配,以及新場景中視角、光照、分辨率和背景噪音的變化擾動造成的。在本論文中,我們提出一個新的自動將通用行人檢測器適應到特定場景中的框架。這個框架分為兩個階段。在第一階段,我們探索監控錄像場景中提供的特定表征。利用這些表征,從目標場景中選擇正負樣本並重新訓練行人檢測器,該過程不斷迭代直至收斂。在第二階段,我們提出一個新的機器學習框架,該框架綜合每個樣本的標簽和比重。根據這些比重,源樣本和目標樣本被重新權重,以優化最終的分類器。這兩種方法都屬於半監督學習,僅僅需要非常少的人工干預。使用提出的方法可以顯著提高通用行人檢測器的准確性。實驗顯示,由方法訓練出來的檢測器可以和使用大量手工標注的目標場景數據訓練出來的媲美。與其它解決類似問題的方法比較,該方法同樣好於許多已有方法。本論文的工作已經分別於朲朱朱年和朲朱朲年在杉杅杅杅計算機視覺和模式識別會議(权杖材杒)中發表。In recent years, significant progress has been made in learning generic pedestrian detectors from publicly available manually labeled large scale training datasets. However, when a generic pedestrian detector is applied to a specific, previously undisclosed scene where the testing data (target examples) does not match with the training data (source examples) because of variations of viewpoints, resolutions, illuminations and backgrounds, its accuracy may decrease greatly.In this thesis, a new framework is proposed automatically adapting a pre-trained generic pedestrian detector to a specific traffic scene. The framework is two-phased. In the first phase, scene-specific cues in the video surveillance sequence are explored. Utilizing the multi-cue information, both condent positive and negative examples from the target scene are selected to re-train the detector iteratively. In the second phase, a new machine learning framework is proposed, incorporating not only example labels but also example confidences. Source and target examples are re-weighted according to their confidence, optimizing the performance of the final classifier. Both methods belong to semi-supervised learning and require very little human intervention.The proposed approaches significantly improve the accuracy of the generic pedestrian detector. Their results are comparable with the detector trained using a large number of manually labeled frames from the target scene. Comparison with other existing approaches tackling similar problems shows that the proposed approaches outperform many contemporary methods.The works have been published on the IEEE Conference on Computer Vision and Pattern Recognition in 2011 and 2012, respectively.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Wang, Meng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2012.Includes bibliographical references (leaves 42-45).Abstracts also in Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- PedestrianDetection --- p.1Chapter 1.1.1 --- Overview --- p.1Chapter 1.1.2 --- StatisticalLearning --- p.1Chapter 1.1.3 --- ObjectRepresentation --- p.2Chapter 1.1.4 --- SupervisedStatisticalLearninginObjectDetection --- p.3Chapter 1.2 --- PedestrianDetectioninVideoSurveillance --- p.4Chapter 1.2.1 --- ProblemSetting --- p.4Chapter 1.2.2 --- Challenges --- p.4Chapter 1.2.3 --- MotivationsandContributions --- p.5Chapter 1.3 --- RelatedWork --- p.6Chapter 1.4 --- OrganizationsofChapters --- p.9Chapter 2 --- Label Inferring by Multi-Cues --- p.10Chapter 2.1 --- DataSet --- p.10Chapter 2.2 --- Method --- p.12Chapter 2.2.1 --- CondentPositiveExamplesofPedestrians --- p.13Chapter 2.2.2 --- CondentNegativeExamplesfromtheBackground --- p.17Chapter 2.2.3 --- CondentNegativeExamplesfromVehicles --- p.17Chapter 2.2.4 --- FinalSceneSpecicPedestrianDetector --- p.19Chapter 2.3 --- ExperimentResults --- p.20Chapter 3 --- Transferring a Detector by Condence Propagation --- p.24Chapter 3.1 --- Method --- p.25Chapter 3.1.1 --- Overview --- p.25Chapter 3.1.2 --- InitialEstimationofCondenceScores --- p.27Chapter 3.1.3 --- Re-weightingSourceSamples --- p.27Chapter 3.1.4 --- Condence-EncodedSVM --- p.30Chapter 3.2 --- Experiments --- p.33Chapter 3.2.1 --- Datasets --- p.33Chapter 3.2.2 --- ParameterSetting --- p.35Chapter 3.2.3 --- Results --- p.36Chapter 4 --- Conclusions and Future Work --- p.4

    Towards Cross-Category Knowledge Propagation for Learning Visual Concepts

    No full text
    In recent years, knowledge transfer algorithms have become one of most the active research areas in learning visual concepts. Most of the existing learning algorithms focuses on leveraging the knowledge transfer process which i
    corecore