79 research outputs found

    Representation Learning with Fine-grained Patterns

    Full text link
    With the development of computational power and techniques for data collection, deep learning demonstrates a superior performance over most of existing algorithms on benchmark data sets. Many efforts have been devoted to studying the mechanism of deep learning. One important observation is that deep learning can learn the discriminative patterns from raw materials directly in a task-dependent manner. Therefore, the representations obtained by deep learning outperform hand-crafted features significantly. However, those patterns are often learned from super-class labels due to a limited availability of fine-grained labels, while fine-grained patterns are desired in many real-world applications such as visual search in online shopping. To mitigate the challenge, we propose an algorithm to learn the fine-grained patterns sufficiently when only super-class labels are available. The effectiveness of our method can be guaranteed with the theoretical analysis. Extensive experiments on real-world data sets demonstrate that the proposed method can significantly improve the performance on target tasks corresponding to fine-grained classes, when only super-class information is available for training

    Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP

    Full text link
    Vision-language pre-training methods, e.g., CLIP, demonstrate an impressive zero-shot performance on visual categorizations with the class proxy from the text embedding of the class name. However, the modality gap between the text and vision space can result in a sub-optimal performance. We theoretically show that the gap cannot be reduced sufficiently by minimizing the contrastive loss in CLIP and the optimal proxy for vision tasks may reside only in the vision space. Therefore, given unlabeled target vision data, we propose to learn the vision proxy directly with the help from the text proxy for zero-shot transfer. Moreover, according to our theoretical analysis, strategies are developed to further refine the pseudo label obtained by the text proxy to facilitate the intra-modal proxy learning (InMaP) for vision. Experiments on extensive downstream tasks confirm the effectiveness and efficiency of our proposal. Concretely, InMaP can obtain the vision proxy within one minute on a single GPU while improving the zero-shot accuracy from 77.02%77.02\% to 80.21%80.21\% on ImageNet with ViT-L/14@336 pre-trained by CLIP. Code is available at \url{https://github.com/idstcv/InMaP}.Comment: accepted by NeurIPS'2

    Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution

    Full text link
    Strong intelligent machines powered by deep neural networks are increasingly deployed as black boxes to make decisions in risk-sensitive domains, such as finance and medical. To reduce potential risk and build trust with users, it is critical to interpret how such machines make their decisions. Existing works interpret a pre-trained neural network by analyzing hidden neurons, mimicking pre-trained models or approximating local predictions. However, these methods do not provide a guarantee on the exactness and consistency of their interpretation. In this paper, we propose an elegant closed form solution named OpenBoxOpenBox to compute exact and consistent interpretations for the family of Piecewise Linear Neural Networks (PLNN). The major idea is to first transform a PLNN into a mathematically equivalent set of linear classifiers, then interpret each linear classifier by the features that dominate its prediction. We further apply OpenBoxOpenBox to demonstrate the effectiveness of non-negative and sparse constraints on improving the interpretability of PLNNs. The extensive experiments on both synthetic and real world data sets clearly demonstrate the exactness and consistency of our interpretation.Comment: KDD 201

    AugDMC: Data Augmentation Guided Deep Multiple Clustering

    Full text link
    Clustering aims to group similar objects together while separating dissimilar ones apart. Thereafter, structures hidden in data can be identified to help understand data in an unsupervised manner. Traditional clustering methods such as k-means provide only a single clustering for one data set. Deep clustering methods such as auto-encoder based clustering methods have shown a better performance, but still provide a single clustering. However, a given dataset might have multiple clustering structures and each represents a unique perspective of the data. Therefore, some multiple clustering methods have been developed to discover multiple independent structures hidden in data. Although deep multiple clustering methods provide better performance, how to efficiently capture the alternative perspectives in data is still a problem. In this paper, we propose AugDMC, a novel data Augmentation guided Deep Multiple Clustering method, to tackle the challenge. Specifically, AugDMC leverages data augmentations to automatically extract features related to a certain aspect of the data using a self-supervised prototype-based representation learning, where different aspects of the data can be preserved under different data augmentations. Moreover, a stable optimization strategy is proposed to alleviate the unstable problem from different augmentations. Thereafter, multiple clusterings based on different aspects of the data can be obtained. Experimental results on three real-world datasets compared with state-of-the-art methods validate the effectiveness of the proposed method

    Enhancing Peak Network Traffic Prediction via Time-Series Decomposition

    Full text link
    For network administration and maintenance, it is critical to anticipate when networks will receive peak volumes of traffic so that adequate resources can be allocated to service requests made to servers. In the event that sufficient resources are not allocated to servers, they can become prone to failure and security breaches. On the contrary, we would waste a lot of resources if we always allocate the maximum amount of resources. Therefore, anticipating peak volumes in network traffic becomes an important problem. However, popular forecasting models such as Autoregressive Integrated Moving Average (ARIMA) forecast time-series data generally, thus lack in predicting peak volumes in these time-series. More than often, a time-series is a combination of different features, which may include but are not limited to 1) Trend, the general movement of the traffic volume, 2) Seasonality, the patterns repeated over some time periods (e.g. daily and monthly), and 3) Noise, the random changes in the data. Considering that the fluctuation of seasonality can be harmful for trend and peak prediction, we propose to extract seasonalities to facilitate the peak volume predictions in the time domain. The experiments on both synthetic and real network traffic data demonstrate the effectiveness of the proposed method

    NPRL: Nightly Profile Representation Learning for Early Sepsis Onset Prediction in ICU Trauma Patients

    Full text link
    Sepsis is a syndrome that develops in response to the presence of infection. It is characterized by severe organ dysfunction and is one of the leading causes of mortality in Intensive Care Units (ICUs) worldwide. These complications can be reduced through early application of antibiotics, hence the ability to anticipate the onset of sepsis early is crucial to the survival and well-being of patients. Current machine learning algorithms deployed inside medical infrastructures have demonstrated poor performance and are insufficient for anticipating sepsis onset early. In recent years, deep learning methodologies have been proposed to predict sepsis, but some fail to capture the time of onset (e.g., classifying patients' entire visits as developing sepsis or not) and others are unrealistic to be deployed into medical facilities (e.g., creating training instances using a fixed time to onset where the time of onset needs to be known apriori). Therefore, in this paper, we first propose a novel but realistic prediction framework that predicts each morning whether sepsis onset will occur within the next 24 hours with the help of most recent data collected at night, when patient-provider ratios are higher due to cross-coverage resulting in limited observation to each patient. However, as we increase the prediction rate into daily, the number of negative instances will increase while that of positive ones remain the same. Thereafter, we have a severe class imbalance problem, making a machine learning model hard to capture rare sepsis cases. To address this problem, we propose to do nightly profile representation learning (NPRL) for each patient. We prove that NPRL can theoretically alleviate the rare event problem. Our empirical study using data from a level-1 trauma center further demonstrates the effectiveness of our proposal

    Improved Visual Fine-tuning with Natural Language Supervision

    Full text link
    Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data and mitigate the over-fitting problem on downstream vision tasks with limited training examples. While the problem of catastrophic forgetting in pre-trained backbone has been extensively studied for fine-tuning, its potential bias from the corresponding pre-training task and data, attracts less attention. In this work, we investigate this problem by demonstrating that the obtained classifier after fine-tuning will be close to that induced by the pre-trained model. To reduce the bias in the classifier effectively, we introduce a reference distribution obtained from a fixed text classifier, which can help regularize the learned vision classifier. The proposed method, Text Supervised fine-tuning (TeS), is evaluated with diverse pre-trained vision models including ResNet and ViT, and text encoders including BERT and CLIP, on 11 downstream tasks. The consistent improvement with a clear margin over distinct scenarios confirms the effectiveness of our proposal. Code is available at \url{https://github.com/idstcv/TeS}.Comment: accepted by ICCV'2

    Multi-Subset Approach to Early Sepsis Prediction

    Full text link
    Sepsis is a life-threatening organ malfunction caused by the host's inability to fight infection, which can lead to death without proper and immediate treatment. Therefore, early diagnosis and medical treatment of sepsis in critically ill populations at high risk for sepsis and sepsis-associated mortality are vital to providing the patient with rapid therapy. Studies show that advancing sepsis detection by 6 hours leads to earlier administration of antibiotics, which is associated with improved mortality. However, clinical scores like Sequential Organ Failure Assessment (SOFA) are not applicable for early prediction, while machine learning algorithms can help capture the progressing pattern for early prediction. Therefore, we aim to develop a machine learning algorithm that predicts sepsis onset 6 hours before it is suspected clinically. Although some machine learning algorithms have been applied to sepsis prediction, many of them did not consider the fact that six hours is not a small gap. To overcome this big gap challenge, we explore a multi-subset approach in which the likelihood of sepsis occurring earlier than 6 hours is output from a previous subset and feed to the target subset as additional features. Moreover, we use the hourly sampled data like vital signs in an observation window to derive a temporal change trend to further assist, which however is often ignored by previous studies. Our empirical study shows that both the multi-subset approach to alleviating the 6-hour gap and the added temporal trend features can help improve the performance of sepsis-related early prediction
    corecore