3 research outputs found

    Improving interpretability and regularization in deep learning

    Get PDF
    Deep learning approaches yield state-of-the-art performance in a range of tasks, including automatic speech recognition. However, the highly distributed representation in a deep neural network (DNN) or other network variations is difficult to analyze, making further parameter interpretation and regularization challenging. This paper presents a regularization scheme acting on the activation function output to improve the network interpretability and regularization. The proposed approach, referred to as activation regularization, encourages activation function outputs to satisfy a target pattern. By defining appropriate target patterns, different learning concepts can be imposed on the network. This method can aid network interpretability and also has the potential to reduce overfitting. The scheme is evaluated on several continuous speech recognition tasks: the Wall Street Journal continuous speech recognition task, eight conversational telephone speech tasks from the IARPA Babel program and a U.S. English broadcast news task. On all the tasks, the activation regularization achieved consistent performance gains over the standard DNN baselines

    Object Detection and Classification based on Hierarchical Semantic Features and Deep Neural Networks

    Get PDF
    The abilities of feature learning, semantic understanding, cognitive reasoning, and model generalization are the consistent pursuit for current deep learning-based computer vision tasks. A variety of network structures and algorithms have been proposed to learn effective features, extract contextual and semantic information, deduct the relationships between objects and scenes, and achieve robust and generalized model.Nevertheless, these challenges are still not well addressed. One issue lies in the inefficient feature learning and propagation, static single-dimension semantic memorizing, leading to the difficulty of handling challenging situations, such as small objects, occlusion, illumination, etc. The other issue is the robustness and generalization, especially when the data source has diversified feature distribution. The study aims to explore classification and detection models based on hierarchical semantic features ("transverse semantic" and "longitudinal semantic"), network architectures, and regularization algorithm, so that the above issues could be improved or solved. (1) A detector model is proposed to make full use of "transverse semantic", the semantic information in space scene, which emphasizes on the effectiveness of deep features produced in high-level layers for better detection of small and occluded objects. (2) We also explore the anchor-based detector algorithm and propose the location-aware reasoning (LAAR), where both the location and classification confidences is considered for the bounding box quality criterion, so that the bestqualified boxes can be picked up in Non-Maximum Suppression (NMS). (3) A semantic clustering-based deduction learning is proposed, which explores the "longitudinal semantic", realizing the high-level clustering in the semantic space, enabling the model to deduce the relations among various classes so as better classification performance is expected. (4) We propose the near-orthogonality regularization by introducing an implicit self-regularization to push the mean and variance of filter angles in a network towards 90â—¦ and 0â—¦ simultaneously, revealing it helps stabilize the training process, speed up convergence and improve robustness. (5) Inspired by the research that self attention networks possess a strong inductive bias which leads to the loss of feature expression power, the transformer architecture with mitigatory attention mechanism is proposed and applied with the state-of-the-art detectors, verifying the superiority of detection enhancement
    corecore