12,266 research outputs found

    Iteratively Optimized Patch Label Inference Network for Automatic Pavement Disease Detection

    Full text link
    We present a novel deep learning framework named the Iteratively Optimized Patch Label Inference Network (IOPLIN) for automatically detecting various pavement diseases that are not solely limited to specific ones, such as cracks and potholes. IOPLIN can be iteratively trained with only the image label via the Expectation-Maximization Inspired Patch Label Distillation (EMIPLD) strategy, and accomplish this task well by inferring the labels of patches from the pavement images. IOPLIN enjoys many desirable properties over the state-of-the-art single branch CNN models such as GoogLeNet and EfficientNet. It is able to handle images in different resolutions, and sufficiently utilize image information particularly for the high-resolution ones, since IOPLIN extracts the visual features from unrevised image patches instead of the resized entire image. Moreover, it can roughly localize the pavement distress without using any prior localization information in the training phase. In order to better evaluate the effectiveness of our method in practice, we construct a large-scale Bituminous Pavement Disease Detection dataset named CQU-BPDD consisting of 60,059 high-resolution pavement images, which are acquired from different areas at different times. Extensive results on this dataset demonstrate the superiority of IOPLIN over the state-of-the-art image classification approaches in automatic pavement disease detection. The source codes of IOPLIN are released on \url{https://github.com/DearCaat/ioplin}.Comment: Revision on IEEE Trans on IT

    Receptor uptake arrays for vitamin B12, siderophores and glycans shape bacterial communities

    Full text link
    Molecular variants of vitamin B12, siderophores and glycans occur. To take up variant forms, bacteria may express an array of receptors. The gut microbe Bacteroides thetaiotaomicron has three different receptors to take up variants of vitamin B12 and 88 receptors to take up various glycans. The design of receptor arrays reflects key processes that shape cellular evolution. Competition may focus each species on a subset of the available nutrient diversity. Some gut bacteria can take up only a narrow range of carbohydrates, whereas species such as B.~thetaiotaomicron can digest many different complex glycans. Comparison of different nutrients, habitats, and genomes provide opportunity to test hypotheses about the breadth of receptor arrays. Another important process concerns fluctuations in nutrient availability. Such fluctuations enhance the value of cellular sensors, which gain information about environmental availability and adjust receptor deployment. Bacteria often adjust receptor expression in response to fluctuations of particular carbohydrate food sources. Some species may adjust expression of uptake receptors for specific siderophores. How do cells use sensor information to control the response to fluctuations? That question about regulatory wiring relates to problems that arise in control theory and artificial intelligence. Control theory clarifies how to analyze environmental fluctuations in relation to the design of sensors and response systems. Recent advances in deep learning studies of artificial intelligence focus on the architecture of regulatory wiring and the ways in which complex control networks represent and classify environmental states. I emphasize the similar design problems that arise in cellular evolution, control theory, and artificial intelligence. I connect those broad concepts to testable hypotheses for bacterial uptake of B12, siderophores and glycans.Comment: Added many new references, edited throughou

    Discriminative feature learning for multimodal classification

    Get PDF
    The purpose of this thesis is to tackle two related topics: multimodal classification and objective functions to improve the discriminative power of features. First, I worked on image and text classification tasks and performed many experiments to show the effectiveness of different approaches available in literature. Then, I introduced a novel methodology which can classify multimodal documents using singlemodal classifiers merging textual and visual information into images and a novel loss function to improve separability between samples of a dataset. Results show that exploiting multimodal data increases performances on classification tasks rather than using traditional single-modality methods. Moreover the introduced GIT loss function is able to enhance the discriminative power of features, lowering intra-class distance and raising inter-class distance between samples of a multiclass dataset

    Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction

    Get PDF
    The visual focus of attention (VFOA) has been recognized as a prominent conversational cue. We are interested in estimating and tracking the VFOAs associated with multi-party social interactions. We note that in this type of situations the participants either look at each other or at an object of interest; therefore their eyes are not always visible. Consequently both gaze and VFOA estimation cannot be based on eye detection and tracking. We propose a method that exploits the correlation between eye gaze and head movements. Both VFOA and gaze are modeled as latent variables in a Bayesian switching state-space model. The proposed formulation leads to a tractable learning procedure and to an efficient algorithm that simultaneously tracks gaze and visual focus. The method is tested and benchmarked using two publicly available datasets that contain typical multi-party human-robot and human-human interactions.Comment: 15 pages, 8 figures, 6 table
    corecore