12,268 research outputs found
Iteratively Optimized Patch Label Inference Network for Automatic Pavement Disease Detection
We present a novel deep learning framework named the Iteratively Optimized
Patch Label Inference Network (IOPLIN) for automatically detecting various
pavement diseases that are not solely limited to specific ones, such as cracks
and potholes. IOPLIN can be iteratively trained with only the image label via
the Expectation-Maximization Inspired Patch Label Distillation (EMIPLD)
strategy, and accomplish this task well by inferring the labels of patches from
the pavement images. IOPLIN enjoys many desirable properties over the
state-of-the-art single branch CNN models such as GoogLeNet and EfficientNet.
It is able to handle images in different resolutions, and sufficiently utilize
image information particularly for the high-resolution ones, since IOPLIN
extracts the visual features from unrevised image patches instead of the
resized entire image. Moreover, it can roughly localize the pavement distress
without using any prior localization information in the training phase. In
order to better evaluate the effectiveness of our method in practice, we
construct a large-scale Bituminous Pavement Disease Detection dataset named
CQU-BPDD consisting of 60,059 high-resolution pavement images, which are
acquired from different areas at different times. Extensive results on this
dataset demonstrate the superiority of IOPLIN over the state-of-the-art image
classification approaches in automatic pavement disease detection. The source
codes of IOPLIN are released on \url{https://github.com/DearCaat/ioplin}.Comment: Revision on IEEE Trans on IT
Receptor uptake arrays for vitamin B12, siderophores and glycans shape bacterial communities
Molecular variants of vitamin B12, siderophores and glycans occur. To take up
variant forms, bacteria may express an array of receptors. The gut microbe
Bacteroides thetaiotaomicron has three different receptors to take up variants
of vitamin B12 and 88 receptors to take up various glycans. The design of
receptor arrays reflects key processes that shape cellular evolution.
Competition may focus each species on a subset of the available nutrient
diversity. Some gut bacteria can take up only a narrow range of carbohydrates,
whereas species such as B.~thetaiotaomicron can digest many different complex
glycans. Comparison of different nutrients, habitats, and genomes provide
opportunity to test hypotheses about the breadth of receptor arrays. Another
important process concerns fluctuations in nutrient availability. Such
fluctuations enhance the value of cellular sensors, which gain information
about environmental availability and adjust receptor deployment. Bacteria often
adjust receptor expression in response to fluctuations of particular
carbohydrate food sources. Some species may adjust expression of uptake
receptors for specific siderophores. How do cells use sensor information to
control the response to fluctuations? That question about regulatory wiring
relates to problems that arise in control theory and artificial intelligence.
Control theory clarifies how to analyze environmental fluctuations in relation
to the design of sensors and response systems. Recent advances in deep learning
studies of artificial intelligence focus on the architecture of regulatory
wiring and the ways in which complex control networks represent and classify
environmental states. I emphasize the similar design problems that arise in
cellular evolution, control theory, and artificial intelligence. I connect
those broad concepts to testable hypotheses for bacterial uptake of B12,
siderophores and glycans.Comment: Added many new references, edited throughou
Discriminative feature learning for multimodal classification
The purpose of this thesis is to tackle two related topics: multimodal classification and objective functions to improve the discriminative power of features.
First, I worked on image and text classification tasks and performed many experiments to show the effectiveness of different approaches available in literature.
Then, I introduced a novel methodology which can classify multimodal documents using singlemodal classifiers merging textual and visual information into images and a novel loss function to improve separability between samples of a dataset.
Results show that exploiting multimodal data increases performances on classification tasks rather than using traditional single-modality methods.
Moreover the introduced GIT loss function is able to enhance the discriminative power of features, lowering intra-class distance and raising inter-class distance between samples of a multiclass dataset
Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction
The visual focus of attention (VFOA) has been recognized as a prominent
conversational cue. We are interested in estimating and tracking the VFOAs
associated with multi-party social interactions. We note that in this type of
situations the participants either look at each other or at an object of
interest; therefore their eyes are not always visible. Consequently both gaze
and VFOA estimation cannot be based on eye detection and tracking. We propose a
method that exploits the correlation between eye gaze and head movements. Both
VFOA and gaze are modeled as latent variables in a Bayesian switching
state-space model. The proposed formulation leads to a tractable learning
procedure and to an efficient algorithm that simultaneously tracks gaze and
visual focus. The method is tested and benchmarked using two publicly available
datasets that contain typical multi-party human-robot and human-human
interactions.Comment: 15 pages, 8 figures, 6 table
- …