74,538 research outputs found
A lightweight network for photovoltaic cell defect detection in electroluminescence images based on neural architecture search and knowledge distillation
Nowadays, the rapid development of photovoltaic(PV) power stations requires
increasingly reliable maintenance and fault diagnosis of PV modules in the
field. Due to the effectiveness, convolutional neural network (CNN) has been
widely used in the existing automatic defect detection of PV cells. However,
the parameters of these CNN-based models are very large, which require
stringent hardware resources and it is difficult to be applied in actual
industrial projects. To solve these problems, we propose a novel lightweight
high-performance model for automatic defect detection of PV cells in
electroluminescence(EL) images based on neural architecture search and
knowledge distillation. To auto-design an effective lightweight model, we
introduce neural architecture search to the field of PV cell defect
classification for the first time. Since the defect can be any size, we design
a proper search structure of network to better exploit the multi-scale
characteristic. To improve the overall performance of the searched lightweight
model, we further transfer the knowledge learned by the existing pre-trained
large-scale model based on knowledge distillation. Different kinds of knowledge
are exploited and transferred, including attention information, feature
information, logit information and task-oriented information. Experiments have
demonstrated that the proposed model achieves the state-of-the-art performance
on the public PV cell dataset of EL images under online data augmentation with
accuracy of 91.74% and the parameters of 1.85M. The proposed lightweight
high-performance model can be easily deployed to the end devices of the actual
industrial projects and retain the accuracy.Comment: 12 pages, 7 figure
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
QuadraNet: Improving High-Order Neural Interaction Efficiency with Hardware-Aware Quadratic Neural Networks
Recent progress in computer vision-oriented neural network designs is mostly
driven by capturing high-order neural interactions among inputs and features.
And there emerged a variety of approaches to accomplish this, such as
Transformers and its variants. However, these interactions generate a large
amount of intermediate state and/or strong data dependency, leading to
considerable memory consumption and computing cost, and therefore compromising
the overall runtime performance. To address this challenge, we rethink the
high-order interactive neural network design with a quadratic computing
approach. Specifically, we propose QuadraNet -- a comprehensive model design
methodology from neuron reconstruction to structural block and eventually to
the overall neural network implementation. Leveraging quadratic neurons'
intrinsic high-order advantages and dedicated computation optimization schemes,
QuadraNet could effectively achieve optimal cognition and computation
performance. Incorporating state-of-the-art hardware-aware neural architecture
search and system integration techniques, QuadraNet could also be well
generalized in different hardware constraint settings and deployment scenarios.
The experiment shows thatQuadraNet achieves up to 1.5 throughput, 30%
less memory footprint, and similar cognition performance, compared with the
state-of-the-art high-order approaches.Comment: ASP-DAC 2024 Best Paper Nominatio
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog
We present a novel end-to-end trainable neural network model for
task-oriented dialog systems. The model is able to track dialog state, issue
API calls to knowledge base (KB), and incorporate structured KB query results
into system responses to successfully complete task-oriented dialogs. The
proposed model produces well-structured system responses by jointly learning
belief tracking and KB result processing conditioning on the dialog history. We
evaluate the model in a restaurant search domain using a dataset that is
converted from the second Dialog State Tracking Challenge (DSTC2) corpus.
Experiment results show that the proposed model can robustly track dialog state
given the dialog history. Moreover, our model demonstrates promising results in
producing appropriate system responses, outperforming prior end-to-end
trainable neural network models using per-response accuracy evaluation metrics.Comment: Published at Interspeech 201
Graph-guided Architecture Search for Real-time Semantic Segmentation
Designing a lightweight semantic segmentation network often requires
researchers to find a trade-off between performance and speed, which is always
empirical due to the limited interpretability of neural networks. In order to
release researchers from these tedious mechanical trials, we propose a
Graph-guided Architecture Search (GAS) pipeline to automatically search
real-time semantic segmentation networks. Unlike previous works that use a
simplified search space and stack a repeatable cell to form a network, we
introduce a novel search mechanism with new search space where a lightweight
model can be effectively explored through the cell-level diversity and
latencyoriented constraint. Specifically, to produce the cell-level diversity,
the cell-sharing constraint is eliminated through the cell-independent manner.
Then a graph convolution network (GCN) is seamlessly integrated as a
communication mechanism between cells. Finally, a latency-oriented constraint
is endowed into the search process to balance the speed and performance.
Extensive experiments on Cityscapes and CamVid datasets demonstrate that GAS
achieves the new state-of-the-art trade-off between accuracy and speed. In
particular, on Cityscapes dataset, GAS achieves the new best performance of
73.5% mIoU with speed of 108.4 FPS on Titan Xp.Comment: CVPR202
- …