1,009 research outputs found
CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition
Most of the traditional work on intrinsic image decomposition rely on
deriving priors about scene characteristics. On the other hand, recent research
use deep learning models as in-and-out black box and do not consider the
well-established, traditional image formation process as the basis of their
intrinsic learning process. As a consequence, although current deep learning
approaches show superior performance when considering quantitative benchmark
results, traditional approaches are still dominant in achieving high
qualitative results. In this paper, the aim is to exploit the best of the two
worlds. A method is proposed that (1) is empowered by deep learning
capabilities, (2) considers a physics-based reflection model to steer the
learning process, and (3) exploits the traditional approach to obtain intrinsic
images by exploiting reflectance and shading gradient information. The proposed
model is fast to compute and allows for the integration of all intrinsic
components. To train the new model, an object centered large-scale datasets
with intrinsic ground-truth images are created. The evaluation results
demonstrate that the new model outperforms existing methods. Visual inspection
shows that the image formation loss function augments color reproduction and
the use of gradient information produces sharper edges. Datasets, models and
higher resolution images are available at https://ivi.fnwi.uva.nl/cv/retinet.Comment: CVPR 201
WEBCAM-BASED LASER DOT DETECTION TECHNIQUE IN COMPUTER REMOTE CONTROL
ABSTRACTIn this paper, the authors propose a method to detect the laser dot in an interactive system using laser pointers. The method is designed for presenters who need to interact with the computer during the presentation by using the laserpointer. The detection technique is developed by using a camera to capture the presentation screen and processing every frames transferred to the ara computer. This paper focuses on the detection and tracking of laser dots, based on their characteristics to distinguish a laser dotfrom other areas on the captured frames. Experimental results showed that the proposed method could reduce the rate of misdetection by light noises of a factor of 10 and achieve an average accuracy of 82% of detection in normal presentation environments. The results point out that the better way to describe the laser dots’ features based on visual concept is to use the HSI color space instead of the normal RGB space.Keywords. laser pointer; laser dot/spot; laser pointer interaction; control; mouse; computer screen/display
Three for one and one for three: Flow, Segmentation, and Surface Normals
Optical flow, semantic segmentation, and surface normals represent different
information modalities, yet together they bring better cues for scene
understanding problems. In this paper, we study the influence between the three
modalities: how one impacts on the others and their efficiency in combination.
We employ a modular approach using a convolutional refinement network which is
trained supervised but isolated from RGB images to enforce joint modality
features. To assist the training process, we create a large-scale synthetic
outdoor dataset that supports dense annotation of semantic segmentation,
optical flow, and surface normals. The experimental results show positive
influence among the three modalities, especially for objects' boundaries,
region consistency, and scene structures.Comment: BMVC 201
EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes
Multimodal large-scale datasets for outdoor scenes are mostly designed for
urban driving problems. The scenes are highly structured and semantically
different from scenarios seen in nature-centered scenes such as gardens or
parks. To promote machine learning methods for nature-oriented applications,
such as agriculture and gardening, we propose the multimodal synthetic dataset
for Enclosed garDEN scenes (EDEN). The dataset features more than 300K images
captured from more than 100 garden models. Each image is annotated with various
low/high-level vision modalities, including semantic segmentation, depth,
surface normals, intrinsic colors, and optical flow. Experimental results on
the state-of-the-art methods for semantic segmentation and monocular depth
prediction, two important tasks in computer vision, show positive impact of
pre-training deep networks on our dataset for unstructured natural scenes. The
dataset and related materials will be available at
https://lhoangan.github.io/eden.Comment: Accepted for publishing at WACV 202
Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System
Innovative enhancement in embedded system platforms, specifically hardware
accelerations, significantly influence the application of deep learning in
real-world scenarios. These innovations translate human labor efforts into
automated intelligent systems employed in various areas such as autonomous
driving, robotics, Internet-of-Things (IoT), and numerous other impactful
applications. NVIDIA's Jetson platform is one of the pioneers in offering
optimal performance regarding energy efficiency and throughput in the execution
of deep learning algorithms. Previously, most benchmarking analysis was based
on 2D images with a single deep learning model for each comparison result. In
this paper, we implement an end-to-end video-based crime-scene anomaly
detection system inputting from surveillance videos and the system is deployed
and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin
Nano). The comparison analysis includes the integration of Torch-TensorRT as a
software developer kit from NVIDIA for the model performance optimisation. The
system is built based on the PySlowfast open-source project from Facebook as
the coding template. The end-to-end system process comprises the videos from
camera, data preprocessing pipeline, feature extractor and the anomaly
detection. We provide the experience of an AI-based system deployment on
various Jetson Edge devices with Docker technology. Regarding anomaly
detectors, a weakly supervised video-based deep learning model called Robust
Temporal Feature Magnitude Learning (RTFM) is applied in the system. The
approach system reaches 47.56 frames per second (FPS) inference speed on a
Jetson edge device with only 3.11 GB RAM usage total. We also discover the
promising Jetson device that the AI system achieves 15% better performance than
the previous version of Jetson devices while consuming 50% less energy power.Comment: 18 pages, 7 figures, 5 table
ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition
In general, intrinsic image decomposition algorithms interpret shading as one
unified component including all photometric effects. As shading transitions are
generally smoother than reflectance (albedo) changes, these methods may fail in
distinguishing strong photometric effects from reflectance variations.
Therefore, in this paper, we propose to decompose the shading component into
direct (illumination) and indirect shading (ambient light and shadows)
subcomponents. The aim is to distinguish strong photometric effects from
reflectance variations. An end-to-end deep convolutional neural network
(ShadingNet) is proposed that operates in a fine-to-coarse manner with a
specialized fusion and refinement unit exploiting the fine-grained shading
model. It is designed to learn specific reflectance cues separated from
specific photometric effects to analyze the disentanglement capability. A
large-scale dataset of scene-level synthetic images of outdoor natural
environments is provided with fine-grained intrinsic image ground-truths. Large
scale experiments show that our approach using fine-grained shading
decompositions outperforms state-of-the-art algorithms utilizing unified
shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD
datasets.Comment: Submitted to International Journal of Computer Vision (IJCV
Trends in socioeconomic inequalities in full vaccination coverage among vietnamese children aged 12–23 months, 2000–2014: Evidence for mitigating disparities in vaccination
There has been no report on the situation of socioeconomic inequalities in the full vaccination coverage among Vietnamese children. This study aims to assess the trends and changes in the socioeconomic inequalities in the full vaccination coverage among Vietnamese children aged 12–23 months from 2000 to 2014. Data were drawn from Multiple Indicator Cluster Surveys (2000, 2006, 2011, and 2014). Concentration index (CCI) and concentration curve (CC) were applied to quantify the degree of the socioeconomic inequalities in full immunization coverage. The prevalence of children fully receiving recommended vaccines was significantly improved during 2000–2014, yet, was still not being covered. The total CCI of full vaccination coverage gradually decreased from 2000 to 2014 (CCI: from 0.241 to 0.009). The CC increasingly became close to the equality line through the survey period, indicating the increasingly narrow gap in child full immunization amongst the poor and the rich. Vietnam witnessed a sharp decrease in socioeconomic inequality in the full vaccination coverage for over a decade. The next policies towards children from vulnerable populations (ethnic minority groups, living in rural areas, and having a mother with low education) belonging to lower socioeconomic groups may mitigate socioeconomic inequalities in full vaccination coverage. © 2019 by the authors. Licensee MDPI, Basel, Switzerland. **Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate “Huy Nguyen” is provided in this record*
Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization
This paper reports the overview of the VLSP 2022 - Vietnamese abstractive
multi-document summarization (Abmusu) shared task for Vietnamese News. This
task is hosted at the 9 annual workshop on Vietnamese Language and
Speech Processing (VLSP 2022). The goal of Abmusu shared task is to develop
summarization systems that could create abstractive summaries automatically for
a set of documents on a topic. The model input is multiple news documents on
the same topic, and the corresponding output is a related abstractive summary.
In the scope of Abmusu shared task, we only focus on Vietnamese news
summarization and build a human-annotated dataset of 1,839 documents in 600
clusters, collected from Vietnamese news in 8 categories. Participated models
are evaluated and ranked in terms of \texttt{ROUGE2-F1} score, the most typical
evaluation metric for document summarization problem.Comment: VLSP 202
Machine Learning Models for Inferring the Axial Strength in Short Concrete-Filled Steel Tube Columns Infilled with Various Strength Concrete
Concrete-filled steel tube (CFST) columns are used in the construction industry because of their high strength, ductility, stiffness, and fire resistance. This paper developed machine learning techniques for inferring the axial strength in short CFST columns infilled with various strength concrete. Additive Random Forests (ARF) and Artificial Neural Networks (ANNs) models were developed and tested using large experimental data. These data-driven models enable us to infer the axial strength in CFST columns based on the diameter, the tube thickness, the steel yield stress, concrete strength, column length, and diameter/tube thickness. The analytical results showed that the ARF obtained high accuracy with the 6.39% in mean absolute percentage error (MAPE) and 211.31 kN in mean absolute error (MAE). The ARF outperformed significantly the ANNs with an improvement rate at 84.1% in MAPE and 65.4% in MAE. In comparison with the design codes such as EC4 and AISC, the ARF improved the predictive accuracy with 36.9% in MAPE and 22.3% in MAE. The comparison results confirmed that the ARF was the most effective machine learning model among the investigated approaches. As a contribution, this study proposed a machine learning model for accurately inferring the axial strength in short CFST columns
- …