1,009 research outputs found

    CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition

    Get PDF
    Most of the traditional work on intrinsic image decomposition rely on deriving priors about scene characteristics. On the other hand, recent research use deep learning models as in-and-out black box and do not consider the well-established, traditional image formation process as the basis of their intrinsic learning process. As a consequence, although current deep learning approaches show superior performance when considering quantitative benchmark results, traditional approaches are still dominant in achieving high qualitative results. In this paper, the aim is to exploit the best of the two worlds. A method is proposed that (1) is empowered by deep learning capabilities, (2) considers a physics-based reflection model to steer the learning process, and (3) exploits the traditional approach to obtain intrinsic images by exploiting reflectance and shading gradient information. The proposed model is fast to compute and allows for the integration of all intrinsic components. To train the new model, an object centered large-scale datasets with intrinsic ground-truth images are created. The evaluation results demonstrate that the new model outperforms existing methods. Visual inspection shows that the image formation loss function augments color reproduction and the use of gradient information produces sharper edges. Datasets, models and higher resolution images are available at https://ivi.fnwi.uva.nl/cv/retinet.Comment: CVPR 201

    WEBCAM-BASED LASER DOT DETECTION TECHNIQUE IN COMPUTER REMOTE CONTROL

    Get PDF
    ABSTRACTIn this paper, the authors propose a method to detect the laser dot in an interactive system using laser pointers. The method is designed for presenters who need to interact with the computer during the presentation by using the laserpointer. The detection technique is developed by using a camera to capture the presentation screen and processing every frames transferred to the ara computer. This paper focuses on the detection and tracking of laser dots, based on their characteristics to distinguish a laser dotfrom other areas on the captured frames. Experimental results showed that the proposed method could reduce the rate of misdetection by light noises of a factor of 10 and achieve an average accuracy of 82% of detection in normal presentation environments. The results point out that the better way to describe the laser dots’ features based on visual concept is to use the HSI color space instead of the normal RGB space.Keywords.  laser pointer; laser dot/spot; laser pointer interaction; control; mouse; computer screen/display

    Three for one and one for three: Flow, Segmentation, and Surface Normals

    Get PDF
    Optical flow, semantic segmentation, and surface normals represent different information modalities, yet together they bring better cues for scene understanding problems. In this paper, we study the influence between the three modalities: how one impacts on the others and their efficiency in combination. We employ a modular approach using a convolutional refinement network which is trained supervised but isolated from RGB images to enforce joint modality features. To assist the training process, we create a large-scale synthetic outdoor dataset that supports dense annotation of semantic segmentation, optical flow, and surface normals. The experimental results show positive influence among the three modalities, especially for objects' boundaries, region consistency, and scene structures.Comment: BMVC 201

    EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes

    Full text link
    Multimodal large-scale datasets for outdoor scenes are mostly designed for urban driving problems. The scenes are highly structured and semantically different from scenarios seen in nature-centered scenes such as gardens or parks. To promote machine learning methods for nature-oriented applications, such as agriculture and gardening, we propose the multimodal synthetic dataset for Enclosed garDEN scenes (EDEN). The dataset features more than 300K images captured from more than 100 garden models. Each image is annotated with various low/high-level vision modalities, including semantic segmentation, depth, surface normals, intrinsic colors, and optical flow. Experimental results on the state-of-the-art methods for semantic segmentation and monocular depth prediction, two important tasks in computer vision, show positive impact of pre-training deep networks on our dataset for unstructured natural scenes. The dataset and related materials will be available at https://lhoangan.github.io/eden.Comment: Accepted for publishing at WACV 202

    Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System

    Full text link
    Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jetson platform is one of the pioneers in offering optimal performance regarding energy efficiency and throughput in the execution of deep learning algorithms. Previously, most benchmarking analysis was based on 2D images with a single deep learning model for each comparison result. In this paper, we implement an end-to-end video-based crime-scene anomaly detection system inputting from surveillance videos and the system is deployed and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin Nano). The comparison analysis includes the integration of Torch-TensorRT as a software developer kit from NVIDIA for the model performance optimisation. The system is built based on the PySlowfast open-source project from Facebook as the coding template. The end-to-end system process comprises the videos from camera, data preprocessing pipeline, feature extractor and the anomaly detection. We provide the experience of an AI-based system deployment on various Jetson Edge devices with Docker technology. Regarding anomaly detectors, a weakly supervised video-based deep learning model called Robust Temporal Feature Magnitude Learning (RTFM) is applied in the system. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.Comment: 18 pages, 7 figures, 5 table

    ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

    Get PDF
    In general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than reflectance (albedo) changes, these methods may fail in distinguishing strong photometric effects from reflectance variations. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows) subcomponents. The aim is to distinguish strong photometric effects from reflectance variations. An end-to-end deep convolutional neural network (ShadingNet) is proposed that operates in a fine-to-coarse manner with a specialized fusion and refinement unit exploiting the fine-grained shading model. It is designed to learn specific reflectance cues separated from specific photometric effects to analyze the disentanglement capability. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with fine-grained intrinsic image ground-truths. Large scale experiments show that our approach using fine-grained shading decompositions outperforms state-of-the-art algorithms utilizing unified shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD datasets.Comment: Submitted to International Journal of Computer Vision (IJCV

    Trends in socioeconomic inequalities in full vaccination coverage among vietnamese children aged 12–23 months, 2000–2014: Evidence for mitigating disparities in vaccination

    Get PDF
    There has been no report on the situation of socioeconomic inequalities in the full vaccination coverage among Vietnamese children. This study aims to assess the trends and changes in the socioeconomic inequalities in the full vaccination coverage among Vietnamese children aged 12–23 months from 2000 to 2014. Data were drawn from Multiple Indicator Cluster Surveys (2000, 2006, 2011, and 2014). Concentration index (CCI) and concentration curve (CC) were applied to quantify the degree of the socioeconomic inequalities in full immunization coverage. The prevalence of children fully receiving recommended vaccines was significantly improved during 2000–2014, yet, was still not being covered. The total CCI of full vaccination coverage gradually decreased from 2000 to 2014 (CCI: from 0.241 to 0.009). The CC increasingly became close to the equality line through the survey period, indicating the increasingly narrow gap in child full immunization amongst the poor and the rich. Vietnam witnessed a sharp decrease in socioeconomic inequality in the full vaccination coverage for over a decade. The next policies towards children from vulnerable populations (ethnic minority groups, living in rural areas, and having a mother with low education) belonging to lower socioeconomic groups may mitigate socioeconomic inequalities in full vaccination coverage. © 2019 by the authors. Licensee MDPI, Basel, Switzerland. **Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate “Huy Nguyen” is provided in this record*

    Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization

    Full text link
    This paper reports the overview of the VLSP 2022 - Vietnamese abstractive multi-document summarization (Abmusu) shared task for Vietnamese News. This task is hosted at the 9th^{th} annual workshop on Vietnamese Language and Speech Processing (VLSP 2022). The goal of Abmusu shared task is to develop summarization systems that could create abstractive summaries automatically for a set of documents on a topic. The model input is multiple news documents on the same topic, and the corresponding output is a related abstractive summary. In the scope of Abmusu shared task, we only focus on Vietnamese news summarization and build a human-annotated dataset of 1,839 documents in 600 clusters, collected from Vietnamese news in 8 categories. Participated models are evaluated and ranked in terms of \texttt{ROUGE2-F1} score, the most typical evaluation metric for document summarization problem.Comment: VLSP 202

    Machine Learning Models for Inferring the Axial Strength in Short Concrete-Filled Steel Tube Columns Infilled with Various Strength Concrete

    Get PDF
    Concrete-filled steel tube (CFST) columns are used in the construction industry because of their high strength, ductility, stiffness, and fire resistance. This paper developed machine learning techniques for inferring the axial strength in short CFST columns infilled with various strength concrete. Additive Random Forests (ARF) and Artificial Neural Networks (ANNs) models were developed and tested using large experimental data. These data-driven models enable us to infer the axial strength in CFST columns based on the diameter, the tube thickness, the steel yield stress, concrete strength, column length, and diameter/tube thickness. The analytical results showed that the ARF obtained high accuracy with the 6.39% in mean absolute percentage error (MAPE) and 211.31 kN in mean absolute error (MAE). The ARF outperformed significantly the ANNs with an improvement rate at 84.1% in MAPE and 65.4% in MAE. In comparison with the design codes such as EC4 and AISC, the ARF improved the predictive accuracy with 36.9% in MAPE and 22.3% in MAE. The comparison results confirmed that the ARF was the most effective machine learning model among the investigated approaches. As a contribution, this study proposed a machine learning model for accurately inferring the axial strength in short CFST columns
    • …
    corecore