26 research outputs found

    3D Textured Model Encryption via 3D Lu Chaotic Mapping

    Full text link
    In the coming Virtual/Augmented Reality (VR/AR) era, 3D contents will be popularized just as images and videos today. The security and privacy of these 3D contents should be taken into consideration. 3D contents contain surface models and solid models. The surface models include point clouds, meshes and textured models. Previous work mainly focus on encryption of solid models, point clouds and meshes. This work focuses on the most complicated 3D textured model. We propose a 3D Lu chaotic mapping based encryption method of 3D textured model. We encrypt the vertexes, the polygons and the textures of 3D models separately using the 3D Lu chaotic mapping. Then the encrypted vertices, edges and texture maps are composited together to form the final encrypted 3D textured model. The experimental results reveal that our method can encrypt and decrypt 3D textured models correctly. In addition, our method can resistant several attacks such as brute-force attack and statistic attack.Comment: 13 pages, 7 figures, under review of SCI

    Efficient high-resolution video compression scheme using background and foreground layers

    Get PDF
    Video coding using dynamic background frame achieves better compression compared to the traditional techniques by encoding background and foreground separately. This process reduces coding bits for the overall frame significantly; however, encoding background still requires many bits that can be compressed further for achieving better coding efficiency. The cuboid coding framework has been proven to be one of the most effective methods of image compression which exploits homogeneous pixel correlation within a frame and has better alignment with object boundary compared to traditional block-based coding. In a video sequence, the cuboid-based frame partitioning varies with the changes of the foreground. However, since the background remains static for a group of pictures, the cuboid coding exploits better spatial pixel homogeneity. In this work, the impact of cuboid coding on the background frame for high-resolution videos (Ultra-High-Definition (UHD) and 360-degree videos) is investigated using the multilayer framework of SHVC. After the cuboid partitioning, the method of coarse frame generation has been improved with a novel idea by keeping human-visual sensitive information. Unlike the traditional SHVC scheme, in the proposed method, cuboid coded background and the foreground are encoded in separate layers in an implicit manner. Simulation results show that the proposed video coding method achieves an average BD-Rate reduction of 26.69% and BD-PSNR gain of 1.51 dB against SHVC with significant encoding time reduction for both UHD and 360 videos. It also achieves an average of 13.88% BD-Rate reduction and 0.78 dB BD-PSNR gain compared to the existing relevant method proposed by X. Hoang Van. © 2013 IEEE

    Effects of Ground Manifold Modeling on the Accuracy of Stixel Calculations

    Get PDF
    This paper highlights the role of ground manifold modeling for stixel calculations; stixels are medium-level data representations used for the development of computer vision modules for self-driving cars. By using single-disparity maps and simplifying ground manifold models, calculated stixels may suffer from noise, inconsistency, and false-detection rates for obstacles, especially in challenging datasets. Stixel calculations can be improved with respect to accuracy and robustness by using more adaptive ground manifold approximations. A comparative study of stixel results, obtained for different ground-manifold models (e.g., plane-fitting, line-fitting in v-disparities or polynomial approximation, and graph cut), defines the main part of this paper. This paper also considers the use of trinocular stereo vision and shows that this provides options to enhance stixel results, compared with the binocular recording. Comprehensive experiments are performed on two publicly available challenging datasets. We also use a novel way for comparing calculated stixels with ground truth. We compare depth information, as given by extracted stixels, with ground-truth depth, provided by depth measurements using a highly accurate LiDAR range sensor (as available in one of the public datasets). We evaluate the accuracy of four different ground-manifold methods. The experimental results also include quantitative evaluations of the tradeoff between accuracy and run time. As a result, the proposed trinocular recording together with graph-cut estimation of ground manifolds appears to be a recommended way, also considering challenging weather and lighting conditions

    Ensemble Modeling for Multimodal Visual Action Recognition

    Full text link
    In this work, we propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset. Based on the underlying principle of focal loss, which captures the relationship between tail (scarce) classes and their prediction difficulties, we propose an exponentially decaying variant of focal loss for our current task. It initially emphasizes learning from the hard misclassified examples and gradually adapts to the entire range of examples in the dataset. This annealing process encourages the model to strike a balance between focusing on the sparse set of hard samples, while still leveraging the information provided by the easier ones. Additionally, we opt for the late fusion strategy to combine the resultant probability distributions from RGB and Depth modalities for final action prediction. Experimental evaluations on the MECCANO dataset demonstrate the effectiveness of our approach.Comment: 22nd International Conference on Image Analysis and Processing Workshops - Multimodal Action Recognition on the MECCANO Dataset, 202

    Optimizing Echo State Networks for Static Pattern Recognition

    Get PDF
    Static pattern recognition requires a machine to classify an object on the basis of a combination of attributes and is typically performed using machine learning techniques such as support vector machines and multilayer perceptrons. Unusually, in this study, we applied a successful time-series processing neural network architecture, the echo state network (ESN), to a static pattern recognition task. The networks were presented with clamped input data patterns, but in this work, they were allowed to run until their output units delivered a stable set of output activations, in a similar fashion to previous work that focused on the behaviour of ESN reservoir units. Our aim was to see if the short-term memory developed by the reservoir and the clamped inputs could deliver improved overall classification accuracy. The study utilized a challenging, high dimensional, real-world plant species spectroradiometry classification dataset with the objective of accurately detecting one of the world’s top 100 invasive plant species. Surprisingly, the ESNs performed equally well with both unsettled and settled reservoirs. Delivering a classification accuracy of 96.60%, the clamped ESNs outperformed three widely used machine learning techniques, namely support vector machines, extreme learning machines and multilayer perceptrons. Contrary to past work, where inputs were clamped until reservoir stabilization, it was found that it was possible to obtain similar classification accuracy (96.49%) by clamping the input patterns for just two repeats. The chief contribution of this work is that a recurrent architecture can get good classification accuracy, even while the reservoir is still in an unstable state

    Efficient Model-Based Object Pose Estimation Based on Multi-Template Tracking and PnP Algorithms

    Get PDF
    [[abstract]]Three-Dimensional (3D) object pose estimation plays a crucial role in computer vision because it is an essential function in many practical applications. In this paper, we propose a real-time model-based object pose estimation algorithm, which integrates template matching and Perspective-n-Point (PnP) pose estimation methods to deal with this issue efficiently. The proposed method firstly extracts and matches keypoints of the scene image and the object reference image. Based on the matched keypoints, a two-dimensional (2D) planar transformation between the reference image and the detected object can be formulated by a homography matrix, which can initialize a template tracking algorithm efficiently. Based on the template tracking result, the correspondence between image features and control points of the Computer-Aided Design (CAD) model of the object can be determined efficiently, thus leading to a fast 3D pose tracking result. Finally, the 3D pose of the object with respect to the camera is estimated by a PnP solver based on the tracked 2D-3D correspondences, which improves the accuracy of the pose estimation. Experimental results show that the proposed method not only achieves real-time performance in tracking multiple objects, but also provides accurate pose estimation results. These advantages make the proposed method suitable for many practical applications, such as augmented reality.[[notice]]補正完

    Structural health monitoring of a footbridge using Echo State Networks and NARMAX

    Get PDF
    Echo State Networks (ESNs) and a Nonlinear Auto-Regressive Moving Average model with eXogenous inputs (NARMAX) have been applied to multi-sensor time-series data arising from a test footbridge which has been subjected to multiple potentially damaging interventions. The aim of the work was to automatically classify known potentially damaging events, while also allowing engineers to observe and localise any long term damage trends. The techniques reported here used data from ten temperature sensors as inputs and were tasked with predicting the output signal from eight tilt sensors embedded at various points over the bridge. Initially, interventions were identified by both ESNs and NARMAX. In addition, training ESNs using data up to the first event, and determining the ESNs’ subsequent predictions, allowed inferences to be made not only about when and where the interventions occurred, but also the level of damage caused, without requiring any prior data pre-processing or extrapolation. Finally, ESNs were successfully used as classifiers to characterise various different types of intervention that had taken place
    corecore