36 research outputs found
Learning Depth from Focus in the Wild
For better photography, most recent commercial cameras including smartphones
have either adopted large-aperture lens to collect more light or used a burst
mode to take multiple images within short times. These interesting features
lead us to examine depth from focus/defocus.
In this work, we present a convolutional neural network-based depth
estimation from single focal stacks. Our method differs from relevant
state-of-the-art works with three unique features. First, our method allows
depth maps to be inferred in an end-to-end manner even with image alignment.
Second, we propose a sharp region detection module to reduce blur ambiguities
in subtle focus changes and weakly texture-less regions. Third, we design an
effective downsampling module to ease flows of focal information in feature
extractions. In addition, for the generalization of the proposed network, we
develop a simulator to realistically reproduce the features of commercial
cameras, such as changes in field of view, focal length and principal points.
By effectively incorporating these three unique features, our network
achieves the top rank in the DDFF 12-Scene benchmark on most metrics. We also
demonstrate the effectiveness of the proposed method on various quantitative
evaluations and real-world images taken from various off-the-shelf cameras
compared with state-of-the-art methods. Our source code is publicly available
at https://github.com/wcy199705/DfFintheWild
EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting
Capturing high-dimensional social interactions and feasible futures is
essential for predicting trajectories. To address this complex nature, several
attempts have been devoted to reducing the dimensionality of the output
variables via parametric curve fitting such as the B\'ezier curve and B-spline
function. However, these functions, which originate in computer graphics
fields, are not suitable to account for socially acceptable human dynamics. In
this paper, we present EigenTrajectory (), a trajectory prediction
approach that uses a novel trajectory descriptor to form a compact space, known
here as space, in place of Euclidean space, for representing
pedestrian movements. We first reduce the complexity of the trajectory
descriptor via a low-rank approximation. We transform the pedestrians' history
paths into our space represented by spatio-temporal principle
components, and feed them into off-the-shelf trajectory forecasting models. The
inputs and outputs of the models as well as social interactions are all
gathered and aggregated in the corresponding space. Lastly, we
propose a trajectory anchor-based refinement method to cover all possible
futures in the proposed space. Extensive experiments demonstrate
that our EigenTrajectory predictor can significantly improve both the
prediction accuracy and reliability of existing trajectory forecasting models
on public benchmarks, indicating that the proposed descriptor is suited to
represent pedestrian behaviors. Code is publicly available at
https://github.com/inhwanbae/EigenTrajectory .Comment: Accepted at ICCV 202
EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images
Light field cameras capture both the spatial and the angular properties of
light rays in space. Due to its property, one can compute the depth from light
fields in uncontrolled lighting environments, which is a big advantage over
active sensing devices. Depth computed from light fields can be used for many
applications including 3D modelling and refocusing. However, light field images
from hand-held cameras have very narrow baselines with noise, making the depth
estimation difficult. any approaches have been proposed to overcome these
limitations for the light field depth estimation, but there is a clear
trade-off between the accuracy and the speed in these methods. In this paper,
we introduce a fast and accurate light field depth estimation method based on a
fully-convolutional neural network. Our network is designed by considering the
light field geometry and we also overcome the lack of training data by
proposing light field specific data augmentation methods. We achieved the top
rank in the HCI 4D Light Field Benchmark on most metrics, and we also
demonstrate the effectiveness of the proposed method on real-world light-field
images.Comment: Accepted to CVPR 2018, Total 10 page
High-fidelity 3D Human Digitization from Single 2K Resolution Images
High-quality 3D human body reconstruction requires high-fidelity and
large-scale training data and appropriate network design that effectively
exploits the high-resolution input images. To tackle these problems, we propose
a simple yet effective 3D human digitization method called 2K2K, which
constructs a large-scale 2K human dataset and infers 3D human models from 2K
resolution images. The proposed method separately recovers the global shape of
a human and its details. The low-resolution depth network predicts the global
structure from a low-resolution image, and the part-wise image-to-normal
network predicts the details of the 3D human body structure. The
high-resolution depth network merges the global 3D shape and the detailed
structures to infer the high-resolution front and back side depth maps.
Finally, an off-the-shelf mesh generator reconstructs the full 3D human model,
which are available at https://github.com/SangHunHan92/2K2K. In addition, we
also provide 2,050 3D human models, including texture maps, 3D joints, and SMPL
parameters for research purposes. In experiments, we demonstrate competitive
performance over the recent works on various datasets.Comment: code page : https://github.com/SangHunHan92/2K2K, Accepted to CVPR
2023 (Highlight
Facile and versatile ligand analysis method of colloidal quantum dot
Colloidal quantum-dots (QDs) are highly attractive materials for various optoelectronic applications owing to their easy maneuverability, high functionality, wide applicability, and low cost of mass-production. QDs usually consist of two components: the inorganic nano-crystalline particle and organic ligands that passivate the surface of the inorganic particle. The organic component is also critical for tuning electronic properties of QDs as well as solubilizing QDs in various solvents. However, despite extensive effort to understand the chemistry of ligands, it has been challenging to develop an efficient and reliable method for identifying and quantifying ligands on the QD surface. Herein, we developed a novel method of analyzing ligands in a mild yet accurate fashion. We found that oxidizing agents, as a heterogeneous catalyst in a different phase from QDs, can efficiently disrupt the interaction between the inorganic particle and organic ligands, and the subsequent simple phase fractionation step can isolate the ligand-containing phase from the oxidizer-containing phase and the insoluble precipitates. Our novel analysis procedure ensures to minimize the exposure of ligand molecules to oxidizing agents as well as to prepare homogeneous samples that can be readily analyzed by diverse analytical techniques, such as nuclear magnetic resonance spectroscopy and gas-chromatography mass-spectrometry. Ā© 2021, The Author(s).1
Disentangled Multi-Relational Graph Convolutional Network for Pedestrian Trajectory Prediction
Pedestrian trajectory prediction is one of the important tasks required for autonomous navigation and social robots in human environments. Previous studies focused on estimating social forces among individual pedestrians. However, they did not consider the social forces of groups on pedestrians, which results in over-collision avoidance problems. To address this problem, we present a Disentangled Multi-Relational Graph Convolutional Network (DMRGCN) for socially entangled pedestrian trajectory prediction. We first introduce a novel disentangled multi-scale aggregation to better represent social interactions, among pedestrians on a weighted graph. For the aggregation, we construct the multi-relational weighted graphs based on distances and relative displacements among pedestrians. In the prediction step, we propose a global temporal aggregation to alleviate accumulated errors for pedestrians changing their directions. Finally, we apply DropEdge into our DMRGCN to avoid the over-fitting issue on relatively small pedestrian trajectory datasets. Through the effective incorporation of the three parts within an end-to-end framework, DMRGCN achieves state-of-the-art performances on a variety of challenging trajectory prediction benchmarks
A Set of Control Points Conditioned Pedestrian Trajectory Prediction
Predicting the trajectories of pedestrians in crowded conditions is an important task for applications like autonomous navigation systems. Previous studies have tackled this problem using two strategies. They (1) infer all future steps recursively, or (2) predict the potential destinations of pedestrians at once and interpolate the intermediate steps to arrive there. However, these strategies often suffer from the accumulated errors of the recursive inference, or restrictive assumptions about social relations in the intermediate path. In this paper, we present a graph convolutional network-based trajectory prediction. Firstly, we propose a control point prediction that divides the future path into three sections and infers the intermediate destinations of pedestrians to reduce the accumulated error. To do this, we construct multi-relational weighted graphs to account for their physical and complex social relations. We then introduce a trajectory refinement step based on a spatio-temporal and multi-relational graph. By considering the social interactions between neighbors, better prediction results are achievable. In experiments, the proposed network achieves state-of-the-art performance on various real-world trajectory prediction benchmarks
Task-Specific Scene Structure Representations
Understanding the informative structures of scenes is essential for low-level vision tasks. Unfortunately, it is difficult to obtain a concrete visual definition of the informative structures because influences of visual features are task-specific. In this paper, we propose a single general neural network architecture for extracting task-specific structure guidance for scenes.
To do this, we first analyze traditional spectral clustering methods, which computes a set of eigenvectors to model a segmented graph forming small compact structures on image domains. We then unfold the traditional graph-partitioning problem into a learnable network, named Scene Structure Guidance Network (SSGNet), to represent the task-specific informative structures. The SSGNet yields a set of coefficients of eigenvectors that produces explicit feature representations of image structures. In addition, our SSGNet is light-weight (56K parameters), and can be used as a plug-and-play module for off-the-shelf architectures. We optimize the SSGNet without any supervision by proposing two novel training losses that enforce task-specific scene structure generation during training. Our main contribution is to show that such a simple network can achieve state-of-the-art results for several low-level vision applications including joint upsampling and image denoising. We also demonstrate that our SSGNet generalizes well on unseen datasets, compared to existing methods which use structural embedding frameworks. Our source codes are available at https://github.com/jsshin98/SSGNet