38 research outputs found
Learning Depth from Focus in the Wild
For better photography, most recent commercial cameras including smartphones
have either adopted large-aperture lens to collect more light or used a burst
mode to take multiple images within short times. These interesting features
lead us to examine depth from focus/defocus.
In this work, we present a convolutional neural network-based depth
estimation from single focal stacks. Our method differs from relevant
state-of-the-art works with three unique features. First, our method allows
depth maps to be inferred in an end-to-end manner even with image alignment.
Second, we propose a sharp region detection module to reduce blur ambiguities
in subtle focus changes and weakly texture-less regions. Third, we design an
effective downsampling module to ease flows of focal information in feature
extractions. In addition, for the generalization of the proposed network, we
develop a simulator to realistically reproduce the features of commercial
cameras, such as changes in field of view, focal length and principal points.
By effectively incorporating these three unique features, our network
achieves the top rank in the DDFF 12-Scene benchmark on most metrics. We also
demonstrate the effectiveness of the proposed method on various quantitative
evaluations and real-world images taken from various off-the-shelf cameras
compared with state-of-the-art methods. Our source code is publicly available
at https://github.com/wcy199705/DfFintheWild
EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting
Capturing high-dimensional social interactions and feasible futures is
essential for predicting trajectories. To address this complex nature, several
attempts have been devoted to reducing the dimensionality of the output
variables via parametric curve fitting such as the B\'ezier curve and B-spline
function. However, these functions, which originate in computer graphics
fields, are not suitable to account for socially acceptable human dynamics. In
this paper, we present EigenTrajectory (), a trajectory prediction
approach that uses a novel trajectory descriptor to form a compact space, known
here as space, in place of Euclidean space, for representing
pedestrian movements. We first reduce the complexity of the trajectory
descriptor via a low-rank approximation. We transform the pedestrians' history
paths into our space represented by spatio-temporal principle
components, and feed them into off-the-shelf trajectory forecasting models. The
inputs and outputs of the models as well as social interactions are all
gathered and aggregated in the corresponding space. Lastly, we
propose a trajectory anchor-based refinement method to cover all possible
futures in the proposed space. Extensive experiments demonstrate
that our EigenTrajectory predictor can significantly improve both the
prediction accuracy and reliability of existing trajectory forecasting models
on public benchmarks, indicating that the proposed descriptor is suited to
represent pedestrian behaviors. Code is publicly available at
https://github.com/inhwanbae/EigenTrajectory .Comment: Accepted at ICCV 202
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
Language models have demonstrated impressive ability in context understanding
and generative performance. Inspired by the recent success of language
foundation models, in this paper, we propose LMTraj (Language-based Multimodal
Trajectory predictor), which recasts the trajectory prediction task into a sort
of question-answering problem. Departing from traditional numerical regression
models, which treat the trajectory coordinate sequence as continuous signals,
we consider them as discrete signals like text prompts. Specially, we first
transform an input space for the trajectory coordinate into the natural
language space. Here, the entire time-series trajectories of pedestrians are
converted into a text prompt, and scene images are described as text
information through image captioning. The transformed numerical and image data
are then wrapped into the question-answering template for use in a language
model. Next, to guide the language model in understanding and reasoning
high-level knowledge, such as scene context and social relationships between
pedestrians, we introduce an auxiliary multi-task question and answering. We
then train a numerical tokenizer with the prompt data. We encourage the
tokenizer to separate the integer and decimal parts well, and leverage it to
capture correlations between the consecutive numbers in the language model.
Lastly, we train the language model using the numerical tokenizer and all of
the question-answer prompts. Here, we propose a beam-search-based most-likely
prediction and a temperature-based multimodal prediction to implement both
deterministic and stochastic inferences. Applying our LMTraj, we show that the
language-based model can be a powerful pedestrian trajectory predictor, and
outperforms existing numerical-based predictor methods. Code is publicly
available at https://github.com/inhwanbae/LMTrajectory .Comment: Accepted at CVPR 202
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
There are five types of trajectory prediction tasks: deterministic,
stochastic, domain adaptation, momentary observation, and few-shot. These
associated tasks are defined by various factors, such as the length of input
paths, data split and pre-processing methods. Interestingly, even though they
commonly take sequential coordinates of observations as input and infer future
paths in the same coordinates as output, designing specialized architectures
for each task is still necessary. For the other task, generality issues can
lead to sub-optimal performances. In this paper, we propose SingularTrajectory,
a diffusion-based universal trajectory prediction framework to reduce the
performance gap across the five tasks. The core of SingularTrajectory is to
unify a variety of human dynamics representations on the associated tasks. To
do this, we first build a Singular space to project all types of motion
patterns from each task into one embedding space. We next propose an adaptive
anchor working in the Singular space. Unlike traditional fixed anchor methods
that sometimes yield unacceptable paths, our adaptive anchor enables correct
anchors, which are put into a wrong location, based on a traversability map.
Finally, we adopt a diffusion-based predictor to further enhance the prototype
paths using a cascaded denoising process. Our unified framework ensures the
generality across various benchmark settings such as input modality, and
trajectory lengths. Extensive experiments on five public benchmarks demonstrate
that SingularTrajectory substantially outperforms existing models, highlighting
its effectiveness in estimating general dynamics of human movements. Code is
publicly available at https://github.com/inhwanbae/SingularTrajectory .Comment: Accepted at CVPR 202
EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images
Light field cameras capture both the spatial and the angular properties of
light rays in space. Due to its property, one can compute the depth from light
fields in uncontrolled lighting environments, which is a big advantage over
active sensing devices. Depth computed from light fields can be used for many
applications including 3D modelling and refocusing. However, light field images
from hand-held cameras have very narrow baselines with noise, making the depth
estimation difficult. any approaches have been proposed to overcome these
limitations for the light field depth estimation, but there is a clear
trade-off between the accuracy and the speed in these methods. In this paper,
we introduce a fast and accurate light field depth estimation method based on a
fully-convolutional neural network. Our network is designed by considering the
light field geometry and we also overcome the lack of training data by
proposing light field specific data augmentation methods. We achieved the top
rank in the HCI 4D Light Field Benchmark on most metrics, and we also
demonstrate the effectiveness of the proposed method on real-world light-field
images.Comment: Accepted to CVPR 2018, Total 10 page
High-fidelity 3D Human Digitization from Single 2K Resolution Images
High-quality 3D human body reconstruction requires high-fidelity and
large-scale training data and appropriate network design that effectively
exploits the high-resolution input images. To tackle these problems, we propose
a simple yet effective 3D human digitization method called 2K2K, which
constructs a large-scale 2K human dataset and infers 3D human models from 2K
resolution images. The proposed method separately recovers the global shape of
a human and its details. The low-resolution depth network predicts the global
structure from a low-resolution image, and the part-wise image-to-normal
network predicts the details of the 3D human body structure. The
high-resolution depth network merges the global 3D shape and the detailed
structures to infer the high-resolution front and back side depth maps.
Finally, an off-the-shelf mesh generator reconstructs the full 3D human model,
which are available at https://github.com/SangHunHan92/2K2K. In addition, we
also provide 2,050 3D human models, including texture maps, 3D joints, and SMPL
parameters for research purposes. In experiments, we demonstrate competitive
performance over the recent works on various datasets.Comment: code page : https://github.com/SangHunHan92/2K2K, Accepted to CVPR
2023 (Highlight
Facile and versatile ligand analysis method of colloidal quantum dot
Colloidal quantum-dots (QDs) are highly attractive materials for various optoelectronic applications owing to their easy maneuverability, high functionality, wide applicability, and low cost of mass-production. QDs usually consist of two components: the inorganic nano-crystalline particle and organic ligands that passivate the surface of the inorganic particle. The organic component is also critical for tuning electronic properties of QDs as well as solubilizing QDs in various solvents. However, despite extensive effort to understand the chemistry of ligands, it has been challenging to develop an efficient and reliable method for identifying and quantifying ligands on the QD surface. Herein, we developed a novel method of analyzing ligands in a mild yet accurate fashion. We found that oxidizing agents, as a heterogeneous catalyst in a different phase from QDs, can efficiently disrupt the interaction between the inorganic particle and organic ligands, and the subsequent simple phase fractionation step can isolate the ligand-containing phase from the oxidizer-containing phase and the insoluble precipitates. Our novel analysis procedure ensures to minimize the exposure of ligand molecules to oxidizing agents as well as to prepare homogeneous samples that can be readily analyzed by diverse analytical techniques, such as nuclear magnetic resonance spectroscopy and gas-chromatography mass-spectrometry. © 2021, The Author(s).1
Disentangled Multi-Relational Graph Convolutional Network for Pedestrian Trajectory Prediction
Pedestrian trajectory prediction is one of the important tasks required for autonomous navigation and social robots in human environments. Previous studies focused on estimating social forces among individual pedestrians. However, they did not consider the social forces of groups on pedestrians, which results in over-collision avoidance problems. To address this problem, we present a Disentangled Multi-Relational Graph Convolutional Network (DMRGCN) for socially entangled pedestrian trajectory prediction. We first introduce a novel disentangled multi-scale aggregation to better represent social interactions, among pedestrians on a weighted graph. For the aggregation, we construct the multi-relational weighted graphs based on distances and relative displacements among pedestrians. In the prediction step, we propose a global temporal aggregation to alleviate accumulated errors for pedestrians changing their directions. Finally, we apply DropEdge into our DMRGCN to avoid the over-fitting issue on relatively small pedestrian trajectory datasets. Through the effective incorporation of the three parts within an end-to-end framework, DMRGCN achieves state-of-the-art performances on a variety of challenging trajectory prediction benchmarks