13,361 research outputs found
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
NF-Atlas: Multi-Volume Neural Feature Fields for Large Scale LiDAR Mapping
LiDAR Mapping has been a long-standing problem in robotics. Recent progress
in neural implicit representation has brought new opportunities to robotic
mapping. In this paper, we propose the multi-volume neural feature fields,
called NF-Atlas, which bridge the neural feature volumes with pose graph
optimization. By regarding the neural feature volume as pose graph nodes and
the relative pose between volumes as pose graph edges, the entire neural
feature field becomes both locally rigid and globally elastic. Locally, the
neural feature volume employs a sparse feature Octree and a small MLP to encode
the submap SDF with an option of semantics. Learning the map using this
structure allows for end-to-end solving of maximum a posteriori (MAP) based
probabilistic mapping. Globally, the map is built volume by volume
independently, avoiding catastrophic forgetting when mapping incrementally.
Furthermore, when a loop closure occurs, with the elastic pose graph based
representation, only updating the origin of neural volumes is required without
remapping. Finally, these functionalities of NF-Atlas are validated. Thanks to
the sparsity and the optimization based formulation, NF-Atlas shows competitive
performance in terms of accuracy, efficiency and memory usage on both
simulation and real-world datasets
Corporate Social Responsibility: the institutionalization of ESG
Understanding the impact of Corporate Social Responsibility (CSR) on firm performance as it relates to industries reliant on technological innovation is a complex and perpetually evolving challenge. To thoroughly investigate this topic, this dissertation will adopt an economics-based structure to address three primary hypotheses. This structure allows for each hypothesis to essentially be a standalone empirical paper, unified by an overall analysis of the nature of impact that ESG has on firm performance. The first hypothesis explores the evolution of CSR to the modern quantified iteration of ESG has led to the institutionalization and standardization of the CSR concept. The second hypothesis fills gaps in existing literature testing the relationship between firm performance and ESG by finding that the relationship is significantly positive in long-term, strategic metrics (ROA and ROIC) and that there is no correlation in short-term metrics (ROE and ROS). Finally, the third hypothesis states that if a firm has a long-term strategic ESG plan, as proxied by the publication of CSR reports, then it is more resilience to damage from controversies. This is supported by the finding that pro-ESG firms consistently fared better than their counterparts in both financial and ESG performance, even in the event of a controversy. However, firms with consistent reporting are also held to a higher standard than their nonreporting peers, suggesting a higher risk and higher reward dynamic. These findings support the theory of good management, in that long-term strategic planning is both immediately economically beneficial and serves as a means of risk management and social impact mitigation. Overall, this contributes to the literature by fillings gaps in the nature of impact that ESG has on firm performance, particularly from a management perspective
Transverse Velocity Field Measurement in High-Resolution Solar Images Based on Deep Learning
To address the problem of the low accuracy of transverse velocity field
measurements for small targets in high-resolution solar images, we proposed a
novel velocity field measurement method for high-resolution solar images based
on PWCNet. This method transforms the transverse velocity field measurements
into an optical flow field prediction problem. We evaluated the performance of
the proposed method using the Ha and TiO datasets obtained from New Vacuum
Solar Telescope (NVST) observations. The experimental results show that our
method effectively predicts the optical flow of small targets in images
compared with several typical machine- and deep-learning methods. On the Ha
dataset, the proposed method improves the image structure similarity from
0.9182 to 0.9587 and reduces the mean of residuals from 24.9931 to 15.2818; on
the TiO dataset, the proposed method improves the image structure similarity
from 0.9289 to 0.9628 and reduces the mean of residuals from 25.9908 to
17.0194. The optical flow predicted using the proposed method can provide
accurate data for the atmospheric motion information of solar images. The code
implementing the proposed method is available on
https://github.com/lygmsy123/transverse-velocity-field-measurement.Comment: 14 pages, 10 figures, 4 tables. Accepted for publication in Research
in Astronomy and Astrophysic
Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms
We propose a new model-based algorithm solving the inverse rig problem in
facial animation retargeting, exhibiting higher accuracy of the fit and
sparser, more interpretable weight vector compared to SOTA. The proposed method
targets a specific subdomain of human face animation - highly-realistic
blendshape models used in the production of movies and video games. In this
paper, we formulate an optimization problem that takes into account all the
requirements of targeted models. Our objective goes beyond a linear blendshape
model and employs the quadratic corrective terms necessary for correctly
fitting fine details of the mesh. We show that the solution to the proposed
problem yields highly accurate mesh reconstruction even when general-purpose
solvers, like SQP, are used. The results obtained using SQP are highly accurate
in the mesh space but do not exhibit favorable qualities in terms of weight
sparsity and smoothness, and for this reason, we further propose a novel
algorithm relying on a MM technique. The algorithm is specifically suited for
solving the proposed objective, yielding a high-accuracy mesh fit while
respecting the constraints and producing a sparse and smooth set of weights
easy to manipulate and interpret by artists. Our algorithm is benchmarked with
SOTA approaches, and shows an overall superiority of the results, yielding a
smooth animation reconstruction with a relative improvement up to 45 percent in
root mean squared mesh error while keeping the cardinality comparable with
benchmark methods. This paper gives a comprehensive set of evaluation metrics
that cover different aspects of the solution, including mesh accuracy, sparsity
of the weights, and smoothness of the animation curves, as well as the
appearance of the produced animation, which human experts evaluated
MaPLe: Multi-modal Prompt Learning
Pre-trained vision-language (V-L) models such as CLIP have shown excellent
generalization ability to downstream tasks. However, they are sensitive to the
choice of input text prompts and require careful selection of prompt templates
to perform well. Inspired by the Natural Language Processing (NLP) literature,
recent CLIP adaptation approaches learn prompts as the textual inputs to
fine-tune CLIP for downstream tasks. We note that using prompting to adapt
representations in a single branch of CLIP (language or vision) is sub-optimal
since it does not allow the flexibility to dynamically adjust both
representation spaces on a downstream task. In this work, we propose
Multi-modal Prompt Learning (MaPLe) for both vision and language branches to
improve alignment between the vision and language representations. Our design
promotes strong coupling between the vision-language prompts to ensure mutual
synergy and discourages learning independent uni-modal solutions. Further, we
learn separate prompts across different early stages to progressively model the
stage-wise feature relationships to allow rich context learning. We evaluate
the effectiveness of our approach on three representative tasks of
generalization to novel classes, new target datasets and unseen domain shifts.
Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable
performance and achieves an absolute gain of 3.45% on novel classes and 2.72%
on overall harmonic-mean, averaged over 11 diverse image recognition datasets.
Our code and pre-trained models are available at
https://github.com/muzairkhattak/multimodal-prompt-learning.Comment: Accepted at CVPR202
Perceptual Requirements for World-Locked Rendering in AR and VR
Stereoscopic, head-tracked display systems can show users realistic,
world-locked virtual objects and environments. However, discrepancies between
the rendering pipeline and physical viewing conditions can lead to perceived
instability in the rendered content resulting in reduced realism, immersion,
and, potentially, visually-induced motion sickness. The requirements to achieve
perceptually stable world-locked rendering are unknown due to the challenge of
constructing a wide field of view, distortion-free display with highly accurate
head- and eye-tracking. In this work we introduce new hardware and software
built upon recently introduced hardware and present a system capable of
rendering virtual objects over real-world references without perceivable drift
under such constraints. The platform is used to study acceptable errors in
render camera position for world-locked rendering in augmented and virtual
reality scenarios, where we find an order of magnitude difference in perceptual
sensitivity between them. We conclude by comparing study results with an
analytic model which examines changes to apparent depth and visual heading in
response to camera displacement errors. We identify visual heading as an
important consideration for world-locked rendering alongside depth errors from
incorrect disparity
Examples of works to practice staccato technique in clarinet instrument
Klarnetin staccato tekniğini güçlendirme aşamaları eser çalışmalarıyla uygulanmıştır. Staccato
geçişlerini hızlandıracak ritim ve nüans çalışmalarına yer verilmiştir. Çalışmanın en önemli amacı
sadece staccato çalışması değil parmak-dilin eş zamanlı uyumunun hassasiyeti üzerinde de
durulmasıdır. Staccato çalışmalarını daha verimli hale getirmek için eser çalışmasının içinde etüt
çalışmasına da yer verilmiştir. Çalışmaların üzerinde titizlikle durulması staccato çalışmasının ilham
verici etkisi ile müzikal kimliğe yeni bir boyut kazandırmıştır. Sekiz özgün eser çalışmasının her
aşaması anlatılmıştır. Her aşamanın bir sonraki performans ve tekniği güçlendirmesi esas alınmıştır.
Bu çalışmada staccato tekniğinin hangi alanlarda kullanıldığı, nasıl sonuçlar elde edildiği bilgisine
yer verilmiştir. Notaların parmak ve dil uyumu ile nasıl şekilleneceği ve nasıl bir çalışma disiplini
içinde gerçekleşeceği planlanmıştır. Kamış-nota-diyafram-parmak-dil-nüans ve disiplin
kavramlarının staccato tekniğinde ayrılmaz bir bütün olduğu saptanmıştır. Araştırmada literatür
taraması yapılarak staccato ile ilgili çalışmalar taranmıştır. Tarama sonucunda klarnet tekniğin de
kullanılan staccato eser çalışmasının az olduğu tespit edilmiştir. Metot taramasında da etüt
çalışmasının daha çok olduğu saptanmıştır. Böylelikle klarnetin staccato tekniğini hızlandırma ve
güçlendirme çalışmaları sunulmuştur. Staccato etüt çalışmaları yapılırken, araya eser çalışmasının
girmesi beyni rahatlattığı ve istekliliği daha arttırdığı gözlemlenmiştir. Staccato çalışmasını yaparken
doğru bir kamış seçimi üzerinde de durulmuştur. Staccato tekniğini doğru çalışmak için doğru bir
kamışın dil hızını arttırdığı saptanmıştır. Doğru bir kamış seçimi kamıştan rahat ses çıkmasına
bağlıdır. Kamış, dil atma gücünü vermiyorsa daha doğru bir kamış seçiminin yapılması gerekliliği
vurgulanmıştır. Staccato çalışmalarında baştan sona bir eseri yorumlamak zor olabilir. Bu açıdan
çalışma, verilen müzikal nüanslara uymanın, dil atış performansını rahatlattığını ortaya koymuştur.
Gelecek nesillere edinilen bilgi ve birikimlerin aktarılması ve geliştirici olması teşvik edilmiştir.
Çıkacak eserlerin nasıl çözüleceği, staccato tekniğinin nasıl üstesinden gelinebileceği anlatılmıştır.
Staccato tekniğinin daha kısa sürede çözüme kavuşturulması amaç edinilmiştir. Parmakların
yerlerini öğrettiğimiz kadar belleğimize de çalışmaların kaydedilmesi önemlidir. Gösterilen azmin ve
sabrın sonucu olarak ortaya çıkan yapıt başarıyı daha da yukarı seviyelere çıkaracaktır
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
Humans have long been recorded in a variety of forms since antiquity. For
example, sculptures and paintings were the primary media for depicting human
beings before the invention of cameras. However, most current human-centric
computer vision tasks like human pose estimation and human image generation
focus exclusively on natural images in the real world. Artificial humans, such
as those in sculptures, paintings, and cartoons, are commonly neglected, making
existing models fail in these scenarios. As an abstraction of life, art
incorporates humans in both natural and artificial scenes. We take advantage of
it and introduce the Human-Art dataset to bridge related tasks in natural and
artificial scenarios. Specifically, Human-Art contains 50k high-quality images
with over 123k person instances from 5 natural and 15 artificial scenarios,
which are annotated with bounding boxes, keypoints, self-contact points, and
text information for humans represented in both 2D and 3D. It is, therefore,
comprehensive and versatile for various downstream tasks. We also provide a
rich set of baseline results and detailed analyses for related tasks, including
human detection, 2D and 3D human pose estimation, image generation, and motion
transfer. As a challenging dataset, we hope Human-Art can provide insights for
relevant research and open up new research questions.Comment: CVPR202
DiffRF: Rendering-Guided 3D Radiance Field Diffusion
We introduce DiffRF, a novel approach for 3D radiance field synthesis based
on denoising diffusion probabilistic models. While existing diffusion-based
methods operate on images, latent codes, or point cloud data, we are the first
to directly generate volumetric radiance fields. To this end, we propose a 3D
denoising model which directly operates on an explicit voxel grid
representation. However, as radiance fields generated from a set of posed
images can be ambiguous and contain artifacts, obtaining ground truth radiance
field samples is non-trivial. We address this challenge by pairing the
denoising formulation with a rendering loss, enabling our model to learn a
deviated prior that favours good image quality instead of trying to replicate
fitting errors like floating artifacts. In contrast to 2D-diffusion models, our
model learns multi-view consistent priors, enabling free-view synthesis and
accurate shape generation. Compared to 3D GANs, our diffusion-based approach
naturally enables conditional generation such as masked completion or
single-view 3D synthesis at inference time.Comment: Project page: https://sirwyver.github.io/DiffRF/ Video:
https://youtu.be/qETBcLu8SUk - CVPR 2023 Highlight - updated evaluations
after fixing initial data mapping error on all method
- …