9,330 research outputs found
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
Recommended from our members
Ensuring Access to Safe and Nutritious Food for All Through the Transformation of Food Systems
CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities
Sleep abnormalities can have severe health consequences. Automated sleep
staging, i.e. labelling the sequence of sleep stages from the patient's
physiological recordings, could simplify the diagnostic process. Previous work
on automated sleep staging has achieved great results, mainly relying on the
EEG signal. However, often multiple sources of information are available beyond
EEG. This can be particularly beneficial when the EEG recordings are noisy or
even missing completely. In this paper, we propose CoRe-Sleep, a Coordinated
Representation multimodal fusion network that is particularly focused on
improving the robustness of signal analysis on imperfect data. We demonstrate
how appropriately handling multimodal information can be the key to achieving
such robustness. CoRe-Sleep tolerates noisy or missing modalities segments,
allowing training on incomplete data. Additionally, it shows state-of-the-art
performance when testing on both multimodal and unimodal data using a single
model on SHHS-1, the largest publicly available study that includes sleep stage
labels. The results indicate that training the model on multimodal data does
positively influence performance when tested on unimodal data. This work aims
at bridging the gap between automated analysis tools and their clinical
utility.Comment: 10 pages, 4 figures, 2 tables, journa
Corporate Social Responsibility: the institutionalization of ESG
Understanding the impact of Corporate Social Responsibility (CSR) on firm performance as it relates to industries reliant on technological innovation is a complex and perpetually evolving challenge. To thoroughly investigate this topic, this dissertation will adopt an economics-based structure to address three primary hypotheses. This structure allows for each hypothesis to essentially be a standalone empirical paper, unified by an overall analysis of the nature of impact that ESG has on firm performance. The first hypothesis explores the evolution of CSR to the modern quantified iteration of ESG has led to the institutionalization and standardization of the CSR concept. The second hypothesis fills gaps in existing literature testing the relationship between firm performance and ESG by finding that the relationship is significantly positive in long-term, strategic metrics (ROA and ROIC) and that there is no correlation in short-term metrics (ROE and ROS). Finally, the third hypothesis states that if a firm has a long-term strategic ESG plan, as proxied by the publication of CSR reports, then it is more resilience to damage from controversies. This is supported by the finding that pro-ESG firms consistently fared better than their counterparts in both financial and ESG performance, even in the event of a controversy. However, firms with consistent reporting are also held to a higher standard than their nonreporting peers, suggesting a higher risk and higher reward dynamic. These findings support the theory of good management, in that long-term strategic planning is both immediately economically beneficial and serves as a means of risk management and social impact mitigation. Overall, this contributes to the literature by fillings gaps in the nature of impact that ESG has on firm performance, particularly from a management perspective
Copy-paste data augmentation for domain transfer on traffic signs
City streets carry a lot of information that can be exploited to improve the quality of the services the citizens receive. For example, autonomous vehicles need to act accordingly to all the element that are nearby the vehicle itself, like pedestrians, traffic signs and other vehicles. It is also possible to use such information for smart city applications, for example to predict and analyze the traffic or pedestrian flows.
Among all the objects that it is possible to find in a street, traffic signs are very important because of the information they carry. This information can in fact be exploited both for autonomous driving and for smart city applications. Deep learning and, more generally, machine learning models however need huge quantities to learn. Even though modern models are very good at gener- alizing, the more samples the model has, the better it can generalize between different samples.
Creating these datasets organically, namely with real pictures, is a very tedious task because of the wide variety of signs available in the whole world and especially because of all the possible light, orientation conditions and con- ditions in general in which they can appear. In addition to that, it may not be easy to collect enough samples for all the possible traffic signs available, cause some of them may be very rare to find.
Instead of collecting pictures manually, it is possible to exploit data aug- mentation techniques to create synthetic datasets containing the signs that are needed. Creating this data synthetically allows to control the distribution and the conditions of the signs in the datasets, improving the quality and quantity of training data that is going to be used. This thesis work is about using copy-paste data augmentation to create synthetic data for the traffic sign recognition task
Neural Architecture Search: Insights from 1000 Papers
In the past decade, advances in deep learning have resulted in breakthroughs
in a variety of areas, including computer vision, natural language
understanding, speech recognition, and reinforcement learning. Specialized,
high-performing neural architectures are crucial to the success of deep
learning in these areas. Neural architecture search (NAS), the process of
automating the design of neural architectures for a given task, is an
inevitable next step in automating machine learning and has already outpaced
the best human-designed architectures on many tasks. In the past few years,
research in NAS has been progressing rapidly, with over 1000 papers released
since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized
and comprehensive guide to neural architecture search. We give a taxonomy of
search spaces, algorithms, and speedup techniques, and we discuss resources
such as benchmarks, best practices, other surveys, and open-source libraries
Hardware Acceleration of Neural Graphics
Rendering and inverse-rendering algorithms that drive conventional computer
graphics have recently been superseded by neural representations (NR). NRs have
recently been used to learn the geometric and the material properties of the
scenes and use the information to synthesize photorealistic imagery, thereby
promising a replacement for traditional rendering algorithms with scalable
quality and predictable performance. In this work we ask the question: Does
neural graphics (NG) need hardware support? We studied representative NG
applications showing that, if we want to render 4k res. at 60FPS there is a gap
of 1.5X-55X in the desired performance on current GPUs. For AR/VR applications,
there is an even larger gap of 2-4 OOM between the desired performance and the
required system power. We identify that the input encoding and the MLP kernels
are the performance bottlenecks, consuming 72%,60% and 59% of application time
for multi res. hashgrid, multi res. densegrid and low res. densegrid encodings,
respectively. We propose a NG processing cluster, a scalable and flexible
hardware architecture that directly accelerates the input encoding and MLP
kernels through dedicated engines and supports a wide range of NG applications.
We also accelerate the rest of the kernels by fusing them together in Vulkan,
which leads to 9.94X kernel-level performance improvement compared to un-fused
implementation of the pre-processing and the post-processing kernels. Our
results show that, NGPC gives up to 58X end-to-end application-level
performance improvement, for multi res. hashgrid encoding on average across the
four NG applications, the performance benefits are 12X,20X,33X and 39X for the
scaling factor of 8,16,32 and 64, respectively. Our results show that with
multi res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPS
for NeRF and 8k res. at 120FPS for all our other NG applications
Pretrained Embeddings for E-commerce Machine Learning: When it Fails and Why?
The use of pretrained embeddings has become widespread in modern e-commerce
machine learning (ML) systems. In practice, however, we have encountered
several key issues when using pretrained embedding in a real-world production
system, many of which cannot be fully explained by current knowledge.
Unfortunately, we find that there is a lack of a thorough understanding of how
pre-trained embeddings work, especially their intrinsic properties and
interactions with downstream tasks. Consequently, it becomes challenging to
make interactive and scalable decisions regarding the use of pre-trained
embeddings in practice.
Our investigation leads to two significant discoveries about using pretrained
embeddings in e-commerce applications. Firstly, we find that the design of the
pretraining and downstream models, particularly how they encode and decode
information via embedding vectors, can have a profound impact. Secondly, we
establish a principled perspective of pre-trained embeddings via the lens of
kernel analysis, which can be used to evaluate their predictability,
interactively and scalably. These findings help to address the practical
challenges we faced and offer valuable guidance for successful adoption of
pretrained embeddings in real-world production. Our conclusions are backed by
solid theoretical reasoning, benchmark experiments, as well as online testings
LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability
EEG-based recognition of activities and states involves the use of prior
neuroscience knowledge to generate quantitative EEG features, which may limit
BCI performance. Although neural network-based methods can effectively extract
features, they often encounter issues such as poor generalization across
datasets, high predicting volatility, and low model interpretability. Hence, we
propose a novel lightweight multi-dimensional attention network, called
LMDA-Net. By incorporating two novel attention modules designed specifically
for EEG signals, the channel attention module and the depth attention module,
LMDA-Net can effectively integrate features from multiple dimensions, resulting
in improved classification performance across various BCI tasks. LMDA-Net was
evaluated on four high-impact public datasets, including motor imagery (MI) and
P300-Speller paradigms, and was compared with other representative models. The
experimental results demonstrate that LMDA-Net outperforms other representative
methods in terms of classification accuracy and predicting volatility,
achieving the highest accuracy in all datasets within 300 training epochs.
Ablation experiments further confirm the effectiveness of the channel attention
module and the depth attention module. To facilitate an in-depth understanding
of the features extracted by LMDA-Net, we propose class-specific neural network
feature interpretability algorithms that are suitable for event-related
potentials (ERPs) and event-related desynchronization/synchronization
(ERD/ERS). By mapping the output of the specific layer of LMDA-Net to the time
or spatial domain through class activation maps, the resulting feature
visualizations can provide interpretable analysis and establish connections
with EEG time-spatial analysis in neuroscience. In summary, LMDA-Net shows
great potential as a general online decoding model for various EEG tasks.Comment: 20 pages, 7 Figure
Procedure-Aware Pretraining for Instructional Video Understanding
Our goal is to learn a video representation that is useful for downstream
procedure understanding tasks in instructional videos. Due to the small amount
of available annotations, a key challenge in procedure understanding is to be
able to extract from unlabeled videos the procedural knowledge such as the
identity of the task (e.g., 'make latte'), its steps (e.g., 'pour milk'), or
the potential next steps given partial progress in its execution. Our main
insight is that instructional videos depict sequences of steps that repeat
between instances of the same or different tasks, and that this structure can
be well represented by a Procedural Knowledge Graph (PKG), where nodes are
discrete steps and edges connect steps that occur sequentially in the
instructional activities. This graph can then be used to generate pseudo labels
to train a video representation that encodes the procedural knowledge in a more
accessible form to generalize to multiple procedure understanding tasks. We
build a PKG by combining information from a text-based procedural knowledge
database and an unlabeled instructional video corpus and then use it to
generate training pseudo labels with four novel pre-training objectives. We
call this PKG-based pre-training procedure and the resulting model Paprika,
Procedure-Aware PRe-training for Instructional Knowledge Acquisition. We
evaluate Paprika on COIN and CrossTask for procedure understanding tasks such
as task recognition, step recognition, and step forecasting. Paprika yields a
video representation that improves over the state of the art: up to 11.23%
gains in accuracy in 12 evaluation settings. Implementation is available at
https://github.com/salesforce/paprika.Comment: CVPR 202
- …