21,380 research outputs found
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
Audio-Visual Automatic Speech Recognition Towards Education for Disabilities
Education is a fundamental right that enriches everyone’s life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition
Likelihood Asymptotics in Nonregular Settings: A Review with Emphasis on the Likelihood Ratio
This paper reviews the most common situations where one or more regularity
conditions which underlie classical likelihood-based parametric inference fail.
We identify three main classes of problems: boundary problems, indeterminate
parameter problems -- which include non-identifiable parameters and singular
information matrices -- and change-point problems. The review focuses on the
large-sample properties of the likelihood ratio statistic. We emphasize
analytical solutions and acknowledge software implementations where available.
We furthermore give summary insight about the possible tools to derivate the
key results. Other approaches to hypothesis testing and connections to
estimation are listed in the annotated bibliography of the Supplementary
Material
Offline and Online Models for Learning Pairwise Relations in Data
Pairwise relations between data points are essential for numerous machine learning algorithms. Many representation learning methods consider pairwise relations to identify the latent features and patterns in the data. This thesis, investigates learning of pairwise relations from two different perspectives: offline learning and online learning.The first part of the thesis focuses on offline learning by starting with an investigation of the performance modeling of a synchronization method in concurrent programming using a Markov chain whose state transition matrix models pairwise relations between involved cores in a computer process.Then the thesis focuses on a particular pairwise distance measure, the minimax distance, and explores memory-efficient approaches to computing this distance by proposing a hierarchical representation of the data with a linear memory requirement with respect to the number of data points, from which the exact pairwise minimax distances can be derived in a memory-efficient manner. Then, a memory-efficient sampling method is proposed that follows the aforementioned hierarchical representation of the data and samples the data points in a way that the minimax distances between all data points are maximally preserved. Finally, the thesis proposes a practical non-parametric clustering of vehicle motion trajectories to annotate traffic scenarios based on transitive relations between trajectories in an embedded space.The second part of the thesis takes an online learning perspective, and starts by presenting an online learning method for identifying bottlenecks in a road network by extracting the minimax path, where bottlenecks are considered as road segments with the highest cost, e.g., in the sense of travel time. Inspired by real-world road networks, the thesis assumes a stochastic traffic environment in which the road-specific probability distribution of travel time is unknown. Therefore, it needs to learn the parameters of the probability distribution through observations by modeling the bottleneck identification task as a combinatorial semi-bandit problem. The proposed approach takes into account the prior knowledge and follows a Bayesian approach to update the parameters. Moreover, it develops a combinatorial variant of Thompson Sampling and derives an upper bound for the corresponding Bayesian regret. Furthermore, the thesis proposes an approximate algorithm to address the respective computational intractability issue.Finally, the thesis considers contextual information of road network segments by extending the proposed model to a contextual combinatorial semi-bandit framework and investigates and develops various algorithms for this contextual combinatorial setting
Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture
Classification of AI-manipulated content is receiving great attention, for
distinguishing different types of manipulations. Most of the methods developed
so far fail in the open-set scenario, that is when the algorithm used for the
manipulation is not represented by the training set. In this paper, we focus on
the classification of synthetic face generation and manipulation in open-set
scenarios, and propose a method for classification with a rejection option. The
proposed method combines the use of Vision Transformers (ViT) with a hybrid
approach for simultaneous classification and localization. Feature map
correlation is exploited by the ViT module, while a localization branch is
employed as an attention mechanism to force the model to learn per-class
discriminative features associated with the forgery when the manipulation is
performed locally in the image. Rejection is performed by considering several
strategies and analyzing the model output layers. The effectiveness of the
proposed method is assessed for the task of classification of facial attribute
editing and GAN attribution
Four Lectures on the Random Field Ising Model, Parisi-Sourlas Supersymmetry, and Dimensional Reduction
Numerical evidence suggests that the Random Field Ising Model loses
Parisi-Sourlas SUSY and the dimensional reduction property somewhere between 4
and 5 dimensions, while a related model of branched polymers retains these
features in any . These notes give a leisurely introduction to a recent
theory, developed jointly with A. Kaviraj and E. Trevisani, which aims to
explain these facts. Based on the lectures given in Cortona and at the IHES in
2022.Comment: 55 pages, 11 figures; v2 - minor changes, mentioned forthcoming work
by Fytas et a
Eigen-Factors an Alternating Optimization for Back-end Plane SLAM of 3D Point Clouds
Modern depth sensors can generate a huge number of 3D points in few seconds
to be latter processed by Localization and Mapping algorithms. Ideally, these
algorithms should handle efficiently large sizes of Point Clouds under the
assumption that using more points implies more information available. The Eigen
Factors (EF) is a new algorithm that solves SLAM by using planes as the main
geometric primitive. To do so, EF exhaustively calculates the error of all
points at complexity , thanks to the {\em Summation matrix} of
homogeneous points.
The solution of EF is highly efficient: i) the state variables are only the
sensor poses -- trajectory, while the plane parameters are estimated previously
in closed from and ii) EF alternating optimization uses a Newton-Raphson method
by a direct analytical calculation of the gradient and the Hessian, which turns
out to be a block diagonal matrix. Since we require to differentiate over
eigenvalues and matrix elements, we have developed an intuitive methodology to
calculate partial derivatives in the manifold of rigid body transformations
, which could be applied to unrelated problems that require analytical
derivatives of certain complexity.
We evaluate EF and other state-of-the-art plane SLAM back-end algorithms in a
synthetic environment. The evaluation is extended to ICL dataset (RGBD) and
LiDAR KITTI dataset. Code is publicly available at
https://github.com/prime-slam/EF-plane-SLAM
Model Diagnostics meets Forecast Evaluation: Goodness-of-Fit, Calibration, and Related Topics
Principled forecast evaluation and model diagnostics are vital in fitting probabilistic models and forecasting outcomes of interest. A common principle is that fitted or predicted distributions ought to be calibrated, ideally in the sense that the outcome is indistinguishable from a random draw from the posited distribution. Much of this thesis is centered on calibration properties of various types of forecasts.
In the first part of the thesis, a simple algorithm for exact multinomial goodness-of-fit tests is proposed. The algorithm computes exact -values based on various test statistics, such as the log-likelihood ratio and Pearson\u27s chi-square. A thorough analysis shows improvement on extant methods. However, the runtime of the algorithm grows exponentially in the number of categories and hence its use is limited.
In the second part, a framework rooted in probability theory is developed, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. Based on a general notion of conditional T-calibration, the thesis introduces population versions of T-reliability diagrams and revisits a score decomposition into measures of miscalibration, discrimination, and uncertainty. Stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, a universal coefficient of determination is introduced that nests and reinterprets the classical in least squares regression.
In the third part, probabilistic top lists are proposed as a novel type of prediction in classification, which bridges the gap between single-class predictions and predictive distributions. The probabilistic top list functional is elicited by strictly consistent evaluation metrics, based on symmetric proper scoring rules, which admit comparison of various types of predictions
Visualisation of Fundamental Movement Skills (FMS): An Iterative Process Using an Overarm Throw
Fundamental Movement Skills (FMS) are precursor gross motor skills to more complex or specialised skills and are recognised as important indicators of physical competence, a key component of physical literacy. FMS are predominantly assessed using pre-defined manual methodologies, most commonly the various iterations of the Test of Gross Motor Development. However, such assessments are time-consuming and often require a minimum basic level of training to conduct. Therefore, the overall aim of this thesis was to utilise accelerometry to develop a visualisation concept as part of a feasibility study to support the learning and assessment of FMS, by reducing subjectivity and the overall time taken to conduct a gross motor skill assessment. The overarm throw, an important fundamental movement skill, was specifically selected for the visualisation development as it is an acyclic movement with a distinct initiation and conclusion. Thirteen children (14.8 ± 0.3 years; 9 boys) wore an ActiGraph GT9X Link Inertial Measurement Unit device on the dominant wrist whilst performing a series of overarm throws. This thesis illustrates how the visualisation concept was developed using raw accelerometer data, which was processed and manipulated using MATLAB 2019b software to obtain and depict key throw performance data, including the trajectory and velocity of the wrist during the throw. Overall, this thesis found that the developed visualisation concept can provide strong indicators of throw competency based on the shape of the throw trajectory. Future research should seek to utilise a larger, more diverse, population, and incorporate machine learning. Finally, further work is required to translate this concept to other gross motor skills
Learning disentangled speech representations
A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody.
The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions.
In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks.
This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically
- …