962 research outputs found
Implementation of Digital Technologies on Beverage Fermentation
In the food and beverage industries, implementing novel methods using digital technologies such as artificial intelligence (AI), sensors, robotics, computer vision, machine learning (ML), and sensory analysis using augmented reality (AR) has become critical to maintaining and increasing the productsâ quality traits and international competitiveness, especially within the past five years. Fermented beverages have been one of the most researched industries to implement these technologies to assess product composition and improve production processes and product quality. This Special Issue (SI) is focused on the latest research on the application of digital technologies on beverage fermentation monitoring and the improvement of processing performance, product quality and sensory acceptability
MULTIMODAL LEARNING FOR AUDIO AND VISUAL PROCESSING
The world contains vast amounts of information which can be sensed and captured in a variety of ways and formats. Virtual environments also lend themselves to endless possibilities and diversity of data. Often our experiences draw from these separate but complementary parts which can be combined in a way to provide a comprehensive representation of the events. Multimodal learning focuses on these types of combinations. By fusing multiple modalities, multimodal learning can improve results beyond individual mode performance. However, many of todayâs state-of-the-art techniques in computer vision, robotics, and machine learning rely solely or primarily on visual inputs even when the visual data is obtained from video where corresponding audio may also be readily available to augment learning. Vision only approaches can experience challenges in cases of highly reflective, transparent, or occluded objects and scenes where, if used alone or in conjunction with, audio may improve task performance. To address these challenges, this thesis explores coupling multimodal information to enhance task performance through learning-based methods for audio and visual processing using real and synthetic data. Physically-based graphics pipelines can naturally be extended for audio and visual synthetic data generation. To enhance the rigid body sound synthesis pipeline for objects containing a liquid, I used an added mass operator for fluid-structure coupling as a pre-processing step. My method is fast and practical for use in interactive 3D systems where live sound synthesis is desired. By fusing audio and visual data from real and synthetic videos, we also demonstrate enhanced processing and performance for object classification, tracking, and reconstruction tasks. As has been shown in visual question and answering and other related work, multiple modalities have the ability to complement one another and outperform single modality systems. To the best of my knowledge, I introduced the first use of audio-visual neural networks to analyze liquid pouring sequences by classifying their weight, liquid, and receiving container. Prior work often required predefined source weights or visual data. My contribution was to use the sound from a pouring sequenceâa liquid being poured into a target container- to train a multimodal convolutional neural networks (CNNs) that fuses mel-scaled spectrograms as audio inputs with corresponding visual data based on video images. I described the first use of an audio-visual neural network for tracking tabletop sized objects and enhancing visual object trackers. Like object detection of reflective surfaces, object trackers can also run into challenges when objects collide, occlude, appear similar, or come close to one another. By using the impact sounds of the objects during collision, my audio-visual object tracking (AVOT) neural network can correct trackers that drift from their original objects that were assigned before collision. Reflective and textureless surfaces not only are difficult to detect and classify, they are also often poorly reconstructed and filled with depth discontinuities and holes. I proposed the first use of an audiovisual method that uses the reflections of sound to aid in geometry and audio reconstruction, referred to as âEchoreconstructionâ. The mobile phone prototype emits pulsed audio, while recording video for RGBbased 3D reconstruction and audio-visual classification. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. EchoCNN inferences from these classifications enhance scene 3D reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. In addition to enhancing scene reconstructions, I proposed a multimodal single- and multi-frame reconstruction LSTM autoencoder for 3D reconstructions using audio-visual inputs. Our neural network produces high-quality 3D reconstructions using voxel representation. It is the first audio-visual reconstruction neural network for 3D geometry and material representation. Contributions of this thesis include new neural network designs, new enhancements to real and synthetic audio-visual datasets, and prototypes that demonstrate audio and audio-augmented performance for sound synthesis, inference, and reconstruction.Doctor of Philosoph
Inferring Complex Activities for Context-aware Systems within Smart Environments
The rising ageing population worldwide and the prevalence of age-related conditions such as physical fragility, mental impairments and chronic diseases have significantly impacted the quality of life and caused a shortage of health and care services. Over-stretched healthcare providers are leading to a paradigm shift in public healthcare provisioning. Thus, Ambient Assisted Living (AAL) using Smart Homes (SH) technologies has been rigorously investigated to help address the aforementioned problems.
Human Activity Recognition (HAR) is a critical component in AAL systems which enables applications such as just-in-time assistance, behaviour analysis, anomalies detection and emergency notifications. This thesis is aimed at investigating challenges faced in accurately recognising Activities of Daily Living (ADLs) performed by single or multiple inhabitants within smart environments. Specifically, this thesis explores five complementary research challenges in HAR. The first study contributes to knowledge by developing a semantic-enabled data segmentation approach with user-preferences. The second study takes the segmented set of sensor data to investigate and recognise human ADLs at multi-granular action level; coarse- and fine-grained action level. At the coarse-grained actions level, semantic relationships between the sensor, object and ADLs are deduced, whereas, at fine-grained action level, object usage at the satisfactory threshold with the evidence fused from multimodal sensor data is leveraged to verify the intended actions. Moreover, due to imprecise/vague interpretations of multimodal sensors and data fusion challenges, fuzzy set theory and fuzzy web ontology language (fuzzy-OWL) are leveraged. The third study focuses on incorporating uncertainties caused in HAR due to factors such as technological failure, object malfunction, and human errors. Hence, existing studies uncertainty theories and approaches are analysed and based on the findings, probabilistic ontology (PR-OWL) based HAR approach is proposed. The fourth study extends the first three studies to distinguish activities conducted by more than one inhabitant in a shared smart environment with the use of discriminative sensor-based techniques and time-series pattern analysis. The final study investigates in a suitable system architecture with a real-time smart environment tailored to AAL system and proposes microservices architecture with sensor-based off-the-shelf and bespoke sensing methods.
The initial semantic-enabled data segmentation study was evaluated with 100% and 97.8% accuracy to segment sensor events under single and mixed activities scenarios. However, the average classification time taken to segment each sensor events have suffered from 3971ms and 62183ms for single and mixed activities scenarios, respectively. The second study to detect fine-grained-level user actions was evaluated with 30 and 153 fuzzy rules to detect two fine-grained movements with a pre-collected dataset from the real-time smart environment. The result of the second study indicate good average accuracy of 83.33% and 100% but with the high average duration of 24648ms and 105318ms, and posing further challenges for the scalability of fusion rule creations. The third study was evaluated by incorporating PR-OWL ontology with ADL ontologies and Semantic-Sensor-Network (SSN) ontology to define four types of uncertainties presented in the kitchen-based activity. The fourth study illustrated a case study to extended single-user AR to multi-user AR by combining RFID tags and fingerprint sensors discriminative sensors to identify and associate user actions with the aid of time-series analysis. The last study responds to the computations and performance requirements for the four studies by analysing and proposing microservices-based system architecture for AAL system. A future research investigation towards adopting fog/edge computing paradigms from cloud computing is discussed for higher availability, reduced network traffic/energy, cost, and creating a decentralised system.
As a result of the five studies, this thesis develops a knowledge-driven framework to estimate and recognise multi-user activities at fine-grained level user actions. This framework integrates three complementary ontologies to conceptualise factual, fuzzy and uncertainties in the environment/ADLs, time-series analysis and discriminative sensing environment. Moreover, a distributed software architecture, multimodal sensor-based hardware prototypes, and other supportive utility tools such as simulator and synthetic ADL data generator for the experimentation were developed to support the evaluation of the proposed approaches. The distributed system is platform-independent and currently supported by an Android mobile application and web-browser based client interfaces for retrieving information such as live sensor events and HAR results
Architectures for online simulation-based inference applied to robot motion planning
Robotic systems have enjoyed significant adoption in industrial and field applications
in structured environments, where clear specifications of the task and observations are
available. Deploying robots in unstructured and dynamic environments remains a
challenge, being addressed through emerging advances in machine learning. The key
open issues in this area include the difficulty of achieving coverage of all factors of
variation in the domain of interest, satisfying safety constraints, etc. One tool that has
played a crucial role in addressing these issues is simulation - which is used to generate
data, and sometimes as a world representation within the decision-making loop.
When physical simulation modules are used in this way, a number of computational
problems arise. Firstly, a suitable simulation representation and fidelity is required
for the specific task of interest. Secondly, we need to perform parameter inference of
physical variables being used in the simulation models. Thirdly, there is the need for
data assimilation, which must be achieved in real-time if the resulting model is to be
used within the online decision-making loop. These are the motivating problems for
this thesis.
In the first section of the thesis, we tackle the inference problem with respect to
a fluid simulation model, where a sensorised UAV performs path planning with the
objective of acquiring data including gas concentration/identity and IMU-based wind
estimation readings. The task for the UAV is to localise the source of a gas leak, while
accommodating the subsequent dispersion of the gas in windy conditions. We present
a formulation of this problem that allows us to perform online and real-time active
inference efficiently through problem-specific simplifications.
In the second section of the thesis, we explore the problem of robot motion planning
when the true state is not fully observable, and actions influence how much of the
state is subsequently observed. This is motivated by the practical problem of a robot
performing suction in the surgical automation setting. The objective is the efficient
removal of liquid while respecting a safety constraint - to not touch the underlying
tissue if possible. If the problem were represented in full generality, as one of planning
under uncertainty and hidden state, it could be hard to find computationally efficient
solutions. Once again, we make problem-specific simplifications. Crucially, instead of
reasoning in general about fluid flows and arbitrary surfaces, we exploit the observations
that the decision can be informed by the contour tree skeleton of the volume, and the
configurations in which the fluid would come to rest if unperturbed. This allows us
to address the problem as one of iterative shortest path computation, whose costs are
informed by a model estimating the shape of the underlying surface.
In the third and final section of the thesis, we propose a model for real-time parameter
estimation directly from raw pixel observations. Through the use of a Variational
Recurrent Neural Network model, where the latent space is further structured by
penalising for fit to data from a physical simulation, we devise an efficient online
inference scheme. This is first shown in the context of a representative dynamic
manipulation task for a robot. This task involves reasoning about a bouncing ball that it
must catch â using as input the raw video from an environment-mounted camera and
accommodating noise and variations in the object and environmental conditions. We
then show that the same architecture lends itself to solving inference problems involving
more complex dynamics, by applying this to measurement inversion of ultrafast X-Ray
scattering data to infer molecular geometry
- âŠ