6,040 research outputs found
DILF: Differentiable Rendering-Based Multi-View Image-Language Fusion for Zero-Shot 3D Shape Understanding
Zero-shot 3D shape understanding aims to recognize âunseenâ 3D categories that are not present in training data. Recently, Contrastive LanguageâImage Pre-training (CLIP) has shown promising open-world performance in zero-shot 3D shape understanding tasks by information fusion among language and 3D modality. It first renders 3D objects into multiple 2D image views and then learns to understand the semantic relationships between the textual descriptions and images, enabling the model to generalize to new and unseen categories. However, existing studies in zero-shot 3D shape understanding rely on predefined rendering parameters, resulting in repetitive, redundant, and low-quality views. This limitation hinders the modelâs ability to fully comprehend 3D shapes and adversely impacts the textâimage fusion in a shared latent space. To this end, we propose a novel approach called Differentiable rendering-based multi-view ImageâLanguage Fusion (DILF) for zero-shot 3D shape understanding. Specifically, DILF leverages large-scale language models (LLMs) to generate textual prompts enriched with 3D semantics and designs a differentiable renderer with learnable rendering parameters to produce representative multi-view images. These rendering parameters can be iteratively updated using a textâimage fusion loss, which aids in parametersâ regression, allowing the model to determine the optimal viewpoint positions for each 3D object. Then a group-view mechanism is introduced to model interdependencies across views, enabling efficient information fusion to achieve a more comprehensive 3D shape understanding. Experimental results can demonstrate that DILF outperforms state-of-the-art methods for zero-shot 3D classification while maintaining competitive performance for standard 3D classification. The code is available at https://github.com/yuzaiyang123/DILP
Robustness, Heterogeneity and Structure Capturing for Graph Representation Learning and its Application
Graph neural networks (GNNs) are potent methods for graph representation learn- ing (GRL), which extract knowledge from complicated (graph) structured data in various real-world scenarios. However, GRL still faces many challenges. Firstly GNN-based node classification may deteriorate substantially by overlooking the pos- sibility of noisy data in graph structures, as models wrongly process the relation among nodes in the input graphs as the ground truth. Secondly, nodes and edges have different types in the real-world and it is essential to capture this heterogeneity in graph representation learning. Next, relations among nodes are not restricted to pairwise relations and it is necessary to capture the complex relations accordingly. Finally, the absence of structural encodings, such as positional information, deterio- rates the performance of GNNs. This thesis proposes novel methods to address the aforementioned problems:
1. Bayesian Graph Attention Network (BGAT): Developed for situations with scarce data, this method addresses the influence of spurious edges. Incor- porating Bayesian principles into the graph attention mechanism enhances robustness, leading to competitive performance against benchmarks (Chapter 3).
2. Neighbour Contrastive Heterogeneous Graph Attention Network (NC-HGAT): By enhancing a cutting-edge self-supervised heterogeneous graph neural net- work model (HGAT) with neighbour contrastive learning, this method ad- dresses heterogeneity and uncertainty simultaneously. Extra attention to edge relations in heterogeneous graphs also aids in subsequent classification tasks (Chapter 4).
3. A novel ensemble learning framework is introduced for predicting stock price movements. It adeptly captures both group-level and pairwise relations, lead- ing to notable advancements over the existing state-of-the-art. The integration of hypergraph and graph models, coupled with the utilisation of auxiliary data via GNNs before recurrent neural network (RNN), provides a deeper under- standing of long-term dependencies between similar entities in multivariate time series analysis (Chapter 5).
4. A novel framework for graph structure learning is introduced, segmenting graphs into distinct patches. By harnessing the capabilities of transformers and integrating other position encoding techniques, this approach robustly capture intricate structural information within a graph. This results in a more comprehensive understanding of its underlying patterns (Chapter 6)
Online semi-supervised learning in non-stationary environments
Existing Data Stream Mining (DSM) algorithms assume the availability of labelled and
balanced data, immediately or after some delay, to extract worthwhile knowledge from the
continuous and rapid data streams. However, in many real-world applications such as
Robotics, Weather Monitoring, Fraud Detection Systems, Cyber Security, and Computer
Network Traffic Flow, an enormous amount of high-speed data is generated by Internet of
Things sensors and real-time data on the Internet. Manual labelling of these data streams
is not practical due to time consumption and the need for domain expertise. Another
challenge is learning under Non-Stationary Environments (NSEs), which occurs due to
changes in the data distributions in a set of input variables and/or class labels. The problem
of Extreme Verification Latency (EVL) under NSEs is referred to as Initially Labelled Non-Stationary Environment (ILNSE). This is a challenging task because the learning algorithms
have no access to the true class labels directly when the concept evolves. Several approaches
exist that deal with NSE and EVL in isolation. However, few algorithms address both issues
simultaneously. This research directly responds to ILNSEâs challenge in proposing two
novel algorithms âPredictor for Streaming Data with Scarce Labelsâ (PSDSL) and
Heterogeneous Dynamic Weighted Majority (HDWM) classifier. PSDSL is an Online Semi-Supervised Learning (OSSL) method for real-time DSM and is closely related to label
scarcity issues in online machine learning.
The key capabilities of PSDSL include learning from a small amount of labelled data in an
incremental or online manner and being available to predict at any time. To achieve this,
PSDSL utilises both labelled and unlabelled data to train the prediction models, meaning it
continuously learns from incoming data and updates the model as new labelled or
unlabelled data becomes available over time. Furthermore, it can predict under NSE
conditions under the scarcity of class labels. PSDSL is built on top of the HDWM classifier,
which preserves the diversity of the classifiers. PSDSL and HDWM can intelligently switch
and adapt to the conditions. The PSDSL adapts to learning states between self-learning,
micro-clustering and CGC, whichever approach is beneficial, based on the characteristics of
the data stream. HDWM makes use of âseedâ learners of different types in an ensemble to
maintain its diversity. The ensembles are simply the combination of predictive models
grouped to improve the predictive performance of a single classifier.
PSDSL is empirically evaluated against COMPOSE, LEVELIW, SCARGC and MClassification
on benchmarks, NSE datasets as well as Massive Online Analysis (MOA) data streams and real-world datasets. The results showed that PSDSL performed significantly better than
existing approaches on most real-time data streams including randomised data instances.
PSDSL performed significantly better than âStaticâ i.e. the classifier is not updated after it is
trained with the first examples in the data streams. When applied to MOA-generated data
streams, PSDSL ranked highest (1.5) and thus performed significantly better than SCARGC,
while SCARGC performed the same as the Static. PSDSL achieved better average prediction
accuracies in a short time than SCARGC.
The HDWM algorithm is evaluated on artificial and real-world data streams against existing
well-known approaches such as the heterogeneous WMA and the homogeneous Dynamic
DWM algorithm. The results showed that HDWM performed significantly better than WMA
and DWM. Also, when recurring concept drifts were present, the predictive performance of
HDWM showed an improvement over DWM. In both drift and real-world streams,
significance tests and post hoc comparisons found significant differences between
algorithms, HDWM performed significantly better than DWM and WMA when applied to
MOA data streams and 4 real-world datasets Electric, Spam, Sensor and Forest cover. The
seeding mechanism and dynamic inclusion of new base learners in the HDWM algorithms
benefit from the use of both forgetting and retaining the models. The algorithm also
provides the independence of selecting the optimal base classifier in its ensemble depending
on the problem.
A new approach, Envelope-Clustering is introduced to resolve the cluster overlap conflicts
during the cluster labelling process. In this process, PSDSL transforms the centroidsâ
information of micro-clusters into micro-instances and generates new clusters called
Envelopes. The nearest envelope clusters assist the conflicted micro-clusters and
successfully guide the cluster labelling process after the concept drifts in the absence of true
class labels. PSDSL has been evaluated on real-world problem âkeystroke dynamicsâ, and
the results show that PSDSL achieved higher prediction accuracy (85.3%) and SCARGC
(81.6%), while the Static (49.0%) significantly degrades the performance due to changes in
the users typing pattern. Furthermore, the predictive accuracies of SCARGC are found
highly fluctuated between (41.1% to 81.6%) based on different values of parameter âkâ
(number of clusters), while PSDSL automatically determine the best values for this
parameter
Protecting Privacy in Indian Schools: Regulating AI-based Technologies' Design, Development and Deployment
Education is one of the priority areas for the Indian government, where Artificial Intelligence (AI) technologies are touted to bring digital transformation. Several Indian states have also started deploying facial recognition-enabled CCTV cameras, emotion recognition technologies, fingerprint scanners, and Radio frequency identification tags in their schools to provide personalised recommendations, ensure student security, and predict the drop-out rate of students but also provide 360-degree information of a student. Further, Integrating Aadhaar (digital identity card that works on biometric data) across AI technologies and learning and management systems (LMS) renders schools a âpanopticonâ.
Certain technologies or systems like Aadhaar, CCTV cameras, GPS Systems, RFID tags, and learning management systems are used primarily for continuous data collection, storage, and retention purposes. Though they cannot be termed AI technologies per se, they are fundamental for designing and developing AI systems like facial, fingerprint, and emotion recognition technologies. The large amount of student data collected speedily through the former technologies is used to create an algorithm for the latter-stated AI systems. Once algorithms are processed using machine learning (ML) techniques, they learn correlations between multiple datasets predicting each studentâs identity, decisions, grades, learning growth, tendency to drop out, and other behavioural characteristics. Such autonomous and repetitive collection, processing, storage, and retention of student data without effective data protection legislation endangers student privacy.
The algorithmic predictions by AI technologies are an avatar of the data fed into the system. An AI technology is as good as the person collecting the data, processing it for a relevant and valuable output, and regularly evaluating the inputs going inside an AI model. An AI model can produce inaccurate predictions if the person overlooks any relevant data. However, the state, school administrations and parentsâ belief in AI technologies as a panacea to student security and educational development overlooks the context in which âdata practicesâ are conducted. A right to privacy in an AI age is inextricably connected to data practices where data gets âcookedâ. Thus, data protection legislation operating without understanding and regulating such data practices will remain ineffective in safeguarding privacy.
The thesis undergoes interdisciplinary research that enables a better understanding of the interplay of data practices of AI technologies with social practices of an Indian school, which the present Indian data protection legislation overlooks, endangering studentsâ privacy from designing and developing to deploying stages of an AI model. The thesis recommends the Indian legislature frame better legislation equipped for the AI/ML age and the Indian judiciary on evaluating the legality and reasonability of designing, developing, and deploying such technologies in schools
Reliable Sensor Intelligence in Resource Constrained and Unreliable Environment
The objective of this research is to design a sensor intelligence that is reliable in a resource constrained, unreliable environment. There are various sources of variations and uncertainty involved in intelligent sensor system, so it is critical to build reliable sensor intelligence. Many prior works seek to design reliable sensor intelligence by developing robust and reliable task. This thesis suggests that along with improving task itself, task reliability quantification based early warning can further improve sensor intelligence. DNN based early warning generator quantifies task reliability based on spatiotemporal characteristics of input, and the early warning controls sensor parameters and avoids system failure. This thesis presents an early warning generator that predicts task failure due to sensor hardware induced input corruption and controls the sensor operation. Moreover, lightweight uncertainty estimator is presented to take account of DNN model uncertainty in task reliability quantification without prohibitive computation from stochastic DNN. Cross-layer uncertainty estimation is also discussed to consider the effect of PIM variations.Ph.D
The Application of Data Analytics Technologies for the Predictive Maintenance of Industrial Facilities in Internet of Things (IoT) Environments
In industrial production environments, the maintenance of equipment has a decisive influence on costs and on the plannability of production capacities. In particular, unplanned failures during production times cause high costs, unplanned downtimes and possibly additional collateral damage. Predictive Maintenance starts here and tries to predict a possible failure and its cause so early that its prevention can be prepared and carried out in time. In order to be able to predict malfunctions and failures, the industrial plant with its characteristics, as well as wear and ageing processes, must be modelled. Such modelling can be done by replicating its physical properties. However, this is very complex and requires enormous expert knowledge about the plant and about wear and ageing processes of each individual component. Neural networks and machine learning make it possible to train such models using data and offer an alternative, especially when very complex and non-linear behaviour is evident.
In order for models to make predictions, as much data as possible about the condition of a plant and its environment and production planning data is needed. In Industrial Internet of Things (IIoT) environments, the amount of available data is constantly increasing. Intelligent sensors and highly interconnected production facilities produce a steady stream of data. The sheer volume of data, but also the steady stream in which data is transmitted, place high demands on the data processing systems. If a participating system wants to perform live analyses on the incoming data streams, it must be able to process the incoming data at least as fast as the continuous data stream delivers it. If this is not the case, the system falls further and further behind in processing and thus in its analyses. This also applies to Predictive Maintenance systems, especially if they use complex and computationally intensive machine learning models. If sufficiently scalable hardware resources are available, this may not be a problem at first. However, if this is not the case or if the processing takes place on decentralised units with limited hardware resources (e.g. edge devices), the runtime behaviour and resource requirements of the type of neural network used can become an important criterion.
This thesis addresses Predictive Maintenance systems in IIoT environments using neural networks and Deep Learning, where the runtime behaviour and the resource requirements are relevant. The question is whether it is possible to achieve better runtimes with similarly result quality using a new type of neural network. The focus is on reducing the complexity of the network and improving its parallelisability. Inspired by projects in which complexity was distributed to less complex neural subnetworks by upstream measures, two hypotheses presented in this thesis emerged: a) the distribution of complexity into simpler subnetworks leads to faster processing overall, despite the overhead this creates, and b) if a neural cell has a deeper internal structure, this leads to a less complex network. Within the framework of a qualitative study, an overall impression of Predictive Maintenance applications in IIoT environments using neural networks was developed. Based on the findings, a novel model layout was developed named Sliced Long Short-Term Memory Neural Network (SlicedLSTM). The SlicedLSTM implements the assumptions made in the aforementioned hypotheses in its inner model architecture.
Within the framework of a quantitative study, the runtime behaviour of the SlicedLSTM was compared with that of a reference model in the form of laboratory tests. The study uses synthetically generated data from a NASA project to predict failures of modules of aircraft gas turbines. The dataset contains 1,414 multivariate time series with 104,897 samples of test data and 160,360 samples of training data.
As a result, it could be proven for the specific application and the data used that the SlicedLSTM delivers faster processing times with similar result accuracy and thus clearly outperforms the reference model in this respect. The hypotheses about the influence of complexity in the internal structure of the neuronal cells were confirmed by the study carried out in the context of this thesis
Surface EMG-Based Inter-Session/Inter-Subject Gesture Recognition by Leveraging Lightweight All-ConvNet and Transfer Learning
Gesture recognition using low-resolution instantaneous HD-sEMG images opens
up new avenues for the development of more fluid and natural muscle-computer
interfaces. However, the data variability between inter-session and
inter-subject scenarios presents a great challenge. The existing approaches
employed very large and complex deep ConvNet or 2SRNN-based domain adaptation
methods to approximate the distribution shift caused by these inter-session and
inter-subject data variability. Hence, these methods also require learning over
millions of training parameters and a large pre-trained and target domain
dataset in both the pre-training and adaptation stages. As a result, it makes
high-end resource-bounded and computationally very expensive for deployment in
real-time applications. To overcome this problem, we propose a lightweight
All-ConvNet+TL model that leverages lightweight All-ConvNet and transfer
learning (TL) for the enhancement of inter-session and inter-subject gesture
recognition performance. The All-ConvNet+TL model consists solely of
convolutional layers, a simple yet efficient framework for learning invariant
and discriminative representations to address the distribution shifts caused by
inter-session and inter-subject data variability. Experiments on four datasets
demonstrate that our proposed methods outperform the most complex existing
approaches by a large margin and achieve state-of-the-art results on
inter-session and inter-subject scenarios and perform on par or competitively
on intra-session gesture recognition. These performance gaps increase even more
when a tiny amount (e.g., a single trial) of data is available on the target
domain for adaptation. These outstanding experimental results provide evidence
that the current state-of-the-art models may be overparameterized for
sEMG-based inter-session and inter-subject gesture recognition tasks
Posthuman Creative Styling can a creative writerâs style of writing be described as procedural?
This thesis is about creative styling â the styling a creative writer might use to make their writing
unique. It addresses the question as to whether such styling can be described as procedural. Creative
styling is part of the technique a creative writer uses when writing. It is how they make the text more
âlivelyâ by use of tips and tricks they have either learned or discovered. In essence these are rules, ones
the writer accrues over time by their practice. The thesis argues that the use and invention of these
rules can be set as procedures. and so describe creative styling as procedural.
The thesis follows from questioning why it is that machines or algorithms have, so far, been
incapable of producing creative writing which has value. Machine-written novels do not abound on
the bookshelves and writing styled by computers is, on the whole, dull in comparison to human-crafted
literature. It came about by thinking how it would be possible to reach a point where writing by people
and procedural writing are considered to have equal value. For this reason the thesis is set in a
posthuman context, where the differences between machines and people are erased.
The thesis uses practice to inform an original conceptual space model, based on quality dimensions
and dynamic-inter operation of spaces. This model gives an example of the procedures which a
posthuman creative writer uses when engaged in creative styling. It suggests an original formulation
for the conceptual blending of conceptual spaces, based on the casting of qualities from one space to
another. In support of and informing its arguments are ninety-nine examples of creative writing
practice which show the procedures by which style has been applied, created and assessed. It provides
a route forward for further joint research into both computational and human-coded creative writing
Novel Neural Network Applications to Mode Choice in Transportation: Estimating Value of Travel Time and Modelling Psycho-Attitudinal Factors
Whenever researchers wish to study the behaviour of individuals choosing among a set of alternatives, they usually rely on models based on the random utility theory, which postulates that the single individuals modify their behaviour so that they can maximise of their utility. These models, often identified as discrete choice models (DCMs), usually require the definition of the utilities for each alternative, by first identifying the variables influencing the decisions. Traditionally, DCMs focused on observable variables and treated users as optimizing tools with predetermined needs. However, such an approach is in contrast with the results from studies in social sciences which show that choice behaviour can be influenced by psychological factors such as attitudes and preferences. Recently there have been formulations of DCMs which include latent constructs for capturing the impact of subjective factors. These are called hybrid choice models or integrated choice and latent variable models (ICLV). However, DCMs are not exempt from issues, like, the fact that researchers have to choose the variables to include and their relations to define the utilities. This is probably one of the reasons which has recently lead to an influx of numerous studies using machine learning (ML) methods to study mode choice, in which researchers tried to find alternative methods to analyse travellersâ choice behaviour. A ML algorithm is any generic method that uses the data itself to understand and build a model, improving its performance the more it is allowed to learn. This means they do not require any a priori input or hypotheses on the structure and nature of the relationships between the several variables used as its inputs. ML models are usually considered black-box methods, but whenever researchers felt the need for interpretability of ML results, they tried to find alternative ways to use ML methods, like building them by using some a priori knowledge to induce specific constrains. Some researchers also transformed the outputs of ML algorithms so that they could be interpreted from an economic point of view, or built hybrid ML-DCM models. The object of this thesis is that of investigating the benefits and the disadvantages deriving from adopting either DCMs or ML methods to study the phenomenon of mode choice in transportation. The strongest feature of DCMs is the fact that they produce very precise and descriptive results, allowing for a thorough interpretation of their outputs. On the other hand, ML models offer a substantial benefit by being truly data-driven methods and thus learning most relations from the data itself. As a first contribution, we tested an alternative method for calculating the value of travel time (VTT) through the results of ML algorithms. VTT is a very informative parameter to consider, since the time consumed by individuals whenever they need to travel normally represents an undesirable factor, thus they are usually willing to exchange their money to reduce travel times. The method proposed is independent from the mode-choice functions, so it can be applied to econometric models and ML methods equally, if they allow the estimation of individual level probabilities. Another contribution of this thesis is a neural network (NN) for the estimation of choice models with latent variables as an alternative to DCMs. This issue arose from wanting to include in ML models not only level of service variables of the alternatives, and socio-economic attributes of the individuals, but also psycho-attitudinal indicators, to better describe the influence of psychological factors on choice behaviour. The results were estimated by using two different datasets. Since NN results are dependent on the values of their hyper-parameters and on their initialization, several NNs were estimated by using different hyper-parameters to find the optimal values, which were used to verify the stability of the results with different initializations
Enhancing the forensic comparison process of common trace materials through the development of practical and systematic methods
An ongoing advancement in forensic trace evidence has driven the development of new and objective methods for comparing various materials. While many standard guides have been published for use in trace laboratories, different areas require a more comprehensive understanding of error rates and an urgent need for harmonizing methods of examination and interpretation. Two critical areas are the forensic examination of physical fits and the comparison of spectral data, which depend highly on the examinerâs judgment.
The long-term goal of this study is to advance and modernize the comparative process of physical fit examinations and spectral interpretation. This goal is fulfilled through several avenues: 1) improvement of quantitative-based methods for various trace materials, 2) scrutiny of the methods through interlaboratory exercises, and 3) addressing fundamental aspects of the discipline using large experimental datasets, computational algorithms, and statistical analysis.
A substantial new body of knowledge has been established by analyzing population sets of nearly 4,000 items representative of casework evidence. First, this research identifies material-specific relevant features for duct tapes and automotive polymers. Then, this study develops reporting templates to facilitate thorough and systematic documentation of an analystâs decision-making process and minimize risks of bias. It also establishes criteria for utilizing a quantitative edge similarity score (ESS) for tapes and automotive polymers that yield relatively high accuracy (85% to 100%) and, notably, no false positives. Finally, the practicality and performance of the ESS method for duct tape physical fits are evaluated by forensic practitioners through two interlaboratory exercises. Across these studies, accuracy using the ESS method ranges between 95-99%, and again no false positives are reported. The practitionersâ feedback demonstrates the methodâs potential to assist in training and improve peer verifications.
This research also develops and trains computational algorithms to support analysts making decisions on sample comparisons. The automated algorithms in this research show the potential to provide objective and probabilistic support for determining a physical fit and demonstrate comparative accuracy to the analyst. Furthermore, additional models are developed to extract feature edge information from the systematic comparison templates of tapes and textiles to provide insight into the relative importance of each comparison feature. A decision tree model is developed to assist physical fit examinations of duct tapes and textiles and demonstrates comparative performance to the trained analysts. The computational tools also evaluate the suitability of partial sample comparisons that simulate situations where portions of the item are lost or damaged.
Finally, an objective approach to interpreting complex spectral data is presented. A comparison metric consisting of spectral angle contrast ratios (SCAR) is used as a model to assess more than 94 different-source and 20 same-source electrical tape backings. The SCAR metric results in a discrimination power of 96% and demonstrates the capacity to capture information on the variability between different-source samples and the variability within same-source samples. Application of the random-forest model allows for the automatic detection of primary differences between samples. The developed threshold could assist analysts with making decisions on the spectral comparison of chemically similar samples.
This research provides the forensic science community with novel approaches to comparing materials commonly seen in forensic laboratories. The outcomes of this study are anticipated to offer forensic practitioners new and accessible tools for incorporation into current workflows to facilitate systematic and objective analysis and interpretation of forensic materials and support analystsâ opinions
- âŠ