399 research outputs found

    Self-supervised learning for transferable representations

    Get PDF
    Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks

    Synthetic Aperture Radar (SAR) Meets Deep Learning

    Get PDF
    This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports

    Geometric Data Analysis: Advancements of the Statistical Methodology and Applications

    Get PDF
    Data analysis has become fundamental to our society and comes in multiple facets and approaches. Nevertheless, in research and applications, the focus was primarily on data from Euclidean vector spaces. Consequently, the majority of methods that are applied today are not suited for more general data types. Driven by needs from fields like image processing, (medical) shape analysis, and network analysis, more and more attention has recently been given to data from non-Euclidean spaces–particularly (curved) manifolds. It has led to the field of geometric data analysis whose methods explicitly take the structure (for example, the topology and geometry) of the underlying space into account. This thesis contributes to the methodology of geometric data analysis by generalizing several fundamental notions from multivariate statistics to manifolds. We thereby focus on two different viewpoints. First, we use Riemannian structures to derive a novel regression scheme for general manifolds that relies on splines of generalized Bézier curves. It can accurately model non-geodesic relationships, for example, time-dependent trends with saturation effects or cyclic trends. Since Bézier curves can be evaluated with the constructive de Casteljau algorithm, working with data from manifolds of high dimensions (for example, a hundred thousand or more) is feasible. Relying on the regression, we further develop a hierarchical statistical model for an adequate analysis of longitudinal data in manifolds, and a method to control for confounding variables. We secondly focus on data that is not only manifold- but even Lie group-valued, which is frequently the case in applications. We can only achieve this by endowing the group with an affine connection structure that is generally not Riemannian. Utilizing it, we derive generalizations of several well-known dissimilarity measures between data distributions that can be used for various tasks, including hypothesis testing. Invariance under data translations is proven, and a connection to continuous distributions is given for one measure. A further central contribution of this thesis is that it shows use cases for all notions in real-world applications, particularly in problems from shape analysis in medical imaging and archaeology. We can replicate or further quantify several known findings for shape changes of the femur and the right hippocampus under osteoarthritis and Alzheimer's, respectively. Furthermore, in an archaeological application, we obtain new insights into the construction principles of ancient sundials. Last but not least, we use the geometric structure underlying human brain connectomes to predict cognitive scores. Utilizing a sample selection procedure, we obtain state-of-the-art results

    Ανάλυση προωτεομικών δεδομένων απο φασματομετρία μάζας και ενσωμάτωσή τους με άλλα κλινικά και μοριακά δεδομένα σε κλινικά δείγματα και καρκινικές σειρές

    Get PDF
    Οι μοριακοί υπότυποι μιας ασθένειας συχνά συσχετίζονται με διαφορές ως προς την επιβίωση ή πρόοδο της νόσου και άλλοτε ως προς την απόκριση σε συγκεκριμένη θεραπεία. Την τελευταία δεκαετία, μελέτες μοριακής ταξινόμησης του ουροθηλιακού καρκίνου εστιάζουν κυρίως στον διηθητικό τύπο της ασθένειας (~20% των ασθένων στην αρχική διάγνωση) ο οποίος χαρακτηρίζεται από υψηλό κίνδυνο για μετάσταση και χαμηλά ποσοστά πενταετούς επιβίωσης. Οι παραπάνω μελέτες επέτρεψαν την ταυτοποιήση πολλαπλών γενομικών και μεταγραφικών υποτύπων οι οποίοι διαφέρουν ριζικά ως προς το μοριακό τους προφίλ, σχηματίζοντας δύο μεγάλες κατηγορίες: τους basal και τους luminal όγκους. Οι πρώτοι φαίνεται να σχετίζονται με πιο επιθετικούς καρκίνους εμπερικλείοντας όμως ένα σημαντικό ποσοστό ασθενών που ανταποκρίνονται στο βασικό χημειοθεραπευτικό σχήμα. Οι δέυτεροι (luminal) αρχικά προσδιορίστηκαν ως λιγότερο επιθετικοί, επόμενες μελέτες όμως αποκάλυψαν την σημαντική μοριακή ετερογένεια που τους χαρακτηρίζει και που αντανακλάται σε κλινικές παραμέτρους. Σήμερα, πιστέυεται ότι ο διηθητικός καρκίνος της ουροδόχου κύστης ταξινομείται σε 6 βασικούς υποτύπους, αλλά τα δεδομένα που υπάρχουν για να υποστηρίξουν την ένταξη των υποτύπων στην κλινική πράξη είναι ατελή και δεν συμφωνούν μεταξύ τους. Από την άλλη, ο μη διηθητικός τύπος της ασθενεις (~80% των περιπτώσεων στην αρχική διάγνωση) χαρακτηρίζεται από υψηλά ποσοστά υποτροπής και προόδου σε ανώτερο στάδιο καθώς και από σημαντικό δημόσιο οικονομικό κόστος εξαιτίας της αυξημένης συχνότητας παρακολούθησης που απαιτεί. Το μοριακό προφίλ του μη-διηθητικού καρκίνου έχει μελετηθεί σημαντικά λιγότερο από αυτό του διηθητικού, και μέχρι σήμερα υπάρχουν δύο μελέτες που επιχειρούν την ταξινόμησή του σε μοριακούς υποτύπους: η πρώτη στη βάση του μεταγραφώματος, η δέυτερη στη βάση της διακύμνασης αριθμού αντιγράφων. Το πρωτεομικό προφίλ όμως, τόσο του διηθητικού όσο και του μη-διηθητικού καρκίνου της ουροδόχου κύστης, μέχρι και σήμερα έχει μελετηθεί υποτυπωδώς. Σκοπός της παρούσας μελέτης είναι η διερεύνηση της ύπαρξης πρωτεομικών υποτύπων του μη διηθητικού ουροθηλιακού καρκίνου, ο μοριακός χαρακτηρισμός τους, η σχέση τους με προηγούμενα συστήματα ταξινόμησης, καθώς και η ταυτοποίηση απορυθμισμένων πρωτεϊνών και μονοπατιών με δυνητική προγνωστική αξία. Για την εξυπηρέτηση του παραπάνω σκοπού, 117 δείγματα καρκινικού ιστού από ασθενείς που πρωτοδιαγνώσθηκαν με ουροθηλιακό καρκίνο (98 μη-διηθητικό, 19 διηθητικό) συλλέχθησαν και το ολικό πρωτέομά τους απομονώθηκε και αρχικά ποσοτικοποιήθηκε με τη μέθοδο Bradford. Κατόπιν διάσπασης με θρυψίνη, τα πεπτίδια διαχωρίστηκαν σε χρωματογραφική στήλη συνδεδεμένη με φασματογράφο μάζας τύπου Orbitrap. Οι φασματικές πληροφορίες για τα πεπτίδια αναλύθηκαν με το πρόγραμμα Proteome Discoverer θέτοντας FDR (False Discovery Rate) <0.01 και αντιστοιχήθηκαν σε πρωτεινικές ταυτότητες. Η πρωτεϊνική ποσοτικοποίηση έγινε με τη χρήση των τριών πιο άφθονων και μοναδικών πεπτιδίων ανά πρωτεΐνη, ενώ κατόπιν επεξεργασίας τα πρωτεομικά δεδομένα υποβλήθηκαν σε μια σειρά από υπολογιστικές αναλύσεις: μη επιτηρούμενη k-means συσταδοποίηση, ανάλυση κύριων συνιστωσών, ανάλυση για στατιστική σημαντικόντητα πρωτεϊνών, πρωτεϊνικών μονοπατιών, βιολογικών λειτουργιών και γονιδιακής έκφρασης καθώς και στην μοντελοιποίηση ενός μοριακού ταξινομητή Radnom Forest. Μέγιστη σταθερότητα συσταδοποίησης επιτεύχηκε για κ = 3 ομάδες, υποδηλώνοντας την ύπαρξη τριών πρωτεομικών υποτύπων στα δεδομένα. Η ομάδα 1 ήταν η μικρότερη σε μέγεθος (17/98), περιείχε κυρίως καρκίνους υψηλού σταδίου, αλλοίωσης και ρίσκου και παρουσίασε ένα μοριακό φαινότυπο ανοσοδιήθησης με υψηλά επιπέδα των μεταγραφικών παραγόντων STAT1, STAT3 και SND1, καθώς και πρωτεϊνων της αντιγονοπαρουσίασης, υποδηλώνοντας ενεργή ανταλλαγή πληροφοριών μεταξύ του ανοσοποιητικού και των καρκινικών κυττάρων. Παράλληλα, χαρακτηρίζονταν απο υψηλότερες ποσότητες πρωτεϊνών που συμμετέχουν στο κυτταρικό κύκλο, και στη μετάδοση στρεσογόνων σημάτων (αντίδραση μη αναδιπλωμένης πρωτεϊνης και επιδιόρθωση βλαβών του DNA). Η όμαδα 2 συγκέντρωσε ασθενείς με ποικίλα κλινικά χαρακτηριστικά που όμως έφεραν κοινώς, αυξημένες ποσότητες εξωκυττάριων πρωτεϊνών (στρώματος), και χαμηλά επιθηλιακά σήματα. Οι ασθενείς στην ομάδα 3 παρουσίασαν έναν πιο διαφοροποιημένο μοριακό φαινότυπο με υψηλότερα επίπεδα (UPKs και KRT20 κάθως και CDH1) που συμβαδίζει με τα κλινικά χαρακτηριστικά τους αφού οι περισσότεροι διαγιγνώσθηκαν με καρκίνους χαμηλού σταδίου και κινδύνου. Η ανάλυση για ενεργοποιημένα πρωτεϊνικά μονοπάτια έδειξε ότι οι ασθενείς της ομάδας 1 έιχαν ενεργή σηματοδότηση για βιοσυνθετικές διεργασίες, για ιντερφερόνη-γ, και αυξημένη δραστηριότητα των μεταγραφικών παραγόντων MYC και E2F, που ελέγχουν θετικά τον κυτταρικό κύκλο. Από την άλλη οι ασθνενείς της ομάδας 3 σχετίστηκαν με ενεργοποίηση μεταβολικών μονοπατιών όπως αυτό της αποτοξίνωσης μεσολαβούμενο από γλουταθειόνη καθώς και της γλυκογονόλυσης – γλυκόλυσης, αλλά και της απόπτωσης. Συγκρίνοντας το πρωτεομικό προφιλ των ασθένων με μη-διηθητικό καρκίνο με ασθενέις που είχαν διηθητικό καρκίνο χρησιμοποιώντας ανάλυση κύριων συνιστωσών, αποκαλύφθηκε κοντινή σχέση της ομάδας 1 με ασθενείς που έφεραν διηθητικό ουροθηλιακό καρκίνο και αντίστροφα, μακρινή σχέση της ομάδας 3 με τους τελευταίους. Η ομάδα 2 εμφάνισε μεγάλη διασπορά επικαλύπτοντας περιοχές των προηγούμενων δύο ομάδων. Για την επικύρωση των πρωτεομικών αποτελεσμάτων, δεδομένα από μεταγραφικές έρευνες (UROMOL και LUND) αναλύθηκαν αναδρομικά. Στην UROMOL έρευνα επίσης ταυτοποιήθηκαν 3 υπότυποι ο ένας εκ των οποίων συγκέντρωσε τους περισσότερους ασθενείς με πρόδοο σε ανώτερο στάδιο (κακής πρόγνωσης υπότυπος). Συγκριτική ανάλυση μεταξύ των τριών πρωτεομικών ομάδων και των τριών υποτύπων της UROMOL έρευνας με το στατιστικό εργαλείο GSEA, έδειξε στατιστικώς σημαντικές φαινοτυπικές ομοιότητες μεταξύ της πρωτεομικής ομάδας 1 και του υποτύπου «κακής» πρόγνωσης της UROMOL καθώς και μεταξύ της πρωτεομικής ομάδας 3 και του υποτύπου «καλής πρόγνωσης». Χρησιμοποιώντας έναν μη επιτηρούμενο μοριακό ταξινομητή Random Forest, οι υψηλού κινδύνου και χαμηλού κινδύνου φαινότυποι των πρωτεομικών ομάδων 1 και 3, επιβεβαιώθηκαν ύστερα από την ταξινόμηση των ασθενών στους υποτύπους «κακής» και «καλής» πρόγνωσης αντίστοιχα, της UROMOL έρευνας. Στατιστικώς σημαντικες πρωτεΐνες που ξεχωρίζουν αυτές τις δυο ακραίες πρωτεομικές ομάδες αλλά και ταυτόχρονα τον διηθητικό από τον μη διηθητικό καρκίνο βρέθηκαν να διαφέρουν σημαντικά και στο επίπεδο του μεταγραφώματος μεταξύ των ομάδων «κακής» και «καλής» πρόγνωσης σε δύο ανεξάρτητες έρευνες (UROMOL και LUND). Τα παραπάνω μόρια συμμετέχουν σε βιολογικές λειτουργίες-κλειδιά για την ανάπτυξη του μη-διηθητικού καρκίνου, όπως στην επαγωγή αποκρίσεων πρωτεϊνικής σταθερότητας, στη σηματοδότηση κυτοκινών και ιντερφερονών, στην αντιγονοπαρουσίαση, στην επεξεργασία πρώιμων mRNAs, σε μετα-μεταφραστικές τροποποιήσεις αλλά και σε μονοπάτια κυτταρικής αύξησης. Συνολικά, η παρούσα μελέτη ταυτοποιεί τρεις πρωτεομικούς υποτύπους του μη διηθητικού καρκίνου και ακολουθώντας μια σύγκριτική ανάλυση με δύο ανεξάρτητες μεταγραφικές έρευνες, παρέχει ομάδες μορίων που μπορεί να οδηγούν τη πρόοδο του καρκίνου και που χρειάζονται επιπλέον επικύρωση στη κλινική πράξη.DNA/RNA-based classification of Bladder Cancer (BC) supports the existence of multiple molecular subtypes, while investigations at the protein level are scarce. The purpose of this study was to investigate if Non-Muscle Invasive Bladder Cancer (NMIBC) can be stratified to biologically meaningful proteomic groups, to establish associations between the proteomics subtypes and previous transcriptomics classification systems and to characterize the continuum of transcriptomics alterations observed in the different stages of the disease. Subsequently, tissue specimens from 117 patients at primary diagnosis (98 with NMIBC and 19 with MIBC), were processed for high resolution LC-MS/MS analysis. Protein quantification was conducted by utilizing the mean abundance of the top three most abundant unique peptides per protein. The proteomics output was subjected to unsupervised consensus clustering, principal component analysis (PCA), and investigation of subtype-specific features, pathways, and genesets, as well as for the construction and validation of a Random Forest based classifier. NMIBC patients were optimally stratified to 3 proteomic subtypes (classes), differing at size, clinico-pathological and molecular backgrounds: Class 1 (mostly high stage/grade/risk samples) was the smallest in size (17/98) and expressed an immune/inflammatory phenotype, along with features involved in cell proliferation, unfolded protein response and DNA damage response, whereas class 2 (mixed stage/grade/risk composition) presented with an infiltrated/mesenchymal profile. Class 3 was rich in luminal/differentiation markers, in line with its pathological composition (mostly low stage/grade/risk samples). PCA revealed a close proximity of class 1 and conversely, remoteness of class 3 to the proteome of MIBC. Samples from class 2 were distributed in a wider fashion at the rotated space. Comparative analysis with GSEA between the three proteomic classes and the three UROMOL subtypes indicated statistically significant associations between the proteomics class 1 and UROMOL subtype 2 (subtype with a bad prognosis) and also between the proteomics class 3 and UROMOL subtype 1 (subtype with the best prognosis). Utilizing a Random Forest based classifier, the predicted high- and low-risk phenotypes for the proteomic class 1 and class 3, were further supported by their classification into the “progressed” and “non-progressed” subtypes of the UROMOL study, respectively. Statistically significant proteins distinguishing these two extreme classes (1 and 3) and also MIBC from NMIBC samples were found to consistently differ at the mRNA levels between NMIBC “Progressors” and “Non-Progressors” groups of the UROMOL and LUND cohorts. Functional assessment of the observed molecular de-regulations suggested severe pathway alterations at unfolded protein response, cytokine and inferferone-γ signaling, antigen presentation, mRNA processing, post translational modifications and in cell growth/division. Collectively, this study identifies three proteomic NMIBC subtypes and following a cross-omics analysis using transcriptomic data from two independent cohorts, shortlists molecular features potentially driving non-invasive carcinogenesis, meriting further validation in clinical trials

    Towards Object-Centric Scene Understanding

    Get PDF
    Visual perception for autonomous agents continues to attract community attention due to the disruptive technologies and the wide applicability of such solutions. Autonomous Driving (AD), a major application in this domain, promises to revolutionize our approach to mobility while bringing critical advantages in limiting accident fatalities. Fueled by recent advances in Deep Learning (DL), more computer vision tasks are being addressed using a learning paradigm. Deep Neural Networks (DNNs) succeeded consistently in pushing performances to unprecedented levels and demonstrating the ability of such approaches to generalize to an increasing number of difficult problems, such as 3D vision tasks. In this thesis, we address two main challenges arising from the current approaches. Namely, the computational complexity of multi-task pipelines, and the increasing need for manual annotations. On the one hand, AD systems need to perceive the surrounding environment on different levels of detail and, subsequently, take timely actions. This multitasking further limits the time available for each perception task. On the other hand, the need for universal generalization of such systems to massively diverse situations requires the use of large-scale datasets covering long-tailed cases. Such requirement renders the use of traditional supervised approaches, despite the data readily available in the AD domain, unsustainable in terms of annotation costs, especially for 3D tasks. Driven by the AD environment nature and the complexity dominated (unlike indoor scenes) by the presence of other scene elements (mainly cars and pedestrians) we focus on the above-mentioned challenges in object-centric tasks. We, then, situate our contributions appropriately in fast-paced literature, while supporting our claims with extensive experimental analysis leveraging up-to-date state-of-the-art results and community-adopted benchmarks

    Novel neural architectures & algorithms for efficient inference

    Get PDF
    In the last decade, the machine learning universe embraced deep neural networks (DNNs) wholeheartedly with the advent of neural architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, etc. These models have empowered many applications, such as ChatGPT, Imagen, etc., and have achieved state-of-the-art (SOTA) performance on many vision, speech, and language modeling tasks. However, SOTA performance comes with various issues, such as large model size, compute-intensive training, increased inference latency, higher working memory, etc. This thesis aims at improving the resource efficiency of neural architectures, i.e., significantly reducing the computational, storage, and energy consumption of a DNN without any significant loss in performance. Towards this goal, we explore novel neural architectures as well as training algorithms that allow low-capacity models to achieve near SOTA performance. We divide this thesis into two dimensions: \textit{Efficient Low Complexity Models}, and \textit{Input Hardness Adaptive Models}. Along the first dimension, i.e., \textit{Efficient Low Complexity Models}, we improve DNN performance by addressing instabilities in the existing architectures and training methods. We propose novel neural architectures inspired by ordinary differential equations (ODEs) to reinforce input signals and attend to salient feature regions. In addition, we show that carefully designed training schemes improve the performance of existing neural networks. We divide this exploration into two parts: \textsc{(a) Efficient Low Complexity RNNs.} We improve RNN resource efficiency by addressing poor gradients, noise amplifications, and BPTT training issues. First, we improve RNNs by solving ODEs that eliminate vanishing and exploding gradients during the training. To do so, we present Incremental Recurrent Neural Networks (iRNNs) that keep track of increments in the equilibrium surface. Next, we propose Time Adaptive RNNs that mitigate the noise propagation issue in RNNs by modulating the time constants in the ODE-based transition function. We empirically demonstrate the superiority of ODE-based neural architectures over existing RNNs. Finally, we propose Forward Propagation Through Time (FPTT) algorithm for training RNNs. We show that FPTT yields significant gains compared to the more conventional Backward Propagation Through Time (BPTT) scheme. \textsc{(b) Efficient Low Complexity CNNs.} Next, we improve CNN architectures by reducing their resource usage. They require greater depth to generate high-level features, resulting in computationally expensive models. We design a novel residual block, the Global layer, that constrains the input and output features by approximately solving partial differential equations (PDEs). It yields better receptive fields than traditional convolutional blocks and thus results in shallower networks. Further, we reduce the model footprint by enforcing a novel inductive bias that formulates the output of a residual block as a spatial interpolation between high-compute anchor pixels and low-compute cheaper pixels. This results in spatially interpolated convolutional blocks (SI-CNNs) that have better compute and performance trade-offs. Finally, we propose an algorithm that enforces various distributional constraints during training in order to achieve better generalization. We refer to this scheme as distributionally constrained learning (DCL). In the second dimension, i.e., \textit{Input Hardness Adaptive Models}, we introduce the notion of the hardness of any input relative to any architecture. In the first dimension, a neural network allocates the same resources, such as compute, storage, and working memory, for all the inputs. It inherently assumes that all examples are equally hard for a model. In this dimension, we challenge this assumption using input hardness as our reasoning that some inputs are relatively easy for a network to predict compared to others. Input hardness enables us to create selective classifiers wherein a low-capacity network handles simple inputs while abstaining from a prediction on the complex inputs. Next, we create hybrid models that route the hard inputs from the low-capacity abstaining network to a high-capacity expert model. We design various architectures that adhere to this hybrid inference style. Further, input hardness enables us to selectively distill the knowledge of a high-capacity model into a low-capacity model by cleverly discarding hard inputs during the distillation procedure. Finally, we conclude this thesis by sketching out various interesting future research directions that emerge as an extension of different ideas explored in this work

    Scene representation and matching for visual localization in hybrid camera scenarios

    Get PDF
    Scene representation and matching are crucial steps in a variety of tasks ranging from 3D reconstruction to virtual/augmented/mixed reality applications, to robotics, and others. While approaches exist that tackle these tasks, they mostly overlook the issue of efficiency in the scene representation, which is fundamental in resource-constrained systems and for increasing computing speed. Also, they normally assume the use of projective cameras, while performance on systems based on other camera geometries remains suboptimal. This dissertation contributes with a new efficient scene representation method that dramatically reduces the number of 3D points. The approach sets up an optimization problem for the automated selection of the most relevant points to retain. This leads to a constrained quadratic program, which is solved optimally with a newly introduced variant of the sequential minimal optimization method. In addition, a new initialization approach is introduced for the fast convergence of the method. Extensive experimentation on public benchmark datasets demonstrates that the approach produces a compressed scene representation quickly while delivering accurate pose estimates. The dissertation also contributes with new methods for scene matching that go beyond the use of projective cameras. Alternative camera geometries, like fisheye cameras, produce images with very high distortion, making current image feature point detectors and descriptors less efficient, since designed for projective cameras. New methods based on deep learning are introduced to address this problem, where feature detectors and descriptors can overcome distortion effects and more effectively perform feature matching between pairs of fisheye images, and also between hybrid pairs of fisheye and perspective images. Due to the limited availability of fisheye-perspective image datasets, three datasets were collected for training and testing the methods. The results demonstrate an increase of the detection and matching rates which outperform the current state-of-the-art methods

    Undergraduate and Graduate Course Descriptions, 2023 Spring

    Get PDF
    Wright State University undergraduate and graduate course descriptions from Spring 2023

    Towards Video Transformers for Automatic Human Analysis

    Full text link
    [eng] With the aim of creating artificial systems capable of mirroring the nuanced understanding and interpretative powers inherent to human cognition, this thesis embarks on an exploration of the intersection between human analysis and Video Transformers. The objective is to harness the potential of Transformers, a promising architectural paradigm, to comprehend the intricacies of human interaction, thus paving the way for the development of empathetic and context-aware intelligent systems. In order to do so, we explore the whole Computer Vision pipeline, from data gathering, to deeply analyzing recent developments, through model design and experimentation. Central to this study is the creation of UDIVA, an expansive multi-modal, multi-view dataset capturing dyadic face-to-face human interactions. Comprising 147 participants across 188 sessions, UDIVA integrates audio-visual recordings, heart-rate measurements, personality assessments, socio- demographic metadata, and conversational transcripts, establishing itself as the largest dataset for dyadic human interaction analysis up to this date. This dataset provides a rich context for probing the capabilities of Transformers within complex environments. In order to validate its utility, as well as to elucidate Transformers' ability to assimilate diverse contextual cues, we focus on addressing the challenge of personality regression within interaction scenarios. We first adapt an existing Video Transformer to handle multiple contextual sources and conduct rigorous experimentation. We empirically observe a progressive enhancement in model performance as more context is added, reinforcing the potential of Transformers to decode intricate human dynamics. Building upon these findings, the Dyadformer emerges as a novel architecture, adept at long-range modeling of dyadic interactions. By jointly modeling both participants in the interaction, as well as embedding multi- modal integration into the model itself, the Dyadformer surpasses the baseline and other concurrent approaches, underscoring Transformers' aptitude in deciphering multifaceted, noisy, and challenging tasks such as the analysis of human personality in interaction. Nonetheless, these experiments unveil the ubiquitous challenges when training Transformers, particularly in managing overfitting due to their demand for extensive datasets. Consequently, we conclude this thesis with a comprehensive investigation into Video Transformers, analyzing topics ranging from architectural designs and training strategies, to input embedding and tokenization, traversing through multi-modality and specific applications. Across these, we highlight trends which optimally harness spatio-temporal representations that handle video redundancy and high dimensionality. A culminating performance comparison is conducted in the realm of video action classification, spotlighting strategies that exhibit superior efficacy, even compared to traditional CNN-based methods.[cat] Aquesta tesi busca crear sistemes artificials que reflecteixin les habilitats de comprensió i interpretació humanes a través de l'ús de Transformers per a vídeo. L'objectiu és utilitzar aquestes arquitectures per comprendre millor la interacció humana i desenvolupar sistemes intel·ligents i conscients de l'entorn. Això implica explorar àmplies àrees de la Visió per Computador, des de la recopilació de dades fins a l'anàlisi de l'estat de l'art i la prova experimental d'aquests models. Una part essencial d'aquest estudi és la creació d'UDIVA, un ampli conjunt de dades multimodal i multivista que enregistra interaccions humanes cara a cara. Amb 147 participants i 188 sessions, UDIVA inclou contingut audiovisual, freqüència cardíaca, perfils de personalitat, dades sociodemogràfiques i transcripcions de les converses. És el conjunt de dades més gran conegut per a l'anàlisi de la interacció humana diàdica i proporciona un context ric per a l'estudi de les capacitats dels Transformers en entorns complexos. Per tal de validar la seva utilitat i les habilitats dels Transformers, ens centrem en la regressió de la personalitat. Inicialment, adaptem un Transformer de vídeo per integrar diverses fonts de context. Mitjançant experiments exhaustius, observem millores progressives en els resultats amb la inclusió de més context, confirmant la capacitat dels Transformers. Motivats per aquests resultats, desenvolupem el Dyadformer, una arquitectura per interaccions diàdiques de llarga duració. Aquesta nova arquitectura considera simultàniament els dos participants en la interacció i incorpora la multimodalitat en un sol model. El Dyadformer supera la nostra proposta inicial i altres treballs similars, destacant la capacitat dels Transformers per abordar tasques complexes. No obstant això, aquestos experiments revelen reptes d'entrenament dels Transformers, com el sobreajustament, per la seva necessitat de grans conjunts de dades. La tesi conclou amb una anàlisi profunda dels Transformers per a vídeo, incloent dissenys arquitectònics, estratègies d'entrenament, preprocessament de vídeos, tokenització i multimodalitat. S'identifiquen tendències per gestionar la redundància i alta dimensionalitat de vídeos i es realitza una comparació de rendiment en la classificació d'accions a vídeo, destacant estratègies d'eficàcia superior als mètodes tradicionals basats en convolucions

    A Cookbook of Self-Supervised Learning

    Full text link
    Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be
    corecore