1,835 research outputs found

    Unsupervised Deep Learning of Visual Representations

    Get PDF
    Interpreting visual signals from complex imagery and video data with a few or no human annotation is challenging yet essential for realising the true values of deep learning techniques in real-world scenarios. During the past decade, deep learning has achieved unprecedented breakthroughs in lots of computer vision fields. Nonetheless, to optimise a large number of parameters in deep neural networks for deriving complex mappings from an input visual space to a discriminative feature representational space, the success of deep learning is heavily relying on a massive amount of human-annotated training data. Collecting such manual annotations are labour-intensive, especially in large-scale that has been proven to be critical to learning generalisable models applicable to new and unseen data. This dramatically limits the usability and scalability of deep learning when being applied in practice. This thesis aims to reduce the reliance of learning deep neural networks on exhaustive human annotations by proposing novel algorithms to learn the underlying visual semantics with insufficient/inadequate manual labels, denoted as generalised unsupervised learning. Based on the different assumptions on the available sources of knowledge used for learning, this thesis studies generalised unsupervised deep learning from four perspectives including learning without any labels by knowledge aggregation from local data structure and knowledge discovery from global data structure, transferring knowledge from relevant labels, and propagating knowledge from incomplete labels. Specifically, novel methods are introduced to address unresolved challenges in these problems as follows: Chapter 3 The first problem is aggregating knowledge from local data structure, which assumes that apparent visual similarities (pixel intensity) among images are encoded in local neighbourhoods in a feature representational space, providing partially the underlying semantic relationships among samples. This thesis studies discriminative representation learning in this problem, aiming to derive visual features which are discriminative in terms of image’s semantic class memberships. This problem is challenging because it is scarcely possible without ground-truth labels to accurately determine reliable neighbourhoods encoding the same underlying class concepts, considering the arbitrarily complex appearance patterns and variations both within and across classes. Existing methods learning from hypothetical inter-sample relationships tend to be error-propagated as the incorrect pairwise supervisions are prone to accumulate across the training process and impact the learned representations. To that end, this thesis proposes to progressively discover sample anchored / centred neighbourhoods to reason and learn the underlying semantic relationships among samples iteratively and accumulatively. Moreover, a novel progressive affinity diffusion process is presented to propagate reliable inter-sample relationships across adjacent neighbourhoods, so as to further identify the within-class visual variation from between-class similarity and bridge the gap between low-level imagery appearance (e.g. pixel intensity) and high-level semantic concepts (e.g. object class memberships). Chapter 4 The second problem is discovering knowledge from global data structure, which makes an assumption that visual similarity among samples of the same semantic classes is generally higher than that of different classes. This thesis investigates deep clustering for solving this problem which simultaneously learns visual features and data grouping without any labels. Existing unsupervised deep learning algorithms fails to benefit from joint representations and partitions learning by either overlooking global class memberships (e.g. contrastive representation learning) or basing on unreliable pseudo labels estimated by updating feature representations that are subject to error-propagation during training. To benefit clustering of images from discriminative visual features derived by a representation learning process, a Semantic Contrastive Learning method is proposed in this thesis, which concurrently optimises both instance visual similarities and cluster decision boundaries to reason about the hypotheses of semantic classes by their consensus. What’s more, based on the observation that assigning visually similar samples into different clusters will implicitly reduce both the intra-cluster compactness and inter-cluster diversity and lead to lower partition confidence, this thesis presents an online deep clustering method named PartItion Confidence mAximisation. It is established on the idea of learning the most semantically plausible data separation by maximising the “global” partition confidence of clustering solution using a novel differentiable partition uncertainty index. Chapter 5 The third problem is transferring knowledge from relevant labels, which assumes the availability of manual labels in relevant domains and the existence of common knowledge shared across domains. This thesis studies transfer clustering in this problem, which aims at learning the semantic class memberships of the unlabelled target data in a novel (target) domain by knowledge transfer from a labelled source domain. Whilst enormous efforts have been made on data annotation during the past decade, accumulating knowledge from existing labelled data to benefit understanding the persistently emerging unlabelled data is intuitively more efficient than exhaustively annotating new data. However, considering the unpredictable changing nature of imagery data distributions, the accumulated pre-learned knowledge does not transfer well without making strong assumptions about the learned source and the novel target domains, e.g. from domain adaptation to zero-shot and few-shot learning. To address this problem and effectively transfer knowledge between domains that are different in both data distributions and label spaces, this thesis proposes a self-SUPervised REMEdy method to align knowledge of domains by learning jointly from the intrinsically available relative (pairwise) imagery information in the unlabelled target domain and the prior-knowledge learned from the labelled source domain, so as to benefit from both transfer and self-supervised learning. Chapter 6 The last problem is propagating knowledge from incomplete labels, with the assumption that incomplete labels (e.g. collective or inexact) are usually easier to be collected and available but tend to be less reliable. This thesis investigates video activity localisation in this problem to locate a short moment (video segment) in an untrimmed and unstructured video according to a natural language query. To derive discriminative representations of video segments to accurately match with sentences, a temporal annotation of the precise start/end frame indices of each target moments are usually required. However, such temporal labels are not only harder to be collected than pairing videos with sentences as they require carefully going through videos frame-by-frame, but also subject to labelling uncertainty due to the intrinsic ambiguity in a video activity’s boundary. To reduce annotation cost for deriving universal visual-textual correlations, a Cross-sentence Relations Mining method is introduced in this thesis to align video segments and query sentences when only a paragraph description of activities (collective label) in a video is available but not per-sentence temporal labels. This is accomplished by exploring cross-sentence relationships in a paragraph as constraints to better interpret and match complex moment-wise temporal and semantic relationships in videos. Moreover, this thesis also studies the problem of propagating knowledge to avoid the negative impacts of inexact labels. To that end, an Elastic Moment Bounding method is proposed, which accommodates flexible and adaptive activity temporal boundaries towards modelling universal video-text correlations with tolerance to underlying temporal uncertainties in pre-fixed human annotations

    Learning Meta-features for AutoML

    Get PDF
    International audienceThis paper tackles the AutoML problem, aimed to automatically select an ML algorithm and its hyper-parameter configuration most appropriate to the dataset at hand. The proposed approach, MetaBu, learns new meta-features via an Optimal Transport procedure, aligning the manually designed meta-features with the space of distributions on the hyper-parameter configurations. MetaBu meta-features, learned once and for all, induce a topology on the set of datasets that is exploited to define a distribution of promising hyper-parameter configurations amenable to AutoML. Experiments on the OpenML CC-18 benchmark demonstrate that using MetaBu meta-features boosts the performance of state of the art AutoML systems, (Feurer et al. 2015) and Probabilistic Matrix Factorization (Fusi et al. 2018). Furthermore, the inspection of MetaBu meta-features gives some hints into when an ML algorithm does well. Finally, the topology based on MetaBu meta-features enables to estimate the intrinsic dimensionality of the OpenML benchmark w.r.t. a given ML algorithm or pipeline

    Automatic Landmarking for Non-cooperative 3D Face Recognition

    Get PDF
    This thesis describes a new framework for 3D surface landmarking and evaluates its performance for feature localisation on human faces. This framework has two main parts that can be designed and optimised independently. The first one is a keypoint detection system that returns positions of interest for a given mesh surface by using a learnt dictionary of local shapes. The second one is a labelling system, using model fitting approaches that establish a one-to-one correspondence between the set of unlabelled input points and a learnt representation of the class of object to detect. Our keypoint detection system returns local maxima over score maps that are generated from an arbitrarily large set of local shape descriptors. The distributions of these descriptors (scalars or histograms) are learnt for known landmark positions on a training dataset in order to generate a model. The similarity between the input descriptor value for a given vertex and a model shape is used as a descriptor-related score. Our labelling system can make use of both hypergraph matching techniques and rigid registration techniques to reduce the ambiguity attached to unlabelled input keypoints for which a list of model landmark candidates have been seeded. The soft matching techniques use multi-attributed hyperedges to reduce ambiguity, while the registration techniques use scale-adapted rigid transformation computed from 3 or more points in order to obtain one-to-one correspondences. Our final system achieves better or comparable (depending on the metric) results than the state-of-the-art while being more generic. It does not require pre-processing such as cropping, spike removal and hole filling and is more robust to occlusion of salient local regions, such as those near the nose tip and inner eye corners. It is also fully pose invariant and can be used with kinds of objects other than faces, provided that labelled training data is available

    A cost-sensitive decision tree learning algorithm based on a multi-armed bandit framework

    Get PDF
    This paper develops a new algorithm for inducing cost-sensitive decision trees that is inspired by the multi-armed bandit problem, in which a player in a casino has to decide which slot machine (bandit) from a selection of slot machines is likely to pay out the most. Game Theory proposes a solution to this multi-armed bandit problem by using a process of exploration and exploitation in which reward is maximized. This paper utilizes these concepts to develop a new algorithm by viewing the rewards as a reduction in costs, and utilizing the exploration and exploitation techniques so that a compromise between decisions based on accuracy and decisions based on costs can be found. The algorithm employs the notion of lever pulls in the multi-armed bandit game to select the attributes during decision tree induction, using a look-ahead methodology to explore potential attributes and exploit the attributes which maximizes the reward. The new algorithm is evaluated on fifteen datasets and compared to six well-known algorithms J48, EG2, MetaCost, AdaCostM1, ICET and ACT. The results obtained show that the new multi-armed based algorithm can produce more cost-effective trees without compromising accuracy. The paper also includes a critical appraisal of the limitations of the new algorithm and proposes avenues for further research

    Persistent Homology Tools for Image Analysis

    Get PDF
    Topological Data Analysis (TDA) is a new field of mathematics emerged rapidly since the first decade of the century from various works of algebraic topology and geometry. The goal of TDA and its main tool of persistent homology (PH) is to provide topological insight into complex and high dimensional datasets. We take this premise onboard to get more topological insight from digital image analysis and quantify tiny low-level distortion that are undetectable except possibly by highly trained persons. Such image distortion could be caused intentionally (e.g. by morphing and steganography) or naturally in abnormal human tissue/organ scan images as a result of onset of cancer or other diseases. The main objective of this thesis is to design new image analysis tools based on persistent homological invariants representing simplicial complexes on sets of pixel landmarks over a sequence of distance resolutions. We first start by proposing innovative automatic techniques to select image pixel landmarks to build a variety of simplicial topologies from a single image. Effectiveness of each image landmark selection demonstrated by testing on different image tampering problems such as morphed face detection, steganalysis and breast tumour detection. Vietoris-Rips simplicial complexes constructed based on the image landmarks at an increasing distance threshold and topological (homological) features computed at each threshold and summarized in a form known as persistent barcodes. We vectorise the space of persistent barcodes using a technique known as persistent binning where we demonstrated the strength of it for various image analysis purposes. Different machine learning approaches are adopted to develop automatic detection of tiny texture distortion in many image analysis applications. Homological invariants used in this thesis are the 0 and 1 dimensional Betti numbers. We developed an innovative approach to design persistent homology (PH) based algorithms for automatic detection of the above described types of image distortion. In particular, we developed the first PH-detector of morphing attacks on passport face biometric images. We shall demonstrate significant accuracy of 2 such morph detection algorithms with 4 types of automatically extracted image landmarks: Local Binary patterns (LBP), 8-neighbour super-pixels (8NSP), Radial-LBP (R-LBP) and centre-symmetric LBP (CS-LBP). Using any of these techniques yields several persistent barcodes that summarise persistent topological features that help gaining insights into complex hidden structures not amenable by other image analysis methods. We shall also demonstrate significant success of a similarly developed PH-based universal steganalysis tool capable for the detection of secret messages hidden inside digital images. We also argue through a pilot study that building PH records from digital images can differentiate breast malignant tumours from benign tumours using digital mammographic images. The research presented in this thesis creates new opportunities to build real applications based on TDA and demonstrate many research challenges in a variety of image processing/analysis tasks. For example, we describe a TDA-based exemplar image inpainting technique (TEBI), superior to existing exemplar algorithm, for the reconstruction of missing image regions

    The role of public space in post-war reconstruction: the case of the redevelopment of Beirut city centre, Lebanon

    Get PDF
    This research emerges from the author's observations and from concerns shared by many local and international architects, urban designers and planners about policies and strategies adopted to reconstruct the city centre of Beirut following the Lebanese civil war (1975- 1990). Post -war reconstruction needs to be seen as a process that carefully restores and preserves the urban fabric as well as culture and heritage, and it should not be perceived as continuation of war by different means.A major postulate of this thesis is that post -war reconstruction is not just a physical phenomenon and needs to follow a holistic perspective that fulfils people's needs, perceptions and values. In other words, it is the unification of four attributes: the physical, socio- cultural, perceptual and functional attributes. Public space imbued with these attributes, in which they have interrelated relationships perceived through the transactional perspective to be a holistic phenomenon. Space can be used to guide the ongoing process of post -war reconstruction, as well as the natural evolution and transformation of the environment. The research assumes that shared identity, cultural continuity and collective memory can be achieved through the transaction of people in the space. To fulfil the thesis objectives, theories and principles on public space are reviewed and examined. A contextual review of the war and post -war period of the city centre of Beirut uncovers major concerns regarding its reconstruction policies and strategies.Public opinion and preferences are elicited using an open -ended questionnaire. Cognitive mapping is also used to examine the collective memory of people about the city centre and its spaces. A comparative spatial analysis is also employed to identify changes in accessibility and integration levels between the pre and post -war spaces.The consequence of the research outcome confirms that public space, through the transaction of people, provides the principles, qualities and meanings that respond to the authentic cultural forces and shared values of people, and the civic character of the city, which existed before the war and can still be seen shaping life today.The thesis, however, follows a logical progression of four interrelated parts. These are:PART ONE includes two chapters. Chapter One reviews a wide spectrum of literature on urban design principles. Chapter Two introduces attributes of public space.PART TWO comprises two chapters. Chapter Three focuses on reviewing the historical evolution of the old settlement of Beirut and its spaces, while Chapter Four outlines the implications caused by the civil war and its post -war reconstruction.PART THREE introduces the empirical work of the research in three chapters. Chapter Five reviews and analyses the questionnaire survey responses and results of 37 respondents. Chapter Six analyses the cognitive maps of the respondents using Lynch's five elements of The Image of the City. Chapter Seven presents the spatial analysis of the city centre of Beirut using space syntax (visibility graph analysis technique).PART FOUR is the concluding chapter. Chapter Eight examines the research findings and restates the thesis approach by proposing a framework for implementation and outlining its major characteristics

    A retinal vasculature tracking system guided by a deep architecture

    Get PDF
    Many diseases such as diabetic retinopathy (DR) and cardiovascular diseases show their early signs on retinal vasculature. Analysing the vasculature in fundus images may provide a tool for ophthalmologists to diagnose eye-related diseases and to monitor their progression. These analyses may also facilitate the discovery of new relations between changes on retinal vasculature and the existence or progression of related diseases or to validate present relations. In this thesis, a data driven method, namely a Translational Deep Belief Net (a TDBN), is adapted to vasculature segmentation. The segmentation performance of the TDBN on low resolution images was found to be comparable to that of the best-performing methods. Later, this network is used for the implementation of super-resolution for the segmentation of high resolution images. This approach provided an acceleration during segmentation, which relates to down-sampling ratio of an input fundus image. Finally, the TDBN is extended for the generation of probability maps for the existence of vessel parts, namely vessel interior, centreline, boundary and crossing/bifurcation patterns in centrelines. These probability maps are used to guide a probabilistic vasculature tracking system. Although segmentation can provide vasculature existence in a fundus image, it does not give quantifiable measures for vasculature. The latter has more practical value in medical clinics. In the second half of the thesis, a retinal vasculature tracking system is presented. This system uses Particle Filters to describe vessel morphology and topology. Apart from previous studies, the guidance for tracking is provided with the combination of probability maps generated by the TDBN. The experiments on a publicly available dataset, REVIEW, showed that the consistency of vessel widths predicted by the proposed method was better than that obtained from observers. Moreover, very noisy and low contrast vessel boundaries, which were hardly identifiable to the naked eye, were accurately estimated by the proposed tracking system. Also, bifurcation/crossing locations during the course of tracking were detected almost completely. Considering these promising initial results, future work involves analysing the performance of the tracking system on automatic detection of complete vessel networks in fundus images.Open Acces

    Metropolitan landscape and urban art. The case of Tehran

    Get PDF
    Paesaggio metropolitano e l'arte urbano. Il caso di Teheran Il paesaggio urbano di Teheran attualmente presenta problemi significativi connessi con un rapido sviluppo dell'arte urbana contemporanea. È quindi parso necessario analizzare approfonditamente il ruolo che mirati progetti di urban art possono avere per superare tali problemi contribuendo al miglioramento del paesaggio urbano nel suo complesso. In questa tesi, l'arte urbana non è quindi considerata come un fattore decorativo, ma è una parte del progetto di paesaggio urbano che coinvolge la dimensione soggettiva, “rispetto all'impatto positivo sui cittadini” e la dimensione oggettiva, “la qualità del paesaggio urbano” studiando il contesto urbano e sociale dell'area/quartiere. Questa ricerca mira, per tanto, a definire un progetto di arte urbana esplicitamente finalizzato a migliorare il paesaggio metropolitano. In particolare, la tesi inizia con una panoramica sull'arte urbana e il suo background in rapporto con i paesaggi metropolitani, per approdare a inquadrare la problematica scientifica. Uno sguardo più da vicino, ai casi di Berlino e Roma ha evidenziato che esistono progetti di arte urbana che apportano benefici significativi a paesaggi metropolitani. Infatti, negli ultimi venti anni sono state messe in campo sia la formulazione di progetti di arte urbana come stimolo alla crescita culturale nel caso di Roma, e l'introduzione dell'arte urbana come vera e propria strategia di sviluppo metropolitano nel caso di Berlino. In parallelo, la ricerca è stata condotta attraverso la raccolta e l'analisi dei dati utili a mettere in luce le problematiche più critiche del paesaggio urbano di Teheran. In questo quadro, l’obiettivo del lavoro è stato quello di definire un piano per determinare un metodo, classificare i problemi ed effettuare confronti mirati anche sulla base dei dati forniti da Roma, Berlino e Teheran. Si è quindi proceduto descrivendo le diverse problematiche di Teheran, e si è valutato il modo di usare gli apprendimenti derivanti dall’analisi dei progetti di arte urbana a Roma e Berlino. Il risultato del lavoro include strategie, linee guida e indicazioni operative. Le strategie evidenziano come l'arte urbana potrebbe aiutare a superare i problemi del paesaggio urbano di Teheran e in quale modo potrebbero essere utilizzati. Le strategie si dividono in: 1- Rigenerazione urbana: Spazio urbano abbandonato, Area degradata, e Spazi perduti: rilancio dell'identità urbana e leggibilità; 2- Creazione dello spazio aperto dinamico e creativo; 3- Comunicazione tra cittadino e città; 4-Funzione culturale e sociale: hotspots criminalità urbana e disuguaglianza spaziale/ingiustizia spaziale; 5- Benefici economici e turistici. Le linee guida descrivono il modo di utilizzare le opere d'arte urbane e fanno riferimento a: 1-Approccio sistematico del posizionamento, 2- Caratteristica di comunicazione, 3- Qualità del progetto e procedure di selezione, 4-Necessità amministrative: Concorsi e Commissioni, 5-Creazione dell'identità, 6-Attuazione della conservazione e della manutenzione. I suggerimenti proposti sono due metodi che potrebbero essere di supporto nella pratica dell'arte urbana a Teheran e contengono: 1- posizione di muri legali 2- progetto Co- creazione. Infine, è stata anche elaborata una simulazione di un progetto di arte urbana nel quartiere studiato Pamenar, di Teheran costruito con riferimento alle conoscenze acquisite
    corecore