460 research outputs found

    SSL Framework for Causal Inconsistency between Structures and Representations

    Full text link
    The cross-pollination of deep learning and causal discovery has catalyzed a burgeoning field of research seeking to elucidate causal relationships within non-statistical data forms like images, videos, and text. Such data, often being named `indefinite data', exhibit unique challenges-inconsistency between causal structure and representation, which are not common in conventional data forms. To tackle this issue, we theoretically develop intervention strategies suitable for indefinite data and derive causal consistency condition (CCC). Moreover, we design a self-supervised learning (SSL) framework that considers interventions as `views' and CCC as a `philosophy' with two implement examples on Supervised Specialized Models (SSMs) and Large Language Models (LLMs), respectively. To evaluate pure inconsistency manifestations, we have prepared the first high-quality causal dialogue dataset-Causalogue. Evaluations are also performed on three other downstream tasks. Extensive experimentation has substantiated the efficacy of our methodology, illuminating how CCC could potentially play an influential role in various fields

    On orthogonal projections for dimension reduction and applications in augmented target loss functions for learning problems

    Get PDF
    The use of orthogonal projections on high-dimensional input and target data in learning frameworks is studied. First, we investigate the relations between two standard objectives in dimension reduction, preservation of variance and of pairwise relative distances. Investigations of their asymptotic correlation as well as numerical experiments show that a projection does usually not satisfy both objectives at once. In a standard classification problem we determine projections on the input data that balance the objectives and compare subsequent results. Next, we extend our application of orthogonal projections to deep learning tasks and introduce a general framework of augmented target loss functions. These loss functions integrate additional information via transformations and projections of the target data. In two supervised learning problems, clinical image segmentation and music information classification, the application of our proposed augmented target loss functions increase the accuracy

    Unsupervised learning of Arabic non-concatenative morphology

    Get PDF
    Unsupervised approaches to learning the morphology of a language play an important role in computer processing of language from a practical and theoretical perspective, due their minimal reliance on manually produced linguistic resources and human annotation. Such approaches have been widely researched for the problem of concatenative affixation, but less attention has been paid to the intercalated (non-concatenative) morphology exhibited by Arabic and other Semitic languages. The aim of this research is to learn the root and pattern morphology of Arabic, with accuracy comparable to manually built morphological analysis systems. The approach is kept free from human supervision or manual parameter settings, assuming only that roots and patterns intertwine to form a word. Promising results were obtained by applying a technique adapted from previous work in concatenative morphology learning, which uses machine learning to determine relatedness between words. The output, with probabilistic relatedness values between words, was then used to rank all possible roots and patterns to form a lexicon. Analysis using trilateral roots resulted in correct root identification accuracy of approximately 86% for inflected words. Although the machine learning-based approach is effective, it is conceptually complex. So an alternative, simpler and computationally efficient approach was then devised to obtain morpheme scores based on comparative counts of roots and patterns. In this approach, root and pattern scores are defined in terms of each other in a mutually recursive relationship, converging to an optimized morpheme ranking. This technique gives slightly better accuracy while being conceptually simpler and more efficient. The approach, after further enhancements, was evaluated on a version of the Quranic Arabic Corpus, attaining a final accuracy of approximately 93%. A comparative evaluation shows this to be superior to two existing, well used manually built Arabic stemmers, thus demonstrating the practical feasibility of unsupervised learning of non-concatenative morphology

    Information Extraction and Modeling from Remote Sensing Images: Application to the Enhancement of Digital Elevation Models

    Get PDF
    To deal with high complexity data such as remote sensing images presenting metric resolution over large areas, an innovative, fast and robust image processing system is presented. The modeling of increasing level of information is used to extract, represent and link image features to semantic content. The potential of the proposed techniques is demonstrated with an application to enhance and regularize digital elevation models based on information collected from RS images

    Holistic interpretation of visual data based on topology:semantic segmentation of architectural facades

    Get PDF
    The work presented in this dissertation is a step towards effectively incorporating contextual knowledge in the task of semantic segmentation. To date, the use of context has been confined to the genre of the scene with a few exceptions in the field. Research has been directed towards enhancing appearance descriptors. While this is unarguably important, recent studies show that computer vision has reached a near-human level of performance in relying on these descriptors when objects have stable distinctive surface properties and in proper imaging conditions. When these conditions are not met, humans exploit their knowledge about the intrinsic geometric layout of the scene to make local decisions. Computer vision lags behind when it comes to this asset. For this reason, we aim to bridge the gap by presenting algorithms for semantic segmentation of building facades making use of scene topological aspects. We provide a classification scheme to carry out segmentation and recognition simultaneously.The algorithm is able to solve a single optimization function and yield a semantic interpretation of facades, relying on the modeling power of probabilistic graphs and efficient discrete combinatorial optimization tools. We tackle the same problem of semantic facade segmentation with the neural network approach.We attain accuracy figures that are on-par with the state-of-the-art in a fully automated pipeline.Starting from pixelwise classifications obtained via Convolutional Neural Networks (CNN). These are then structurally validated through a cascade of Restricted Boltzmann Machines (RBM) and Multi-Layer Perceptron (MLP) that regenerates the most likely layout. In the domain of architectural modeling, there is geometric multi-model fitting. We introduce a novel guided sampling algorithm based on Minimum Spanning Trees (MST), which surpasses other propagation techniques in terms of robustness to noise. We make a number of additional contributions such as measure of model deviation which captures variations among fitted models

    Learning-based Segmentation for Connectomics

    Get PDF
    Recent advances in electron microscopy techniques make it possible to acquire highresolution, isotropic volume images of neural circuitry. In connectomics, neuroscientists seek to obtain the circuit diagram involving all neurons and synapses in such a volume image. Mapping neuron connectivity requires tracing each and every neural process through terabytes of image data. Due to the size and complexity of these volume images, fully automated analysis methods are desperately needed. In this thesis, I consider automated, machine learning-based neurite segmentation approaches based on a simultaneous merge decision of adjacent supervoxels. - Given a learned likelihood of merging adjacent supervoxels, Chapter 4 adapts a probabilistic graphical model which ensures that merge decisions are consistent and the surfaces of final segments are closed. This model can be posed as a multicut optimization problem and is solved with the cutting-plane method. In order to scale to large datasets, a fast search for (and good choice of) violated cycle constraints is crucial. Quantitative experiments show that the proposed closed-surface regularization significantly improves segmentation performance. - In Chapter 5, I investigate whether the edge weights of the previous model can be chosen to minimize the loss with respect to non-local segmentation quality measures (e.g. Rand Index). Suitable w are obtained from a structured learning approach. In the Structured Support Vector Machine formulation, a novel fast enumeration scheme is used to find the most violated constraint. Quantitative experiments show that structured learning can improve upon unstructured methods. Furthermore, I introduce a new approximate, hierarchical and blockwise optimization approach for large-scale multicut segmentation. Using this method, high-quality approximate solutions for large problem instances are found quickly. - Chapter 6 introduces another novel approximate scheme for multicut segmentation -- Cut, Glue&Cut -- which is based on the move-making paradigm. First, the graph is recursively partitioned into small regions (cut phase). Then, for any two adjacent regions, alternative cuts of these two regions define possible moves (glue&cut phase). The proposed algorithm finds segmentations that are { as measured by a loss function { as close to the ground-truth as the global optimum found by exact solvers, while being significantly faster than existing methods. - In order to jointly label resulting segments as well as to label the boundaries between segments, Chapter 7 proposes the Asymmetric Multi-way Cut model, a variant of Multi-way Cut. In this new model, within-class cuts are allowed for some labels, while being forbidden for other labels. Qualitative experiments show when such a formulation can be beneficial. In particular, an application to joint neurite and cell organelle labeling in EM volume images is discussed. - Custom software tools that can cope with the large data volumes common in the field of connectomics are a prerequisite for the implementation and evaluation of novel segmentation techniques. Chapter 3 presents version 1.0 of ilastik, a joint effort of multiple researchers. I have co-written its volume viewing component, volumina. ilastik provides an interactive pixel classification work ow on largerthan-RAM datasets as well as a semi-automated segmentation module useful for acquiring gold standard segmentations. Furthermore, I describe new software for dealing with hierarchies of cell complexes as well as for blockwise image processing operations on large datasets. The different segmentation methods presented in this thesis provide a promising direction towards reaching the required reliability as well as the required data throughput necessary for connectomics applications

    Object-based video representations: shape compression and object segmentation

    Get PDF
    Object-based video representations are considered to be useful for easing the process of multimedia content production and enhancing user interactivity in multimedia productions. Object-based video presents several new technical challenges, however. Firstly, as with conventional video representations, compression of the video data is a requirement. For object-based representations, it is necessary to compress the shape of each video object as it moves in time. This amounts to the compression of moving binary images. This is achieved by the use of a technique called context-based arithmetic encoding. The technique is utilised by applying it to rectangular pixel blocks and as such it is consistent with the standard tools of video compression. The blockbased application also facilitates well the exploitation of temporal redundancy in the sequence of binary shapes. For the first time, context-based arithmetic encoding is used in conjunction with motion compensation to provide inter-frame compression. The method, described in this thesis, has been thoroughly tested throughout the MPEG-4 core experiment process and due to favourable results, it has been adopted as part of the MPEG-4 video standard. The second challenge lies in the acquisition of the video objects. Under normal conditions, a video sequence is captured as a sequence of frames and there is no inherent information about what objects are in the sequence, not to mention information relating to the shape of each object. Some means for segmenting semantic objects from general video sequences is required. For this purpose, several image analysis tools may be of help and in particular, it is believed that video object tracking algorithms will be important. A new tracking algorithm is developed based on piecewise polynomial motion representations and statistical estimation tools, e.g. the expectationmaximisation method and the minimum description length principle
    • 

    corecore