46 research outputs found

    Preference-based Representation Learning for Collections

    Get PDF
    In this thesis, I make some contributions to the development of representation learning in the setting of external constraints and noisy supervision. A setting of external constraints refers to the scenario in which the learner is forced to output a latent representation of the given data points while enforcing some particular conditions. These conditions can be geometrical constraints, for example forcing the vector embeddings to be close to each other based on a particular relations, or forcing the embedding vectors to lie in a particular manifold, such as the manifold of vectors whose elements sum to 1, or even more complex constraints. The objects of interest in this thesis are elements of a collection X in an abstract space that is endowed with a similarity function which quantifies how similar two objects are. A collection is defined as a set of items in which the order is ignored but the multiplicity is relevant. Various types of collections are used as inputs or outputs in the machine learning field. The most common are perhaps sequences and sets. Besides studying representation learning approaches in presence of external constraints, in this thesis we tackle the case in which the evaluation of this similarity function is not directly possible. In recent years, the machine learning setting of having only binary answers to some comparisons for tuples of elements has gained interest. Learning good representations from a scenario in which a clear distance information cannot be obtained is of fundamental importance. This problem is opposite to the standard machine learning setting where the similarity function between elements can be directly evaluated. Moreover, we tackle the case in which the learner is given noisy supervision signals, with a certain probability for the label to be incorrect. Another research question that was studied in this thesis is how to assess the quality of the learned representations and how a learner can convey the uncertainty about this representation. After the introductory Chapter 1, the thesis is structured in three main parts. In the first part, I present the results of representation learning based on data points that are sequences. The focus in this part is on sentences and permutations, particular types of sequences. The first contribution of this part consists in enforcing analogical relations between sentences and the second is learning appropriate representations for permutations, which are particular mathematical objects, while using neural networks. The second part of this thesis tackles the question of learning perceptual embeddings from binary and noisy comparisons. In machine learning, this problem is referred as ordinal embedding problem. This part contains two chapters which elaborate two different aspects of the problem: appropriately conveying the uncertainty of the representation and learning the embeddings from aggregated and noisy feedback. Finally the third part of the thesis, contains applications of the findings of the previous part, namely unsupervised alignment of clouds of embedding vectors and entity set extension

    Pseudo-Random Bit Generator Using Chaotic Seed for Cryptographic Algorithm in Data Protection of Electric Power Consumption

    Get PDF
    Cryptographic algorithms have played an important role in information security for protecting privacy. The literature provides evidence that many types of chaotic cryptosystems have been proposed. These chaotic systems encode information to obviate its orbital instability and ergodicity. In this work, a pseudo-random cryptographic generator algorithm with a symmetric key, based on chaotic functions, is proposed. Moreover, the algorithm exploits dynamic simplicity and synchronization to generate encryption sub-keys using unpredictable seeds, extracted from a chaotic zone, in order to increase their level of randomness. Also, it is applied to a simulated electrical energy consumption signal and implemented on a prototype, using low hardware resources, to measure physical variables; hence, the unpredictability degree was statistically analyzed using the resulting cryptogram. It is shown that the pseudo-random sequences produced by the cryptographic key generator have acceptable properties with respect to randomness, which are validated in this paper using National Institute of Standards and Technology (NIST) statistical tests. To complement the evaluation of the encrypted data, the Lena image is coded and its metrics are compared with those reported in the literature, yielding some useful results

    Molecular mechanisms of pathogenesis in Drosophila models of C9orf72 mutation associated ALS/FTD

    Get PDF
    A GGGGCC hexanucleotide repeat expansion within the C9orf72 gene is the most common genetic cause of both amyotrophic lateral sclerosis and frontotemporal dementia (ALS/FTD). Toxicity has been proposed to be due to loss of function of the gene, or by a toxic gain of function, mediated either by the transcription of repetitive sense and antisense RNA molecules, or by translation of RNA into five repetitive dipeptide proteins (DPRs) via repeat associated non-ATG initiated translation. In order to fully assess the role of sense and antisense RNA in vivo, Drosophila models were created where expression of sense or antisense RNA was induced whilst suppressing the formation of DPRs. Despite the formation of cardinal pathological features (RNA binding protein sequestering intranuclear RNA foci) toxicity was not observed in these models suggesting that repeat RNA plays a limited role in disease pathogenesis. When individual DPRs are expressed in Drosophila neurons a strong toxicity is induced by the arginine containing DPRs (poly-GR and poly-PR). To gain insight into the mechanism(s) by which this toxicity occurs, the protein-interactome of these DPRs was investigated in vivo using novel transgenic Drosophila that inducibly express affinity tagged DPR constructs, with identification of interacting proteins using mass spectrometry. In parallel, inclusions of dipeptide proteins were laser-capture microdissected from patient brain tissue and enriched proteins identified by mass spectrometry. The overlap of these datasets suggested that translation may be impaired by the arginine-containing DPRs and methods were adapted to assess the rate of translation in adult Drosophila brains. In parallel, enzymelinked immunosorbent assays (ELISAs) were developed against poly-GR and an abundant non-toxic DPR poly-GP. Measurement of these proteins was performed in various model systems (transfected immortalised cell lines, induced pluripotent stem cell derived neurons, Drosophila models) to confirm the validity of the assays and the potential therapeutic value of interventions

    CP-nets: From Theory to Practice

    Get PDF
    Conditional preference networks (CP-nets) exploit the power of ceteris paribus rules to represent preferences over combinatorial decision domains compactly. CP-nets have much appeal. However, their study has not yet advanced sufficiently for their widespread use in real-world applications. Known algorithms for deciding dominance---whether one outcome is better than another with respect to a CP-net---require exponential time. Data for CP-nets are difficult to obtain: human subjects data over combinatorial domains are not readily available, and earlier work on random generation is also problematic. Also, much of the research on CP-nets makes strong, often unrealistic assumptions, such as that decision variables must be binary or that only strict preferences are permitted. In this thesis, I address such limitations to make CP-nets more useful. I show how: to generate CP-nets uniformly randomly; to limit search depth in dominance testing given expectations about sets of CP-nets; and to use local search for learning restricted classes of CP-nets from choice data

    On Information-centric Resiliency and System-level Security in Constrained, Wireless Communication

    Get PDF
    The Internet of Things (IoT) interconnects many heterogeneous embedded devices either locally between each other, or globally with the Internet. These things are resource-constrained, e.g., powered by battery, and typically communicate via low-power and lossy wireless links. Communication needs to be secured and relies on crypto-operations that are often resource-intensive and in conflict with the device constraints. These challenging operational conditions on the cheapest hardware possible, the unreliable wireless transmission, and the need for protection against common threats of the inter-network, impose severe challenges to IoT networks. In this thesis, we advance the current state of the art in two dimensions. Part I assesses Information-centric networking (ICN) for the IoT, a network paradigm that promises enhanced reliability for data retrieval in constrained edge networks. ICN lacks a lower layer definition, which, however, is the key to enable device sleep cycles and exclusive wireless media access. This part of the thesis designs and evaluates an effective media access strategy for ICN to reduce the energy consumption and wireless interference on constrained IoT nodes. Part II examines the performance of hardware and software crypto-operations, executed on off-the-shelf IoT platforms. A novel system design enables the accessibility and auto-configuration of crypto-hardware through an operating system. One main focus is the generation of random numbers in the IoT. This part of the thesis further designs and evaluates Physical Unclonable Functions (PUFs) to provide novel randomness sources that generate highly unpredictable secrets, on low-cost devices that lack hardware-based security features. This thesis takes a practical view on the constrained IoT and is accompanied by real-world implementations and measurements. We contribute open source software, automation tools, a simulator, and reproducible measurement results from real IoT deployments using off-the-shelf hardware. The large-scale experiments in an open access testbed provide a direct starting point for future research

    Scalable multimedia indexing and similarity search in high dimensionality

    Get PDF
    Orientador: Ricardo da Silva TorresDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A disseminação de grandes coleções de arquivos de imagens, músicas e vídeos tem aumentado a demanda por métodos de indexação e sistemas de recuperação de informações multimídia. No caso de imagens, os sistemas de busca mais promissores são os sistemas baseados no conteúdo, que ao invés de usarem descrições textuais, utilizam vetores de características, que são representações de propriedades visuais, como cor, textura e forma. O emparelhamento dos vetores de características da imagem de consulta e das imagens de uma base de dados é implementado através da busca por similaridade. A sua forma mais comum é a busca pelos k vizinhos mais próximos, ou seja, encontrar os k vetores mais próximos ao vetor da consulta. Em grandes bases de imagens, um índice é indispensável para acelerar essas consultas. O problema é que os vetores de características podem ter muitas dimensões, o que afeta gravemente o desempenho dos métodos de indexação. Acima de 10 dimensões, geralmente é preciso recorrer aos métodos aproximados, sacrificando a eficácia em troca da rapidez. Dentre as diversas soluções propostas, existe uma abordagem baseada em curvas fractais chamadas curvas de preenchimento do espaço. Essas curvas permitem mapear pontos de um espaço multidimensional em uma única dimensão, de maneira que os pontos próximos na curva correspondam a pontos próximos no espaço. O grande problema dessa alternativa é a existência de regiões de descontinuidade nas curvas, pontos próximos dessas regiões não são mapeados próximos na curva. A principal contribuição deste trabalho é um método de indexação de vetores de características de alta dimensionalidade, que utiliza uma curva de preenchimento do espaço e múltiplos representantes para os dados. Esse método, chamado MONORAIL, gera os representantes explorando as propriedades geométricas da curva. Isso resulta em um ganho na eficácia da busca por similaridade, quando comparado com o método de referência. Outra contribuição não trivial deste trabalho é o rigor experimental usado nas comparações: os experimentos foram cuidadosamente projetados para garantir resultados estatisticamente significativos. A escalabilidade do MONORAIL é testada com três bases de dados de tamanhos diferentes, a maior delas com mais de 130 milhões de vetoresAbstract: The spread of large collections of images, videos and music has increased the demand for indexing methods and multimedia information retrieval systems. For images, the most promising search engines are content-based, which instead of using textual annotations, use feature vectors to represent visual properties such as color, texture, and shape. The matching of feature vectors of query image and database images is implemented by similarity search. Its most common form is the k nearest neighbors search, which aims to find the k closest vectors to the query vector. In large image databases, an index structure is essential to speed up those queries. The problem is that the feature vectors may have many dimensions, which seriously affects the performance of indexing methods. For more than 10 dimensions, it is often necessary to use approximate methods to trade-off effectiveness for speed. Among the several solutions proposed, there is an approach based on fractal curves known as space-filling curves. Those curves allow the mapping of a multidimensional space onto a single dimension, so that points near on the curve correspond to points near on the space. The great problem with that alternative is the existence of discontinuity regions on the curves, where points near on those regions are not mapped near on the curve. The main contribution of this dissertation is an indexing method for high-dimensional feature vectors, using a single space-filling curve and multiple surrogates for each data point. That method, called MONORAIL, generates surrogates by exploiting the geometric properties of the curve. The result is a gain in terms of effectiveness of similarity search, when compared to the baseline method. Another non-trivial contribution of this work is the rigorous experimental design used for the comparisons. The experiments were carefully designed to ensure statistically sound results. The scalability of the MONORAIL is tested with three databases of different sizes, the largest one with more than 130 million vectorsMestradoCiência da ComputaçãoMestre em Ciência da Computaçã
    corecore