46 research outputs found
Preference-based Representation Learning for Collections
In this thesis, I make some contributions to the development of representation learning in the setting of external constraints and noisy supervision. A setting of external constraints refers to the scenario in which the learner is forced to output a latent representation of the given data points while enforcing some particular conditions. These conditions can be geometrical constraints, for example forcing the vector embeddings to be close to each other based on a particular relations, or forcing the embedding vectors to lie in a particular manifold, such as the manifold of vectors whose elements sum to 1, or even more complex constraints. The objects of interest in this thesis are elements of a collection X in an abstract space that is endowed with a similarity function which quantifies how similar two objects are. A collection is defined as a set of items in which the order is ignored but the multiplicity is relevant. Various types of collections are used as inputs or outputs in the machine learning field. The most common are perhaps sequences and sets.
Besides studying representation learning approaches in presence of external constraints, in this thesis we tackle the case in which the evaluation of this similarity function is not directly possible. In recent years, the machine learning setting of having only binary answers to some comparisons for tuples of elements has gained interest. Learning good representations from a scenario in which a clear distance information cannot be obtained is of fundamental importance. This problem is opposite to the standard machine learning setting where the similarity function between elements can be directly evaluated. Moreover, we tackle the case in which the learner is given noisy supervision signals, with a certain probability for the label to be incorrect. Another research question that was studied in this thesis is how to assess the quality of the learned representations and how a learner can convey the uncertainty about this representation.
After the introductory Chapter 1, the thesis is structured in three main parts. In the first part, I present the results of representation learning based on data points that are sequences. The focus in this part is on sentences and permutations, particular types of sequences. The first contribution of this part consists in enforcing analogical relations between sentences and the second is learning appropriate representations for permutations, which are particular mathematical objects, while using neural networks. The second part of this thesis tackles the question of learning perceptual embeddings from binary and noisy comparisons. In machine learning, this problem is referred as ordinal embedding problem. This part contains two chapters which elaborate two different aspects of the problem: appropriately conveying the uncertainty of the representation and learning the embeddings from aggregated and noisy feedback. Finally the third part of the thesis, contains applications of the findings of the previous part, namely unsupervised alignment of clouds of embedding vectors and entity set extension
Pseudo-Random Bit Generator Using Chaotic Seed for Cryptographic Algorithm in Data Protection of Electric Power Consumption
Cryptographic algorithms have played an important role in information security for protecting privacy. The literature provides evidence that many types of chaotic cryptosystems have been proposed. These chaotic systems encode information to obviate its orbital instability and ergodicity. In this work, a pseudo-random cryptographic generator algorithm with a symmetric key, based on chaotic functions, is proposed. Moreover, the algorithm exploits dynamic simplicity and synchronization to generate encryption sub-keys using unpredictable seeds, extracted from a chaotic zone, in order to increase their level of randomness. Also, it is applied to a simulated electrical energy consumption signal and implemented on a prototype, using low hardware resources, to measure physical variables; hence, the unpredictability degree was statistically analyzed using the resulting cryptogram. It is shown that the pseudo-random sequences produced by the cryptographic key generator have acceptable properties with respect to randomness, which are validated in this paper using National Institute of Standards and Technology (NIST) statistical tests. To complement the evaluation of the encrypted data, the Lena image is coded and its metrics are compared with those reported in the literature, yielding some useful results
Molecular mechanisms of pathogenesis in Drosophila models of C9orf72 mutation associated ALS/FTD
A GGGGCC hexanucleotide repeat expansion within the C9orf72 gene is the most common genetic cause of both amyotrophic lateral sclerosis and frontotemporal dementia (ALS/FTD). Toxicity has been proposed to be due to loss of function of the gene, or by a toxic gain of function, mediated either by the transcription of repetitive sense and antisense RNA molecules, or by translation of RNA into five repetitive dipeptide proteins (DPRs) via repeat associated non-ATG initiated translation. In order to fully assess the role of sense and antisense RNA in vivo, Drosophila models were created where expression of sense or antisense RNA was induced whilst suppressing the formation of DPRs. Despite the formation of cardinal pathological features (RNA binding protein sequestering intranuclear RNA foci) toxicity was not observed in these models suggesting that repeat RNA plays a limited role in disease pathogenesis. When individual DPRs are expressed in Drosophila neurons a strong toxicity is induced by the arginine containing DPRs (poly-GR and poly-PR). To gain insight into the mechanism(s) by which this toxicity occurs, the protein-interactome of these DPRs was investigated in vivo using novel transgenic Drosophila that inducibly express affinity tagged DPR constructs, with identification of interacting proteins using mass spectrometry. In parallel, inclusions of dipeptide proteins were laser-capture microdissected from patient brain tissue and enriched proteins identified by mass spectrometry. The overlap of these datasets suggested that translation may be impaired by the arginine-containing DPRs and methods were adapted to assess the rate of translation in adult Drosophila brains. In parallel, enzymelinked immunosorbent assays (ELISAs) were developed against poly-GR and an abundant non-toxic DPR poly-GP. Measurement of these proteins was performed in various model systems (transfected immortalised cell lines, induced pluripotent stem cell derived neurons, Drosophila models) to confirm the validity of the assays and the potential therapeutic value of interventions
Recommended from our members
Updating the PECAS Modeling Framework to Include Energy Use Data for Buildings
This study investigates the consumption of electricity and natural gas for building operations for several categories of residential and non-residential buildings. The study updates the Production Exchange Consumption Allocation System (PECAS) land use modeling framework to include energy components. An energy database was assembled to study energy consumption in buildings. The authors conducted statistical analysis of utility data and estimated linear regression models to predict energy consumption in buildings. Results are validated using data from independent sources, including the California Residential Appliance Saturation Study (RASS) and the Commercial End-Use Survey (CEUS). Results are used to update PECAS and form part of the baseline study to estimate energy and greenhouse gas balances in an urban metabolism framework for the analysis of the environmental impacts of complex urban regions. The results also allow the total energy consumption and greenhouse gas emissions for residential and commercial building operations to be estimated through the application to the total residential and commercial building inventory in the region. These results are then useful for the evaluation of possible energy savings in buildings
CP-nets: From Theory to Practice
Conditional preference networks (CP-nets) exploit the power of ceteris paribus rules to represent preferences over combinatorial decision domains compactly. CP-nets have much appeal. However, their study has not yet advanced sufficiently for their widespread use in real-world applications. Known algorithms for deciding dominance---whether one outcome is better than another with respect to a CP-net---require exponential time. Data for CP-nets are difficult to obtain: human subjects data over combinatorial domains are not readily available, and earlier work on random generation is also problematic. Also, much of the research on CP-nets makes strong, often unrealistic assumptions, such as that decision variables must be binary or that only strict preferences are permitted. In this thesis, I address such limitations to make CP-nets more useful. I show how: to generate CP-nets uniformly randomly; to limit search depth in dominance testing given expectations about sets of CP-nets; and to use local search for learning restricted classes of CP-nets from choice data
On Information-centric Resiliency and System-level Security in Constrained, Wireless Communication
The Internet of Things (IoT) interconnects many heterogeneous embedded devices either locally between each other, or globally with the Internet. These things are resource-constrained, e.g., powered by battery, and typically communicate via low-power and lossy wireless links. Communication needs to be secured and relies on crypto-operations that are often resource-intensive and in conflict with the device constraints. These challenging operational conditions on the cheapest hardware possible, the unreliable wireless transmission, and the need for protection against common threats of the inter-network, impose severe challenges to IoT networks. In this thesis, we advance the current state of the art in two dimensions.
Part I assesses Information-centric networking (ICN) for the IoT, a network paradigm that promises enhanced reliability for data retrieval in constrained edge networks. ICN lacks a lower layer definition, which, however, is the key to enable device sleep cycles and exclusive wireless media access. This part of the thesis designs and evaluates an effective media access strategy for ICN to reduce the energy consumption and wireless interference on constrained IoT nodes.
Part II examines the performance of hardware and software crypto-operations, executed on off-the-shelf IoT platforms. A novel system design enables the accessibility and auto-configuration of crypto-hardware through an operating system. One main focus is the generation of random numbers in the IoT. This part of the thesis further designs and evaluates Physical Unclonable Functions (PUFs) to provide novel randomness sources that generate highly unpredictable secrets, on low-cost devices that lack hardware-based security features.
This thesis takes a practical view on the constrained IoT and is accompanied by real-world implementations and measurements. We contribute open source software, automation tools, a simulator, and reproducible measurement results from real IoT deployments using off-the-shelf hardware. The large-scale experiments in an open access testbed provide a direct starting point for future research
Scalable multimedia indexing and similarity search in high dimensionality
Orientador: Ricardo da Silva TorresDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A disseminação de grandes coleções de arquivos de imagens, músicas e vídeos tem aumentado a demanda por métodos de indexação e sistemas de recuperação de informações multimídia. No caso de imagens, os sistemas de busca mais promissores são os sistemas baseados no conteúdo, que ao invés de usarem descrições textuais, utilizam vetores de características, que são representações de propriedades visuais, como cor, textura e forma. O emparelhamento dos vetores de características da imagem de consulta e das imagens de uma base de dados é implementado através da busca por similaridade. A sua forma mais comum é a busca pelos k vizinhos mais próximos, ou seja, encontrar os k vetores mais próximos ao vetor da consulta. Em grandes bases de imagens, um índice é indispensável para acelerar essas consultas. O problema é que os vetores de características podem ter muitas dimensões, o que afeta gravemente o desempenho dos métodos de indexação. Acima de 10 dimensões, geralmente é preciso recorrer aos métodos aproximados, sacrificando a eficácia em troca da rapidez. Dentre as diversas soluções propostas, existe uma abordagem baseada em curvas fractais chamadas curvas de preenchimento do espaço. Essas curvas permitem mapear pontos de um espaço multidimensional em uma única dimensão, de maneira que os pontos próximos na curva correspondam a pontos próximos no espaço. O grande problema dessa alternativa é a existência de regiões de descontinuidade nas curvas, pontos próximos dessas regiões não são mapeados próximos na curva. A principal contribuição deste trabalho é um método de indexação de vetores de características de alta dimensionalidade, que utiliza uma curva de preenchimento do espaço e múltiplos representantes para os dados. Esse método, chamado MONORAIL, gera os representantes explorando as propriedades geométricas da curva. Isso resulta em um ganho na eficácia da busca por similaridade, quando comparado com o método de referência. Outra contribuição não trivial deste trabalho é o rigor experimental usado nas comparações: os experimentos foram cuidadosamente projetados para garantir resultados estatisticamente significativos. A escalabilidade do MONORAIL é testada com três bases de dados de tamanhos diferentes, a maior delas com mais de 130 milhões de vetoresAbstract: The spread of large collections of images, videos and music has increased the demand for indexing methods and multimedia information retrieval systems. For images, the most promising search engines are content-based, which instead of using textual annotations, use feature vectors to represent visual properties such as color, texture, and shape. The matching of feature vectors of query image and database images is implemented by similarity search. Its most common form is the k nearest neighbors search, which aims to find the k closest vectors to the query vector. In large image databases, an index structure is essential to speed up those queries. The problem is that the feature vectors may have many dimensions, which seriously affects the performance of indexing methods. For more than 10 dimensions, it is often necessary to use approximate methods to trade-off effectiveness for speed. Among the several solutions proposed, there is an approach based on fractal curves known as space-filling curves. Those curves allow the mapping of a multidimensional space onto a single dimension, so that points near on the curve correspond to points near on the space. The great problem with that alternative is the existence of discontinuity regions on the curves, where points near on those regions are not mapped near on the curve. The main contribution of this dissertation is an indexing method for high-dimensional feature vectors, using a single space-filling curve and multiple surrogates for each data point. That method, called MONORAIL, generates surrogates by exploiting the geometric properties of the curve. The result is a gain in terms of effectiveness of similarity search, when compared to the baseline method. Another non-trivial contribution of this work is the rigorous experimental design used for the comparisons. The experiments were carefully designed to ensure statistically sound results. The scalability of the MONORAIL is tested with three databases of different sizes, the largest one with more than 130 million vectorsMestradoCiência da ComputaçãoMestre em Ciência da Computaçã
Recommended from our members
The Black Mountain phase occupation at Old Town : an examination of social and technological organization in the Mimbres Valley of southwestern New Mexico, ca. A.D 1150 - 1300
textThe Black Mountain phase of the Mimbres Mogollon cultural tradition, dating from around A.D. 1150 through A.D. 1300, is perhaps the most poorly understood time period of the entire Mimbres sequence. During that time, people inhabiting the Mimbres Valley of southwestern New Mexico adopted new ceramic sequences, ceased producing Black-on-white pottery, adopted new architectural styles, and possibly changed mortuary patterns. These changes have been interpreted in a multitude of ways that can be reduced to models of continuity and discontinuity. Unfortunately, these models and interpretations rest on a very limited set of data that comes largely from three moderately tested Black Mountain phase sites in the Mimbres Valley proper: Montoya, Old Town, and Walsh. Thus, arguments for or against either model based on the presence of absence of particular traits are necessarily limited by the modest data from these three sites. It was in this context of opposing interpretations that other aspects of the life ways of Black Mountain phase peoples were analyzed. Specifically, I look at the ways lithic and ceramic technologies were organized to assess if the changes that occurred during the Black Mountain phase also represent changes in the ways social systems were organized. I believe that while certain aspects of material culture such as shifts in ceramic or architectural style are easily changed whereas the social mechanisms responsible for their production are more resistant. The results of these analyses demonstrate that there are more similarities than differences with respect to the manner in which technologies were organized during the time periods traditionally accepted as representing “Mimbres” manifestations and the Black Mountain phase. Thus, the social mechanisms dictating the processes of production, distribution, transmission, and reproduction appear to be similar from the Pithouse periods through the Black Mountain phase. This research adds to the growing body of evidence that suggests continuity between the Classic period inhabitants of the Mimbres area and later Black Mountain phase peoples.Anthropolog