314 research outputs found

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    固有値分解とテンソル分解を用いた大規模グラフデータ分析に関する研究

    Get PDF
    筑波大学 (University of Tsukuba)201

    Validação de heterogeneidade estrutural em dados de Crio-ME por comitês de agrupadores

    Get PDF
    Orientadores: Fernando José Von Zuben, Rodrigo Villares PortugalDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Análise de Partículas Isoladas é uma técnica que permite o estudo da estrutura tridimensional de proteínas e outros complexos macromoleculares de interesse biológico. Seus dados primários consistem em imagens de microscopia eletrônica de transmissão de múltiplas cópias da molécula em orientações aleatórias. Tais imagens são bastante ruidosas devido à baixa dose de elétrons utilizada. Reconstruções 3D podem ser obtidas combinando-se muitas imagens de partículas em orientações similares e estimando seus ângulos relativos. Entretanto, estados conformacionais heterogêneos frequentemente coexistem na amostra, porque os complexos moleculares podem ser flexíveis e também interagir com outras partículas. Heterogeneidade representa um desafio na reconstrução de modelos 3D confiáveis e degrada a resolução dos mesmos. Entre os algoritmos mais populares usados para classificação estrutural estão o agrupamento por k-médias, agrupamento hierárquico, mapas autoorganizáveis e estimadores de máxima verossimilhança. Tais abordagens estão geralmente entrelaçadas à reconstrução dos modelos 3D. No entanto, trabalhos recentes indicam ser possível inferir informações a respeito da estrutura das moléculas diretamente do conjunto de projeções 2D. Dentre estas descobertas, está a relação entre a variabilidade estrutural e manifolds em um espaço de atributos multidimensional. Esta dissertação investiga se um comitê de algoritmos de não-supervisionados é capaz de separar tais "manifolds conformacionais". Métodos de "consenso" tendem a fornecer classificação mais precisa e podem alcançar performance satisfatória em uma ampla gama de conjuntos de dados, se comparados a algoritmos individuais. Nós investigamos o comportamento de seis algoritmos de agrupamento, tanto individualmente quanto combinados em comitês, para a tarefa de classificação de heterogeneidade conformacional. A abordagem proposta foi testada em conjuntos sintéticos e reais contendo misturas de imagens de projeção da proteína Mm-cpn nos estados "aberto" e "fechado". Demonstra-se que comitês de agrupadores podem fornecer informações úteis na validação de particionamentos estruturais independetemente de algoritmos de reconstrução 3DAbstract: Single Particle Analysis is a technique that allows the study of the three-dimensional structure of proteins and other macromolecular assemblies of biological interest. Its primary data consists of transmission electron microscopy images from multiple copies of the molecule in random orientations. Such images are very noisy due to the low electron dose employed. Reconstruction of the macromolecule can be obtained by averaging many images of particles in similar orientations and estimating their relative angles. However, heterogeneous conformational states often co-exist in the sample, because the molecular complexes can be flexible and may also interact with other particles. Heterogeneity poses a challenge to the reconstruction of reliable 3D models and degrades their resolution. Among the most popular algorithms used for structural classification are k-means clustering, hierarchical clustering, self-organizing maps and maximum-likelihood estimators. Such approaches are usually interlaced with the reconstructions of the 3D models. Nevertheless, recent works indicate that it is possible to infer information about the structure of the molecules directly from the dataset of 2D projections. Among these findings is the relationship between structural variability and manifolds in a multidimensional feature space. This dissertation investigates whether an ensemble of unsupervised classification algorithms is able to separate these "conformational manifolds". Ensemble or "consensus" methods tend to provide more accurate classification and may achieve satisfactory performance across a wide range of datasets, when compared with individual algorithms. We investigate the behavior of six clustering algorithms both individually and combined in ensembles for the task of structural heterogeneity classification. The approach was tested on synthetic and real datasets containing a mixture of images from the Mm-cpn chaperonin in the "open" and "closed" states. It is shown that cluster ensembles can provide useful information in validating the structural partitionings independently of 3D reconstruction methodsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    Using hypergraph theory to model coexistence management and coordinated spectrum allocation for heterogeneous wireless networks operating in shared spectrum

    Get PDF
    Electromagnetic waves in the Radio Frequency (RF) spectrum are used to convey wireless transmissions from one radio antenna to another. Spectrum utilisation factor, which refers to how readily a given spectrum can be reused across space and time while maintaining an acceptable level of transmission errors, is used to measure how efficiently a unit of frequency spectrum can be allocated to a specified number of users. The demand for wireless applications is increasing exponentially, hence there is a need for efficient management of the RF spectrum. However, spectrum usage studies have shown that the spectrum is under-utilised in space and time. A regulatory shift from static spectrum assignment to DSA is one way of addressing this. Licence exemption policy has also been advanced in Dynamic Spectrum Access (DSA) systems to spur wireless innovation and universal access to the internet. Furthermore, there is a shift from homogeneous to heterogeneous radio access and usage of the same spectrum band. These three shifts from traditional spectrum management have led to the challenge of coexistence among heterogeneous wireless networks which access the spectrum using DSA techniques. Cognitive radios have the ability for spectrum agility based on spectrum conditions. However, in the presence of multiple heterogeneous networks and without spectrum coordination, there is a challenge related to switching between available channels to minimise interference and maximise spectrum allocation. This thesis therefore focuses on the design of a framework for coexistence management and spectrum coordination, with the objective of maximising spectrum utilisation across geographical space and across time. The amount of geographical coverage in which a frequency can be used is optimised through frequency reuse while ensuring that harmful interference is minimised. The time during which spectrum is occupied is increased through time-sharing of the same spectrum by two or more networks, while ensuring that spectrum is shared by networks that can coexist in the same spectrum and that the total channel load is not excessive to prevent spectrum starvation. Conventionally, a graph is used to model relationships between entities such as interference relationships among networks. However, the concept of an edge in a graph is not sufficient to model relationships that involve more than two entities, such as more than two networks that are able to share the same channel in the time domain, because an edge can only connect two entities. On the other hand, a hypergraph is a generalisation of an undirected graph in which a hyperedge can connect more than two entities. Therefore, this thesis investigates the use of hypergraph theory to model the RF environment and the spectrum allocation scheme. The hypergraph model was applied to an algorithm for spectrum sharing among 100 heterogeneous wireless networks, whose geo-locations were randomly and independently generated in a 50 km by 50 km area. Simulation results for spectrum utilisation performance have shown that the hypergraph-based model allocated channels, on average, to 8% more networks than the graph-based model. The results also show that, for the same RF environment, the hypergraph model requires up to 36% fewer channels to achieve, on average, 100% operational networks, than the graph model. The rate of growth of the running time of the hypergraph-based algorithm with respect to the input size is equal to the square of the input size, like the graph-based algorithm. Thus, the model achieved better performance at no additional time complexity.Electromagnetic waves in the Radio Frequency (RF) spectrum are used to convey wireless transmissions from one radio antenna to another. Spectrum utilisation factor, which refers to how readily a given spectrum can be reused across space and time while maintaining an acceptable level of transmission errors, is used to measure how efficiently a unit of frequency spectrum can be allocated to a specified number of users. The demand for wireless applications is increasing exponentially, hence there is a need for efficient management of the RF spectrum. However, spectrum usage studies have shown that the spectrum is under-utilised in space and time. A regulatory shift from static spectrum assignment to DSA is one way of addressing this. Licence exemption policy has also been advanced in Dynamic Spectrum Access (DSA) systems to spur wireless innovation and universal access to the internet. Furthermore, there is a shift from homogeneous to heterogeneous radio access and usage of the same spectrum band. These three shifts from traditional spectrum management have led to the challenge of coexistence among heterogeneous wireless networks which access the spectrum using DSA techniques. Cognitive radios have the ability for spectrum agility based on spectrum conditions. However, in the presence of multiple heterogeneous networks and without spectrum coordination, there is a challenge related to switching between available channels to minimise interference and maximise spectrum allocation. This thesis therefore focuses on the design of a framework for coexistence management and spectrum coordination, with the objective of maximising spectrum utilisation across geographical space and across time. The amount of geographical coverage in which a frequency can be used is optimised through frequency reuse while ensuring that harmful interference is minimised. The time during which spectrum is occupied is increased through time-sharing of the same spectrum by two or more networks, while ensuring that spectrum is shared by networks that can coexist in the same spectrum and that the total channel load is not excessive to prevent spectrum starvation. Conventionally, a graph is used to model relationships between entities such as interference relationships among networks. However, the concept of an edge in a graph is not sufficient to model relationships that involve more than two entities, such as more than two networks that are able to share the same channel in the time domain, because an edge can only connect two entities. On the other hand, a hypergraph is a generalisation of an undirected graph in which a hyperedge can connect more than two entities. Therefore, this thesis investigates the use of hypergraph theory to model the RF environment and the spectrum allocation scheme. The hypergraph model was applied to an algorithm for spectrum sharing among 100 heterogeneous wireless networks, whose geo-locations were randomly and independently generated in a 50 km by 50 km area. Simulation results for spectrum utilisation performance have shown that the hypergraph-based model allocated channels, on average, to 8% more networks than the graph-based model. The results also show that, for the same RF environment, the hypergraph model requires up to 36% fewer channels to achieve, on average, 100% operational networks, than the graph model. The rate of growth of the running time of the hypergraph-based algorithm with respect to the input size is equal to the square of the input size, like the graph-based algorithm. Thus, the model achieved better performance at no additional time complexity

    High-Quality Hypergraph Partitioning

    Get PDF
    This dissertation focuses on computing high-quality solutions for the NP-hard balanced hypergraph partitioning problem: Given a hypergraph and an integer kk, partition its vertex set into kk disjoint blocks of bounded size, while minimizing an objective function over the hyperedges. Here, we consider the two most commonly used objectives: the cut-net metric and the connectivity metric. Since the problem is computationally intractable, heuristics are used in practice - the most prominent being the three-phase multi-level paradigm: During coarsening, the hypergraph is successively contracted to obtain a hierarchy of smaller instances. After applying an initial partitioning algorithm to the smallest hypergraph, contraction is undone and, at each level, refinement algorithms try to improve the current solution. With this work, we give a brief overview of the field and present several algorithmic improvements to the multi-level paradigm. Instead of using a logarithmic number of levels like traditional algorithms, we present two coarsening algorithms that create a hierarchy of (nearly) nn levels, where nn is the number of vertices. This makes consecutive levels as similar as possible and provides many opportunities for refinement algorithms to improve the partition. This approach is made feasible in practice by tailoring all algorithms and data structures to the nn-level paradigm, and developing lazy-evaluation techniques, caching mechanisms and early stopping criteria to speed up the partitioning process. Furthermore, we propose a sparsification algorithm based on locality-sensitive hashing that improves the running time for hypergraphs with large hyperedges, and show that incorporating global information about the community structure into the coarsening process improves quality. Moreover, we present a portfolio-based initial partitioning approach, and propose three refinement algorithms. Two are based on the Fiduccia-Mattheyses (FM) heuristic, but perform a highly localized search at each level. While one is designed for two-way partitioning, the other is the first FM-style algorithm that can be efficiently employed in the multi-level setting to directly improve kk-way partitions. The third algorithm uses max-flow computations on pairs of blocks to refine kk-way partitions. Finally, we present the first memetic multi-level hypergraph partitioning algorithm for an extensive exploration of the global solution space. All contributions are made available through our open-source framework KaHyPar. In a comprehensive experimental study, we compare KaHyPar with hMETIS, PaToH, Mondriaan, Zoltan-AlgD, and HYPE on a wide range of hypergraphs from several application areas. Our results indicate that KaHyPar, already without the memetic component, computes better solutions than all competing algorithms for both the cut-net and the connectivity metric, while being faster than Zoltan-AlgD and equally fast as hMETIS. Moreover, KaHyPar compares favorably with the current best graph partitioning system KaFFPa - both in terms of solution quality and running time
    corecore