673 research outputs found

    Diversity in similarity joins

    Get PDF
    With the increasing ability of current applications to produce and consume more complex data, such as images and geographic information, the similarity join has attracted considerable attention. However, this operator does not consider the relationship among the elements in the answer, generating results with many pairs similar among themselves, which does not add value to the final answer. Result diversification methods are intended to retrieve elements similar enough to satisfy the similarity conditions, but also considering the diversity among the elements in the answer, producing a more heterogeneous result with smaller cardinality, which improves the meaning of the answer. Still, diversity have been studied only when applied to unary operations. In this paper, we introduce the concept of diverse similarity joins: a similarity join operator that ensures a smaller, more diversified and useful answers. The experiments performed on real and synthetic datasets show that our proposal allows exploiting diversity in similarity joins without diminish their performance whereas providing elements that cover the same data space distribution of the non-diverse answers.FAPESPCNPQCAPESRescuer (EU Commission Grant 614154 and CNPQ/MCTI Grant 490084/2013-3)International Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow

    Braneworld dynamics with the BraneCode

    Full text link
    We give a full nonlinear numerical treatment of time-dependent 5d braneworld geometry, which is determined self-consistently by potentials for the scalar field in the bulk and at two orbifold branes, supplemented by boundary conditions at the branes. We describe the BraneCode, an algorithm which we designed to solve the dynamical equations numerically. We applied the BraneCode to braneworld models and found several novel phenomena of the brane dynamics. Starting with static warped geometry with de Sitter branes, we found numerically that this configuration is often unstable due to a tachyonic mass of the radion during inflation. If the model admits other static configurations with lower values of de Sitter curvature, this effect causes a violent re-structuring towards them, flattening the branes, which appears as a lowering of the 4d effective cosmological constant. Braneworld dynamics can often lead to brane collisions. We found that in the presence of the bulk scalar field, the 5d geometry between colliding branes approaches a universal, homogeneous, anisotropic strong gravity Kasner-like asymptotic, irrespective of the bulk/brane potentials. The Kasner indices of the brane directions are equal to each other but different from that of the extra dimension.Comment: 38 pages, 10 figure

    Implementation for spatial data of the shared nearest neighbour with metric data structures

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia Informátic

    Towards a Framework for DHT Distributed Computing

    Get PDF
    Distributed Hash Tables (DHTs) are protocols and frameworks used by peer-to-peer (P2P) systems. They are used as the organizational backbone for many P2P file-sharing systems due to their scalability, fault-tolerance, and load-balancing properties. These same properties are highly desirable in a distributed computing environment, especially one that wants to use heterogeneous components. We show that DHTs can be used not only as the framework to build a P2P file-sharing service, but as a P2P distributed computing platform. We propose creating a P2P distributed computing framework using distributed hash tables, based on our prototype system ChordReduce. This framework would make it simple and efficient for developers to create their own distributed computing applications. Unlike Hadoop and similar MapReduce frameworks, our framework can be used both in both the context of a datacenter or as part of a P2P computing platform. This opens up new possibilities for building platforms to distributed computing problems. One advantage our system will have is an autonomous load-balancing mechanism. Nodes will be able to independently acquire work from other nodes in the network, rather than sitting idle. More powerful nodes in the network will be able use the mechanism to acquire more work, exploiting the heterogeneity of the network. By utilizing the load-balancing algorithm, a datacenter could easily leverage additional P2P resources at runtime on an as needed basis. Our framework will allow MapReduce-like or distributed machine learning platforms to be easily deployed in a greater variety of contexts

    An Unsupervised Cluster: Learning Water Customer Behavior Using Variation of Information on a Reconstructed Phase Space

    Get PDF
    The unsupervised clustering algorithm described in this dissertation addresses the need to divide a population of water utility customers into groups based on their similarities and differences, using only the measured flow data collected by water meters. After clustering, the groups represent customers with similar consumption behavior patterns and provide insight into ‘normal’ and ‘unusual’ customer behavior patterns. This research focuses upon individually metered water utility customers and includes both residential and commercial customer accounts serviced by utilities within North America. The contributions of this dissertation not only represent a novel academic work, but also solve a practical problem for the utility industry. This dissertation introduces a method of agglomerative clustering using information theoretic distance measures on Gaussian mixture models within a reconstructed phase space. The clustering method accommodates a utility’s limited human, financial, computational, and environmental resources. The proposed weighted variation of information distance measure for comparing Gaussian mixture models places emphasis upon those behaviors whose statistical distributions are more compact over those behaviors with large variation and contributes a novel addition to existing comparison options

    Identifying Online Sexual Predators Using Support Vector Machine

    Get PDF
    A two-stage classification model is built in the research for online sexual predator identification. The first stage identifies the suspicious conversations that have predator participants. The second stage identifies the predators in suspicious conversations. Support vector machines are used with word and character n-grams, combined with behavioural features of the authors to train the final classifier. The unbalanced dataset is downsampled to test the performance of re-balancing an unbalanced dataset. An age group classification model is also constructed to test the feasibility of extracting the age profile of the authors, which can be used as features for classifier training. The e↵ect of re-balancing the unbalanced dataset resulted in a better performance of the classifier. Testing the two-stage classification model on the unseen test set, 171 out of 254 predators are successfully identified giving a precision of 0.85, recall of 0.67 and f-score of 0.807. Comparing the classification performance with and without the behavioural feature, it can be seen the n-gram contributed the most to the performance of the classifier, while the behavioural features do not contribute significantly to the performance

    Enhancement of Query processing on XML data

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Contributions to security and privacy protection in recommendation systems

    Get PDF
    A recommender system is an automatic system that, given a customer model and a set of available documents, is able to select and offer those documents that are more interesting to the customer. From the point of view of security, there are two main issues that recommender systems must face: protection of the users' privacy and protection of other participants of the recommendation process. Recommenders issue personalized recommendations taking into account not only the profile of the documents, but also the private information that customers send to the recommender. Hence, the users' profiles include personal and highly sensitive information, such as their likes and dislikes. In order to have a really useful recommender system and improve its efficiency, we believe that users shouldn't be afraid of stating their preferences. The second challenge from the point of view of security involves the protection against a new kind of attack. Copyright holders have shifted their targets to attack the document providers and any other participant that aids in the process of distributing documents, even unknowingly. In addition, new legislation trends such as ACTA or the ¿Sinde-Wert law¿ in Spain show the interest of states all over the world to control and prosecute these intermediate nodes. we proposed the next contributions: 1.A social model that captures user's interests into the users' profiles, and a metric function that calculates the similarity between users, queries and documents. This model represents profiles as vectors of a social space. Document profiles are created by means of the inspection of the contents of the document. Then, user profiles are calculated as an aggregation of the profiles of the documents that the user owns. Finally, queries are a constrained view of a user profile. This way, all profiles are contained in the same social space, and the similarity metric can be used on any pair of them. 2.Two mechanisms to protect the personal information that the user profiles contain. The first mechanism takes advantage of the Johnson-Lindestrauss and Undecomposability of random matrices theorems to project profiles into social spaces of less dimensions. Even if the information about the user is reduced in the projected social space, under certain circumstances the distances between the original profiles are maintained. The second approach uses a zero-knowledge protocol to answer the question of whether or not two profiles are affine without leaking any information in case of that they are not. 3.A distributed system on a cloud that protects merchants, customers and indexers against legal attacks, by means of providing plausible deniability and oblivious routing to all the participants of the system. We use the term DocCloud to refer to this system. DocCloud organizes databases in a tree-shape structure over a cloud system and provide a Private Information Retrieval protocol to avoid that any participant or observer of the process can identify the recommender. This way, customers, intermediate nodes and even databases are not aware of the specific database that answered the query. 4.A social, P2P network where users link together according to their similarity, and provide recommendations to other users in their neighborhood. We defined an epidemic protocol were links are established based on the neighbors similarity, clustering and randomness. Additionally, we proposed some mechanisms such as the use SoftDHT to aid in the identification of affine users, and speed up the process of creation of clusters of similar users. 5.A document distribution system that provides the recommended documents at the end of the process. In our view of a recommender system, the recommendation is a complete process that ends when the customer receives the recommended document. We proposed SCFS, a distributed and secure filesystem where merchants, documents and users are protectedEste documento explora c omo localizar documentos interesantes para el usuario en grandes redes distribuidas mediante el uso de sistemas de recomendaci on. Se de fine un sistema de recomendaci on como un sistema autom atico que, dado un modelo de cliente y un conjunto de documentos disponibles, es capaz de seleccionar y ofrecer los documentos que son m as interesantes para el cliente. Las caracter sticas deseables de un sistema de recomendaci on son: (i) ser r apido, (ii) distribuido y (iii) seguro. Un sistema de recomendaci on r apido mejora la experiencia de compra del cliente, ya que una recomendaci on no es util si es que llega demasiado tarde. Un sistema de recomendaci on distribuido evita la creaci on de bases de datos centralizadas con informaci on sensible y mejora la disponibilidad de los documentos. Por ultimo, un sistema de recomendaci on seguro protege a todos los participantes del sistema: usuarios, proveedores de contenido, recomendadores y nodos intermedios. Desde el punto de vista de la seguridad, existen dos problemas principales a los que se deben enfrentar los sistemas de recomendaci on: (i) la protecci on de la intimidad de los usuarios y (ii) la protecci on de los dem as participantes del proceso de recomendaci on. Los recomendadores son capaces de emitir recomendaciones personalizadas teniendo en cuenta no s olo el per l de los documentos, sino tambi en a la informaci on privada que los clientes env an al recomendador. Por tanto, los per les de usuario incluyen informaci on personal y altamente sensible, como sus gustos y fobias. Con el n de desarrollar un sistema de recomendaci on util y mejorar su e cacia, creemos que los usuarios no deben tener miedo a la hora de expresar sus preferencias. Para ello, la informaci on personal que est a incluida en los per les de usuario debe ser protegida y la privacidad del usuario garantizada. El segundo desafi o desde el punto de vista de la seguridad implica un nuevo tipo de ataque. Dado que la prevenci on de la distribuci on ilegal de documentos con derechos de autor por medio de soluciones t ecnicas no ha sido efi caz, los titulares de derechos de autor cambiaron sus objetivos para atacar a los proveedores de documentos y cualquier otro participante que ayude en el proceso de distribuci on de documentos. Adem as, tratados y leyes como ACTA, la ley SOPA de EEUU o la ley "Sinde-Wert" en España ponen de manfi esto el inter es de los estados de todo el mundo para controlar y procesar a estos nodos intermedios. Los juicios recientes como MegaUpload, PirateBay o el caso contra el Sr. Pablo Soto en España muestran que estas amenazas son una realidad

    On the p-Laplace operator on Riemannian manifolds

    Get PDF
    This thesis covers different aspects of the p-Laplace operators on Riemannian manifolds. Chapter 2. Potential theoretic aspects: the Khasmkinskii condition. Chapter 3: sharp eigenvalue estimates with Ricci curvature lower bounds. Chapter 4: Critical sets of (2-)harmonic functions.Comment: PhD Thesis: Contains results obtained in collaboration with other mathematicians, see section 1.4 for details. ADDED IN THIS VERSION: correction of few typos, and added a reference brought to our attention by an anonymous referee. Details in the introduction, end of section 1.
    corecore