Search CORE

673 research outputs found

Diversity in similarity joins

Author: Carvalho Luiz Olmes
Oliveira Willian Dener de
Santos Lúcio Fernandes Dutra
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: Cham
Publication date
Field of study

With the increasing ability of current applications to produce and consume more complex data, such as images and geographic information, the similarity join has attracted considerable attention. However, this operator does not consider the relationship among the elements in the answer, generating results with many pairs similar among themselves, which does not add value to the final answer. Result diversification methods are intended to retrieve elements similar enough to satisfy the similarity conditions, but also considering the diversity among the elements in the answer, producing a more heterogeneous result with smaller cardinality, which improves the meaning of the answer. Still, diversity have been studied only when applied to unary operations. In this paper, we introduce the concept of diverse similarity joins: a similarity join operator that ensures a smaller, more diversified and useful answers. The experiments performed on real and synthetic datasets show that our proposal allows exploiting diversity in similarity joins without diminish their performance whereas providing elements that cover the same data space distribution of the non-diverse answers.FAPESPCNPQCAPESRescuer (EU Commission Grant 614154 and CNPQ/MCTI Grant 490084/2013-3)International Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow

ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms

Author: Alexander Faithfull
Alexandr Andoni
Amsaleg
Bentley
Christiani
Ciaccia
Curtin
Edel
Erik Bernhardsson
Heo
Herlocker
Houle
Hyvönen
Iwasaki
Johnson
Kirner
Kriegel
Laarhoven
LeCun
Levina
Malkov
Martin Aumüller
Pálmason
Van Rijn
Wang
Williams
Zezula
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Crossref

The IT University of Copenhagen's Repository

Braneworld dynamics with the BraneCode

Author: A. Lukas
A. Mennim
A.V. Frolov
Andrei V. Frolov
C. Csaki
C. Herdeiro
C.P. Burgess
D. Langlois
D. Langlois
D. Langlois
E.E. Flanagan
G.N. Felder
G.N. Felder
G.R. Dvali
G.W. Gibbons
Gary N. Felder
H.A. Chamblin
J. de Boer
J.E. Lidsey
Johannes Martin
K.i. Maeda
L. Randall
L. Randall
Lev A. Kofman
Marco Peloso
N. Arkani-Hamed
O. DeWolfe
P. Binetruy
P. Binetruy
P. Bowcock
P. Horava
P. Kanti
P. Kanti
P. Kraus
P.J. Steinhardt
S. Förste
S. Kachru
S. Mukohyama
S. Mukohyama
T. Damour
T. Shiromizu
T. Tanaka
U. Gien
V.A. Belinsky
V.A. Belinsky
V.A. Rubakov
W.D. Goldberger
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2003
Field of study

We give a full nonlinear numerical treatment of time-dependent 5d braneworld geometry, which is determined self-consistently by potentials for the scalar field in the bulk and at two orbifold branes, supplemented by boundary conditions at the branes. We describe the BraneCode, an algorithm which we designed to solve the dynamical equations numerically. We applied the BraneCode to braneworld models and found several novel phenomena of the brane dynamics. Starting with static warped geometry with de Sitter branes, we found numerically that this configuration is often unstable due to a tachyonic mass of the radion during inflation. If the model admits other static configurations with lower values of de Sitter curvature, this effect causes a violent re-structuring towards them, flattening the branes, which appears as a lowering of the 4d effective cosmological constant. Braneworld dynamics can often lead to brane collisions. We found that in the presence of the bulk scalar field, the 5d geometry between colliding branes approaches a universal, homogeneous, anisotropic strong gravity Kasner-like asymptotic, irrespective of the bulk/brane potentials. The Kasner indices of the brane directions are equal to each other but different from that of the extra dimension.Comment: 38 pages, 10 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Archivio istituzionale della ricerca - Università di Padova

Implementation for spatial data of the shared nearest neighbour with metric data structures

Author: Faustino Bruno Filipe Fernandes Simões Salgueiro
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2012
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia Informátic

Repositório da Universidade Nova de Lisboa

Towards a Framework for DHT Distributed Computing

Author: Rosen Andrew
Publication venue: ScholarWorks @ Georgia State University
Publication date: 12/08/2016
Field of study

Distributed Hash Tables (DHTs) are protocols and frameworks used by peer-to-peer (P2P) systems. They are used as the organizational backbone for many P2P file-sharing systems due to their scalability, fault-tolerance, and load-balancing properties. These same properties are highly desirable in a distributed computing environment, especially one that wants to use heterogeneous components. We show that DHTs can be used not only as the framework to build a P2P file-sharing service, but as a P2P distributed computing platform. We propose creating a P2P distributed computing framework using distributed hash tables, based on our prototype system ChordReduce. This framework would make it simple and efficient for developers to create their own distributed computing applications. Unlike Hadoop and similar MapReduce frameworks, our framework can be used both in both the context of a datacenter or as part of a P2P computing platform. This opens up new possibilities for building platforms to distributed computing problems. One advantage our system will have is an autonomous load-balancing mechanism. Nodes will be able to independently acquire work from other nodes in the network, rather than sitting idle. More powerful nodes in the network will be able use the mechanism to acquire more work, exploiting the heterogeneity of the network. By utilizing the load-balancing algorithm, a datacenter could easily leverage additional P2P resources at runtime on an as needed basis. Our framework will allow MapReduce-like or distributed machine learning platforms to be easily deployed in a greater variety of contexts

ScholarWorks @ Georgia State University

An Unsupervised Cluster: Learning Water Customer Behavior Using Variation of Information on a Reconstructed Phase Space

Author: Malinowski Michele Rae Bizub
Publication venue: e-Publications@Marquette
Publication date: 01/04/2018
Field of study

The unsupervised clustering algorithm described in this dissertation addresses the need to divide a population of water utility customers into groups based on their similarities and differences, using only the measured flow data collected by water meters. After clustering, the groups represent customers with similar consumption behavior patterns and provide insight into ‘normal’ and ‘unusual’ customer behavior patterns. This research focuses upon individually metered water utility customers and includes both residential and commercial customer accounts serviced by utilities within North America. The contributions of this dissertation not only represent a novel academic work, but also solve a practical problem for the utility industry. This dissertation introduces a method of agglomerative clustering using information theoretic distance measures on Gaussian mixture models within a reconstructed phase space. The clustering method accommodates a utility’s limited human, financial, computational, and environmental resources. The proposed weighted variation of information distance measure for comparing Gaussian mixture models places emphasis upon those behaviors whose statistical distributions are more compact over those behaviors with large variation and contributes a novel addition to existing comparison options

epublications@Marquette

Identifying Online Sexual Predators Using Support Vector Machine

Author: Li Yifan
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2020
Field of study

A two-stage classification model is built in the research for online sexual predator identification. The first stage identifies the suspicious conversations that have predator participants. The second stage identifies the predators in suspicious conversations. Support vector machines are used with word and character n-grams, combined with behavioural features of the authors to train the final classifier. The unbalanced dataset is downsampled to test the performance of re-balancing an unbalanced dataset. An age group classification model is also constructed to test the feasibility of extracting the age profile of the authors, which can be used as features for classifier training. The e↵ect of re-balancing the unbalanced dataset resulted in a better performance of the classifier. Testing the two-stage classification model on the unseen test set, 171 out of 254 predators are successfully identified giving a precision of 0.85, recall of 0.67 and f-score of 0.807. Comparing the classification performance with and without the behavioural feature, it can be seen the n-gram contributed the most to the performance of the classifier, while the behavioural features do not contribute significantly to the performance

Arrow@TUDublin

Enhancement of Query processing on XML data

Author: YANG RUI
Publication venue
Publication date: 10/07/2007
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Contributions to security and privacy protection in recommendation systems

Author: Vera del Campo Juan
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

A recommender system is an automatic system that, given a customer model and a set of available documents, is able to select and offer those documents that are more interesting to the customer. From the point of view of security, there are two main issues that recommender systems must face: protection of the users' privacy and protection of other participants of the recommendation process. Recommenders issue personalized recommendations taking into account not only the profile of the documents, but also the private information that customers send to the recommender. Hence, the users' profiles include personal and highly sensitive information, such as their likes and dislikes. In order to have a really useful recommender system and improve its efficiency, we believe that users shouldn't be afraid of stating their preferences. The second challenge from the point of view of security involves the protection against a new kind of attack. Copyright holders have shifted their targets to attack the document providers and any other participant that aids in the process of distributing documents, even unknowingly. In addition, new legislation trends such as ACTA or the ¿Sinde-Wert law¿ in Spain show the interest of states all over the world to control and prosecute these intermediate nodes. we proposed the next contributions: 1.A social model that captures user's interests into the users' profiles, and a metric function that calculates the similarity between users, queries and documents. This model represents profiles as vectors of a social space. Document profiles are created by means of the inspection of the contents of the document. Then, user profiles are calculated as an aggregation of the profiles of the documents that the user owns. Finally, queries are a constrained view of a user profile. This way, all profiles are contained in the same social space, and the similarity metric can be used on any pair of them. 2.Two mechanisms to protect the personal information that the user profiles contain. The first mechanism takes advantage of the Johnson-Lindestrauss and Undecomposability of random matrices theorems to project profiles into social spaces of less dimensions. Even if the information about the user is reduced in the projected social space, under certain circumstances the distances between the original profiles are maintained. The second approach uses a zero-knowledge protocol to answer the question of whether or not two profiles are affine without leaking any information in case of that they are not. 3.A distributed system on a cloud that protects merchants, customers and indexers against legal attacks, by means of providing plausible deniability and oblivious routing to all the participants of the system. We use the term DocCloud to refer to this system. DocCloud organizes databases in a tree-shape structure over a cloud system and provide a Private Information Retrieval protocol to avoid that any participant or observer of the process can identify the recommender. This way, customers, intermediate nodes and even databases are not aware of the specific database that answered the query. 4.A social, P2P network where users link together according to their similarity, and provide recommendations to other users in their neighborhood. We defined an epidemic protocol were links are established based on the neighbors similarity, clustering and randomness. Additionally, we proposed some mechanisms such as the use SoftDHT to aid in the identification of affine users, and speed up the process of creation of clusters of similar users. 5.A document distribution system that provides the recommended documents at the end of the process. In our view of a recommender system, the recommendation is a complete process that ends when the customer receives the recommended document. We proposed SCFS, a distributed and secure filesystem where merchants, documents and users are protectedEste documento explora c omo localizar documentos interesantes para el usuario en grandes redes distribuidas mediante el uso de sistemas de recomendaci on. Se de fine un sistema de recomendaci on como un sistema autom atico que, dado un modelo de cliente y un conjunto de documentos disponibles, es capaz de seleccionar y ofrecer los documentos que son m as interesantes para el cliente. Las caracter sticas deseables de un sistema de recomendaci on son: (i) ser r apido, (ii) distribuido y (iii) seguro. Un sistema de recomendaci on r apido mejora la experiencia de compra del cliente, ya que una recomendaci on no es util si es que llega demasiado tarde. Un sistema de recomendaci on distribuido evita la creaci on de bases de datos centralizadas con informaci on sensible y mejora la disponibilidad de los documentos. Por ultimo, un sistema de recomendaci on seguro protege a todos los participantes del sistema: usuarios, proveedores de contenido, recomendadores y nodos intermedios. Desde el punto de vista de la seguridad, existen dos problemas principales a los que se deben enfrentar los sistemas de recomendaci on: (i) la protecci on de la intimidad de los usuarios y (ii) la protecci on de los dem as participantes del proceso de recomendaci on. Los recomendadores son capaces de emitir recomendaciones personalizadas teniendo en cuenta no s olo el per l de los documentos, sino tambi en a la informaci on privada que los clientes env an al recomendador. Por tanto, los per les de usuario incluyen informaci on personal y altamente sensible, como sus gustos y fobias. Con el n de desarrollar un sistema de recomendaci on util y mejorar su e cacia, creemos que los usuarios no deben tener miedo a la hora de expresar sus preferencias. Para ello, la informaci on personal que est a incluida en los per les de usuario debe ser protegida y la privacidad del usuario garantizada. El segundo desafi o desde el punto de vista de la seguridad implica un nuevo tipo de ataque. Dado que la prevenci on de la distribuci on ilegal de documentos con derechos de autor por medio de soluciones t ecnicas no ha sido efi caz, los titulares de derechos de autor cambiaron sus objetivos para atacar a los proveedores de documentos y cualquier otro participante que ayude en el proceso de distribuci on de documentos. Adem as, tratados y leyes como ACTA, la ley SOPA de EEUU o la ley "Sinde-Wert" en España ponen de manfi esto el inter es de los estados de todo el mundo para controlar y procesar a estos nodos intermedios. Los juicios recientes como MegaUpload, PirateBay o el caso contra el Sr. Pablo Soto en España muestran que estas amenazas son una realidad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

On the p-Laplace operator on Riemannian manifolds

Author: Giulio Setti
Id Number R
Supervisor Alberto
Publication venue
Publication date: 26/02/2013
Field of study

This thesis covers different aspects of the p-Laplace operators on Riemannian manifolds. Chapter 2. Potential theoretic aspects: the Khasmkinskii condition. Chapter 3: sharp eigenvalue estimates with Ricci curvature lower bounds. Chapter 4: Critical sets of (2-)harmonic functions.Comment: PhD Thesis: Contains results obtained in collaboration with other mathematicians, see section 1.4 for details. ADDED IN THIS VERSION: correction of few typos, and added a reference brought to our attention by an anonymous referee. Details in the introduction, end of section 1.

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano