Search CORE

93 research outputs found

When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

Author: Andoni A.
Davis T.
Gionis A.
Goel A.
Shrivastava A.
Shrivastava A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/03/2017
Field of study

Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold

\tau

. In contrast to previous work where

\tau

is assumed to be quite close to 1, we focus on recommendation applications where

\tau

is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small

\tau

. To the best of our knowledge, there is no practical solution for computing all user pairs with, say

\tau = 0.2

on large social networks, even using the power of distributed algorithms. Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges

arXiv.org e-Print Archive

Crossref

Zero-Shot Hashing via Transferring Supervised Knowledge

Author: Frome A.
Gionis A.
Huang E. H.
Jayaraman D.
Kang W.-C.
Krizhevsky A.
Larochelle H.
Liu W.
Norouzi M.
Petrović S.
Romera-Paredes B.
Socher R.
Turian J.
Weiss Y.
Wen Z.
Wu T. T.
Xia R.
Zhang H.
Publication venue
Publication date: 01/01/2016
Field of study

Hashing has shown its efficiency and effectiveness in facilitating large-scale multimedia applications. Supervised knowledge e.g. semantic labels or pair-wise relationship) associated to data is capable of significantly improving the quality of hash codes and hash functions. However, confronted with the rapid growth of newly-emerging concepts and multimedia data on the Web, existing supervised hashing approaches may easily suffer from the scarcity and validity of supervised information due to the expensive cost of manual labelling. In this paper, we propose a novel hashing scheme, termed \emph{zero-shot hashing} (ZSH), which compresses images of "unseen" categories to binary codes with hash functions learned from limited training data of "seen" categories. Specifically, we project independent data labels i.e. 0/1-form label vectors) into semantic embedding space, where semantic relationships among all the labels can be precisely characterized and thus seen supervised knowledge can be transferred to unseen classes. Moreover, in order to cope with the semantic shift problem, we rotate the embedded space to more suitably align the embedded semantics with the low-level visual feature space, thereby alleviating the influence of semantic gap. In the meantime, to exert positive effects on learning high-quality hash functions, we further propose to preserve local structural property and discrete nature in binary codes. Besides, we develop an efficient alternating algorithm to solve the ZSH model. Extensive experiments conducted on various real-life datasets show the superior zero-shot image retrieval performance of ZSH as compared to several state-of-the-art hashing methods.Comment: 11 page

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

Non-surgical spinal decompression therapy: does the scientific literature support efficacy claims made in the advertising media?

Author: A Macario
C Martin
Dwain M Daniel
E Sherry
EE Gose
F Tilaro
G Ramos
G Ramos
HG Deen Jr.
M Shy
N Shealy
N Shealy
P Foye
T Gionis
WK Naguszewski
Publication venue: BioMed Central
Publication date: 01/05/2007
Field of study

Abstract Background Traction therapy has been utilized in the treatment of low back pain for decades. The most recent incarnation of traction therapy is non-surgical spinal decompression therapy which can cost over $100,000. This form of therapy has been heavily marketed to manual therapy professions and subsequently to the consumer. The purpose of this paper is to initiate a debate pertaining to the relationship between marketing claims and the scientific literature on non-surgical spinal decompression. Discussion Only one small randomized controlled trial and several lower level efficacy studies have been performed on spinal decompression therapy. In general the quality of these studies is questionable. Many of the studies were performed using the VAX-D® unit which places the patient in a prone position. Often companies utilize this research for their marketing although their units place the patient in the supine position. Summary Only limited evidence is available to warrant the routine use of non-surgical spinal decompression, particularly when many other well investigated, less expensive alternatives are available.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Pigeonring: A Principle for Faster Thresholded Similarity Search

Author: Altschul S. F.
Andoni A.
Apostol T.
Arasu A.
Broder A. Z.
Christiani T.
Ciaccia P.
Daepp U.
Gionis A.
Gravano L.
Hwang Y.
Jégou H.
Kim S.
Li C.
Lv Q.
Mann W.
Meek C.
Qin J.
Razborov A. A.
Samet H.
Savasere A.
Tabei Y.
Tao T.
Wang J.
Weiss Y.
Yi B.
Publication venue: 'VLDB Endowment'
Publication date: 01/09/2018
Field of study

Crossref

Edinburgh Research Explorer

Trioctahedral entities in palygorskite: Near-infrared evidence for sepiolite-palygorskite polysomatism

Author: Chryssikos Georgios D.
García Romero Emilia
Gionis Vassilis
Kacandes Georges H.
Stathopoulou Elizabeth T.
Sánchez del Río Manuel
Súarez Barrios Mercedes
Publication venue: 'Schweizerbart'
Publication date: 01/01/2011
Field of study

The mixed dioctahedral-trioctahedral character of Mg-rich palygorskite has been previously described by the formula yMg5 Si8 O20(OH)2(OH2)4(1–y)[xMg2Fe2(1–x)Mg2 Al2] Si8 O20(OH)2(OH2)4, where y is the trioctahedral fraction of this two-chain ribbon mineral with an experimentally determined upper limit of y 0.5 and x is the FeIII content in the M2 sites of the dioctahedral component. Ideal trioctahedral (y ¼ 1) palygorskite is elusive, although sepiolite Mg8Si12O30(OH)4(OH2)4 with a similar composition, three-chain ribbon structure and distinct XRD pattern is common. A set of 22 samples identified by XRD as palygorskite and with variable composition (0 , x , 0.7, 0 , y , 0.5) were studied to extrapolate the structure of an ideal trioctahedral (y ¼ 1) palygorskite and to compare this structure to sepiolite. Near-infrared spectroscopy was used to study the influence of octahedral composition on the structure of the TOT ribbons, H2O in the tunnels and surface silanols of palygorskite, as well as their response to loss of zeolitic H2O. All spectroscopic evidence suggests that palygorskite consists of discrete dioctahedral and trioctahedral entities. The dioctahedral entities have variable structure determined solely by x=FeIII/(Al+FeIII) and their content is proportional to (1–y). In contrast, the trioctahedral entities have fixed octahedral composition or ribbon structure and are spectroscopically identical to sepiolite. The value of d200 in palygorskite follows the regression d200 (A°)= 6.362 + 0.129 x(1–y) + 0.305y, R2 = 0.96, σ = 0.013A°. When extrapolated to y = 1,d200 is identical to sepiolite. Based on this analysis, we propose that palygorskite samples with non-zero trioctahedral character should be considered as members of a polysomatic series of sepiolite and (dioctahedral) palygorskite described by the new formula y'Mg8 Si12 O30(OH)4(OH2)4.(1–y')[x'Mg2Fe2(1–x')Mg2Al2]Si8O20(OH)2(OH2)4, with 0 < x'= x < 0.7 and 0 < y' = y/(2–y) < 0.33

Docta Complutense

Transforming traditional production system transactions to interoperable eBusiness-aware systems with the use of generic process models

Author: Aissi S
Androutsellis-Theotokis S
Aytulun SK
Bussler C
Calori R
Charalabidis Y
Costa P
Dimitrios Askounis
Fenareti Lampathaki
George Gionis
Gionis G
Gomes L
IEEE
Janner T
Jones P
Jones S
Nikolaidou M
Ou-Yang C
Roy B
Ruggaber R
Sotirios Koussouris
Weiss M
Yannis Charalabidis
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Hyponatremia in visceral leishmaniasis

Author: Akalin E
Alleyne GAO
Ansari NA
Augusto S. Neto
Badaró R
Bartter FC
Baylis PH
Beck LH
Berl T
Brandonisio O
Carvalho EM
Dluhy AS
Duarte MIS
Duarte MIS
Emir M. Lima Verde
Francisco A.A. Lima Verde
Francisco José V. Veronese
Frederico A. Lima Verde
Galdino Fuc
Gionis D
Halperin ML
Ishii T
Knight R
Liamis G
Lima Verde FA
Lima Verde FAA
Lima Verde FAA
Mastorakos G
Mc Fadzean AJS
Miller PL
Minodier P
Morissette MP
Muigai R
Mullins RE
Murakami T
Nolph KD
Pippard MJ
Poll T
Reiner NE
Reiner NE
Rizos E
Totan M
Zipser RD
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Privileged Information for Data Clustering

Author: A Blum
A Gionis
A Greven
A Likas
A P Dempster
A Strehl
A Topchy
A Y Ng
B Ribeiro
B Sch�lkopf
D Chou
D Pechyony
D Pechyony
E Fermi
F Cai
F Wilcoxon
G Forestier
J Han
J M Pe�a
J Shi
J Vesanto
Jan Feyereisl
K L Priddy
K Pearson
L Hubert
L Liang
N Cesa-Bianchi
O Chapelle
P S Bradley
S Ben-David
S Bickel
T Kohonen
U Luxburg
Uwe Aickelin
V De Sa
V N Vapnik
V Vapnik
V Vapnik
V Vapnik
W M Rand
W Shannon
W.-L Tai
Y Chen
Y Lecun
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Crossref

De-identifying a public use microdata file from the Canadian national discharge abstract database

Author: A Dale
A de Waal
A Gionis
A Hundepool
A Hundepool
A Machanavajjhala
A Machanavajjhala
A Meyerson
A Narayanan
Agency for Healthcare Research and Quality
B Hore
B Yolles
B-C Chen
BCM Fung
BCM Fung
BCM Fung
C Hogue
C Mackie
C Marsh
C Marsh
C Skinner
C Skinner
Canada Statistics
Canadian Institute for Health Information
Canadian Institute for Health Information
CE Shannon
CE Shannon
CK Liew
D Altman
D Defays
D Defays
D Hutchon
D Lafky
David Paton
DB Rubin
Department of Health and Human Services
Department of Health and Human Services
E Boyko
Federal Court (Canada)
Fida Dankar
G Aggarwal
G Duncan
G Loukides
G Sande
G Sullivan
G Sullivan
GD Smith
GR Heer
Gunes Koru
H Kargupta
J Castro
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Jimenez
J Schoenman
J Xu
JJ Kim
JP Gouweleeuw
K Abraham
K Benitez
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K LeFevre
Khaled El Emam
L Alexander
L Sweeney
L Sweeney
L Sweeney
L Sweeney
L Sweeney
L Willenborg
L Willenborg
LA Alexander
LH Cox
M Barbaro
M Templ
ME Nergiz
National Committee on Vital and Health Statistics
P Doyle
P Kooiman
P Nanopoulos
P Samarati
P Samarati
P Samarati
R Bayardo
R Gopal
RA Dandekar
RA Dandekar
RJ Bayardo
RJA Little
S Fienberg
S Hansell
S Ochoa
Statistics Canada
Statistics Canada
Statistics Canada
T de Waal
T Delamothe
T Hedrick
T Zeller Jr
V Ciriani
V Iyengar
V Torra
V Torra
V Torra
VS Iyengar
W Lowrance
W Winkler
WE Winkler
X Xiao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records. Methods Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy. Results Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression. Conclusions The strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Functional relationships in the nuclear and extended family: A 16-culture study

Crossref