Search CORE

112 research outputs found

Solving $k$ -means on High-dimensional Big Data

Author: AK Jain
H Steinhaus
J Stallmann
JL Bentley
K Jain
MR Ackermann
MW Mahoney
N Halko
P Drineas
PK Agarwal
T Kanungo
T Zhang
X Wu
Publication venue
Publication date: 01/01/2015
Field of study

In recent years, there have been major efforts to develop data stream algorithms that process inputs in one pass over the data with little memory requirement. For the

k

-means problem, this has led to the development of several

(1+\varepsilon)

-approximations (under the assumption that

k

is a constant), but also to the design of algorithms that are extremely fast in practice and compute solutions of high accuracy. However, when not only the length of the stream is high but also the dimensionality of the input points, then current methods reach their limits. We propose two algorithms, piecy and piecy-mr that are based on the recently developed data stream algorithm BICO that can process high dimensional data in one pass and output a solution of high quality. While piecy is suited for high dimensional data with a medium number of points, piecy-mr is meant for high dimensional data that comes in a very long stream. We provide an extensive experimental study to evaluate piecy and piecy-mr that shows the strength of the new algorithms.Comment: 23 pages, 9 figures, published at the 14th International Symposium on Experimental Algorithms - SEA 201

arXiv.org e-Print Archive

computer science publication server

Crossref

Kölner UniversitätsPublikationsServer

Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data

Author: A Likas
A Tellaeche
AK Jain
Colin Boyd
H Kargupta
JA Hartigan
K Liu
K Liu
K Liu
LF Cranor
M Mignotte
M Upmanyu
T Celik
T. Kanungo
Publication venue
Publication date: 28/06/2019
Field of study

The k-means clustering is one of the most popular clustering algorithms in data mining. Recently a lot of research has been concentrated on the algorithm when the dataset is divided into multiple parties or when the dataset is too large to be handled by the data owner. In the latter case, usually some servers are hired to perform the task of clustering. The dataset is divided by the data owner among the servers who together perform the k-means and return the cluster labels to the owner. The major challenge in this method is to prevent the servers from gaining substantial information about the actual data of the owner. Several algorithms have been designed in the past that provide cryptographic solutions to perform privacy preserving k-means. We provide a new method to perform k-means over a large set using multiple servers. Our technique avoids heavy cryptographic computations and instead we use a simple randomization technique to preserve the privacy of the data. The k-means computed has exactly the same efficiency and accuracy as the k-means computed over the original dataset without any randomization. We argue that our algorithm is secure against honest but curious and passive adversary.Comment: 19 pages, 4 tables. International Conference on Information Systems Security. Springer, Cham, 201

arXiv.org e-Print Archive

Crossref

Algorithms for Stable Matching and Clustering in a Grid

Author: AK Jain
C Hoffman
D Eppstein
D Gale
DE Knuth
EM Arkin
F Aurenhammer
F Dehne
F Ricca
H Fraysseix De
J Chun
KF Böhringer
M Chrobak
MH Overmars
MS Rahman
R Hartley
S Chandran
T Kanungo
TH Cormen
TM Chan
TP Fang
V Akman
Publication venue
Publication date: 01/01/2017
Field of study

We study a discrete version of a geometric stable marriage problem originally proposed in a continuous setting by Hoffman, Holroyd, and Peres, in which points in the plane are stably matched to cluster centers, as prioritized by their distances, so that each cluster center is apportioned a set of points of equal area. We show that, for a discretization of the problem to an

n\times n

grid of pixels with

k

centers, the problem can be solved in time

O(n^2 \log^5 n)

, and we experiment with two slower but more practical algorithms and a hybrid method that switches from one of these algorithms to the other to gain greater efficiency than either algorithm alone. We also show how to combine geometric stable matchings with a

k

-means clustering algorithm, so as to provide a geometric political-districting algorithm that views distance in economic terms, and we experiment with weighted versions of stable

k

-means in order to improve the connectivity of the resulting clusters.Comment: 23 pages, 12 figures. To appear (without the appendices) at the 18th International Workshop on Combinatorial Image Analysis, June 19-21, 2017, Plovdiv, Bulgari

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

Use of Oral Cholera Vaccines in an Outbreak in Vietnam: A Case Control Study

Author: A Naficy
A Safa
AK Siddique
AK Siddique
Anna Lena Lopez
BM Nguyen
D Sur
DA Sack
Dang Duc Anh
DD Trach
DT Vu
Edward Ryan
EE Richie
GB Nair
Hye Jung Kwon
JD Clemens
Jin Kyung Park
John D. Clemens
LA Kelly-Hope
LA Kelly-Hope
M Ansaruzzaman
Michael Favorov
Nguyen Tran Hien
P Calain
PG Smith
S Bhattacharya
S Kanungo
Shannon L. Grahek
Tran Nhu Duong
VD Thiem
Vu Dinh Thiem
W David
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Simple measures such as adequate sanitation and clean water stops the spread of cholera; however, in areas where these are not available, cholera spreads quickly and may lead to death in a few hours if treatment is not initiated immediately. The use of life-saving rehydration therapy is the mainstay in cholera control, however, the rapidity of the disease and the limited access to appropriate healthcare units in far-flung areas together result in an unacceptable number of deaths. The WHO has recommended the use of oral cholera vaccines as a preventive measure against cholera outbreaks since 2001, but this was recently updated so that vaccine use may also be considered once a cholera outbreak has begun. The findings from this study suggest that reactive use of killed oral cholera vaccines provides protection against the disease and may be a potential tool in times of outbreaks. Further studies must be conducted to confirm these findings

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Identifying Prototypical Components in Behaviour Using Clustering Algorithms

Author: A Ben-Hur
A Fod
A Galata
A Panuccio
AF Bobick
AK Jain
Bart Geurten
Björn Brembs
BL Fussnecker
BS Everitt
C Schilstra
C Schilstra
C Thurau
D Del Vecchio
DB West
E Levine
Elke Braun
G Milligan
GJ Stephens
H Niemann
J van Hateren
JC Fentress
JH Baek
JM Coggins
JP Lindemann
K Fukunaga
K Hoshi
K Osborne
M Bucan
M Halkidi
M Jambu
M Pomplun
M Suster
Martin Egelhaaf
MB Sokolowski
N Böddeker
P Martin
P Smyth
PN Lehner
R Kern
RO Duda
S Dudoit
S Saraswati
T Flash
T Kanungo
T Lange
T Schack
W Geng
W Härdle
Z Ghahramani
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Quantitative analysis of animal behaviour is a requirement to understand the task solving strategies of animals and the underlying control mechanisms. The identification of repeatedly occurring behavioural components is thereby a key element of a structured quantitative description. However, the complexity of most behaviours makes the identification of such behavioural components a challenging problem. We propose an automatic and objective approach for determining and evaluating prototypical behavioural components. Behavioural prototypes are identified using clustering algorithms and finally evaluated with respect to their ability to represent the whole behavioural data set. The prototypes allow for a meaningful segmentation of behavioural sequences. We applied our clustering approach to identify prototypical movements of the head of blowflies during cruising flight. The results confirm the previously established saccadic gaze strategy by the set of prototypes being divided into either predominantly translational or rotational movements, respectively. The prototypes reveal additional details about the saccadic and intersaccadic flight sections that could not be unravelled so far. Successful application of the proposed approach to behavioural data shows its ability to automatically identify prototypical behavioural components within a large and noisy database and to evaluate these with respect to their quality and stability. Hence, this approach might be applied to a broad range of behavioural and neural data obtained from different animals and in different contexts

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

Gray zones around diffuse large B cell lymphoma. Conclusions based on the workshop of the XIV meeting of the European Association for Hematopathology and the Society of Hematopathology in Bordeaux, France

Author: A Carbone
A Dodero
A Dogan
A Kanungo
A Rosenwald
A Traverse-Glehen
A Wellmann
AA Liebow
AK Lichtenstein
Antoine de Mascarel
B Petitjean
C Copie-Bergman
C Copie-Bergman
C Jenkins
CC Chang
CC Childs
D Burkitt
D Fong
Daphne de Jong
DW Sevilla
E Haralambieva
E Haralambieva
E Leucci
ED Hsi
EG Boerma
Eric D. Hsi
ES Jaffe
ES Jaffe
ES Jaffe
F Ricci
German Ott
GM Mead
H Mossafa
H Saffer
HH Wong
I Anagnostopoulos
I Magrath
I Salaverria
J Teruya-Feldstein
JA McBride
JF Garcia
JP Higgins
K Ohshima
KA Blum
KF Macsween
KJ Savage
KJ Savage
KL Chuah
L Boudova
L Leval de
LA Otteman
Leticia Quintanilla-Martinez
LR Kimm
LV Abruzzo
M Calaminici
M Ehinger
M Hummel
M Paulli
M Rodriguez-Justo
M Yotsumoto
Marie Parrens
MB Lustberg
MH Kramer
MM Vrsalovic
MS Lim
N Asano
N Macpherson
NA Johnson
P Chandra
P Lin
P Moller
P Moller
PD Thornton
Philip Kluin
PM Kluin
R Lai
R Wasielewski von
S Ascani
S Bea
S Franke
S Gouill Le
S Joos
S Joos
S Montes-Moreno
S Nakamura
S Park
S Poppema
S Poppema
S Prakash
S Tanaka
SA Diehl
SA Pileri
SA Pileri
SE Gibson
SH Nam-Cha
SH Swerdlow
SJ Rodig
SS Dave
Stefano Pileri
T Oyama
T Perrone
T Rudiger
TS Barry
V Seitz
W Klapper
Y Shimoyama
Y Zhou
Yaso Natkunam
Z Fan
Z Mao
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

The term “gray-zone” lymphoma has been used to denote a group of lymphomas with overlapping histological, biological, and clinical features between various types of lymphomas. It has been used in the context of Hodgkin lymphomas (HL) and non-Hodgkin lymphomas (NHL), including classical HL (CHL), and primary mediastinal large B cell lymphoma, cases with overlapping features between nodular lymphocyte predominant Hodgkin lymphoma and T-cell/histiocyte-rich large B cell lymphoma, CHL, and Epstein–Barr-virus-positive lymphoproliferative disorders, and peripheral T cell lymphomas simulating CHL. A second group of gray-zone lymphomas includes B cell NHL with intermediate features between diffuse large B cell lymphoma and classical Burkitt lymphoma. In order to review controversial issues in gray-zone lymphomas, a joint Workshop of the European Association for Hematopathology and the Society for Hematopathology was held in Bordeaux, France, in September 2008. The panel members reviewed and discussed 145 submitted cases and reached consensus diagnoses. This Workshop summary is focused on the most controversial aspects of gray-zone lymphomas and describes the panel’s proposals regarding diagnostic criteria, terminology, and new prognostic and diagnostic parameters

Crossref

Springer - Publisher Connector

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

In quest of a systematic framework for unifying and defining nanoscience

Author: A Bielinska
A Desai
A Myc
AJ Levine
AK Patri
AL Martin
AL Martin
AM Naylor
AP Alivisatos
AW Jensen
B Huang
B Huang
B Klajnert
B Pullman
B-K Cho
BL Frankamp
BL Frankamp
C Burda
C Cannizzo
C Chen
C Chidley
C Kruger
C Loo
C Rao
C Zhang
C-H Chen
C-H Su
CA Mirkin
CJ Kiely
CJ Loweth
CM Sayes
CM Sayes
CNR Rao
CR DeMattei
CS Braun
CS Love
CT Yavuz
D Nykpanchuk
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DA Tomalia
DF Perepichka
DJ Gentleman
DJ Wales
DM Vriezema
Donald A. Tomalia
DR Swanson
E Heilbronner
E Shevchenko
E Shevchenko
E Shevchenko
E Strable
EC Wiener
FX Redl
G Maltzahn von
G Ramakrishna
G Schmid
G Schmid
G Schmid
G Schmid
G Schmid
G Springholz
GA DeVries
GA DeVries
GA Ozin
H Gu
H Kobayashi
H Kobayashi
H Weller
I Tabakovic
IL Medintz
J Aldana
J Jockusch
J Li
J Wang
J-C Charlier
J-L Bridot
JC Carrero-Sanchez
JC Roberts
JD Reuter
JJ Landers
JL Jackson
JM Haremza
JMJ Fréchet
JMJ Fréchet
JP Wilcoxon
JP Wilcoxon
JP Zimmer
JV Sander
K Joo
K Kostarelos
K Zhang
K-W Kwon
KJ Klabunde
KL Hultman
KL Kelly
KL Wooley
KR Gopidas
KR Gopidas
KR Gopidas
L Balogh
L Balogh
L Balogh
L Crespo
L Lacerda
LL Miller
LR Hirsch
M Brack
M El-Sayed
M Feng
M Kanungo
M Zhao
MA Kostiainen
MC Hersam
MC Roco
MF Ottaviani
MF Ottaviani
MJ Damha
MJ Damha
MJ Damha
MJ Hostetler
ML Mansfield
ML Patil
MR Knecht
MR McDevitt
MT Islam
MT Islam
N Duxin
N Lewinski
N Nishiyama
NC Seeman
NC Seeman
NJ Turro
NN Mamedova
O Mongin
OM Wilson
P Singh
P Singh
PD Cozzoli
PJ Thomas
PR Dvornic
PW Anderson
R Andres
R Deschenaux
R Koole
R Krupke
R Xie
RC Haddon
RC Mucic
RG Ellis-Behnke
RG Ellis-Behnke
S Banerjee
S Banerjee
S Hecht
S Langereis
S Mann
S Ravindran
S Srivastava
S Uppuluri
S Uppuluri
S Uppuluri
S Zhang
SC Zimmerman
SD Hudson
SS Zumdahl
SY Park
T Castro
T Douglas
T Teranishi
T Ueno
TA Betley
TK Jain
TP Thomas
TW Nilsen
U Boas
V Juttukonda
V Percec
V Percec
V Percec
V Percec
V Percec
V Petkov
VJ Catalano
W Cai
W Guo
W Jiang
WM Deen
WW Yu
X Gao
X Qu
X Shi
X Shi
X Wang
X Xu
Y Chen
Y Choi
Y Choi
Y Rio
YA Wang
YW Cao
Publication venue: Springer Netherlands
Publication date: 01/08/2009
Field of study

This article proposes a systematic framework for unifying and defining nanoscience based on historic first principles and step logic that led to a “central paradigm” (i.e., unifying framework) for traditional elemental/small-molecule chemistry. As such, a Nanomaterials classification roadmap is proposed, which divides all nanomatter into Category I: discrete, well-defined and Category II: statistical, undefined nanoparticles. We consider only Category I, well-defined nanoparticles which are >90% monodisperse as a function of Critical Nanoscale Design Parameters (CNDPs) defined according to: (a) size, (b) shape, (c) surface chemistry, (d) flexibility, and (e) elemental composition. Classified as either hard (H) (i.e., inorganic-based) or soft (S) (i.e., organic-based) categories, these nanoparticles were found to manifest pervasive atom mimicry features that included: (1) a dominance of zero-dimensional (0D) core–shell nanoarchitectures, (2) the ability to self-assemble or chemically bond as discrete, quantized nanounits, and (3) exhibited well-defined nanoscale valencies and stoichiometries reminiscent of atom-based elements. These discrete nanoparticle categories are referred to as hard or soft particle nanoelements. Many examples describing chemical bonding/assembly of these nanoelements have been reported in the literature. We refer to these hard:hard (H-n:H-n), soft:soft (S-n:S-n), or hard:soft (H-n:S-n) nanoelement combinations as nanocompounds. Due to their quantized features, many nanoelement and nanocompound categories are reported to exhibit well-defined nanoperiodic property patterns. These periodic property patterns are dependent on their quantized nanofeatures (CNDPs) and dramatically influence intrinsic physicochemical properties (i.e., melting points, reactivity/self-assembly, sterics, and nanoencapsulation), as well as important functional/performance properties (i.e., magnetic, photonic, electronic, and toxicologic properties). We propose this perspective as a modest first step toward more clearly defining synthetic nanochemistry as well as providing a systematic framework for unifying nanoscience. With further progress, one should anticipate the evolution of future nanoperiodic table(s) suitable for predicting important risk/benefit boundaries in the field of nanoscience

Crossref

Springer - Publisher Connector

PubMed Central

Probabilistic landslide hazard assessment using homogeneous susceptible units (HSU) along a national highway corridor in the northern Himalayas, India

Author: A Carrara
A Carrara
A Carrara
A Pasuto
AK Naithani
AK Saha
Alfred Stein
BD Malamud
C Baeza
CB Corner
CJ Westen van
CJ Westen Van
CJ Westen van
CJF Chung
CJF Chung
CP Stark
D Chakraborty
DE Alexander
DP Kanungo
F Guzzetti
F Guzzetti
F Karsli
F Nadim
FC Dai
G Devoli
GC Ohlmacher
GM Espindola
I Das
ID Moore
Iswar Das
JA Coe
JL Zezere
K Vinod Kumar
L Ayalew
L Zhang
M Baatz
M Galli
MC Larsen
N Kerle
NC Agarwal
Norman Kerle
P Aleotti
R Anbalagan
R Soeters
RH Guthrie
S Lee
TR Martha
TR Martha
V. K. Dadhwal
VM Choubey
Y Hong
Z Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Typhoid Fever and Its Association with Environmental Factors in the Dhaka Metropolitan Area of Bangladesh: A Spatial and Time-Series Approach

Author: A Hörman
A Karkey
A Kothari
A Naheed
AK Mallik
AKMM Rahman
AM Dewan
AM Stolwijk
AM Vollaard
Ashraf M. Dewan
B Brumback
B Ivanoff
BJ Stoll
C Wanke
CC King
D Clayton
D Sur
Emmanuel T. Ongee
FC Curriero
FY Lin
GM Jacquez
GM Jacquez
H Kondo
HH Tran
JA Crump
JA Whitaker
JC Cho
JG Morris Jr
JH Mermin
Joseph M. Vinetz
JP Velema
K Nagashetty
K-L Thong
KMA Sohel
L Anselin
L Anselin
L Loth
LA Kelly-Hope
LX Wang
M Ali
M Ali
M Ali
M Bhan
M Emch
M Emch
M Hashizume
M Rezaeian
Masahiro Hashizume
MH Gasem
N Chaikaew
PK Ram
PK Sharma
R Reyburn
RE Black
RL Ochiai
Robert Corner
RS Kovats
S Durrleman
S Kanungo
SE Hinman
SK Karn
SK Saha
SM Ahmed
SP Luby
SP Luby
SP Luby
T Butler
TC Matisziw
U Haque
W Gesler
WA Brook
Z Aktar
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Typhoid fever is a major cause of death worldwide with a major part of the disease burden in developing regions such as the Indian sub-continent. Bangladesh is part of this highly endemic region, yet little is known about the spatial and temporal distribution of the disease at a regional scale. This research used a Geographic Information System to explore, spatially and temporally, the prevalence of typhoid in Dhaka Metropolitan Area (DMA) of Bangladesh over the period 2005-9. This paper provides the first study of the spatio-temporal epidemiology of typhoid for this region. The aims of the study were: (i) to analyse the epidemiology of cases from 2005 to 2009; (ii) to identify spatial patterns of infection based on two spatial hypotheses; and (iii) to determine the hydro-climatological factors associated with typhoid prevalence. Case occurrences data were collected from 11 major hospitals in DMA, geocoded to census tract level, and used in a spatio-temporal analysis with a range of demographic, environmental and meteorological variables. Analyses revealed distinct seasonality as well as age and gender differences, with males and very young children being disproportionately infected. The male-female ratio of typhoid cases was found to be 1.36, and the median age of the cases was 14 years. Typhoid incidence was higher in male population than female (χ2 = 5.88, p0.05). A statistically significant inverse association was found between typhoid incidence and distance to major waterbodies. Spatial pattern analysis showed that there was a significant clustering of typhoid distribution in the study area. Moran\u27s I was highest (0.879; p<0.01) in 2008 and lowest (0.075; p<0.05) in 2009. Incidence rates were found to form three large, multi-centred, spatial clusters with no significant difference between urban and rural rates. Temporally, typhoid incidence was seen to increase with temperature, rainfall and river level at time lags ranging from three to five weeks. For example, for a 0.1 metre rise in river levels, the number of typhoid cases increased by 4.6% (95% CI: 2.4-2.8) above the threshold of 4.0 metres (95% CI: 2.4-4.3). On the other hand, with a 1°C rise in temperature, the number of typhoid cases could increase by 14.2% (95% CI: 4.4-25.0)

Public Library of Science (PLOS)

Crossref

Nagasaki University's Academic Output SITE: NAOSITE

Directory of Open Access Journals

PubMed Central

Institutional Repositories DataBase (IRDB)

Nagasaki university's Academic Output SITE

espace@Curtin

FigShare