Search CORE

620 research outputs found

Matrix constructions of divisible designs.

Author: Arasu K.T.
Haemers W.H.
Jungnickel D.
Pott A.
Publication venue
Publication date
Field of study

Scalability and Total Recall with Fast CoveringLSH

Author: Arasu A.
Gionis A.
Minsky M.
Norouzi M.
Shrivastava A.
Weber R.
Yu F. X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Locality-sensitive hashing (LSH) has emerged as the dominant algorithmic technique for similarity search with strong performance guarantees in high-dimensional spaces. A drawback of traditional LSH schemes is that they may have \emph{false negatives}, i.e., the recall is less than 100\%. This limits the applicability of LSH in settings requiring precise performance guarantees. Building on the recent theoretical "CoveringLSH" construction that eliminates false negatives, we propose a fast and practical covering LSH scheme for Hamming space called \emph{Fast CoveringLSH (fcLSH)}. Inheriting the design benefits of CoveringLSH our method avoids false negatives and always reports all near neighbors. Compared to CoveringLSH we achieve an asymptotic improvement to the hash function computation time from

\mathcal{O}(dL)

\mathcal{O}(d + L\log{L})

, where

d

is the dimensionality of data and

L

is the number of hash tables. Our experiments on synthetic and real-world data sets demonstrate that \emph{fcLSH} is comparable (and often superior) to traditional hashing-based approaches for search radius up to 20 in high-dimensional Hamming space.Comment: Short version appears in Proceedings of CIKM 201

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

LINVIEW: Incremental View Maintenance for Complex Analytical Queries

Author: Abadi D.
Arasu A.
Deng L.
Grama A.
Kamvar S.
Kraska T.
McSherry F.
Motwani R.
Press W.
Seeger M.
Stonebraker M.
Stonebraker M.
Venkataraman S.
Whaley C.
Zaharia M.
Zhang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/05/2014
Field of study

Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.Comment: 14 pages, SIGMO

arXiv.org e-Print Archive

Crossref

Applying semantic web technologies to knowledge sharing in aerospace engineering

Author: A. Arasu
A. Chakravarthy
A.-S. Dadzie
A.H.F. Laender
B. Rosenfeld
C. Manning
C. Preisach
D. Petrelli
F. Ciravegna
J. Broekstra
J. Hendler
J. Iria
J. Magalhães
J. Magalhães
J. Xu
M.R. Naphade
R. Bhagdev
S. Chapman
S. Gupta
V. Lanfranchi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale

CiteSeerX

Crossref

White Rose Research Online

Recommended from our members

Exploration of PET and MRI radiomic features for decoding breast cancer phenotypes and prognosis.

Author: Arasu Vignesh A
Behr Spencer C
Copeland Timothy P
Esserman Laura
Franc Benjamin L
Harnish Roy J
Huang Shih-Ying
Hylton Nola M
Jones Ella F
Kornak John
Liu Gengbo
Mitra Debasis
Price Elissa R
Seo Youngho
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Radiomics is an emerging technology for imaging biomarker discovery and disease-specific personalized treatment management. This paper aims to determine the benefit of using multi-modality radiomics data from PET and MR images in the characterization breast cancer phenotype and prognosis. Eighty-four features were extracted from PET and MR images of 113 breast cancer patients. Unsupervised clustering based on PET and MRI radiomic features created three subgroups. These derived subgroups were statistically significantly associated with tumor grade (p = 2.0 × 10-6), tumor overall stage (p = 0.037), breast cancer subtypes (p = 0.0085), and disease recurrence status (p = 0.0053). The PET-derived first-order statistics and gray level co-occurrence matrix (GLCM) textural features were discriminative of breast cancer tumor grade, which was confirmed by the results of L2-regularization logistic regression (with repeated nested cross-validation) with an estimated area under the receiver operating characteristic curve (AUC) of 0.76 (95% confidence interval (CI) = [0.62, 0.83]). The results of ElasticNet logistic regression indicated that PET and MR radiomics distinguished recurrence-free survival, with a mean AUC of 0.75 (95% CI = [0.62, 0.88]) and 0.68 (95% CI = [0.58, 0.81]) for 1 and 2 years, respectively. The MRI-derived GLCM inverse difference moment normalized (IDMN) and the PET-derived GLCM cluster prominence were among the key features in the predictive models for recurrence-free survival. In conclusion, radiomic features from PET and MR images could be helpful in deciphering breast cancer phenotypes and may have potential as imaging biomarkers for prediction of breast cancer recurrence-free survival

eScholarship - University of California

Elucidating the role of Staphylococcus epidermidis serine-aspartate repeat protein G in platelet activation.

Author: Arasu S
Brennan Marian P
Chubb Anthony J
Cox Dermot
Devocelle Marc
Foster T J
Loughman A
Publication venue: e-publications@RCSI
Publication date: 01/08/2009
Field of study

BACKGROUND: Staphylococcus epidermidis is a commensal of the human skin that has been implicated in infective endocarditis and infections involving implanted medical devices. S. epidermidis induces platelet aggregation by an unknown mechanism. The fibrinogen-binding protein serine-aspartate repeat protein G (SdrG) is present in 67-91% of clinical strains. OBJECTIVES: To determine whether SdrG plays a role in platelet activation, and if so to investigate the role of fibrinogen in this mechanism. METHODS: SdrG was expressed in a surrogate host, Lactococcus lactis, in order to investigate its role in the absence of other staphylococcal components. Platelet adhesion and platelet aggregation assays were employed. RESULTS: L. lactis expressing SdrG stimulated platelet aggregation (lag time: 2.9 +/- 0.5 min), whereas the L. lactis control did not. L. lactis SdrG-induced aggregation was inhibited by alpha(IIb)beta3 antagonists and aspirin. Aggregation was dependent on both fibrinogen and IgG, and the platelet IgG receptor FcgammaRIIa. Preincubation of the bacteria with Bbeta-chain fibrinopeptide inhibited aggregation (delaying the lag time six-fold), suggesting that fibrinogen acts as a bridging molecule. Platelets adhered to L. lactis SdrG in the absence of fibrinogen. Adhesion was inhibited by alpha(IIb)beta3 antagonists, suggesting that this direct interaction involves alpha(IIb)beta3. Investigation using purified fragments of SdrG revealed a direct interaction with the B-domains. Adhesion to the A-domain involved both a fibrinogen and an IgG bridge. CONCLUSION: SdrG alone is sufficient to support platelet adhesion and aggregation through both direct and indirect mechanisms

RCSI Repository

On correctness in RDF stream processor benchmarking

Author: A. Arasu
C. Bizer
C. Gutierrez
D. Le-Phuoc
D. Le-Phuoc
D.F. Barbieri
I. Botan
J.-P. Calbimonte
J.P. Calbimonte
Y. Guo
Y. Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Two complementary benchmarks have been proposed so far for the evaluation and continuous improvement of RDF stream processors: SRBench and LSBench. They put a special focus on different features of the evaluated systems, including coverage of the streaming extensions of SPARQL supported by each processor, query processing throughput, and an early analysis of query evaluation correctness, based on comparing the results obtained by different processors for a set of queries. However, none of them has analysed the operational semantics of these processors in order to assess the correctness of query evaluation results. In this paper, we propose a characterization of the operational semantics of RDF stream processors, adapting well-known models used in the stream processing engine community: CQL and SECRET. Through this formalization, we address correctness in RDF stream processor benchmarks, allowing to determine the multiple answers that systems should provide. Finally, we present CSRBench, an extension of SRBench to address query result correctness verification using an automatic method

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Archivo Digital UPM (Univ. Politécnica de Madrid)

SRBench: A streaming RDF/SPARQL benchmark

Author: A. Arasu
A. Bolles
A.P. Sheth
C. Bizer
D. Le-Phuoc
D.F. Barbieri
E. Bouillet
E. Valle Della
J. Pérez
J.-P. Calbimonte
K. Whitehouse
M. Balazinska
O. Corcho
Y. Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet omprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art

Crossref

Archivo Digital UPM (Univ. Politécnica de Madrid)

Space-optimal Heavy Hitters with Strong Error Bounds

Author: Arasu A.
Berinde R.
Berinde R.
Bonnet P.
Bose P.
Breslau L.
Chakrabarti A.
Charikar M.
Cormode G.
Cormode G.
Demaine E.
Fang M.
Graham Cormode
Lucchese C.
Manku G.
Martin J. Strauss
Piotr Indyk
Radu Berinde
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

The problem of finding heavy hitters and approximating the frequencies of items is at the heart of many problems in data stream analysis. It has been observed that several proposed solutions to this problem can outperform their worst-case guarantees on real data. This leads to the question of whether some stronger bounds can be guaranteed. We answer this in the positive by showing that a class of "counter-based algorithms" (including the popular and very space-efficient FREQUENT and SPACESAVING algorithms) provide much stronger approximation guarantees than previously known. Specifically, we show that errors in the approximation of individual elements do not depend on the frequencies of the most frequent elements, but only on the frequency of the remaining "tail." This shows that counter-based methods are the most space-efficient (in fact, space-optimal) algorithms having this strong error bound. This tail guarantee allows these algorithms to solve the "sparse recovery" problem. Here, the goal is to recover a faithful representation of the vector of frequencies, f. We prove that using space O(k), the algorithms construct an approximation f* to the frequency vector f so that the L1 error ||f -- f*||[subscript 1] is close to the best possible error min[subscript f2] ||f2 -- f||[subscript 1], where f2 ranges over all vectors with at most k non-zero entries. This improves the previously best known space bound of about O(k log n) for streams without element deletions (where n is the size of the domain from which stream elements are drawn). Other consequences of the tail guarantees are results for skewed (Zipfian) data, and guarantees for accuracy of merging multiple summarized streams.David & Lucile Packard Foundation (Fellowship)Center for Massive Data Algorithmics (MADALGO)National Science Foundation (U.S.). (Grant number CCF-0728645

CiteSeerX

DSpace@MIT

Crossref

Warwick Research Archives Portal Repository

A Nonexistence Result for Abelian Menon Difference Sets Using Perfect Binary Arrays

Author: Arasu K. T.
Davis James A
Jedwab Jonathan
Publication venue: UR Scholarship Repository
Publication date: 01/09/1995
Field of study

A Menon difference set has the parameters (4N2, 2N2-N, N2-N). In the abelian case it is equivalent to a perfect binary array, which is a multi-dimensional matrix with elements ±1 such that all out-of-phase periodic autocorrelation coefficients are zero. Suppose that the abelian group H×K×Zpα contains a Menon difference set, where p is an odd prime, |K|=pα, and pj≡−1 (mod exp (H)) for some j. Using the viewpoint of perfect binary arrays we prove that K must be cyclic. A corollary is that there exists a Menon difference set in the abelian group H×K×Z3α, where exp (H)=2 or 4 and |K|=3α, if and only if K is cyclic

University of Richmond