Search CORE

198 research outputs found

Query recommendation in the information domain of children

Author: Bilal
Gao
Haveliwala
Publication venue: 'Wiley'
Publication date
Field of study

BagMinHash - Minwise Hashing Algorithm for Weighted Sets

Author: Alonso O.
Broder A. Z.
Chum O.
Dahlgaard S.
Haveliwala T.
Li P.
Luo C.
Shrivastava A.
Wu W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/07/2018
Field of study

Minwise hashing has become a standard tool to calculate signatures which allow direct estimation of Jaccard similarities. While very efficient algorithms already exist for the unweighted case, the calculation of signatures for weighted sets is still a time consuming task. BagMinHash is a new algorithm that can be orders of magnitude faster than current state of the art without any particular restrictions or assumptions on weights or data dimensionality. Applied to the special case of unweighted sets, it represents the first efficient algorithm producing independent signature components. A series of tests finally verifies the new algorithm and also reveals limitations of other approaches published in the recent past.Comment: 10 pages, KDD 201

arXiv.org e-Print Archive

Crossref

GEMRec: A graph-based emotion-aware music recommendation approach

Author: B Han
B Yuan
H Wu
K Wang
M Kaminskas
M Kaminskas
M Shan
S Deng
T Pettijohn II
TH Haveliwala
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

© Springer International Publishing AG 2016. Music recommendation has gained substantial attention in recent times. As one of the most important context features,user emotion has great potential to improve recommendations,but this has not yet been sufficiently explored due to the difficulty of emotion acquisition and incorporation. This paper proposes a graph-based emotion-aware music recommendation approach (GEMRec) by simultaneously taking a user’s music listening history and emotion into consideration. The proposed approach models the relations between user,music,and emotion as a three-element tuple (user,music,emotion),upon which an Emotion Aware Graph (EAG) is built,and then a relevance propagation algorithm based on random walk is devised to rank the relevance of music items for recommendation. Evaluation experiments are conducted based on a real dataset collected from a Chinese microblog service in comparison to baselines. The results show that the emotional context from a user’s microblogs contributes to improving the performance of music recommendation in terms of hitrate,precision,recall,and F1 score

Crossref

OPUS - University of Technology Sydney

Asymptotic analysis for personalized Web search

Author: Andersen
Barabási
Bingham
Boldi
Fortunato
Haveliwala
Jeh
Kamvar
Kraaij
Langville
Nelly Litvak
Page
Ponte
Resnick
Richardson
Volkovich
Yana Volkovich
Publication venue: 'Applied Probability Trust'
Publication date
Field of study

Crossref

HelpfulMed: Intelligent searching for medical information over the internet

Author: Bates
Brin
Chen
Chen
Chen
Chen
Cho
Cimino
Crouch
Deerwester
Eysenbach
Fallis
Furnas
Guntzer
Haveliwala
Hearst
Hopfield
Houston
Janes
Kohonen
Lyman
Mechkour
Roussinov
Salton
Salton
Salton
Srinivasan
Tolle
van Rijsbergen
Vélez
Woolf
Wu
Zamir
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Crossref

A single source k-shortest paths algorithm to infer regulatory pathways in a gene network

Author: Ashburner
Bader
Bebek
Beyer
Chan
Doyle
Froehlich
Gao
Hahn
Han
Haveliwala
Hershberger
Hughes
Jeong
Jin
Malviya
Mering
Missiuro
Paccanaro
Riedel
Scott
Srinivasan Parthasarathy
Stark
Stojmirovié
Stojmirovié
Stojmirovié
Suthram
Tu
Vaske
Voevodski
Wei
Yen
Yu-Keng Shih
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Inferring the underlying regulatory pathways within a gene interaction network is a fundamental problem in Systems Biology to help understand the complex interactions and the regulation and flow of information within a system-of-interest. Given a weighted gene network and a gene in this network, the goal of an inference algorithm is to identify the potential regulatory pathways passing through this gene

Crossref

PubMed Central

Information Discovery on Electronic Health Records Using Authority Flow Techniques

Author: A Balmin
A Singhal
A Singhal
AK Sehgal
CJ McDonald
DL Shepelyansky
F Farfán
H Hwang
J Savoy
JF Fontaine
L Guo
M Brinkmeier
MG Weiner
MI Lieberman
Michael Weiner
Paul Biondich
R Moskovitch
R Motwani
R Varadarajan
Ramakrishna R Varadarajan
RM Podowski
S Agrawal
S Brin
SE Robertson
SE Robertson
T Haveliwala
T Matsunaga
V Hristidis
V Hristidis
Vagelis Hristidis
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs. Methods We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease. Results Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians. Conclusions Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.</p

Crossref

IUPUIScholarWorks

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DigitalCommons@Florida International University

Mining gene functional networks to improve mass-spectrometry-based protein identification

Author: Berriz
Bowers
Brunner
Chi
Choi
Christine Vogel
Craig
Daniel P. Miranker
de Godoy
Deng
Dennis
Edward M. Marcotte
Fawcett
Futcher
Ghaemmaghami
Giaever
Graumann
Guan
Haveliwala
Kall
Kall
Keller
Kim
Langville Meyer
Lee
Lee
Lee
Li
Lu
Luiz O. Penalva
Marcotte
Nash
Nesvizhskii
Newman
Ogata
Page
Paley
Park
Pena-Castillo
Peng
Planta
Prince
Ramakrishnan
Robinson
Shannon
Smriti R. Ramakrishnan
Storey
Tabb
Tabb
Taejoon Kwon
von Mering
Washburn
Wei pan
Zybailov
Publication venue: Oxford University Press
Publication date: 04/11/2015
Field of study

Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other evidence to suggest that a protein is present and confidence in individual protein identification can be updated accordingly

Crossref

PubMed Central

ScholarWorks@UNIST

Streaming histogram sketching for rapid microbiome analytics

Author: A Sczyrba
AG Shaw
AL Greninger
AP Carrieri
B Grüning
BD Ondov
C Alcon-Giner
C Kakkanatt
D Yang
DB Rusch
F Pedregosa
G Benoit
G Cormode
H Mulcahy-O’Grady
Human Microbiome Project Consortium
I Koychev
JD Forbes
K Sim
LP Coelho
LR Thompson
M Bawa
MW Libbrecht
Q Zhang
R Bovee
S Ioffe
S Seth
SY Anvar
T Brown
T Haveliwala
VB Dubinkina
W Wu
XC Morgan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2019
Field of study

Background: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. Results: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a ‘real life’ example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. Conclusions: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space

University of Liverpool Repository

Crossref

University of Birmingham Research Portal

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

University of East Anglia digital repository

Markov Chain Ontology Analysis (MCOA)

Author: A Alexa
A Newton
A Subramanian
Alexa T McCray
C Pesquita
D Sean
DW Huang
E Almaas
E Glaab
F Baader
G Alterovitz
GA Pavlopoulos
GK Smyth
H Robert Frost
J Wang
J Wang
JG Kemeny
JH Moore
K Bade
LB Moran
LMA Oliveira
M Ashburner
M Gnädinger
M Gupta
M Invernizzi
M Kanehisa
M Vidal
MA Sartor
NF Noy
NH Shah
O Bodenreider
P Borghammer
P Cimiano
P Resnik
R Tirrell
RZN Vêncio
RZN Vêncio
S Bauer
S Bauer
S Bauer
S Brin
S Falcon
S Grossmann
S Pappatà
T Barrett
TH Haveliwala
V Brochard
Y Lu
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data. Results In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods. Conclusion A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central