Search CORE

16 research outputs found

Top-K Aggregation Queries over Large Networks

Author: Bin He
Bin He
Feida Zhu
Feida Zhu
Jiawei Han
Jiawei Han
Xifeng Yan
Xifeng Yan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

An Incremental Threshold Method for Continuous Text Search Queries

Author: MOURATIDIS Kyriakos
PANG Hwee Hwa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Abstract—A text filtering system monitors a stream of incoming documents, to identify those that match the interest profiles of its users. The user interests are registered at a server as continuous text search queries. The server constantly maintains for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring. In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. Using a stream of real documents, we experimentally verify the efficiency of our approach, which is at least an order of magnitude faster than a competitor constructed from existing techniques. I

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

Efficient Evaluation of Continuous Text Seach Queries

Author: MOURATIDIS Kyriakos
PANG Hwee Hwa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Consider a text filtering server that monitors a stream of incoming documents for a set of users, who register their interests in the form of continuous text search queries. The task of the server is to constantly maintain for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring. In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents in main memory with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. We distinguish between two versions of the monitoring algorithm, an eager and a lazy one, which differ in how aggressively they manage the thresholds on the inverted index. Using benchmark queries over a stream of real documents, we experimentally verify the efficiency of our methodology; both its versions are at least an order of magnitude faster than a competitor constructed from existing techniques, with lazy being the best approach overall

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

Building and Maintaining Halls of Fame over a Database

Author: Alvanaki Foteini
Michel Sebastian
Stupar Aleksandar
Publication venue
Publication date: 01/01/2012
Field of study

Halls of Fame are fascinating constructs. They represent the elite of an often very large amount of entities---persons, companies, products, countries etc. Beyond their practical use as static rankings, changes to them are particularly interesting---for decision making processes, as input to common media or novel narrative science applications, or simply consumed by users. In this work, we aim at detecting events that can be characterized by changes to a Hall of Fame ranking in an automated way. We describe how the schema and data of a database can be used to generate Halls of Fame. In this database scenario, by Hall of Fame we refer to distinguished tuples; entities, whose characteristics set them apart from the majority. We define every Hall of Fame as one specific instance of an SQL query, such that a change in its result is considered a noteworthy event. Identified changes (i.e., events) are ranked using lexicographic tradeoffs over event and query properties and presented to users or fed in higher-level applications. We have implemented a full-fledged prototype system that uses either database triggers or a Java based middleware for event identification. We report on an experimental evaluation using a real-world dataset of basketball statistics

arXiv.org e-Print Archive

MPG.PuRe

Computing Immutable Regions for Subspace Top-k Queries

Author: Baeza-Yates R.
Börzsönyi S.
Chang K. C.-C.
Chang Y.-C.
Hua M.
Li J.
Mouratidis K.
Nutanong S.
Pei J.
Persin M.
Soliman M. A.
Song Z.
Tsaparas P.
Vlachou A.
Yi K.
Zhang J.
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2012
Field of study

National Research Foundation (NRF) Singapore under International Research Centre @ Singapore Funding Initiativ

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

Region clustering based evaluation of multiple top-N selection queries

Author: Andrade
Bruno
Chakrabarti
Chaudhuri
Chunnian Liu
Hristidis
Ilyas
Liang Zhu
Marian
Meng
Motro
O’Neil
Sellis
Sellis
Silberschatz
Stoica
Weiyi Meng
Wenzhu Yang
Zhu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Processing top-N relational queries by learning

Author: A. Marian
A. Motro
A. Silberschatz
B. L. Bowerman
Chunnian Liu
Dazhong Liu
I. Ilyas
K. Zhao
L. Zhu
Liang Zhu
M. Zhu
N. Bruno
S. Chaudhuri
S.-W. Hwang
S.-W. Hwang
V. Hristidis
W. Fleming
Weiyi Meng
Wenzhu Yang
Y. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Design and analysis of algorithms for similarity search based on intrinsic dimension

Author: Ma Xiguo
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2015
Field of study

One of the most fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, is that of similarity search. It has been used in numerous fields of application such as multimedia, information retrieval, recommender systems and pattern recognition. Specifically, a similarity query aims to retrieve from the database the most similar objects to a query object, where the underlying similarity measure is usually expressed as a distance function. The cost of processing similarity queries has been typically assessed in terms of the representational dimension of the data involved, that is, the number of features used to represent individual data objects. It is generally the case that high representational dimension would result in a significant increase in the processing cost of similarity queries. This relation is often attributed to an amalgamation of phenomena, collectively referred to as the curse of dimensionality. However, the observed effects of dimensionality in practice may not be as severe as expected. This has led to the development of models quantifying the complexity of data in terms of some measure of the intrinsic dimensionality. The generalized expansion dimension (GED) is one of such models, which estimates the intrinsic dimension in the vicinity of a query point q through the observation of the ranks and distances of pairs of neighbors with respect to q. This dissertation is mainly concerned with the design and analysis of search algorithms, based on the GED model. In particular, three variants of similarity search problem are considered, including adaptive similarity search, flexible aggregate similarity search, and subspace similarity search. The good practical performance of the proposed algorithms demonstrates the effectiveness of dimensionality-driven design of search algorithms

Digital Commons @ New Jersey Institute of Technology (NJIT)