Search CORE

878 research outputs found

Ranking Bias in Deep Web Size Estimation Using Capture Recapture Method

Author: Lu Jianguo
Publication venue: Scholarship at UWindsor
Publication date: 01/01/2010
Field of study

Many deep web data sources are ranked data sources, i.e., they rank the matched documents and return at most the top k number of results even though there are more than k documents matching the query. While estimating the size of such ranked deep web data source, it is well known that there is a ranking bias—the traditional methods tend to underestimate the size when queries overflow (match more documents than the return limit). Numerous estimation methods have been proposed to overcome the ranking bias, such as by avoiding overflowing queries during the sampling process, or by adjusting the initial estimation using a fixed function. We observe that the overflow rate has a direct impact on the accuracy of the estimation. Under certain conditions, the actual size is close to the estimation obtained by unranked model multiplied by the overflow rate. Based on this result, this paper proposes a method that allows overflowing queries in the sampling process

Scholarship at UWindsor

Recommended from our members

Understanding the Evolution of Landscape Planning Strategy in China: From Fragmented Urban Green Space System to Regional Greenway Network across Cities

Author: Li Zhiming
Lu Di
Lu Jianguo
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2013
Field of study

In China, urban green space system (UGSS) is defined as a network of all sorts of green spaces in city built-up area which supports ecological and recreational functions (Wang, 2009). The implementation of UGSS indicates several common problems, such as overemphasizing green spaces in the built area of city, losing stability and rationality in spatial patterns, and mismatching the progress of ecological restoration cycles (Liu & Wen, 2007; Wang, 2009). Greenways represent a distinctly strategic approach to landscape planning through combinations of spatially and functionally compatible land uses within a network (Ahern, 1995). Specially, four principal strategies (Protective, Defensive, Offensive, and Opportunistic) are recognized as an overall planning strategy for greenway (Ahern, 1995). Inspired by the greenway concept, China has constructed 2,372 kilometers of greenway network at Pearl River Delta (PRD), in order to maintain regional ecological safety, to improve regional livability, to stimulate economic growth, and to protect cultural and historic resources (He et al, 2010). Meanwhile, various cities in China have initiated their own greenway network planning for implementation. This indicates a potential greenway movement during the next few years in this country, following the global interest in greenways as a sustainable landscape planning strategy. Through historical review of urban green space system in China and a case study of PRD greenway network, this research attempts to answer the following questions: (1) how contemporary greenway network is planned and implemented in China? (2) How Ahern\u27s four principal strategies (protective, defensive, offensive and opportunistic) have been applied within PRD regional greenway network as landscape planning strategy? The purpose of this research is to provide a holistic perspective on greenway planning and development in China. Specially, this paper will (1) present evolution of UGSS planning and recent greenway development in China; (2) discuss the practice of implementing greenway network as landscape planning strategy; and (3) discuss the future greenway development in China

ScholarWorks@UMass Amherst

Determination of Damage Yield Loss Relationships and Economic Injury Levels of \u3ci\u3eOedaleus asiaticus\u3c/i\u3e (Orthoptera: Acrididae) in Steppe of China

Author: Han Jianguo
Lu Hui
Publication venue: UKnowledge
Publication date: 14/06/2021
Field of study

University of Kentucky

Web service search: who, when, what, and how

Author: Lu Jianguo
Yu Yijun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Web service search is an important problem in service oriented architecture that has attracted widespread attention from academia as well as industry. Web service searching can be performed by various stakeholders, in different situations, using different forms of queries. All those combinations result in radically different ways of implementation. Using a real world web service composition example, this paper describes when, what, and how to search web services from service assemblers’ point of view, where the semantics of web services are not explicitly described. This example outlines the approach to implement a web service broker that can recommend useful services to service assemblers

CiteSeerX

Open Research Online (The Open University)

Higher order generalization and its application in program verification

Author: Hagiya Masami
Harao Masateru
Lu Jianguo
Mylopoulos John
Publication venue: Scholarship at UWindsor
Publication date: 01/01/2000
Field of study

Generalization is a fundamental operation of inductive inference. While first order syntactic generalization (anti–unification) is well understood, its various extensions are often needed in applications. This paper discusses syntactic higher order generalization in a higher order language λ2 [1]. Based on the application ordering, we prove that least general generalization exists for any two terms and is unique up to renaming. An algorithm to compute the least general generalization is also presented. To illustrate its usefulness, we propose a program verification system based on higher order generalization that can reuse the proofs of similar programs

Scholarship at UWindsor

Beyond network centrality: Individual-level behavioral traits for predicting information superspreaders in social media

Author: Liu Jianguo
Lu Linyuan
Mariani Manuel Sebastian
Zhou Fang
Publication venue: Oxford University Press
Publication date: 13/06/2024
Field of study

Understanding the heterogeneous role of individuals in large-scale information spreading is essential to manage online behavior as well as its potential offline consequences. To this end, most existing studies from diverse research domains focus on the disproportionate role played by highly-connected “hub” individuals. However, we demonstrate here that information superspreaders in online social media are best understood and predicted by simultaneously considering two individual-level behavioral traits: influence and susceptibility. Specifically, we derive a nonlinear network-based algorithm to quantify individuals’ influence and susceptibility from multiple spreading event data. By applying the algorithm to large-scale data from Twitter and Weibo, we demonstrate that individuals’ estimated influence and susceptibility scores enable predictions of future superspreaders above and beyond network centrality, and reveal new insights on the network position of the superspreaders

ZORA

Optimal algorithms for selecting top-k combinations of attributes : theory and applications

Author: Lin Chunbin
Lu Jiaheng
Wang Jianguo
Wei Zhewei
Xiao Xiaokui
Publication venue
Publication date: 01/01/2017
Field of study

Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

DR-NTU (Digital Repository of NTU)