54 research outputs found

    Harvesting all matching information to a given query from a deep website

    Get PDF
    In this paper, the goal is harvesting all documents matching a given (entity) query from a deep web source. The objective is to retrieve all information about for instance "Denzel Washington", "Iran Nuclear Deal", or "FC Barcelona" from data hidden behind web forms. Policies of web search engines usually do not allow accessing all of the matching query search results for a given query. They limit the number of returned documents and the number of user requests. In this work, we propose a new approach which automatically collects information related to a given query from a search engine, given the search engine's limitations. The approach minimizes the number of queries that need to be sent by applying information from a large external corpus. The new approach outperforms existing approaches when tested on Google, measuring the total number of unique documents found per query

    Perceived Islamic work ethics and organisational commitment among Muslim engineers in Perak Tengah and Manjung district

    Get PDF
    Many have argued that the productivity and quality of work of Muslim engineers are lower than their non-Muslim counterparts. Islamic Work Ethics is argued as the main barrier for higher productivity. The study aims to obtain the views of Muslim engineers in Perak Tengah and Manjung Districts whether Islamic Work Ethics (IWE) contributes to lower productivity and quality of work by Muslim professionals. The study distributed questionnaires to the 50 Muslim engineers. The preliminary findings show IWE enhances Muslim engineers’ commitment towards their organisations and also work productivity and quality. Thus, the findings rejected the claim that IWE is the barrier for productivity and work quality. Nevertheless, the study found that the “theomorphic potential” of most Muslim engineers in Perak Tengah and Manjung are not fully realized. Such weakness reduces the conscious to be more careful and thoughtful in producing quality work. The study suggests that Muslim engineers should enhance the cognitive (aql’), affective (nafs’) and normative (syariat) aspects of work with Qur’anic-based Islamic values as demonstrated by Prophet Muhammad P.B.U.H. Future studies should cross examine professionals from other sectors with larger sample size

    Why we need an independent index of the Web

    Full text link
    The path to greater diversity, as we have seen, cannot be achieved by merely hoping for a new search engine nor will government support for a single alternative achieve this goal. What is instead required is to create the conditions that will make establishing such a search engine possible in the first place. I describe how building and maintaining a proprietary index is the greatest deterrent to such an undertaking. We must first overcome this obstacle. Doing so will still not solve the problem of the lack of diversity in the search engine marketplace. But it may establish the conditions necessary to achieve that desired end

    The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling

    Full text link
    We study a basic problem of approximating the size of an unknown set SS in a known universe UU. We consider two versions of the problem. In both versions the algorithm can specify subsets TUT\subseteq U. In the first version, which we refer to as the group query or subset query version, the algorithm is told whether TST\cap S is non-empty. In the second version, which we refer to as the subset sampling version, if TST\cap S is non-empty, then the algorithm receives a uniformly selected element from TST\cap S. We study the difference between these two versions under different conditions on the subsets that the algorithm may query/sample, and in both the case that the algorithm is adaptive and the case where it is non-adaptive. In particular we focus on a natural family of allowed subsets, which correspond to intervals, as well as variants of this family

    A parallel view for search engines

    Get PDF
    To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of n-tillions of queries every day. Despite the importance of large-scale search engines on the Web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from years ago. In most papers the index simply ”is”, without discussion of how it was created. But for a indexing scheme to be useful it must be possible for the index to be constructed in a reasonable amount of time, and so papers describing complex indexing methods should also describe and analyze a mechanism whereby the index can be built. Scalability is of concern during index construction as well as during query processing. This paper describes the cooperative work between the Crawler, Indexer and the Searcher.VI Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI

    A Comparison of Source Distribution and Result Overlap in Web Search Engines

    Full text link
    When it comes to search engines, users generally prefer Google. Our study aims to find the differences between the results found in Google compared to other search engines. We compared the top 10 results from Google, Bing, DuckDuckGo, and Metager, using 3,537 queries generated from Google Trends from Germany and the US. Google displays more unique domains in the top results than its competitors. Wikipedia and news websites are the most popular sources overall. With some top sources dominating search results, the distribution of domains is also consistent across all search engines. The overlap between Google and Bing is always under 32%, while Metager has a higher overlap with Bing than DuckDuckGo, going up to 78%. This study shows that the use of another search engine, especially in addition to Google, provides a wider variety in sources and might lead the user to find new perspectives.Comment: Submitted to the 85th Annual Meeting of the Association for Information Science & Technology and will be published in the conference proceeding
    corecore