54 research outputs found
Harvesting all matching information to a given query from a deep website
In this paper, the goal is harvesting all documents matching a given (entity) query from a deep web source. The objective is to retrieve all information about for instance "Denzel Washington", "Iran Nuclear Deal", or "FC Barcelona" from data hidden behind web forms. Policies of web search engines usually do not allow accessing all of the matching query search results for a given query. They limit the number of returned documents and the number of user requests. In this work, we propose a new approach which automatically collects information related to a given query from a search engine, given the search engine's limitations. The approach minimizes the number of queries that need to be sent by applying information from a large external corpus. The new approach outperforms existing approaches when tested on Google, measuring the total number of unique documents found per query
Perceived Islamic work ethics and organisational commitment among Muslim engineers in Perak Tengah and Manjung district
Many have argued that the productivity and quality of work of Muslim engineers are lower than their non-Muslim counterparts. Islamic Work Ethics is argued as the main barrier for higher productivity. The study aims to obtain the views of Muslim engineers in Perak Tengah and Manjung Districts whether Islamic Work Ethics (IWE) contributes to lower productivity and quality of work by Muslim professionals. The study distributed questionnaires to the 50 Muslim engineers. The preliminary findings show IWE enhances Muslim engineers’ commitment towards their organisations and also work productivity and quality. Thus, the findings rejected the claim that IWE is the barrier for productivity and work quality. Nevertheless, the study found that the “theomorphic potential” of most Muslim engineers in Perak Tengah and Manjung are not fully realized. Such weakness reduces the conscious to be more careful and thoughtful in producing quality work. The study suggests that Muslim engineers should enhance the cognitive (aql’), affective (nafs’) and normative (syariat) aspects of work with Qur’anic-based Islamic values as demonstrated by Prophet Muhammad P.B.U.H. Future studies should cross examine professionals from other sectors with larger sample size
Why we need an independent index of the Web
The path to greater diversity, as we have seen, cannot be achieved by merely
hoping for a new search engine nor will government support for a single
alternative achieve this goal. What is instead required is to create the
conditions that will make establishing such a search engine possible in the
first place. I describe how building and maintaining a proprietary index is the
greatest deterrent to such an undertaking. We must first overcome this
obstacle. Doing so will still not solve the problem of the lack of diversity in
the search engine marketplace. But it may establish the conditions necessary to
achieve that desired end
The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling
We study a basic problem of approximating the size of an unknown set in a
known universe . We consider two versions of the problem. In both versions
the algorithm can specify subsets . In the first version, which
we refer to as the group query or subset query version, the algorithm is told
whether is non-empty. In the second version, which we refer to as the
subset sampling version, if is non-empty, then the algorithm receives
a uniformly selected element from . We study the difference between
these two versions under different conditions on the subsets that the algorithm
may query/sample, and in both the case that the algorithm is adaptive and the
case where it is non-adaptive. In particular we focus on a natural family of
allowed subsets, which correspond to intervals, as well as variants of this
family
A parallel view for search engines
To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of n-tillions of queries every day. Despite the importance of large-scale search engines on the Web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from years ago. In most papers the index simply ”is”, without discussion of how it was created. But for a indexing scheme to be useful it must be possible for the index to be constructed in a reasonable amount of time, and so papers describing complex indexing methods should also describe and analyze a mechanism whereby the index can be built.
Scalability is of concern during index construction as well as during query processing. This paper describes the cooperative work between the Crawler, Indexer and the Searcher.VI Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI
A Comparison of Source Distribution and Result Overlap in Web Search Engines
When it comes to search engines, users generally prefer Google. Our study
aims to find the differences between the results found in Google compared to
other search engines. We compared the top 10 results from Google, Bing,
DuckDuckGo, and Metager, using 3,537 queries generated from Google Trends from
Germany and the US. Google displays more unique domains in the top results than
its competitors. Wikipedia and news websites are the most popular sources
overall. With some top sources dominating search results, the distribution of
domains is also consistent across all search engines. The overlap between
Google and Bing is always under 32%, while Metager has a higher overlap with
Bing than DuckDuckGo, going up to 78%. This study shows that the use of another
search engine, especially in addition to Google, provides a wider variety in
sources and might lead the user to find new perspectives.Comment: Submitted to the 85th Annual Meeting of the Association for
Information Science & Technology and will be published in the conference
proceeding
- …