141,647 research outputs found
Crossing the academic ocean? Judit Bar-Ilan's oeuvre on search engines studies
[EN] The main objective of this work is to analyse the contributions of Judit Bar-Ilan to the search engines studies. To do this, two complementary approaches have been carried out. First, a systematic literature review of 47 publications authored and co-authored by Judit and devoted to this topic. Second, an interdisciplinarity analysis based on the cited references (publications cited by Judit) and citing documents (publications that cite Judit's work) through Scopus. The systematic literature review unravels an immense amount of search engines studied (43) and indicators measured (especially technical precision, overlap and fluctuation over time). In addition to this, an evolution over the years is detected from descriptive statistical studies towards empirical user studies, with a mixture of quantitative and qualitative methods. Otherwise, the interdisciplinary analysis evidences that a significant portion of Judit's oeuvre was intellectually founded on the computer sciences, achieving a significant, but not exclusively, impact on library and information sciences.Orduña-Malea, E. (2020). Crossing the academic ocean? Judit Bar-Ilan's oeuvre on search engines studies. Scientometrics. 123(3):1317-1340. https://doi.org/10.1007/s11192-020-03450-4S131713401233Bar-Ilan, J. (1998a). On the overlap, the precision and estimated recall of search engines. A case study of the query “Erdos”. Scientometrics,42(2), 207–228. https://doi.org/10.1007/bf02458356.Bar-Ilan, J. (1998b). The mathematician, Paul Erdos (1913–1996) in the eyes of the Internet. Scientometrics,43(2), 257–267. https://doi.org/10.1007/bf02458410.Bar-Ilan, J. (2000). The web as an information source on informetrics? A content analysis. Journal of the American Society for Information Science and Technology,51(5), 432–443. https://doi.org/10.1002/(sici)1097-4571(2000)51:5%3C432:aid-asi4%3E3.0.co;2-7.Bar-Ilan, J. (2001). Data collection methods on the web for informetric purposes: A review and analysis. Scientometrics,50(1), 7–32.Bar-Ilan, J. (2002). Methods for measuring search engine performance over time. Journal of the American Society for Information Science and Technology,53(4), 308–319. https://doi.org/10.1002/asi.10047.Bar-Ilan, J. (2003). Search engine results over time: A case study on search engine stability. Cybermetrics,2/3, 1–16.Bar-Ilan, J. (2005a). Expectations versus reality—Search engine features needed for Web research at mid 2005. Cybermetrics,9, 1–26.Bar-Ilan, J. (2005b). Expectations versus reality—Web search engines at the beginning of 2005. In Proceedings of ISSI 2005: 10th international conference of the international society for scientometrics and informetrics (Vol. 1, pp. 87–96).Bar-Ilan, J. (2010). The WIF of Peter Ingwersen’s website. In B. Larsen, J. W. Schneider, & F. Åström (Eds.), The Janus Faced Scholar a Festschrift in honour of Peter Ingwersen (pp. 119–121). Det Informationsvidenskabelige Akademi. Retrieved 15 January 15, 2020, from https://vbn.aau.dk/ws/portalfiles/portal/90357690/JanusFacedScholer_Festschrift_PeterIngwersen_2010.pdf#page=122.Bar-Ilan, J. (2018). Eugene Garfield on the web in 2001. Scientometrics,114(2), 389–399. https://doi.org/10.1007/s11192-017-2590-9.Bar-Ilan, J., Mat-Hassan, M., & Levene, M. (2006). Methods for comparing rankings of search engine results. Computer Networks,50(10), 1448–1463. https://doi.org/10.1016/j.comnet.2005.10.020.Thelwall, M. (2017). Judit Bar-Ilan: Information scientist, computer scientist, scientometrician. Scientometrics,113(3), 1235–1244. https://doi.org/10.1007/s11192-017-2551-3
Evaluating the retrieval effectiveness of Web search engines using a representative query sample
Search engine retrieval effectiveness studies are usually small-scale, using
only limited query samples. Furthermore, queries are selected by the
researchers. We address these issues by taking a random representative sample
of 1,000 informational and 1,000 navigational queries from a major German
search engine and comparing Google's and Bing's results based on this sample.
Jurors were found through crowdsourcing, data was collected using specialised
software, the Relevance Assessment Tool (RAT). We found that while Google
outperforms Bing in both query types, the difference in the performance for
informational queries was rather low. However, for navigational queries, Google
found the correct answer in 95.3 per cent of cases whereas Bing only found the
correct answer 76.6 per cent of the time. We conclude that search engine
performance on navigational queries is of great importance, as users in this
case can clearly identify queries that have returned correct results. So,
performance on this query type may contribute to explaining user satisfaction
with search engines
Human-Level Performance on Word Analogy Questions by Latent Relational Analysis
This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus
Constructing experimental indicators for Open Access documents
The ongoing paradigm change in the scholarly publication system ('science is
turning to e-science') makes it necessary to construct alternative evaluation
criteria/metrics which appropriately take into account the unique
characteristics of electronic publications and other research output in digital
formats. Today, major parts of scholarly Open Access (OA) publications and the
self-archiving area are not well covered in the traditional citation and
indexing databases. The growing share and importance of freely accessible
research output demands new approaches/metrics for measuring and for evaluating
of these new types of scientific publications. In this paper we propose a
simple quantitative method which establishes indicators by measuring the
access/download pattern of OA documents and other web entities of a single web
server. The experimental indicators (search engine, backlink and direct access
indicator) are constructed based on standard local web usage data. This new
type of web-based indicator is developed to model the specific demand for
better study/evaluation of the accessibility, visibility and interlinking of
open accessible documents. We conclude that e-science will need new stable
e-indicators.Comment: 9 pages, 3 figure
Efficient Diversification of Web Search Results
In this paper we analyze the efficiency of various search results
diversification methods. While efficacy of diversification approaches has been
deeply investigated in the past, response time and scalability issues have been
rarely addressed. A unified framework for studying performance and feasibility
of result diversification solutions is thus proposed. First we define a new
methodology for detecting when, and how, query results need to be diversified.
To this purpose, we rely on the concept of "query refinement" to estimate the
probability of a query to be ambiguous. Then, relying on this novel ambiguity
detection method, we deploy and compare on a standard test set, three different
diversification methods: IASelect, xQuAD, and OptSelect. While the first two
are recent state-of-the-art proposals, the latter is an original algorithm
introduced in this paper. We evaluate both the efficiency and the effectiveness
of our approach against its competitors by using the standard TREC Web
diversification track testbed. Results shown that OptSelect is able to run two
orders of magnitude faster than the two other state-of-the-art approaches and
to obtain comparable figures in diversification effectiveness.Comment: VLDB201
Distributional semantics beyond words: Supervised learning of analogy and paraphrase
There have been several efforts to extend distributional semantics beyond
individual words, to measure the similarity of word pairs, phrases, and
sentences (briefly, tuples; ordered sets of words, contiguous or
noncontiguous). One way to extend beyond words is to compare two tuples using a
function that combines pairwise similarities between the component words in the
tuples. A strength of this approach is that it works with both relational
similarity (analogy) and compositional similarity (paraphrase). However, past
work required hand-coding the combination function for different tasks. The
main contribution of this paper is that combination functions are generated by
supervised learning. We achieve state-of-the-art results in measuring
relational similarity between word pairs (SAT analogies and SemEval~2012 Task
2) and measuring compositional similarity between noun-modifier phrases and
unigrams (multiple-choice paraphrase questions)
- …