70 research outputs found

    Crossing the academic ocean? Judit Bar-Ilan's oeuvre on search engines studies

    Full text link
    [EN] The main objective of this work is to analyse the contributions of Judit Bar-Ilan to the search engines studies. To do this, two complementary approaches have been carried out. First, a systematic literature review of 47 publications authored and co-authored by Judit and devoted to this topic. Second, an interdisciplinarity analysis based on the cited references (publications cited by Judit) and citing documents (publications that cite Judit's work) through Scopus. The systematic literature review unravels an immense amount of search engines studied (43) and indicators measured (especially technical precision, overlap and fluctuation over time). In addition to this, an evolution over the years is detected from descriptive statistical studies towards empirical user studies, with a mixture of quantitative and qualitative methods. Otherwise, the interdisciplinary analysis evidences that a significant portion of Judit's oeuvre was intellectually founded on the computer sciences, achieving a significant, but not exclusively, impact on library and information sciences.Orduña-Malea, E. (2020). Crossing the academic ocean? Judit Bar-Ilan's oeuvre on search engines studies. Scientometrics. 123(3):1317-1340. https://doi.org/10.1007/s11192-020-03450-4S131713401233Bar-Ilan, J. (1998a). On the overlap, the precision and estimated recall of search engines. A case study of the query “Erdos”. Scientometrics,42(2), 207–228. https://doi.org/10.1007/bf02458356.Bar-Ilan, J. (1998b). The mathematician, Paul Erdos (1913–1996) in the eyes of the Internet. Scientometrics,43(2), 257–267. https://doi.org/10.1007/bf02458410.Bar-Ilan, J. (2000). The web as an information source on informetrics? A content analysis. Journal of the American Society for Information Science and Technology,51(5), 432–443. https://doi.org/10.1002/(sici)1097-4571(2000)51:5%3C432:aid-asi4%3E3.0.co;2-7.Bar-Ilan, J. (2001). Data collection methods on the web for informetric purposes: A review and analysis. Scientometrics,50(1), 7–32.Bar-Ilan, J. (2002). Methods for measuring search engine performance over time. Journal of the American Society for Information Science and Technology,53(4), 308–319. https://doi.org/10.1002/asi.10047.Bar-Ilan, J. (2003). Search engine results over time: A case study on search engine stability. Cybermetrics,2/3, 1–16.Bar-Ilan, J. (2005a). Expectations versus reality—Search engine features needed for Web research at mid 2005. Cybermetrics,9, 1–26.Bar-Ilan, J. (2005b). Expectations versus reality—Web search engines at the beginning of 2005. In Proceedings of ISSI 2005: 10th international conference of the international society for scientometrics and informetrics (Vol. 1, pp. 87–96).Bar-Ilan, J. (2010). The WIF of Peter Ingwersen’s website. In B. Larsen, J. W. Schneider, & F. Åström (Eds.), The Janus Faced Scholar a Festschrift in honour of Peter Ingwersen (pp. 119–121). Det Informationsvidenskabelige Akademi. Retrieved 15 January 15, 2020, from https://vbn.aau.dk/ws/portalfiles/portal/90357690/JanusFacedScholer_Festschrift_PeterIngwersen_2010.pdf#page=122.Bar-Ilan, J. (2018). Eugene Garfield on the web in 2001. Scientometrics,114(2), 389–399. https://doi.org/10.1007/s11192-017-2590-9.Bar-Ilan, J., Mat-Hassan, M., & Levene, M. (2006). Methods for comparing rankings of search engine results. Computer Networks,50(10), 1448–1463. https://doi.org/10.1016/j.comnet.2005.10.020.Thelwall, M. (2017). Judit Bar-Ilan: Information scientist, computer scientist, scientometrician. Scientometrics,113(3), 1235–1244. https://doi.org/10.1007/s11192-017-2551-3

    Online social networks: Measurement, analysis, and applications to distributed information systems

    Get PDF
    Recently, online social networking sites have exploded in popularity. Numerous sites are dedicated to finding and maintaining contacts and to locating and sharing different types of content. Online social networks represent a new kind of information network that differs significantly from existing networks like the Web. For example, in the Web, hyperlinks between content form a graph that is used to organize, navigate, and rank information. The properties of the Web graph have been studied extensively, and have lead to useful algorithms such as PageRank. In contrast, few links exist between content in online social networks and instead, the links exist between content and users, and between users themselves. However, little is known in the research community about the properties of online social network graphs at scale, the factors that shape their structure, or the ways they can be leveraged in information systems. In this thesis, we use novel measurement techniques to study online social networks at scale, and use the resulting insights to design innovative new information systems. First, we examine the structure and growth patterns of online social networks, focusing on how users are connecting to one another. We conduct the first large-scale measurement study of multiple online social networks at scale, capturing information about over 50 million users and 400 million links. Our analysis identifies a common structure across multiple networks, characterizes the underlying processes that are shaping the network structure, and exposes the rich community structure. Second, we leverage our understanding of the properties of online social networks to design new information systems. Specifically, we build two distinct applications that leverage different properties of online social networks. We present and evaluate Ostra, a novel system for preventing unwanted communication that leverages the difficulty in establishing and maintaining relationships in social networks. We also present, deploy, and evaluate PeerSpective, a system for enhancing Web search using the natural community, structure in social networks. Each of these systems has been evaluated on data from real online social networks or in a deployment with real users

    Information extraction with network centralities : finding rumor sources, measuring influence, and learning community structure

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 193-197).Network centrality is a function that takes a network graph as input and assigns a score to each node. In this thesis, we investigate the potential of network centralities for addressing inference questions arising in the context of large-scale networked data. These questions are particularly challenging because they require algorithms which are extremely fast and simple so as to be scalable, while at the same time they must perform well. It is this tension between scalability and performance that this thesis aims to resolve by using appropriate network centralities. Specifically, we solve three important network inference problems using network centrality: finding rumor sources, measuring influence, and learning community structure. We develop a new network centrality called rumor centrality to find rumor sources in networks. We give a linear time algorithm for calculating rumor centrality, demonstrating its practicality for large networks. Rumor centrality is proven to be an exact maximum likelihood rumor source estimator for random regular graphs (under an appropriate probabilistic rumor spreading model). For a wide class of networks and rumor spreading models, we prove that it is an accurate estimator. To establish the universality of rumor centrality as a source estimator, we utilize techniques from the classical theory of generalized Polya's urns and branching processes. Next we use rumor centrality to measure influence in Twitter. We develop an influence score based on rumor centrality which can be calculated in linear time. To justify the use of rumor centrality as the influence score, we use it to develop a new network growth model called topological network growth. We find that this model accurately reproduces two important features observed empirically in Twitter retweet networks: a power-law degree distribution and a superstar node with very high degree. Using these results, we argue that rumor centrality is correctly quantifying the influence of users on Twitter. These scores form the basis of a dynamic influence tracking engine called Trumor which allows one to measure the influence of users in Twitter or more generally in any networked data. Finally we investigate learning the community structure of a network. Using arguments based on social interactions, we determine that the network centrality known as degree centrality can be used to detect communities. We use this to develop the leader-follower algorithm (LFA) which can learn the overlapping community structure in networks. The LFA runtime is linear in the network size. It is also non-parametric, in the sense that it can learn both the number and size of communities naturally from the network structure without requiring any input parameters. We prove that it is very robust and learns accurate community structure for a broad class of networks. We find that the LFA does a better job of learning community structure on real social and biological networks than more common algorithms such as spectral clustering.by Tauhid R. Zaman.Ph.D

    Recommender systems in industrial contexts

    Full text link
    This thesis consists of four parts: - An analysis of the core functions and the prerequisites for recommender systems in an industrial context: we identify four core functions for recommendation systems: Help do Decide, Help to Compare, Help to Explore, Help to Discover. The implementation of these functions has implications for the choices at the heart of algorithmic recommender systems. - A state of the art, which deals with the main techniques used in automated recommendation system: the two most commonly used algorithmic methods, the K-Nearest-Neighbor methods (KNN) and the fast factorization methods are detailed. The state of the art presents also purely content-based methods, hybridization techniques, and the classical performance metrics used to evaluate the recommender systems. This state of the art then gives an overview of several systems, both from academia and industry (Amazon, Google ...). - An analysis of the performances and implications of a recommendation system developed during this thesis: this system, Reperio, is a hybrid recommender engine using KNN methods. We study the performance of the KNN methods, including the impact of similarity functions used. Then we study the performance of the KNN method in critical uses cases in cold start situation. - A methodology for analyzing the performance of recommender systems in industrial context: this methodology assesses the added value of algorithmic strategies and recommendation systems according to its core functions.Comment: version 3.30, May 201

    Data fusion by using machine learning and computational intelligence techniques for medical image analysis and classification

    Get PDF
    Data fusion is the process of integrating information from multiple sources to produce specific, comprehensive, unified data about an entity. Data fusion is categorized as low level, feature level and decision level. This research is focused on both investigating and developing feature- and decision-level data fusion for automated image analysis and classification. The common procedure for solving these problems can be described as: 1) process image for region of interest\u27 detection, 2) extract features from the region of interest and 3) create learning model based on the feature data. Image processing techniques were performed using edge detection, a histogram threshold and a color drop algorithm to determine the region of interest. The extracted features were low-level features, including textual, color and symmetrical features. For image analysis and classification, feature- and decision-level data fusion techniques are investigated for model learning using and integrating computational intelligence and machine learning techniques. These techniques include artificial neural networks, evolutionary algorithms, particle swarm optimization, decision tree, clustering algorithms, fuzzy logic inference, and voting algorithms. This work presents both the investigation and development of data fusion techniques for the application areas of dermoscopy skin lesion discrimination, content-based image retrieval, and graphic image type classification --Abstract, page v

    Understanding Google: Search Engines and the Changing Nature of Access, Thought and Knowledge within a Global Context

    Get PDF
    This thesis explores the impact of search engines within contemporary digital culture and, in particular, focuses on the social, cultural, and philosophical influence of Google. Search engines are deeply enmeshed with other recent developments in digital culture; therefore, in addressing their impact these intersections must be recognised, while highlighting the technological and social specificity of search engines. Also important is acknowledging the way that certain institutions, in particular Google, have shaped the web and wider culture around a particular set of economic incentives that have far-reaching consequences for contemporary digital culture. This thesis argues that to understand search engines requires a recognition of its contemporary context, while also acknowledging that Google’s quest to “organize the world's information and make it universally accessible and useful” is part of a much older and broader discourse. Balancing these two viewpoints is important; Google is shaping public discourse on a global scale with unprecedentedly extensive consequences. However, many of the issues addressed by this thesis would remain centrally important even if Google declared bankruptcy or if search engines were abandoned for a different technology. Search engines are a specific technological response to a particular cultural environment; however, their social function and technical operation are embedded within a historical relationship to enquiry and inscription that stretches back to antiquity. This thesis addresses the following broad research questions, while at each stage specifically addressing the role and influence of search engines: how do individuals interrogate and navigate the world around them? How do technologies and social institutions facilitate how we think and remember? How culturally situated is knowledge; are there epistemological truths that transcend social environments? How does technological expansion fit within wider questions of globalisation? How do technological discourses shape the global flows of information and capital? These five questions map directly onto the five chapters of this thesis. Much of the existing study of search engines has been focused on small-scale evaluation, which either addresses Google’s day-by-day algorithmic changes or poses relatively isolated disciplinary questions. Therefore, not only is the number of academics, technicians, and journalists attending to search engines relatively small, given the centrality of search engines to digital culture, but much of the knowledge that is produced becomes outdated with algorithmic changes or the shifting strategies of companies. This thesis ties these focused concerns to wider issues, with a view to encourage and facilitate further enquiry.This thesis explores the impact of Google’s search engine within contemporary digital culture. Search engines have been studied in various disciplines, for example information retrieval, computer science, law, and new media, yet much of this work remains fixed within disciplinary boundaries. The approach of this thesis is to draw on work from a number of areas in order to link a technical understanding of how search engines function with a wider cultural and philosophical context. In particular, this thesis draws on critical theory in order to attend to the convergence of language, programming, and culture on a global scale. The chapter outline is as follows. Chapter one compares search engine queries to traditional questions. The chapter draws from information retrieval research to provide a technical framework that is brought into contact with philosophy and critical theory, including Plato and Hans-Georg Gadamer. Chapter two investigates search engines as memory aids, deploying a history of memory and exploring practices within oral cultures and mnemonic techniques such as the Ars Memoria. This places search engines within a longer historical context, while drawing on contemporary insights from the philosophy and science of cognition. Chapter three addresses Google’s Autocomplete functionality and chapter four explores the contextual nature of results in order to highlight how different characteristics of users are used to personalise access to the web. These chapters address Google’s role within a global context and the implications for identity and community online. Finally, chapter five explores how Google’s method of generating revenue, through advertising, has a social impact on the web as a whole, particularly when considered through the lens of contemporary Post-Fordist accounts of capitalism. Throughout, this thesis develops a framework for attending to algorithmic cultures and outlines the specific influence that Google has had on the web and continues to have at a global scale.Arts and Humanities Research Counci

    Structure-oriented prediction in complex networks

    Get PDF
    Complex systems are extremely hard to predict due to its highly nonlinear interactions and rich emergent properties. Thanks to the rapid development of network science, our understanding of the structure of real complex systems and the dynamics on them has been remarkably deepened, which meanwhile largely stimulates the growth of effective prediction approaches on these systems. In this article, we aim to review different network-related prediction problems, summarize and classify relevant prediction methods, analyze their advantages and disadvantages, and point out the forefront as well as critical challenges of the field

    A comparison of near-infrared and visible imaging for surveillance applications

    Get PDF
    A computer vision approach is investigated which has low computational complexity and which compares near-infrared and visible image systems. The target application is a surveillance system for pedestrian and vehicular traffic. Near-infrared light has potential benefits including non-visible illumination requirements. Image-processing and intelligent classification algorithms for monitoring pedestrians are implemented in outdoor and indoor environments with frequent traffic. The image set collected consists of persons walking in the presence of foreground as well as background objects at different times during the day. Image sets with nonperson objects, e.g. bicycles and vehicles, are also considered. The complex, cluttered environments are highly variable, e.g. shadows and moving foliage. The system performance for near-infrared images is compared to that of traditional visible images. The approach consists of thresholding an image and creating a silhouette of new objects in the scene. Filtering is used to eliminate noise. Twenty-four features are calculated by MATLABâ™­ code for each identified object. These features are analyzed for usefulness in object discrimination. Minimal combinations of features are proposed and explored for effective automated discrimination. Features were used to train and test a variety of classification architectures. The results show that the algorithm can effectively manipulate near-infrared images and that effective object classification is possible even in the presence of system noise and environmental clutter. The potential for automated surveillance based on near-infrared imaging and automated feature processing are discussed --Abstract, page iii

    Mining and modeling graphs using patterns and priors

    No full text
    • …
    corecore