350,544 research outputs found

    Hybrid focused crawling on the Surface and the Dark Web

    Get PDF
    Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly navigate through the Surface Web and several darknets present in the Dark Web (i.e., Tor, I2P, and Freenet) during a single crawl by automatically adapting its crawling behavior and its classifier-guided hyperlink selection strategy based on the destination network type and the strength of the local evidence present in the vicinity of a hyperlink. It investigates 11 hyperlink selection methods, among which a novel strategy proposed based on the dynamic linear combination of a link-based and a parent Web page classifier. This hybrid focused crawler is demonstrated for the discovery of Web resources containing recipes for producing homemade explosives. The evaluation experiments indicate the effectiveness of the proposed focused crawler both for the Surface and the Dark Web

    The use of web analytics on an academic library website

    Get PDF
    The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Title from PDF of title page (University of Missouri--Columbia, viewed on January 27, 2011).Thesis advisor: Dr. Sanda Erdelez.Vita.Ph. D. University of Missouri--Columbia 2009.Academic libraries need to evaluate their electronic services in order to ensure their users' satisfaction. Libraries have been using evidence based evaluation traditionally. These practices have their imitations as they are time consuming and rely on outdated literature. Human Information Behavior studies offer a rich literature of users' behavior that could provide evidence of usage; nevertheless HIB studies are centered on smaller sample and hence lack generalization. Web analytics can address the gap in library service evaluation and HIB studies by providing quick access to aggregate information on real users' data collected unobtrusively. This study was conducted in an academic library setting. Two topics on the use of analytics for library decision-making and generalizing in HIB studies were addressed. The Library's web usability group was interviewed and their Google Analytics data were reviewed. Qualitative analyses were conducted on data obtained from the interview and Google analytics. There were concrete findings on the use of web analytics for Library decision-making that indicated its utility for enhancing the Library's online services and for improving navigation. However, there were noteworthy factors that could affect decision-making indirectly - the respondents' curiosity of users' behavior, the Library management practices, could influence decision-making in the Library along the way. Visitor trending data in Google Analytics further provided important aspects of the online users' behavior. Graphs indicated irregular patterns in users' behavior over a period of a semester. Further instances illustrated the differences in users behavior were based on their choice of sources. Visitors' technology preferences indicated factors that could influence users' information seeking. Finally, analytics can provide information on the Library's primary resources used.Includes bibliographical reference

    Model checking: Correct Web page navigations with browser behavior.

    Get PDF
    While providing better performance, transparency and expressiveness, the main features of the web technologies such as web caching, session and cookies, dynamically generated web pages etc. may also affect the correct understanding of the web applications running on top of them. From the viewpoint of formal verification and specification-based testing, this suggests that the formal model of the web application we use for static analysis or test case generation should contain the abstract behavior of the underlying web application environment. Here we consider the automated generation of such a model in terms of extended finite state machines from a given abstract description of a web application by incorporating the abstract behavioral model of the web browsers in the presence of session/cookies and dynamically generated web pages. The derived model can serve as the formal basis for both model checking and specification-based testing on the web applications where we take into account the effect of the internal caching mechanism to the correct accessibility of the web pages, which can be quite sensitive to the security of the information they carry. In order to check the correctness of the derived model against required properties, we provide the automated translation of the model into Promela. By applying SPIN on Promela models, we present experimental results on the evaluation of the proposed modeling in terms of scalability.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z543. Source: Masters Abstracts International, Volume: 43-05, page: 1761. Adviser: Jessica Chen. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    Auditing Search Engines for Differential Satisfaction Across Demographics

    Get PDF
    Many online services, such as search engines, social media platforms, and digital marketplaces, are advertised as being available to any user, regardless of their age, gender, or other demographic factors. However, there are growing concerns that these services may systematically underserve some groups of users. In this paper, we present a framework for internally auditing such services for differences in user satisfaction across demographic groups, using search engines as a case study. We first explain the pitfalls of na\"ively comparing the behavioral metrics that are commonly used to evaluate search engines. We then propose three methods for measuring latent differences in user satisfaction from observed differences in evaluation metrics. To develop these methods, we drew on ideas from the causal inference literature and the multilevel modeling literature. Our framework is broadly applicable to other online services, and provides general insight into interpreting their evaluation metrics.Comment: 8 pages Accepted at WWW 201

    Using Search Engine Technology to Improve Library Catalogs

    Get PDF
    This chapter outlines how search engine technology can be used in online public access library catalogs (OPACs) to help improve users’ experiences, to identify users’ intentions, and to indicate how it can be applied in the library context, along with how sophisticated ranking criteria can be applied to the online library catalog. A review of the literature and current OPAC developments form the basis of recommendations on how to improve OPACs. Findings were that the major shortcomings of current OPACs are that they are not sufficiently user-centered and that their results presentations lack sophistication. Further, these shortcomings are not addressed in current 2.0 developments. It is argued that OPAC development should be made search-centered before additional features are applied. While the recommendations on ranking functionality and the use of user intentions are only conceptual and not yet applied to a library catalogue, practitioners will find recommendations for developing better OPACs in this chapter. In short, readers will find a systematic view on how the search engines’ strengths can be applied to improving libraries’ online catalogs

    Express: a web-based technology to support human and computational experimentation

    Get PDF
    Experimental cognitive psychology has been greatly assisted by the development of general computer-based experiment presentation packages. Typically, however, such packages provide little support for running participants on different computers. It is left to the experimenter to ensure that group sizes are balanced between conditions and to merge data gathered on different computers once the experiment is complete. Equivalent issues arise in the evaluation of parameterized computational models, where it is frequently necessary to test a model's behavior over a range of parameter values (which amount to between-subjects factors) and where such testing can be speeded up significantly by the use of multiple processors. This article describes Express, a Web-based technology for coordinating "clients" (human participants or computational models) and collating client data. The technology provides an experiment design editor, client coordination facilities (e.g., automated randomized assignment of clients to groups so that group sizes are balanced), general data collation and tabulation facilities, a range of basic statistical functions (which are constrained by the specified experimental design), and facilities to export data to standard statistical packages (such as SPSS). We report case studies demonstrating the utility of Express in both human and computational experiments. Express may be freely downloaded from the Express Web site (http://express.psyc.bbk.ac.uk/)

    Reflecting on E-Recruiting Research Using Grounded Theory

    Get PDF
    This paper presents a systematic review of the e-Recruiting literature through a grounded theory lens. The large number of publications and the increasing diversity of publications on e-Recruiting research, as the most studied area within e-HRM (Electronic Human Resource Management), calls for a synthesis of e-Recruiting research. We show interconnections between achievements, research gaps and future research directions in order to advance both e-Recruiting research and practice. Moreover, we provide a definition of e-Recruiting. The use of grounded theory enabled us to reach across sub-disciplines, methods used, perspectives studied, themes discussed and stakeholders involved. We demonstrate that the Grounded Theory Approach led to a better understanding of the interconnections that lay buried in the disparate e-Recruiting literature
    • …
    corecore