87 research outputs found

    A Brief History of Web Crawlers

    Full text link
    Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

    Identifying task-based sessions in search engine query logs

    Full text link

    AUTOMATED ANALYSIS OF NATURAL-LANGUAGE REQUIREMENTS USING NATURAL LANGUAGE PROCESSING

    Get PDF
    Natural Language (NL) is arguably the most common vehicle for specifying requirements. This dissertation devises automated assistance for some important tasks that requirements engineers need to perform in order to structure, manage, and elaborate NL requirements in a sound and effective manner. The key enabling technology underlying the work in this dissertation is Natural Language Processing (NLP). All the solutions presented herein have been developed and empirically evaluated in close collaboration with industrial partners. The dissertation addresses four different facets of requirements analysis: • Checking conformance to templates. Requirements templates are an effective tool for improving the structure and quality of NL requirements statements. When templates are used for specifying the requirements, an important quality assurance task is to ensure that the requirements conform to the intended templates. We develop an automated solution for checking the conformance of requirements to templates. • Extraction of glossary terms. Requirements glossaries (dictionaries) improve the understandability of requirements, and mitigate vagueness and ambiguity. We develop an auto- mated solution for supporting requirements analysts in the selection of glossary terms and their related terms. • Extraction of domain models. By providing a precise representation of the main concepts in a software project and the relationships between these concepts, a domain model serves as an important artifact for systematic requirements elaboration. We propose an automated approach for domain model extraction from requirements. The extraction rules in our approach encompass both the rules already described in the literature as well as a number of important extensions developed in this dissertation. • Identifying the impact of requirements changes. Uncontrolled change in requirements presents a major risk to the success of software projects. We address two different dimen- sions of requirements change analysis in this dissertation: First, we develop an automated approach for predicting how a change to one requirement impacts other requirements. Next, we consider the propagation of change from requirements to design. To this end, we develop an automated approach for predicting how the design of a system is impacted by changes made to the requirements

    Network analysis of the cellular circuits of memory

    Get PDF
    Intuitively, memory is conceived as a collection of static images that we accumulate as we experience the world. But actually, memories are constantly changing through our life, shaped by our ongoing experiences. Assimilating new knowledge without corrupting pre-existing memories is then a critical brain function. However, learning and memory interact: prior knowledge can proactively influence learning, and new information can retroactively modify memories of past events. The hippocampus is a brain region essential for learning and memory, but the network-level operations that underlie the continuous integration of new experiences into memory, segregating them as discrete traces while enabling their interaction, are unknown. Here I show a network mechanism by which two distinct memories interact. Hippocampal CA1 neuron ensembles were monitored in mice as they explored a familiar environment before and after forming a new place-reward memory in a different environment. By employing a network science representation of the co-firing relationships among principal cells, I first found that new associative learning modifies the topology of the cells’ co-firing patterns representing the unrelated familiar environment. I fur- ther observed that these neuronal co-firing graphs evolved along three functional axes: the first segregated novelty; the second distinguished individual novel be- havioural experiences; while the third revealed cross-memory interaction. Finally, I found that during this process, high activity principal cells rapidly formed the core representation of each memory; whereas low activity principal cells gradually joined co-activation motifs throughout individual experiences, enabling cross-memory in- teractions. These findings reveal an organizational principle of brain networks where high and low activity cells are differentially recruited into coactivity motifs as build- ing blocks for the flexible integration and interaction of memories. Finally, I employ a set of manifold learning and related approaches to explore and characterise the complex neural population dynamics within CA1 that underlie sim- ple exploration.Open Acces

    Ontology matching: state of the art and future challenges

    Get PDF
    shvaiko2013aInternational audienceAfter years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology matching still making progress? Is this progress significant enough to pursue some further research? If so, what are the particularly promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field

    Using contextual information to understand searching and browsing behavior

    Get PDF
    There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and utilize contextual information in order to understand and improve users' searching and browsing behavior on the web. Our results capture important aspects of context under the realistic conditions of different online search services, aiming to ensure that our scientific insights and solutions transfer to the operational settings of real world applications

    Inferring User Needs and Tasks from User Interactions

    Get PDF
    The need for search often arises from a broad range of complex information needs or tasks (such as booking travel, buying a house, etc.) which lead to lengthy search processes characterised by distinct stages and goals. While existing search systems are adept at handling simple information needs, they offer limited support for tackling complex tasks. Accurate task representations could be useful in aptly placing users in the task-subtask space and enable systems to contextually target the user, provide them better query suggestions, personalization and recommendations and help in gauging satisfaction. The major focus of this thesis is to work towards task based information retrieval systems - search systems which are adept at understanding, identifying and extracting tasks as well as supporting user’s complex search task missions. This thesis focuses on two major themes: (i) developing efficient algorithms for understanding and extracting search tasks from log user and (ii) leveraging the extracted task information to better serve the user via different applications. Based on log analysis on a tera-byte scale data from a real-world search engine, detailed analysis is provided on user interactions with search engines. On the task extraction side, two bayesian non-parametric methods are proposed to extract subtasks from a complex task and to recursively extract hierarchies of tasks and subtasks. A novel coupled matrix-tensor factorization model is proposed that represents user based on their topical interests and task behaviours. Beyond personalization, the thesis demonstrates that task information provides better context to learn from and proposes a novel neural task context embedding architecture to learn query representations. Finally, the thesis examines implicit signals of user interactions and considers the problem of predicting user’s satisfaction when engaged in complex search tasks. A unified multi-view deep sequential model is proposed to make query and task level satisfaction prediction

    Query Log Mining to Enhance User Experience in Search Engines

    Get PDF
    The Web is the biggest repository of documents humans have ever built. Even more, it is increasingly growing in size every day. Users rely on Web search engines (WSEs) for finding information on the Web. By submitting a textual query expressing their information need, WSE users obtain a list of documents that are highly relevant to the query. Moreover, WSEs tend to store such huge amount of users activities in "query logs". Query log mining is the set of techniques aiming at extracting valuable knowledge from query logs. This knowledge represents one of the most used ways of enhancing the users’ search experience. According to this vision, in this thesis we firstly prove that the knowledge extracted from query logs suffer aging effects and we thus propose a solution to this phenomenon. Secondly, we propose new algorithms for query recommendation that overcome the aging problem. Moreover, we study new query recommendation techniques for efficiently producing recommendations for rare queries. Finally, we study the problem of diversifying Web search engine results. We define a methodology based on the knowledge derived from query logs for detecting when and how query results need to be diversified and we develop an efficient algorithm for diversifying search results
    • …
    corecore