1,420 research outputs found

    Information retrieval in the workplace: A comparison of professional search practices

    Get PDF
    Legal researchers, recruitment professionals, healthcare information professionals, and patent analysts all undertake work tasks where search forms a core part of their duties. In these instances, the search task is often complex and time-consuming and requires specialist expertise to identify relevant documents and insights within large domain-specific repositories and collections. Several studies have been made investigating the search practices of professionals such as these, but few have attempted to directly compare their professional practices and so it remains unclear to what extent insights and approaches from one domain can be applied to another. In this paper we describe the results of a survey of a purposive sample of 108 legal researchers, 64 recruitment professionals and 107 healthcare information professionals. Their responses are compared with results from a previous survey of 81 patent analysts. The survey investigated their search practices and preferences, the types of functionality they value, and their requirements for future information retrieval systems. The results reveal that these professions share many fundamental needs and face similar challenges. In particular a continuing preference to formulate queries as Boolean expressions, the need to manage, organise and re-use search strategies and results and an ambivalence toward the use of relevance ranking. The results stress the importance of recall and coverage for the healthcare and patent professionals, while precision and recency were more important to the legal and recruitment professionals. The results also highlight the need to ensure that search systems give confidence to the professional searcher and so trust, explainability and accountability remains a significant challenge when developing such systems. The findings suggest that translational research between the different areas could benefit professionals across domains

    Query refinement for patent prior art search

    Get PDF
    A patent is a contract between the inventor and the state, granting a limited time period to the inventor to exploit his invention. In exchange, the inventor must put a detailed description of his invention in the public domain. Patents can encourage innovation and economic growth but at the time of economic crisis patents can hamper such growth. The long duration of the application process is a big obstacle that needs to be addressed to maximize the benefit of patents on innovation and economy. This time can be significantly improved by changing the way we search the patent and non-patent literature.Despite the recent advancement of general information retrieval and the revolution of Web Search engines, there is still a huge gap between the emerging technologies from the research labs and adapted by major Internet search engines, and the systems which are in use by the patent search communities.In this thesis we investigate the problem of patent prior art search in patent retrieval with the goal of finding documents which describe the idea of a query patent. A query patent is a full patent application composed of hundreds of terms which does not represent a single focused information need. Other relevance evidences (e.g. classification tags, and bibliographical data) provide additional details about the underlying information need of the query patent. The first goal of this thesis is to estimate a uni-gram query model from the textual fields of a query patent. We then improve the initial query representation using noun phrases extracted from the query patent. We show that expansion in a query-dependent manner is useful.The second contribution of this thesis is to address the term mismatch problem from a query formulation point of view by integrating multiple relevance evidences associated with the query patent. To do this, we enhance the initial representation of the query with the term distribution of the community of inventors related to the topic of the query patent. We then build a lexicon using classification tags and show that query expansion using this lexicon and considering proximity information (between query and expansion terms) can improve the retrieval performance. We perform an empirical evaluation of our proposed models on two patent datasets. The experimental results show that our proposed models can achieve significantly better results than the baseline and other enhanced models

    Expert search strategies: the information retrieval practices of healthcare information professionals

    Get PDF
    Background: Healthcare information professionals play a key role in closing the knowledge gap between medical research and clinical practice. Their work involves meticulous searching of literature databases using complex search strategies that can consist of hundreds of keywords, operators, and ontology terms. This process is prone to error and can lead to inefficiency and bias if performed incorrectly. Objective: The aim of this study was to investigate the search behavior of healthcare information professionals, uncovering their needs, goals, and requirements for information retrieval systems. Methods: A survey was distributed to healthcare information professionals via professional association email discussion lists. It investigated the search tasks they undertake, their techniques for search strategy formulation, their approaches to evaluating search results, and their preferred functionality for searching library-style databases. The popular literature search system PubMed was then evaluated to determine the extent to which their needs were met. Results: The 107 respondents indicated that their information retrieval process relied on the use of complex, repeatable, and transparent search strategies. On average it took 60 minutes to formulate a search strategy, with a search task taking 4 hours and consisting of 15 strategy lines. Respondents reviewed a median of 175 results per search task, far more than they would ideally like (100). The most desired features of a search system were merging search queries and combining search results. Conclusions: Healthcare information professionals routinely address some of the most challenging information retrieval problems of any profession. However, their needs are not fully supported by current literature search systems and there is demand for improved functionality, in particular regarding the development and management of search strategies

    The Search as Learning Spaceship: Toward a Comprehensive Model of Psychological and Technological Facets of Search as Learning

    Get PDF
    Using a Web search engine is one of today’s most frequent activities. Exploratory search activities which are carried out in order to gain knowledge are conceptualized and denoted as Search as Learning (SAL). In this paper, we introduce a novel framework model which incorporates the perspective of both psychology and computer science to describe the search as learning process by reviewing recent literature. The main entities of the model are the learner who is surrounded by a specific learning context, the interface that mediates between the learner and the information environment, the information retrieval (IR) backend which manages the processes between the interface and the set of Web resources, that is, the collective Web knowledge represented in resources of different modalities. At first, we provide an overview of the current state of the art with regard to the five main entities of our model, before we outline areas of future research to improve our understanding of search as learning processes. Copyright © 2022 von Hoyer, Hoppe, Kammerer, Otto, Pardi, Rokicki, Yu, Dietze, Ewerth and Holtz

    The Search as Learning Spaceship: Toward a Comprehensive Model of Psychological and Technological Facets of Search as Learning

    Get PDF
    Using a Web search engine is one of today’s most frequent activities. Exploratory search activities which are carried out in order to gain knowledge are conceptualized and denoted as Search as Learning (SAL). In this paper, we introduce a novel framework model which incorporates the perspective of both psychology and computer science to describe the search as learning process by reviewing recent literature. The main entities of the model are the learner who is surrounded by a specific learning context, the interface that mediates between the learner and the information environment, the information retrieval (IR) backend which manages the processes between the interface and the set of Web resources, that is, the collective Web knowledge represented in resources of different modalities. At first, we provide an overview of the current state of the art with regard to the five main entities of our model, before we outline areas of future research to improve our understanding of search as learning processes

    Data-Driven Design-by-Analogy: State of the Art and Future Directions

    Full text link
    Design-by-Analogy (DbA) is a design methodology wherein new solutions, opportunities or designs are generated in a target domain based on inspiration drawn from a source domain; it can benefit designers in mitigating design fixation and improving design ideation outcomes. Recently, the increasingly available design databases and rapidly advancing data science and artificial intelligence technologies have presented new opportunities for developing data-driven methods and tools for DbA support. In this study, we survey existing data-driven DbA studies and categorize individual studies according to the data, methods, and applications in four categories, namely, analogy encoding, retrieval, mapping, and evaluation. Based on both nuanced organic review and structured analysis, this paper elucidates the state of the art of data-driven DbA research to date and benchmarks it with the frontier of data science and AI research to identify promising research opportunities and directions for the field. Finally, we propose a future conceptual data-driven DbA system that integrates all propositions.Comment: A Preprint Versio

    On Term Selection Techniques for Patent Prior Art Search

    No full text
    A patent is a set of exclusive rights granted to an inventor to protect his invention for a limited period of time. Patent prior art search involves finding previously granted patents, scientific articles, product descriptions, or any other published work that may be relevant to a new patent application. Many well-known information retrieval (IR) techniques (e.g., typical query expansion methods), which are proven effective for ad hoc search, are unsuccessful for patent prior art search. In this thesis, we mainly investigate the reasons that generic IR techniques are not effective for prior art search on the CLEF-IP test collection. First, we analyse the errors caused due to data curation and experimental settings like applying International Patent Classification codes assigned to the patent topics to filter the search results. Then, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection, starting with the description section of the reference patent and using language models (LM) and BM25 scoring functions. We find that an oracular relevance feedback system, which extracts terms from the judged relevant documents far outperforms the baseline (i.e., 0.11 vs. 0.48) and performs twice as well on mean average precision (MAP) as the best participant in CLEF-IP 2010 (i.e., 0.22 vs. 0.48). We find a very clear term selection value threshold for use when choosing terms. We also notice that most of the useful feedback terms are actually present in the original query and hypothesise that the baseline system can be substantially improved by removing negative query terms. We try four simple automated approaches to identify negative terms for query reduction but we are unable to improve on the baseline performance with any of them. However, we show that a simple, minimal feedback interactive approach, where terms are selected from only the first retrieved relevant document outperforms the best result from CLEF-IP 2010, suggesting the promise of interactive methods for term selection in patent prior art search

    NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.

    Full text link
    This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113407/1/rahuljha_1.pd

    Overexpression and Characterization of a Laccase from Geobacillus Thermoglucosidasius

    Get PDF
    The use of enzymes as industrial oxidants has become popular due to their high substrate specificity and mild reaction conditions. Specifically, laccases are multi-copper oxidases that can oxidize a disparate range of organic substrates using oxygen and producing water as a byproduct without requirement for additional reactive compounds. Currently, all laccases used in industrial processes are fungal in origin. Although fungal laccases have high activities under near-ambient conditions, their use is limited at higher temperatures. Also, expression of fungal laccases in heterologous hosts is limited due to incorrect glycosylation. Bacterial laccases are much easier to express heterologously and are more active and stable at high temperatures, pH and salt concentrations. Geobacillus is a genus of gram-positive thermophilic bacteria, many of which have been found to naturally secrete proteins at high levels. A novel laccase has been predicted to be present in multiple Geobacillus strains using comparative genomics. This laccase is approximately half the size of those found in other gram-positives or fungi, making it a better candidate for lignocellulosic biomass degradation because of easier access to the substrate. In this work, we seek to isolate and characterize this laccase, and determine the types of substrates it can oxidize. We then want to compare the activity of our laccase with that of a fungal laccase at different temperatures. A plasmid was successfully constructed for the overexpression of laccase in Geobacillus thermoglucosidasius 95A1 and Escherichia coli DH5á. The novel laccase was isolated and purified from E. coli. The laccase was characterized by determining the activity for 5 substrates at a range of pHs and temperatures. Finally, the thermal stability of our laccase was compared with that from a fungal source, Trametes versicolor. Laccase from G. thermoglucosidasius demonstrated a 20-fold higher initial activity than Trametes laccase at 80°C, and was superior to the latter in terms of thermal stability and activity at high temperatures

    Mapping Nanomedicine Terminology in the Regulatory Landscape

    Get PDF
    A common terminology is essential in any field of science and technology for a mutual understanding among different communities of experts and regulators, harmonisation of policy actions, standardisation of quality procedures and experimental testing, and the communication to the general public. It also allows effective revision of information for policy making and optimises research fund allocation. In particular, in emerging scientific fields with a high innovation potential, new terms, descriptions and definitions are quickly generated, which are then ambiguously used by stakeholders having diverse interests, coming from different scientific disciplines and/or from various regions. The application of nanotechnology in health -often called nanomedicine- is considered as such emerging and multidisciplinary field with a growing interest of various communities. In order to support a better understanding of terms used in the regulatory domain, the Nanomedicines Working Group of the International Pharmaceutical Regulators Forum (IPRF) has prioritised the need to map, compile and discuss the currently used terminology of regulatory scientists coming from different geographic areas. The JRC has taken the lead to identify and compile frequently used terms in the field by using web crawling and text mining tools as well as the manual extraction of terms. Websites of 13 regulatory authorities and clinical trial registries globally involved in regulating nanomedicines have been crawled. The compilation and analysis of extracted terms demonstrated sectorial and geographical differences in the frequency and type of nanomedicine related terms used in a regulatory context. Finally 31 relevant and most frequently used terms deriving from various agencies have been compiled, discussed and analysed for their similarities and differences. These descriptions will support the development of harmonised use of terminology in the future. The report provides necessary background information to advance the discussion among stakeholders. It will strengthen activities aiming to develop harmonised standards in the field of nanomedicine, which is an essential factor to stimulate innovation and industrial competitiveness.JRC.F.2-Consumer Products Safet
    • …
    corecore