5 research outputs found

    Question Answering with distilled BERT models: A case study for Biomedical Data

    Get PDF
    In the healthcare industry today, 80% of data is unstructured (Razzak et al., 2019). The challenge this imposes on healthcare providers is that they rely on unstructured data to inform their decision-making. Although Electronic Health Records (EHRs) exist to integrate patient data, healthcare providers are still challenged with searching for information and answers contained within unstructured data. Prior NLP and Deep Learning research has shown that these methods can improve information extraction on unstructured medical documents. This research expands upon those studies by developing a Question Answering system using distilled BERT models. Healthcare providers can use this system on their local computers to search for and receive answers to specific questions about patients. This paper’s best TinyBERT and TinyBioBERT models had Mean Reciprocal Rank (MRRs) of 0.522 and 0.284 respectively. Based on these findings this paper concludes that TinyBERT performed better than TinyBioBERT on BioASQ task 9b data

    RD-Suite: A Benchmark for Ranking Distillation

    Full text link
    The distillation of ranking models has become an important topic in both academia and industry. In recent years, several advanced methods have been proposed to tackle this problem, often leveraging ranking information from teacher rankers that is absent in traditional classification settings. To date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide range of tasks and datasets make it difficult to assess or invigorate advances in this field. This paper first examines representative prior arts on ranking distillation, and raises three questions to be answered around methodology and reproducibility. To that end, we propose a systematic and unified benchmark, Ranking Distillation Suite (RD-Suite), which is a suite of tasks with 4 large real-world datasets, encompassing two major modalities (textual and numeric) and two applications (standard distillation and distillation transfer). RD-Suite consists of benchmark results that challenge some of the common wisdom in the field, and the release of datasets with teacher scores and evaluation scripts for future research. RD-Suite paves the way towards better understanding of ranking distillation, facilities more research in this direction, and presents new challenges.Comment: 15 pages, 2 figures. arXiv admin note: text overlap with arXiv:2011.04006 by other author

    On Elastic Language Models

    Full text link
    Large-scale pretrained language models have achieved compelling performance in a wide range of language understanding and information retrieval tasks. Knowledge distillation offers an opportunity to compress a large language model to a small one, in order to reach a reasonable latency-performance tradeoff. However, for scenarios where the number of requests (e.g., queries submitted to a search engine) is highly variant, the static tradeoff attained by the compressed language model might not always fit. Once a model is assigned with a static tradeoff, it could be inadequate in that the latency is too high when the number of requests is large or the performance is too low when the number of requests is small. To this end, we propose an elastic language model (ElasticLM) that elastically adjusts the tradeoff according to the request stream. The basic idea is to introduce a compute elasticity to the compressed language model, so that the tradeoff could vary on-the-fly along scalable and controllable compute. Specifically, we impose an elastic structure to enable ElasticLM with compute elasticity and design an elastic optimization to learn ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic schedule. Considering the specificity of information retrieval, we adapt ElasticLM to dense retrieval and reranking and present ElasticDenser and ElasticRanker respectively. Offline evaluation is conducted on a language understanding benchmark GLUE; and several information retrieval tasks including Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM along with ElasticDenser and ElasticRanker can perform correctly and competitively compared with an array of static baselines. Furthermore, online simulation with concurrency is also carried out. The results demonstrate that ElasticLM can provide elastic tradeoffs with respect to varying request stream.Comment: 27 pages, 11 figures, 9 table

    HANDLING CHANGE IN A PRODUCTION TASKBOT. EFFICIENTLY MANAGING THE GROWTH OF TWIZ, AN ALEXA ASSISTANT

    Get PDF
    A Conversational Agent aims to converse with users, with a focus on natural behaviour and responses. They can be extremely complex as there are several parts which constitute it, several courses of action and infinite possible inputs. As so, behaviour checking is essential, especially if used in a production context, as wrong behaviour can have big consequences. Nevertheless, developing a robust and correctly behaving Task Bot, should not hinder research and must allow for continuous improvement of vanguard solutions. Hence, manual testing of such a complex system is bound to encounter several limits, either on the extension of the testing or on the time consumption of developers’ work. As so, we propose the development of a tool to automatically test, with a much broader test surface, these highly sophisticated systems. We introduce a solution, which leverages past conversation replay and mimicking to generate synthetic conversations. This allows for time-savings on quality assurance and better change handling. A key part of a Conversational Agent is the retrieval component. This is responsible for the correct retrieval of information, that is useful to the user. In task-guiding assistants, the retrieval element should not narrow the user’s behaviour, by omitting tasks that could be relevant. However, achieving perfect information matching to a user’s query is arduous, since there could be a plethora of words the user could say in order to attempt to accomplish an objective. To tackle this, we make use of a semantic retrieval algorithm adapting it to this domain by generating a synthetic dataset.Um Agente Conversacional visa ter conversas com utilizadores, focando-se no comportamento e nas respostas naturais. Estes podem ser, no entanto, extremamente complexos. São várias as partes que os constituem, os fluxos possíveis e os pedidos que o utilizador pode fazer. Assim, a verificação de comportamento é essencial, especialmente se usada em um contexto de produção, pois o comportamento errado pode ter grandes consequências. No entanto, o desenvolvimento de um Task Bot robusto e de comportamento correto não deve prejudicar a pesquisa e deve permitir a melhoria contínua das soluções. Portanto, testagem manual de um sistema tão complexo depara-se com vários limites, seja na extensão do teste ou no consumo de tempo do trabalho dos developers. Assim, propomos também o desenvolvimento de uma ferramenta para testes automáticos, com uma frente de teste muito mais ampla, para estes sistemas sofisticados. Apresentamos uma solução que aproveita a repetição e a simulação de conversas anteriores para gerar conversas sintéticas. Isso permite reduzir o tempo gasto na verificação de qualidade e permite melhor adaptação a mudanças. Uma parte fundamental de um agente conversacional é o retriever. Esta é a componente responsável pela obtenção de informação relevante. Nos assistentes que têm como objetivo a orientação de tarefas, o retriever não deve restringir o comportamento do utilizador, ao omitir tarefas que possam ser relevantes. No entanto, obter uma correspondência perfeita de informações com o pedido do utilizador é árduo, pois pode haver uma infinidade de formas que o utilizador pode formular o seu pedido pretendendo o mesmo objetivo. Para ultrupassar este problema, utilizamos um algoritmo de retrieval semântico, adaptando-o ao domínio em questão através da geração de um conjunto de dados sintético

    Pretrained Transformers for Text Ranking: BERT and Beyond

    Get PDF
    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading
    corecore