5 research outputs found

    Distributional models in the task of hypernym discovery

    No full text
    An approach to the solution of the first task of automatically taxonomy construction for the Russian language is described. This task consists in matching unknown input-words with hypernyms from the existing taxonomy. We show that useful results can be attained using pre-trained distribution models without additional training. © Springer Nature Switzerland AG 2020

    Anomaly detection for short texts: Identifying whether your chatbot should switch from goal-oriented conversation to chit-chatting

    No full text
    Goal-oriented conversational agents are systems able converse with humans using natural language to help them reach a certain goal. The number of goals (or domains) about which an agent could converse is limited, and one of the issues is to identify whether a user talks about the unknown domain (in order to report a misunderstanding or switch to chit-chatting mode). We argue that this issue could be resolved if we consider it as an anomaly detection task which is in a field of machine learning. The scientific community developed a broad range of methods for resolving this task, and their applicability to the short text data was never investigated before. The aim of this work is to compare performance of 6 different anomaly detection methods on Russian and English short texts modeling conversational utterances, proposing the first evaluation framework for this task. As a result of the study, we find out that a simple threshold for cosine similarity works better than other methods for both of the considered languages. © Springer Nature Switzerland AG 2018

    Fast and Accurate Patent Classification in Search Engines

    No full text
    This article presents a new approach to large scale patent classification. The need to classify documents often takes place in professional information retrieval systems. In this paper we describe our approach, based on linguistically-supported k-nearest neighbors. We experimentally evaluate it on the Russian and English datasets and compare modern classification technique fastText. We show that KNN is a viable alternative to traditional text classifiers, achieving comparable accuracy while using less additional hardware resources. © Published under licence by IOP Publishing Ltd