8 research outputs found

    Механизмы адаптации прецедентных высказываний в китайской интернет-коммуникации

    Get PDF
    Modern texts contain many precedent statements, and Internet space is not an exception, Chinese Internet communication in particular. However, not all Internet users can detect precedent statements in texts, measure their features, and determine the sources of precedent statements. This article is an attempt to study precedent statements as sources of modern Chinese Internet phraseology, and also analyse their formal and semantic ways of adaptation, and examine phraseosyntactic schemes (phraseoschemes) based on precedent statements.Utterances of well-known figures, famous TV hosts, politicians, as well as posts or comments of Internet users may become precedent statements in Chinese Internet communication. Statements that acquire new meanings without changing their form and begin to function as precedent statements can be considered syntactic phraseological units, while statements with replaced lexical components can be considered phraseoschemes, underlying some sentences. Quite frequently, the use of syntactic phraseological units and phraseoschemes based on precedent statements is a response to the desire to add humorous tone to an utterance, however, if it touches upon, for example, a political aspect, the use of precedent statements is also a method of bypassing quite rigid Internet censorship in China.The article examines mechanisms of precedent statements adaptation to their secondary usage. According to the author, such mechanisms include structural transformations of the original statement, which provide, among other things, its semantic and syntactic variability, and include a secondary semantisation of the statement.The analysis allows the author to demonstrate the diversity of types of precedent statements used in modern Chinese Internet communication, and formulate questions that require further study.Современные тексты насыщены прецедентными высказываниями, не исключение и интернет-пространство, в частности китайская интернет-коммуникация. Однако не все пользователи сети могут обнаружить в тексте прецедентное высказывание, оценить особенности его использования и распознать источник прецедентности. В данной статье предпринята попытка рассмотреть прецедентные высказывания как источник пополнения интернет-фразеологии современного китайского языка, проанализировать формальные и смысловые способы их адаптации, а также изучить фразеосинтаксические схемы, построенные на основе прецедентных высказываний. В качестве прецедентных высказываний в китайской интернет-коммуникации выступают высказывания известных деятелей, ведущих, политиков, записи или комментарии интернет-пользователей и т. д. Те высказывания, которые без изменения формы при вторичном употреблении в интернете прибрели новое значение и стали функционировать в качестве прецедентных высказываний, можно отнести к синтаксическим фразеологизмам. Те же, в которых происходит замена лексических компонентов, могут быть отнесены к фразеосинтаксическим схемам (фразеосхемам), на основе которых выстраиваются предложения. Нередко использование синтаксических фразеологизмов и фразеосинтаксических схем, основанных на прецедентных высказываниях, продиктовано стремлением придать высказыванию юмористический оттенок, однако если высказывание затрагивает, например, политический аспект, их использование является также способом обхода достаточно жесткой в Китае интернет-цензуры.В статье также рассматриваются механизмы адаптации прецедентных высказываний к их вторичному употреблению. К таким механизмам автор относит структурные преобразования исходного высказывания, обеспечивающие, среди прочего, его смысловую и синтаксическую вариативность, а также механизмы вторичной семантизации высказывания. Проведенный анализ позволяет продемонстрировать разнообразие видов прецедентных высказываний, используемых в современной китаеязычной интернет-­коммуникации, а также сформулировать вопросы, требующие дальнейшего изучения

    Strict baselines for Covid-19 forecasting and ML perspective for USA and Russia

    Full text link
    Currently, the evolution of Covid-19 allows researchers to gather the datasets accumulated over 2 years and to use them in predictive analysis. In turn, this makes it possible to assess the efficiency potential of more complex predictive models, including neural networks with different forecast horizons. In this paper, we present the results of a consistent comparative study of different types of methods for predicting the dynamics of the spread of Covid-19 based on regional data for two countries: the United States and Russia. We used well-known statistical methods (e.g., Exponential Smoothing), a "tomorrow-as-today" approach, as well as a set of classic machine learning models trained on data from individual regions. Along with them, a neural network model based on Long short-term memory (LSTM) layers was considered, the training samples of which aggregate data from all regions of two countries: the United States and Russia. Efficiency evaluation was carried out using cross-validation according to the MAPE metric. It is shown that for complicated periods characterized by a large increase in the number of confirmed daily cases, the best results are shown by the LSTM model trained on all regions of both countries, showing an average Mean Absolute Percentage Error (MAPE) of 18%, 30%, 37% for Russia and 31%, 41%, 50% for US for predictions at forecast horizons of 14, 28, and 42 days, respectively

    Effective calculations on neuromorphic hardware based on spiking neural network approaches

    No full text
    The nowadays' availability of neural networks designed on power-efficient neuromorphic computing architectures gives rise to the question of applying spiking neural networks to practical machine learning tasks. A spiking network can be used in the classification task after mapping synaptic weights from the trained formal neural network to the spiking one of same topology. We show the applicability of this approach to practical tasks and investigate the influence of spiking neural network parameters on the classification accuracy. Obtained results demonstrate that the mapping with further tuning of spiking neuron network parameters may improve the classification accuracy

    Modeling the Dynamics of Spiking Networks with Memristor-Based STDP to Solve Classification Tasks

    No full text
    The problem with training spiking neural networks (SNNs) is relevant due to the ultra-low power consumption these networks could exhibit when implemented in neuromorphic hardware. The ongoing progress in the fabrication of memristors, a prospective basis for analogue synapses, gives relevance to studying the possibility of SNN learning on the base of synaptic plasticity models, obtained by fitting the experimental measurements of the memristor conductance change. The dynamics of memristor conductances is (necessarily) nonlinear, because conductance changes depend on the spike timings, which neurons emit in an all-or-none fashion. The ability to solve classification tasks was previously shown for spiking network models based on the bio-inspired local learning mechanism of spike-timing-dependent plasticity (STDP), as well as with the plasticity that models the conductance change of nanocomposite (NC) memristors. Input data were presented to the network encoded into the intensities of Poisson input spike sequences. This work considers another approach for encoding input data into input spike sequences presented to the network: temporal encoding, in which an input vector is transformed into relative timing of individual input spikes. Since temporal encoding uses fewer input spikes, the processing of each input vector by the network can be faster and more energy-efficient. The aim of the current work is to show the applicability of temporal encoding to training spiking networks with three synaptic plasticity models: STDP, NC memristor approximation, and PPX memristor approximation. We assess the accuracy of the proposed approach on several benchmark classification tasks: Fisher’s Iris, Wisconsin breast cancer, and the pole balancing task (CartPole). The accuracies achieved by SNN with memristor plasticity and conventional STDP are comparable and are on par with classic machine learning approaches

    Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models

    No full text
    The paper presents the full-size Russian corpus of Internet users’ reviews on medicines with complex named entity recognition (NER) labeling of pharmaceutically relevant entities. We evaluate the accuracy levels reached on this corpus by a set of advanced deep learning neural networks for extracting mentions of these entities. The corpus markup includes mentions of the following entities: medication (33,005 mentions), adverse drug reaction (1778), disease (17,403), and note (4490). Two of them—medication and disease—include a set of attributes. A part of the corpus has a coreference annotation with 1560 coreference chains in 300 documents. A multi-label model based on a language model and a set of features has been developed for recognizing entities of the presented corpus. We analyze how the choice of different model components affects the entity recognition accuracy. Those components include methods for vector representation of words, types of language models pre-trained for the Russian language, ways of text normalization, and other pre-processing methods. The sufficient size of our corpus allows us to study the effects of particularities of annotation and entity balancing. We compare our corpus to existing ones by the occurrences of entities of different types and show that balancing the corpus by the number of texts with and without adverse drug event (ADR) mentions improves the ADR recognition accuracy with no notable decline in the accuracy of detecting entities of other types. As a result, the state of the art for the pharmacological entity extraction task for the Russian language is established on a full-size labeled corpus. For the ADR entity type, the accuracy achieved is 61.1% by the F1-exact metric, which is on par with the accuracy level for other language corpora with similar characteristics and ADR representativeness. The accuracy of the coreference relation extraction evaluated on our corpus is 71%, which is higher than the results achieved on the other Russian-language corpora

    Adverse Drug Reaction Concept Normalization in Russian-Language Reviews of Internet Users

    No full text
    Mapping the pharmaceutically significant entities on natural language to standardized terms/concepts is a key task in the development of the systems for pharmacovigilance, marketing, and using drugs out of the application scope. This work estimates the accuracy of mapping adverse reaction mentions to the concepts from the Medical Dictionary of Regulatory Activity (MedDRA) in the case of adverse reactions extracted from the reviews on the use of pharmaceutical products by Russian-speaking Internet users (normalization task). The solution we propose is based on a neural network approach using two neural network models: the first one for encoding concepts, and the second one for encoding mentions. Both models are pre-trained language models, but the second one is additionally tuned for the normalization task using both the Russian Drug Reviews (RDRS) corpus and a set of open English-language corpora automatically translated into Russian. Additional tuning of the model during the proposed procedure increases the accuracy of mentions of adverse drug reactions by 3% on the RDRS corpus. The resulting accuracy for the adverse reaction mentions mapping to the preferred terms of MedDRA in RDRS is 70.9% F1-micro. The paper analyzes the factors that affect the accuracy of solving the task based on a comparison of the RDRS and the CSIRO Adverse Drug Event Corpus (CADEC) corpora. It is shown that the composition of the concepts of the MedDRA and the number of examples for each concept play a key role in the task solution. The proposed model shows a comparable accuracy of 87.5% F1-micro on a subsample of RDRS and CADEC datasets with the same set of MedDRA preferred terms

    Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models

    No full text
    The paper presents the full-size Russian corpus of Internet users’ reviews on medicines with complex named entity recognition (NER) labeling of pharmaceutically relevant entities. We evaluate the accuracy levels reached on this corpus by a set of advanced deep learning neural networks for extracting mentions of these entities. The corpus markup includes mentions of the following entities: medication (33,005 mentions), adverse drug reaction (1778), disease (17,403), and note (4490). Two of them—medication and disease—include a set of attributes. A part of the corpus has a coreference annotation with 1560 coreference chains in 300 documents. A multi-label model based on a language model and a set of features has been developed for recognizing entities of the presented corpus. We analyze how the choice of different model components affects the entity recognition accuracy. Those components include methods for vector representation of words, types of language models pre-trained for the Russian language, ways of text normalization, and other pre-processing methods. The sufficient size of our corpus allows us to study the effects of particularities of annotation and entity balancing. We compare our corpus to existing ones by the occurrences of entities of different types and show that balancing the corpus by the number of texts with and without adverse drug event (ADR) mentions improves the ADR recognition accuracy with no notable decline in the accuracy of detecting entities of other types. As a result, the state of the art for the pharmacological entity extraction task for the Russian language is established on a full-size labeled corpus. For the ADR entity type, the accuracy achieved is 61.1% by the F1-exact metric, which is on par with the accuracy level for other language corpora with similar characteristics and ADR representativeness. The accuracy of the coreference relation extraction evaluated on our corpus is 71%, which is higher than the results achieved on the other Russian-language corpora

    Accuracy Analysis of the End-to-End Extraction of Related Named Entities from Russian Drug Review Texts by Modern Approaches Validated on English Biomedical Corpora

    No full text
    An extraction of significant information from Internet sources is an important task of pharmacovigilance due to the need for post-clinical drugs monitoring. This research considers the task of end-to-end recognition of pharmaceutically significant named entities and their relations in texts in natural language. The meaning of “end-to-end” is that both of the tasks are performed within a single process on the “raw” text without annotation. The study is based on the current version of the Russian Drug Review Corpus—a dataset of 3800 review texts from the Russian segment of the Internet. Currently, this is the only corpus in the Russian language appropriate for research of the mentioned type. We estimated the accuracy of the recognition of the pharmaceutically significant entities and their relations in two approaches based on neural-network language models. The first core approach is to sequentially solve tasks of named-entities recognition and relation extraction (the sequential approach). The second one solves both tasks simultaneously with a single neural network (the joint approach). The study includes a comparison of both approaches, along with the hyperparameters selection to maximize resulting accuracy. It is shown that both approaches solve the target task at the same level of accuracy: 52–53% macro-averaged F1-score, which is the current level of accuracy for “end-to-end” tasks on the Russian language. Additionally, the paper presents the results for English open datasets ADE and DDI based on the joint approach, and hyperparameter selection for the modern domain-specific language models. The result is that the achieved accuracies of 84.2% (ADE) and 73.3% (DDI) are comparable or better than other published results for the datasets
    corecore