11 research outputs found

    Метод визначення семантичної зв’язності

    No full text
    Роботу присвячено вивченню проблеми визначення семантичної зв’язності понять англійської мови на базі текстових корпусів. На початку роботи ми наводимо короткий огляд існуючих підходів до вирішення проблеми, розглядаємо основні еталонні корпуси, що розмічено експертами. Далі переходимо до опису власного методу та основних класів гіпотез, на яких він базується. В роботі запропоновано і описано більше 70 гіпотез, що можуть бути використаними при обчисленні семантичної зв’язності, а також нову, високоефективну модель вимірювання зв’язності на базі машинного навчання і запропонованих гіпотез. Модель дозволяє гнучко обирати серед гіпотез підмножини і показує високу ефективність на різних наборах еталонних тестів.The work is dedicated to the problem of semantic relatedness calculation based on text corpora. At the beginning of the work, we present a brief overview of existing approaches to solve the problem and consider the basic benchmark corpora. Then we describe our own method and main hypotheses on which it is based. The paper presents more than 70 hypotheses that can be used in the calculation of semantic relatedness and a new, high-performance relatedness measure model based on machine learning. The model can flexibly switch between subsets of hypotheses and demonstrate high efficiency on different benchmarks sets

    Word Embeddings: A Survey

    Full text link
    This work lists and describes the main recent strategies for building fixed-length, dense and distributed representations for words, based on the distributional hypothesis. These representations are now commonly called word embeddings and, in addition to encoding surprisingly good syntactic and semantic information, have been proven useful as extra features in many downstream NLP tasks.Comment: 10 pages, 2 tables, 1 imag

    Low-rank Tensor Assisted K-space Generative Model for Parallel Imaging Reconstruction

    Full text link
    Although recent deep learning methods, especially generative models, have shown good performance in fast magnetic resonance imaging, there is still much room for improvement in high-dimensional generation. Considering that internal dimensions in score-based generative models have a critical impact on estimating the gradient of the data distribution, we present a new idea, low-rank tensor assisted k-space generative model (LR-KGM), for parallel imaging reconstruction. This means that we transform original prior information into high-dimensional prior information for learning. More specifically, the multi-channel data is constructed into a large Hankel matrix and the matrix is subsequently folded into tensor for prior learning. In the testing phase, the low-rank rotation strategy is utilized to impose low-rank constraints on tensor output of the generative network. Furthermore, we alternately use traditional generative iterations and low-rank high-dimensional tensor iterations for reconstruction. Experimental comparisons with the state-of-the-arts demonstrated that the proposed LR-KGM method achieved better performance

    Feature Selection by Singular Value Decomposition for Reinforcement Learning

    Get PDF
    Solving reinforcement learning problems using value function approximation requires having good state features, but constructing them manually is often difficult or impossible. We propose Fast Feature Selection (FFS), a new method for automatically constructing good features in problems with high-dimensional state spaces but low-rank dynamics. Such problems are common when, for example, controlling simple dynamic systems using direct visual observations with states represented by raw images. FFS relies on domain samples and singular value decomposition to construct features that can be used to approximate the optimal value function well. Compared with earlier methods, such as LFD, FFS is simpler and enjoys better theoretical performance guarantees. Our experimental results show that our approach is also more stable, computes better solutions, and can be faster when compared with prior work

    A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

    No full text

    Métodos de Deteção Automática de Plágio Extrínseco em Textos de Grande Dimensão

    Get PDF
    A prática de plágio em documentos, livros e na arte de forma geral, tem consequência gravas na sociedade. A existência de pessoas sem honestidade, na academia, na indústria, na imprensa que se apropriam da propriedade intelectual de outrem, levou algumas organizações a produzirem normas de combate ao plágio e adotarem meios tecnológicas para enfrentar e evitar a propagação deste mal. Os sistemas de Deteção Automática de Plágio (DAP) são, sem dúvida, os principais meios utilizadas para identificação de situações que envolvem a prática de plágio em documentos de texto disponíveis na Web. Para tentar ofuscar a atitude fraudulenta (omitir o plágio) em um documento de texto de grande dimensão, os praticantes de plágio, algumas vezes extraem curtas frases, sendo consequentemente manipuladas e transformadas de voz ativa para passiva e vice-versa, bem como os léxicos transformados em sinónimos e antónimos [ASA12, AIAA15, ASI+17]. Por outra, com pares de texto1 de maior tamanho, o processo de alinhamento textual é fastidioso, que o torna menos eficiente e até menos eficaz, sobretudo, se existir tentativa de ofuscação. Este trabalho tinha como objetivo propor métodos de DAP menos complexos que tornam o processo da Análise Detalhada mais eficiente e com melhor eficácia. Para tal, desenvolvemos dois métodos de DAP primeiramente, um método de deteção plágio que utiliza uma abordagem de segmentação recursiva do documento fonte em três blocos, afim de identificar pequenos e grandes segmentos plagiados com paráfrases com eficácia e alto nível de eficiência temporal. O segundo método proposto é o de Pesquisa de Plágio por Scanning Vetorial. Este método utiliza word embeeding (word2vec) sem recurso aos cálculos matriciais, e é capaz de detetar quer pequenos segmentos plagiados, quer segmentos grandes, mesmo com alto nível de ofuscação de forma eficiente e com alto nível de eficácia. Os resultados que apresentados no Capítulo 4 demonstram a eficácia e a eficiência dos métodos propostos nesta dissertação.The existence of people without honesty, in the academy, in the industry, in the press that appropriates the intellectual property of others, led some organizations to produce norms to combat plagiarism and to adopt technological means to confront and to prevent the propagation of this evil. Plagiarism Automatic Detectiors (PAD) systems are undoubtedly the main means used to identify situations involving the practice of plagiarism in text documents available in Web. To attempt to obfuscate the fraudulent attitude (omitting plagiarism) in a large text document, plagiarists sometimes extract short phrases and are consequently manipulated and transformed from active to passive and vice versa, as well as lexicons transformed into synonyms and antonyms [ASA12, AIAA15, ASI+17]. On the other, with pairs of text 2 Of larger size, the process of text alignment is tedious, which makes it less efficient and even less effective, especially if there is an attempt to obfuscate. This work aimed to propose less complex PAD methods that make the Detailed Analysis process more efficient and with better efficiency. For this, we developed two methods of PAD first, a plagiarism detection method that uses a recursive segmentation approach of the source document in three blocks, in order to identify small and large segments plagiarized with efficacious paraphrases and high level of temporal efficiency. The second proposed method is the Plagiarism Research by Vector Scanning). This method uses word embeedings (word2vec) without recourse to matrix calculations, and is capable of detecting either small plagiarized segments or large segments, even with high level of obfuscation efficiently and with high level of efficiency. The results presented in Chapter 4 demonstrate the efficacy and efficiency of the methods proposed in this dissertation
    corecore