47 research outputs found

    El modelo cortical HTM y su aplicación al conocimiento lingüístico

    Get PDF
    El problema que aborda este trabajo de investigación es encontrar un modelo neurocomputacional de representación y comprensión del conocimiento léxico, utilizando para ello el algoritmo cortical HTM, que modela el mecanismo según el cual se procesa la información en el neocórtex humano. La comprensión automática del lenguaje natural implica que las máquinas tengan un conocimiento profundo del lenguaje natural, lo que, actualmente, está muy lejos de conseguirse. En general, los modelos computacionales para el Procesamiento del Lenguaje Natural (PLN), tanto en su vertiente de análisis y comprensión como en la de generación, utilizan algoritmos fundamentados en modelos matemáticos y lingüísticos que intentan emular la forma en la que tradicionalmente se ha procesado el lenguaje, por ejemplo, obteniendo la estructura jerárquica implícita de las frases o las desinencias de las palabras. Estos modelos son útiles porque sirven para construir aplicaciones concretas como la extracción de datos, la clasificación de textos o el análisis de opinión. Sin embargo, a pesar de su utilidad, las máquinas realmente no entienden lo que hacen con ninguno de estos modelos. Por tanto, la pregunta que se aborda en este trabajo es si, realmente, es posible modelar computacionalmente los procesos neocorticales humanos que regulan el tratamiento de la información de tipo semántico del léxico. Esta cuestión de investigación constituye el primer nivel para comprender el procesamiento del lenguaje natural a niveles lingüísticos superiores..

    Resolving XML Semantic Ambiguity

    Get PDF
    ABSTRACT XML semantic-aware processing has become a motivating and important challenge in Web data management, data processing, and information retrieval. While XML data is semi-structured, yet it remains prone to lexical ambiguity, and thus requires dedicated semantic analysis and sense disambiguation processes to assign well-defined meaning to XML elements and attributes. This becomes crucial in an array of applications ranging over semantic-aware query rewriting, semantic document clustering and classification, schema matching, as well as blog analysis and event detection in social networks and tweets. Most existing approaches in this context: i) ignore the problem of identifying ambiguous XML nodes, ii) only partially consider their structural relations/context, iii) use syntactic information in processing XML data regardless of the semantics involved, and iv) are static in adopting fixed disambiguation constraints thus limiting user involvement. In this paper, we provide a new XML Semantic Disambiguation Framework titled XSDF designed to address each of the above motivations, taking as input: an XML document and a general purpose semantic network, and then producing as output a semantically augmented XML tree made of unambiguous semantic concepts. Experiments demonstrate the effectiveness of our approach in comparison with alternative methods. Categories and Subject Descriptors General Terms Algorithms, Measurement, Performance, Design, Experimentation. Keywords XML semantic-aware processing, a m b i g u i t y d e g r e e , s p h e r e neighborhood, XML context vector, semantic network, semantic disambiguation

    Christmas Crackers

    Get PDF
    The collected issues of "Christmas Crackers" issued in 1990-1997, containing selections of logological diversions and verbal amuses-bouch

    Антонимия прилагательных в словаре, контексте и когнитивной системе

    Get PDF
    Предмет и циљеви истраживањаПредмет овог истраживања су придевски антоними, а циљеви овог рада су троструки: теоријски, методолошки и практично-лексикографски. У теоријске циљеве, поред употпуњавања досадашњих лексиколошких и психолингвистичких знања о антонимији, убрајамо и утврђивање објективнијег начина класификације антонима на боље и лошије. Када су у питању методолошки циљеви, интересује нас на које је све начине могуће испитивати антонимију, шта нам свака од метода може понудити и у којој мери су налази добијени различитим методама упоредиви и конгруентни. У практично-лексикографске циљеве убрајамо експлицирање начина обраде и функције антонима у речницима, испитивање оправданости датог начина обраде и анализу могућности израде будућег речника антонима српског језика.Методе и технике истраживањаУ раду смо користили различите методе и технике истраживања: метаанализу лексикографског текста, упоређивање података из различитих дескриптивних, антонимских и асоцијативних речника, тест контролисаних асоцијација, кластер анализу, скалу процене, као и корпусне мере заједничке употребе речи у тексту...Research scope and objectivesThe scope of the research are adjective antonyms, while the objectives are threefold: theoretical, methodological and lexicographical practice. Theoretical objectives, in addition to broadening of the current lexicological and psycholinguistic knowledge of antonyms, include determining of the more objective way to classify antonyms as better and worse. In terms of the methodological objectives, we are interested in identifying possible ways to examine antonyms, what each method can offer and to what extent the findings obtained by different methods are comparable and congruent. Objectives of lexicographical practice improvement include explication of antonym usage and functions of antonyms in dictionaries, investigation of the validity of such an antonym usage and analysis of the possibility of preparing future antonym dictionary of the Serbian language.Research methods and techniquesWe have used different methods and techniques in the research: a meta-analysis of the lexicographical text, a comparison of data from different descriptive, antonym and associative dictionaries, controlled association test, a cluster analysis, judgement scales, as well as corpus measures of common use of words in a text..

    Social work with airports passengers

    Get PDF
    Social work at the airport is in to offer to passengers social services. The main methodological position is that people are under stress, which characterized by a particular set of characteristics in appearance and behavior. In such circumstances passenger attracts in his actions some attention. Only person whom he trusts can help him with the documents or psychologically

    The development of a framework for semantic similarity measures for the Arabic language

    Get PDF
    This thesis presents a novel framework for developing an Arabic Short Text Semantic Similarity (STSS) measure, namely that of NasTa. STSS measures are developed for short texts of 10 -25 words long. The algorithm calculates the STSS based on Part of Speech (POS), Arabic Word Sense Disambiguation (WSD), semantic nets and corpus statistics. The proposed framework is founded on word similarity measures. Firstly, a novel Arabic noun similarity measure is created using information sources extracted from a lexical database known as Arabic WordNet. Secondly, a novel verb similarity algorithm is created based on the assumption that words sharing a common root usually have a related meaning which is a central characteristic of Arabic language. Two Arabic word benchmark datasets, noun and verb are created to evaluate them. These are the first of their kinds for Arabic. Their creation methodologies use the best available experimental techniques to create materials and collect human ratings from representative samples of the Arabic speaking population. Experimental evaluation indicates that the Arabic noun and the Arabic verb measures performed well and achieved good correlations comparison with the average human performance on the noun and verb benchmark datasets respectively. Specific features of the Arabic language are addressed. A new Arabic WSD algorithm is created to address the challenge of ambiguity caused by missing diacritics in the contemporary Arabic writing system. The algorithm disambiguates all words (nouns and verbs) in the Arabic short texts without requiring any manual training data. Moreover, a novel algorithm is presented to identify the similarity score between two words belonging to different POS, either a pair comprising a noun and verb or a verb and noun. This algorithm is developed to perform Arabic WSD based on the concept of noun semantic similarity. Important benchmark datasets for text similarity are presented: ASTSS-68 and ASTSS-21. Experimental results indicate that the performance of the Arabic STSS algorithm achieved a good correlation comparison with the average human performance on ASTSS-68 which was statistically significant

    Semantic sentence similarity incorporating linguistic concepts

    Get PDF
    A natural language allows a set of simpler ideas to be combined together to communicate much more complex ideas. This ability gives language the potential for use as a highly intuitive method of human interaction. However, this freedom of expression makes interpreting language with automation extremely challenging. Semantic sentence similarity is an approach which allows the knowledge of how to compare simpler units, such as words, to obtain a measure of similarity between two sentences. This similarity can allow existing knowledge to be applied to new situations. The objective of this research is to show that a sentence similarity model can be improved through the inclusion of Linguistic concepts, with the aim of producing a more accurate model. This presents the challenge of adapting the human focused rules of Linguistics for sentence similarity and how to evaluate individual component effects in isolation. This research successfully overcame these barriers through the development of an extensible modular framework and construction of a new mathematical model for this framework , called SARUMAN. The core contribution of the research resulted from gradually incorporating fundamental Linguistic components to SARUMAN including: disambiguation by part of speech; treating the sentence as clauses, and advanced word interaction to handle where meanings merge. The most advanced being called SCAWIT. From experiments on a small data set, each of these introduced concepts showed statistically significant improvement in the Pearson's correlation (0.05 or more) over the previous version. The produced models were capable of processing several hundred sentence pairs a second with a single processor. A further significant advance to the field of sentence similarity was the introduction of opposites to sentence similarity. This was conceptually beyond the pre-existing models and showed strong results for an extension of SCAWIT, called SANO. Other novel contribution was added through automated word sense disambiguation from WordNet definitions; and the use of a properties of words model. Some of these changes have potential but did not yield significant improvement with the current knowledge base

    Image summarisation: human action description from static images

    Get PDF
    Dissertação de Mestrado, Processamento de Linguagem Natural e Indústrias da Língua, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2014The object of this master thesis is Image Summarisation and more specifically the automatic human action description from static images. The work has been organised into three main phases, with first one being the data collection, second the actual system implementation and third the system evaluation. The dataset consists of 1287 images depicting human activities belonging in fours semantic categories; "walking a dog", "riding a bike", "riding a horse" and "playing the guitar". The images were manually annotated with an approach based in the idea of crowd sourcing, and the annotation of each sentence is in the form of one or two simple sentences. The system is composed by two parts, a Content-based Image Retrieval part and a Natural Language Processing part. Given a query image the first part retrieves a set of images perceived as visually similar and the second part processes the annotations following each of the images in order to extract common information by using a graph merging technique of the dependency graphs of the annotated sentences. An optimal path consisting of a subject-verb-complement relation is extracted and transformed into a proper sentence by applying a set of surface processing rules. The evaluation of the system was carried out in three different ways. Firstly, the Content-based Image Retrieval sub-system was evaluated in terms of precision and recall and compared to a baseline classification system based on randomness. In order to evaluate the Natural Language Processing sub-system, the Image Summarisation task was considered as a machine translation task, and therefore it was evaluated in terms of BLEU score. Given images that correspond to the same semantic as a query image the system output was compared to the corresponding reference summary as provided during the annotation phase, in terms of BLEU score. Finally, the whole system has been qualitatively evaluated by means of a questionnaire. The conclusions reached by the evaluation is that even if the system does not always capture the right human action and subjects and objects involved in it, it produces understandable and efficient in terms of language summaries.O objetivo desta dissertação é sumarização imagem e, mais especificamente, a geração automática de descrições de ações humanas a partir de imagens estáticas. O trabalho foi organizado em três fases principais: a coleta de dados, a implementação do sistema e, finalmente, a sua avaliação. O conjunto de dados é composto por 1.287 imagens que descrevem atividades humanas pertencentes a quatro categorias semânticas: "passear o cão", "andar de bicicleta", "andar a cavalo" e "tocar guitarra". As imagens foram anotadas manualmente com uma abordagem baseada na ideia de 'crowd-sourcing' e a anotação de cada frase foi feita sob a forma de uma ou duas frases simples. O sistema é composto por duas partes: uma parte consiste na recuperação de imagens baseada em conteúdo e a outra parte, que envolve Processamento de Língua Natural. Dada uma imagem para procura, a primeira parte recupera um conjunto de imagens percebidas como visualmente semelhantes e a segunda parte processa as anotações associadas a cada uma dessas imagens, a fim de extrair informações comuns, usando uma técnica de fusão de grafos a partir dos grafos de dependência das frases anotadas. Um caminho ideal consistindo numa relação sujeito-verbo-complemento é então extraído desses grafos e transformado numa frase apropriada, pela aplicação de um conjunto de regras de processamento de superfície. A avaliação do sistema foi realizado de três maneiras diferentes. Em primeiro lugar, o subsistema de recuperação de imagens baseado em conteúdo foi avaliado em termos de precisão e abrangência (recall) e comparado com um limiar de referência (baseline) definido com base num resultado aleatório. A fim de avaliar o subsistema de Processamento de Linguagem Natural, a tarefa de sumarização imagem foi considerada como uma tarefa de tradução automática e foi, portanto, avaliada com base na medida BLEU. Dadas as imagens que correspondem ao mesmo significado da imagem de consulta, a saída do sistema foi comparada com o resumo de referência correspondente, fornecido durante a fase de anotação, utilizando a medida BLEU. Por fim, todo o sistema foi avaliado qualitativamente por meio de um questionário. Em conclusão, verificou-se que o sistema, apesar de nem sempre capturar corretamente a ação humana e os sujeitos ou objetos envolvidos, produz, no entanto, descrições compreensíveis e e linguisticamente adequadas.Erasmus Mundu

    FROM SEMANTIC TO EMOTIONAL SPACE IN SENSE SENTIMENT ANALYSIS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Presences of the Infinite: J.M. Coetzee and Mathematics

    Get PDF
    This thesis articulates the resonances between J. M. Coetzee's lifelong engagement with mathematics and his practice as a novelist, critic, and poet. Though the critical discourse surrounding Coetzee's literary work continues to flourish, and though the basic details of his background in mathematics are now widely acknowledged, his inheritance from that background has not yet been the subject of a comprehensive and mathematically- literate account. In providing such an account, I propose that these two strands of his intellectual trajectory not only developed in parallel, but together engendered several of the characteristic qualities of his finest work. The structure of the thesis is essentially thematic, but is also broadly chronological. Chapter 1 focuses on Coetzee's poetry, charting the increasing involvement of mathematical concepts and methods in his practice and poetics between 1958 and 1979. Chapter 2 situates his master's thesis alongside archival materials from the early stages of his academic career, and thus traces the development of his philosophical interest in the migration of quantificatory metaphors into other conceptual domains. Concentrating on his doctoral thesis and a series of contemporaneous reviews, essays, and lecture notes, Chapter 3 details the calculated ambivalence with which he therein articulates, adopts, and challenges various statistical methods designed to disclose objective truth. Chapter 4 explores the thematisation of several mathematical concepts in Dusklands and In the Heart of the Country. Chapter Five considers Waiting for the Barbarians and Foe in the context provided by Coetzee's interest in the attempts of Isaac Newton to bridge the gap between natural language and the supposedly transparent language of mathematics. Finally, Chapter 6 locates in Elizabeth Costello and Diary of a Bad Year a cognitive approach to the use of mathematical concepts in ethics, politics, and aesthetics, and, by analogy, a central aspect of the challenge Coetzee's late fiction poses to the contemporary literary landscape
    corecore