4,106 research outputs found

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Application of fuzzy sets in data-to-text system

    Get PDF
    This PhD dissertation addresses the convergence of two distinct paradigms: fuzzy sets and natural language generation. The object of study is the integration of fuzzy set-derived techniques that model imprecision and uncertainty in human language into systems that generate textual information from numeric data, commonly known as data-to-text systems. This dissertation covers an extensive state of the art review, potential convergence points, two real data-to-text applications that integrate fuzzy sets (in the meteorology and learning analytics domains), and a model that encompasses the most relevant elements in the linguistic description of data discipline and provides a framework for building and integrating fuzzy set-based approaches into natural language generation/data-to-ext systems

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

    Sentiment classification with case-base approach

    Get PDF
    L'augmentation de la croissance des rĂ©seaux, des blogs et des utilisateurs des sites d'examen sociaux font d'Internet une Ă©norme source de donnĂ©es, en particulier sur la façon dont les gens pensent, sentent et agissent envers diffĂ©rentes questions. Ces jours-ci, les opinions des gens jouent un rĂŽle important dans la politique, l'industrie, l'Ă©ducation, etc. Alors, les gouvernements, les grandes et petites industries, les instituts universitaires, les entreprises et les individus cherchent Ă  Ă©tudier des techniques automatiques fin d’extraire les informations dont ils ont besoin dans les larges volumes de donnĂ©es. L’analyse des sentiments est une vĂ©ritable rĂ©ponse Ă  ce besoin. Elle est une application de traitement du langage naturel et linguistique informatique qui se compose de techniques de pointe telles que l'apprentissage machine et les modĂšles de langue pour capturer les Ă©valuations positives, nĂ©gatives ou neutre, avec ou sans leur force, dans des texte brut. Dans ce mĂ©moire, nous Ă©tudions une approche basĂ©e sur les cas pour l'analyse des sentiments au niveau des documents. Notre approche basĂ©e sur les cas gĂ©nĂšre un classificateur binaire qui utilise un ensemble de documents classifies, et cinq lexiques de sentiments diffĂ©rents pour extraire la polaritĂ© sur les scores correspondants aux commentaires. Puisque l'analyse des sentiments est en soi une tĂąche dĂ©pendante du domaine qui rend le travail difficile et coĂ»teux, nous appliquons une approche «cross domain» en basant notre classificateur sur les six diffĂ©rents domaines au lieu de le limiter Ă  un seul domaine. Pour amĂ©liorer la prĂ©cision de la classification, nous ajoutons la dĂ©tection de la nĂ©gation comme une partie de notre algorithme. En outre, pour amĂ©liorer la performance de notre approche, quelques modifications innovantes sont appliquĂ©es. Il est intĂ©ressant de mentionner que notre approche ouvre la voie Ă  nouveaux dĂ©veloppements en ajoutant plus de lexiques de sentiment et ensembles de donnĂ©es Ă  l'avenir.Increasing growth of the social networks, blogs, and user review sites make Internet a huge source of data especially about how people think, feel, and act toward different issues. These days, people opinions play an important role in the politic, industry, education, etc. Thus governments, large and small industries, academic institutes, companies, and individuals are looking for investigating automatic techniques to extract their desire information from large amount of data. Sentiment analysis is one true answer to this need. Sentiment analysis is an application of natural language processing and computational linguistic that consists of advanced techniques such as machine learning and language model approaches to capture the evaluative factors such as positive, negative, or neutral, with or without their strength, from plain texts. In this thesis we study a case-based approach on cross-domain for sentiment analysis on the document level. Our case-based algorithm generates a binary classifier that uses a set of the processed cases, and five different sentiment lexicons to extract the polarity along the corresponding scores from the reviews. Since sentiment analysis inherently is a domain dependent task that makes it problematic and expensive work, we use a cross-domain approach by training our classifier on the six different domains instead of limiting it to one domain. To improve the accuracy of the classifier, we add negation detection as a part of our algorithm. Moreover, to improve the performance of our approach, some innovative modifications are applied. It is worth to mention that our approach allows for further developments by adding more sentiment lexicons and data sets in the future

    Visual Question Answering: A Survey of Methods and Datasets

    Full text link
    Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires reasoning over visual elements of the image and general knowledge to infer the correct answer. In the first part of this survey, we examine the state of the art by comparing modern approaches to the problem. We classify methods by their mechanism to connect the visual and textual modalities. In particular, we examine the common approach of combining convolutional and recurrent neural networks to map images and questions to a common feature space. We also discuss memory-augmented and modular architectures that interface with structured knowledge bases. In the second part of this survey, we review the datasets available for training and evaluating VQA systems. The various datatsets contain questions at different levels of complexity, which require different capabilities and types of reasoning. We examine in depth the question/answer pairs from the Visual Genome project, and evaluate the relevance of the structured annotations of images with scene graphs for VQA. Finally, we discuss promising future directions for the field, in particular the connection to structured knowledge bases and the use of natural language processing models.Comment: 25 page

    Rakenduste kasutajaarvustustest informatsiooni kaevandamine tarkvara arendustegevuste soodustamiseks

    Get PDF
    Kasutajate vajaduste ja ootuste hindamine on arendajate jaoks oluline oma tarkvararakenduste kvaliteedi parandamiseks. Mobiilirakenduste platvormidele sisestatud arvustused on kasulikuks infoallikaks kasutajate pidevalt muutuvate vajaduste hindamiseks. IgapĂ€evaselt rakenduste platvormidele esitatud arvustuste suur maht nĂ”uab aga automaatseid meetodeid neist kasuliku info leidmiseks. Arvustuste automaatseks liigitamiseks, nt veateatis vĂ”i uue funktsionaalsuse kĂŒsimine, saab kasutada teksti klassifitseerimismudeleid. Rakenduse funktsioonide automaatne kaevandamine arvustustest aitab teha kokkuvĂ”tteid kasutajate meelsusest rakenduse olemasolevate funktsioonide osas. KĂ”igepealt eksperimenteerime erinevate tekstiklassifitseerimise mudelitega ning vĂ”rdleme lihtsaid, leksikaalseid tunnuseid kasutavaid mudeleid keerukamatega, mis kasutavad rikkalikke lingvistilisi tunnuseid vĂ”i mis pĂ”hinevad tehisnĂ€rvivĂ”rkudel. Erinevate faktorite mĂ”ju uurimiseks funktsioonide kaevandamise meetoditele me teeme kĂ”igepealt kindlaks erinevate meetodite baastaseme tĂ€psuse rakendades neid samades eksperimentaalsetes tingimustes. SeejĂ€rel vĂ”rdleme neid meetodeid erinevates tingimustes, varieerides treenimiseks kasutatud annoteeritud andmestikke ning hindamismeetodeid. Kuna juhendatud masinĂ”ppel baseeruvad kaevandamismeetodid on vĂ”rreldes reeglipĂ”histega tundlikumad (1) andmete mĂ€rgendamisel kasutatud annoteerimisjuhistele ning (2) mĂ€rgendatatud andmestiku suurusele, siis uurisime nende faktorite mĂ”ju juhendatud masinĂ”ppe kontekstis ja pakkusime vĂ€lja uued annoteerimisjuhised, mis vĂ”ivad aidata funktsioonide kaevandamise tĂ€psust parandada. KĂ€esoleva doktoritöö projekti tulemusel valmis ka kontseptuaalne tööriist, mis vĂ”imaldab konkureerivaid rakendusi omavahel vĂ”rrelda. Tööriist kombineerib arvustuse tekstide klassifitseerimise ja rakenduse funktsioonide kaevandamise meetodid. Tööriista hinnanud kĂŒmme tarkvaraarendajat leidsid, et sellest vĂ”ib olla kasu rakenduse kvaliteedi parandamiselFor app developers, it is important to continuously evaluate the needs and expectations of their users to improve app quality. User reviews submitted to app marketplaces are regarded as a useful information source to re-access evolving user needs. The large volume of user reviews received every day requires automatic methods to find such information in user reviews. Text classification models can be used to categorize review information into types such as feature requests and bug reports, while automatic app feature extraction from user reviews can help in summarizing users’ sentiments at the level of app features. For classifying review information, we perform experiments to compare the performance of simple models using only lexical features to models with rich linguistic features and models built on deep learning architectures, i.e., Convolutional Neural Network (CNN). To investigate factors influencing the performance of automatic app feature extraction methods, i.e. rule-based and supervised machine learning, we first establish a baseline in a single experimental setting and then compare the performances in different experimental settings (i.e., varying annotated datasets and evaluation methods). Since the performance of supervised feature extraction methods is more sensitive than rule- based methods to (1) guidelines used to annotate app features in user reviews and (2) the size of the annotated data, we investigate their impact on the performance of supervised feature extraction models and suggest new annotation guidelines that have the potential to improve feature extraction performance. To make the research results of the thesis project also applicable for non-experts, we developed a proof-of-concept tool for comparing competing apps. The tool combines review classification and app feature extraction methods and has been evaluated by ten developers from industry who perceived it useful for improving the app quality.  https://www.ester.ee/record=b529379
    • 

    corecore