661 research outputs found

    A Theme-Rewriting Approach for Generating Algebra Word Problems

    Full text link
    Texts present coherent stories that have a particular theme or overall setting, for example science fiction or western. In this paper, we present a text generation method called {\it rewriting} that edits existing human-authored narratives to change their theme without changing the underlying story. We apply the approach to math word problems, where it might help students stay more engaged by quickly transforming all of their homework assignments to the theme of their favorite movie without changing the math concepts that are being taught. Our rewriting method uses a two-stage decoding process, which proposes new words from the target theme and scores the resulting stories according to a number of factors defining aspects of syntactic, semantic, and thematic coherence. Experiments demonstrate that the final stories typically represent the new theme well while still testing the original math concepts, outperforming a number of baselines. We also release a new dataset of human-authored rewrites of math word problems in several themes.Comment: To appear EMNLP 201

    Osittain automatisoitujen menetelmien käyttö suorien anglismien tunnistamiseen suomenkielisissä korpusaineistoissa

    Get PDF
    The goal of this thesis is to investigate methods that could help with harvesting neologisms and more specifically anglicisms (i.e. English-sourced borrowings) in Finnish language. The work is partially motivated by the Global Anglicism Database project to gather anglicisms from various languages, which can serve both as an anglicism dictionary and researchers as a source of information for studying language contact and borrowing either in depth for a specific language or cross-linguistically. A systematic way of harvesting anglicisms in current Finnish language from a suitable corpus is devised. The research examines what kinds of data sources suitable for this goal are available, and what would be the criteria for a useful data source; how to use a data source like that to prepare a good list of anglicisms candidates so that there would be as little irrelevant material as possible but so that no anglicisms would not be lost in the process, and how could the candidates be scored so that the more probable anglicisms would appear closer to the top of a candidate list. Several of Language Bank's Finnish language monolingual corpora are considered. The most important criteria are identified to be the size and genre of the corpus and its annotation. The criteria are explored from the description of corpora on Language Bank's website and available literature and by hands-on examination of the data. Other important measures of corpus suitability are the amount of unannotated foreign language material, amount of noise, and potential anglicism proportion in the corpora. This information is gained via meticulous exploration of random samples of the corpora neologism candidate lists and evaluation on previously gained anglicism set. A combination of two corpora with good coverage of known anglicisms and relatively low amount of noise is chosen as the dataset for the next phase of the anglicism identification process. Anglicism candidate lists are prepared by a process of removing tokens irrelevant for anglicism harvesting. That includes an identifiable part of foreign language material in the corpus, formally recognizable noise, known lemmas of the words that were present in Finnish language around the time just before the major influx of English borrowings to Finnish language started, and their inflected forms. Several methods of scoring candidates are devised that would assign better scores to tokens with higher probability to be an anglicism. The score is based on tokens' frequency in the corpus and relative frequency of the character-level n-grams made out of tokens in representative purely English and purely Finnish corpora. The tokens in the candidate list are scored and ordered, and the resulting list is evaluated based on the ranking of a set of previously identified anglicisms. The method is proved to be somewhat effective; the resulting average ranking of known anglicisms is better than it would be in a randomly sorted candidate list

    Making a Third Space for Student Voices in Two Academic Libraries

    Get PDF
    When we think of voices in the library, we have tended to think of them as disruptive, something to control and manage for the sake of the total library environment. The stereotype of the shushing librarian pervades public perception, creating expectations about the kinds of spaces libraries want to create. Voices are not always disruptive, however. Indeed, developing an academic voice is one of the main challenges facing incoming university students, and libraries can play an important role in helping these students find their academic voices. Two initiatives at two different academic libraries are explored here: a Secrets Wall, where students are invited to write and share a secret during exam time while seeing, reading, commenting on the secrets of others; and a librarian and historian team-taught course called History on the Web, which brings together information literacy and the study of history in the digital age. This article examines both projects and considers how critical perspectives on voice and identity might guide our instructional practices, helping students to learn to write themselves into the university. Further, it describes how both the Secrets Wall and the History on the Web projects intentionally create a kind of “Third Space” designed specifically so students can enter it, negotiate with it, interrogate it, and eventually come to be part of it

    Making a Third Space for Student Voices in Two Academic Libraries

    Get PDF
    When we think of voices in the library, we have tended to think of them as disruptive, something to control and manage for the sake of the total library environment. The stereotype of the shushing librarian pervades public perception, creating expectations about the kinds of spaces libraries want to create. Voices are not always disruptive, however. Indeed, developing an academic voice is one of the main challenges facing incoming university students, and libraries can play an important role in helping these students find their academic voices. Two initiatives at two different academic libraries are explored here: a Secrets Wall, where students are invited to write and share a secret during exam time while seeing, reading, commenting on the secrets of others; and a librarian and historian team-taught course called History on the Web, which brings together information literacy and the study of history in the digital age. This article examines both projects and considers how critical perspectives on voice and identity might guide our instructional practices, helping students to learn to write themselves into the university. Further, it describes how both the Secrets Wall and the History on the Web projects intentionally create a kind of “Third Space” designed specifically so students can enter it, negotiate with it, interrogate it, and eventually come to be part of it

    ORÁCULO: Detection of Spatiotemporal Hot Spots of Conflict-Related Events Extracted from Online News Sources

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Geographic Information Systems and ScienceAchieving situational awareness in peace operations requires understanding where and when conflict-related activity is most intense. However, the irregular nature of most factions hinders the use of remote sensing, while winning the trust of the host populations to allow the collection of wide-ranging human intelligence is a slow process. Thus, our proposed solution, ORÁCULO, is an information system which detects spatiotemporal hot spots of conflict-related activity by analyzing the patterns of events extracted from online news sources, allowing immediate situational awareness. To do so, it combines a closed-domain supervised event extractor with emerging hot spots analysis of event space-time cubes. The prototype of ORÁCULO was tested on tweets scraped from the Twitter accounts of local and international news sources covering the Central African Republic Civil War, and its test results show that it achieved near state-of-theart event extraction performance, significant overlap with a reference event dataset, and strong correlation with the hot spots space-time cube generated from the reference event dataset, proving the viability of the proposed solution. Future work will focus on improving the event extraction performance and on testing ORÁCULO in cooperation with peacekeeping organizations. Keywords: event extraction, natural language understanding, spatiotemporal analysis, peace operations, open-source intelligence.Atingir e manter a consciência situacional em operações de paz requer o conhecimento de quando e onde é que a atividade relacionada com o conflito é mais intensa. Porém, a natureza irregular da maioria das fações dificulta o uso de deteção remota, e ganhar a confiança das populações para permitir a recolha de informações é um processo moroso. Assim, a nossa solução proposta, ORÁCULO, consiste num sistema de informações que deteta “hot spots” espácio-temporais de atividade relacionada com o conflito através da análise dos padrões de eventos extraídos de fontes noticiosas online, (incluindo redes sociais), permitindo consciência situacional imediata. Nesse sentido, a nossa solução combina um extrator de eventos de domínio limitado baseado em aprendizagem supervisionada com a análise de “hot spots” emergentes de cubos espaçotempo de eventos. O protótipo de ORÁCULO foi testado em tweets recolhidos de fontes noticiosas locais e internacionais que cobrem a Guerra Civil da República Centro- Africana. Os resultados dos seus testes demonstram que foram conseguidos um desempenho de extração de eventos próximo do estado da arte, uma sobreposição significativa com um conjunto de eventos de referência e uma correlação forte com o cubo espaço-tempo de “hot spots” gerado a partir desse conjunto de referência, comprovando a viabilidade da solução proposta. Face aos resultados atingidos, o trabalho futuro focar-se-á em melhorar o desempenho de extração de eventos e em testar o sistema ORÁCULO em cooperação com organizações que conduzam operações paz

    Making a Third Space for Student Voices in Two Academic Libraries

    Get PDF
    The article examines initiatives including an activity Secrets Wall in which students secretly write secret during exam times and and History on the Web, librarian and historian team-taught course. Topics discussed include creation of third space for student voices, secrets wall offered at the University of Iowa Main Library to help students in final exam and secret wall as a third space for students to offer outlet for authentic self-expression and dialogic information

    Revealing textual polarity patterns with a browser extension

    Get PDF
    We describe a new method to combine sentiment analysis and web augmentation into a browser-based platform enabling visualization of a web document's opinionated expressions and patterns of polarity. The Augmentator extension assists the reader recognizing keywords and paragraphs of polarity sentiment with the idea that by moving a part of the problem of text analysis from statistics and data mining into the realm of human vision and recognition, non-professionals can hopefully benefit more easily from powerful analysis and visualization tools.ye

    How we draw texts: a review of approaches to text visualization and exploration

    Get PDF
    This paper presents a review of approaches to text visualization and exploration. Text visualization and exploration, we argue, constitute a subfield of data visualization, and are fuelled by the advances being made in text analysis research and by the growing amount of accessible data in text format. We propose an original classification for a total of 49 cases based on the visual features of the approaches adopted, identified using an inductive process of analysis. We group the cases (published between 1994 and 2013) in two categories: single-text visualizations and text-collection visualizations, both of which can be explored and compared online
    corecore