661 research outputs found
A Theme-Rewriting Approach for Generating Algebra Word Problems
Texts present coherent stories that have a particular theme or overall
setting, for example science fiction or western. In this paper, we present a
text generation method called {\it rewriting} that edits existing
human-authored narratives to change their theme without changing the underlying
story. We apply the approach to math word problems, where it might help
students stay more engaged by quickly transforming all of their homework
assignments to the theme of their favorite movie without changing the math
concepts that are being taught. Our rewriting method uses a two-stage decoding
process, which proposes new words from the target theme and scores the
resulting stories according to a number of factors defining aspects of
syntactic, semantic, and thematic coherence. Experiments demonstrate that the
final stories typically represent the new theme well while still testing the
original math concepts, outperforming a number of baselines. We also release a
new dataset of human-authored rewrites of math word problems in several themes.Comment: To appear EMNLP 201
Osittain automatisoitujen menetelmien käyttö suorien anglismien tunnistamiseen suomenkielisissä korpusaineistoissa
The goal of this thesis is to investigate methods that could help with harvesting neologisms and more specifically anglicisms (i.e. English-sourced borrowings) in Finnish language. The work is partially motivated by the Global Anglicism Database project to gather anglicisms from various languages, which can serve both as an anglicism dictionary and researchers as a source of information for studying language contact and borrowing either in depth for a specific language or cross-linguistically.
A systematic way of harvesting anglicisms in current Finnish language from a suitable corpus is devised. The research examines what kinds of data sources suitable for this goal are available, and what would be the criteria for a useful data source; how to use a data source like that to prepare a good list of anglicisms candidates so that there would be as little irrelevant material as possible but so that no anglicisms would not be lost in the process, and how could the candidates be scored so that the more probable anglicisms would appear closer to the top of a candidate list.
Several of Language Bank's Finnish language monolingual corpora are considered. The most important criteria are identified to be the size and genre of the corpus and its annotation. The criteria are explored from the description of corpora on Language Bank's website and available literature and by hands-on examination of the data. Other important measures of corpus suitability are the amount of unannotated foreign language material, amount of noise, and potential anglicism proportion in the corpora. This information is gained via meticulous exploration of random samples of the corpora neologism candidate lists and evaluation on previously gained anglicism set. A combination of two corpora with good coverage of known anglicisms and relatively low amount of noise is chosen as the dataset for the next phase of the anglicism identification process.
Anglicism candidate lists are prepared by a process of removing tokens irrelevant for anglicism harvesting. That includes an identifiable part of foreign language material in the corpus, formally recognizable noise, known lemmas of the words that were present in Finnish language around the time just before the major influx of English borrowings to Finnish language started, and their inflected forms.
Several methods of scoring candidates are devised that would assign better scores to tokens with higher probability to be an anglicism. The score is based on tokens' frequency in the corpus and relative frequency of the character-level n-grams made out of tokens in representative purely English and purely Finnish corpora. The tokens in the candidate list are scored and ordered, and the resulting list is evaluated based on the ranking of a set of previously identified anglicisms. The method is proved to be somewhat effective; the resulting average ranking of known anglicisms is better than it would be in a randomly sorted candidate list
Making a Third Space for Student Voices in Two Academic Libraries
When we think of voices in the library, we have tended to think of them as disruptive, something to control and manage for the sake of the total library environment. The stereotype of the shushing librarian pervades public perception, creating expectations about the kinds of spaces libraries want to create. Voices are not always disruptive, however. Indeed, developing an academic voice is one of the main challenges facing incoming university students, and libraries can play an important role in helping these students find their academic voices. Two initiatives at two different academic libraries are explored here: a Secrets Wall, where students are invited to write and share a secret during exam time while seeing, reading, commenting on the secrets of others; and a librarian and historian team-taught course called History on the Web, which brings together information literacy and the study of history in the digital age. This article examines both projects and considers how critical perspectives on voice and identity might guide our instructional practices, helping students to learn to write themselves into the university. Further, it describes how both the Secrets Wall and the History on the Web projects intentionally create a kind of “Third Space” designed specifically so students can enter it, negotiate with it, interrogate it, and eventually come to be part of it
Making a Third Space for Student Voices in Two Academic Libraries
When we think of voices in the library, we have tended to think of them as disruptive, something to control and manage for the sake of the total library environment. The stereotype of the shushing librarian pervades public perception, creating expectations about the kinds of spaces libraries want to create. Voices are not always disruptive, however. Indeed, developing an academic voice is one of the main challenges facing incoming university students, and libraries can play an important role in helping these students find their academic voices. Two initiatives at two different academic libraries are explored here: a Secrets Wall, where students are invited to write and share a secret during exam time while seeing, reading, commenting on the secrets of others; and a librarian and historian team-taught course called History on the Web, which brings together information literacy and the study of history in the digital age. This article examines both projects and considers how critical perspectives on voice and identity might guide our instructional practices, helping students to learn to write themselves into the university. Further, it describes how both the Secrets Wall and the History on the Web projects intentionally create a kind of “Third Space” designed specifically so students can enter it, negotiate with it, interrogate it, and eventually come to be part of it
ORÁCULO: Detection of Spatiotemporal Hot Spots of Conflict-Related Events Extracted from Online News Sources
Dissertation presented as the partial requirement for obtaining a Master's degree in Geographic Information Systems and ScienceAchieving situational awareness in peace operations requires understanding
where and when conflict-related activity is most intense. However, the irregular nature
of most factions hinders the use of remote sensing, while winning the trust of the host
populations to allow the collection of wide-ranging human intelligence is a slow process.
Thus, our proposed solution, ORÁCULO, is an information system which detects
spatiotemporal hot spots of conflict-related activity by analyzing the patterns of events
extracted from online news sources, allowing immediate situational awareness. To do so,
it combines a closed-domain supervised event extractor with emerging hot spots analysis
of event space-time cubes. The prototype of ORÁCULO was tested on tweets scraped
from the Twitter accounts of local and international news sources covering the Central
African Republic Civil War, and its test results show that it achieved near state-of-theart
event extraction performance, significant overlap with a reference event dataset, and
strong correlation with the hot spots space-time cube generated from the reference event
dataset, proving the viability of the proposed solution. Future work will focus on
improving the event extraction performance and on testing ORÁCULO in cooperation
with peacekeeping organizations.
Keywords: event extraction, natural language understanding, spatiotemporal analysis,
peace operations, open-source intelligence.Atingir e manter a consciência situacional em operações de paz requer o
conhecimento de quando e onde é que a atividade relacionada com o conflito é mais
intensa. Porém, a natureza irregular da maioria das fações dificulta o uso de deteção
remota, e ganhar a confiança das populações para permitir a recolha de informações é
um processo moroso. Assim, a nossa solução proposta, ORÁCULO, consiste num sistema
de informações que deteta “hot spots” espácio-temporais de atividade relacionada com o
conflito através da análise dos padrões de eventos extraídos de fontes noticiosas online,
(incluindo redes sociais), permitindo consciência situacional imediata. Nesse sentido, a
nossa solução combina um extrator de eventos de domínio limitado baseado em
aprendizagem supervisionada com a análise de “hot spots” emergentes de cubos espaçotempo
de eventos. O protótipo de ORÁCULO foi testado em tweets recolhidos de fontes
noticiosas locais e internacionais que cobrem a Guerra Civil da República Centro-
Africana. Os resultados dos seus testes demonstram que foram conseguidos um
desempenho de extração de eventos próximo do estado da arte, uma sobreposição
significativa com um conjunto de eventos de referência e uma correlação forte com o
cubo espaço-tempo de “hot spots” gerado a partir desse conjunto de referência,
comprovando a viabilidade da solução proposta. Face aos resultados atingidos, o
trabalho futuro focar-se-á em melhorar o desempenho de extração de eventos e em testar
o sistema ORÁCULO em cooperação com organizações que conduzam operações paz
Making a Third Space for Student Voices in Two Academic Libraries
The article examines initiatives including an activity Secrets Wall in which students secretly write secret during exam times and and History on the Web, librarian and historian team-taught course. Topics discussed include creation of third space for student voices, secrets wall offered at the University of Iowa Main Library to help students in final exam and secret wall as a third space for students to offer outlet for authentic self-expression and dialogic information
Revealing textual polarity patterns with a browser extension
We describe a new method to combine sentiment analysis and web augmentation into a browser-based platform enabling visualization of a web document's opinionated expressions and patterns of polarity.
The Augmentator extension assists the reader recognizing keywords and paragraphs of polarity sentiment with the idea that by moving a part of the problem of text analysis from statistics and data mining into the realm of human vision and recognition, non-professionals can hopefully benefit more easily from powerful analysis and visualization tools.ye
How we draw texts: a review of approaches to text visualization and exploration
This paper presents a review of approaches to text visualization and exploration. Text visualization and exploration, we argue, constitute a subfield of data visualization, and are fuelled by the advances being made in text analysis research and by the growing amount of accessible data in text format. We propose an original classification for a total of 49 cases based on the visual features of the approaches adopted, identified using an inductive process of analysis. We group the cases (published between 1994 and 2013) in two categories: single-text visualizations and text-collection visualizations, both of which can be explored and compared online
- …