344 research outputs found
Can Subcategorisation Probabilities Help a Statistical Parser?
Research into the automatic acquisition of lexical information from corpora
is starting to produce large-scale computational lexicons containing data on
the relative frequencies of subcategorisation alternatives for individual
verbal predicates. However, the empirical question of whether this type of
frequency information can in practice improve the accuracy of a statistical
parser has not yet been answered. In this paper we describe an experiment with
a wide-coverage statistical grammar and parser for English and
subcategorisation frequencies acquired from ten million words of text which
shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st
Vocabulary Through Affixes and Word Families - A Computer-Assisted Language Learning Program for Adult ELL Students
Vocabulary plays an important role in language learning of ELL (English Language Learner) students. This work discusses the importance of metalinguistic awareness in teaching vocabulary to adult English Language Learners at an intermediate- or advanced-level of English language proficiency with an emphasis on learning vocabulary through word families and increased morphological awareness. The main contribution is a computer-based program that guides users through a series of interactive reading and vocabulary practice exercises which allow them to explore and learn how certain words are connected through word families and how some of the most common affixes in English can affect the meaning and grammatical function of words. Unlike most existing Computer-Assisted Language Learning systems, the number of vocabulary practice exercises it can produce is unlimited, as is the range of reading materials it can analyze, including text supplied by the users
An investigation into lemmatization in Southern Sotho
Lemmatization refers to the process whereby a lexicographer
assigns a specific place in a dictionary to a word which he
regards as the most basic form amongst other related forms. The
fact that in Bantu languages formative elements can be added to
one another in an often seemingly interminable series till quite
long words are produced, evokes curiosity as far as lemmatization
is concerned. Being aware of the productive nature of Southern
Sotho it is interesting to observe how lexicographers go about
handling the question of morphological complexities they are
normally faced with in the process of arranging lexical items.
This study has shown that some difficulties are encountered as
far as adhering to the traditional method of alphabetization is
concerned. It does not aim at proposing solutions but does point
out some considerations which should be borne in mind in the
process of lemmatization.African LanguagesM.A. (African Languages
Investigar a variação linguística em inglês com a Internet e Corpora
Mestrado em Estudos InglesesA preocupação desta dissertação está em acompanhar as variações
linguísticas que têm surgido em grande parte através do uso da Internet. Estas
também involvem mudanças culturais que são expressas através de novas
formas de palavras. Os processos de formação das palavras são estudados
para analizar as mudanças apresentadas nos casos prácticos implementados
tal como exemplificou Aitchison (1994). Discute-se também a corpora
informatizada e o recurso à produção de corpora específicas de modo a
analisar a variação linguística. Os resultados da pesquisa levada a cabo têm
implicações quer para os professores, quer para os estudantes de linguagem
que precisam estar aptos a descobrir o uso do Inglês moderno. Contudo, os
resultados vão muito para além disto e mostram as implicações educacionais
envolvidas nas variações linguísticas aqui estudadas.
ABSTRACT: This dissertation is concerned with tracking language changes which are seen
to have come about largely through the use of the internet. They also involve
cultural changes which are expressed through new word forms. The processes
of word formation are examined in order to analyse the changes presented in
the case studies carried out as exemplified by Aitchison (1994). There is also a
discussion of computer corpora and recourse to the production of specific
corpora in order to examine language change. The results of the research
carried out have implications for both teachers and language learners who
need to be able to find out about modern English usage. The results go much
further than this however and show the wider educational implications involved
in the language changes studied here
Baltic Journal of English Language, Literature and Culture, Vol.10
Kontorslokaler nyttjas generellt cirka 2500 av årets 8760 timmar. Ett vanligt problem med kontorslokaler är det termiska klimatet, antingen är det för varmt, för kallt, eller så drar det. Höga temperaturer, över ca 26°C, bidrar till trötthet, nedsatt koncentration och gör att luften känns mindre fräsch. Stora variationen av lasten mellan dag och nattetid kan också resultera i att lokalerna överventileras under nattetid och underventileras under dagtid. Syftet med examensarbetet var att undersöka och jämföra Ecoclimes komforttaks lösning med andra olika värme och kylsystem i kontorslokaler. Att undersöka vilka eventuella fördelar Ecoclimes komforttak har gällande komfort, kyla, ventilation och ur energisynpunkt. Simuleringsprogrammet IDA ICE har använts för att simulera komforten och rumstemperaturer för ett kontor och ett konferensrum i en byggnad placerad i centrala Umeå. Resultaten från simuleringar indikerar att Ecoclimes komforttak, sänker den operativa temperaturen och höjer komforten med en mindre andel missnöjda i sitt rum jämfört med andra system trots samma rumstemperatur. För att bedömma andelen missnöjda i ett rum har komfortindexet PMV(Predicted mean vote) och PPD(Predicted percentage dissatisfied) använts. Den höga passiva effekten bidrar också till mindre energianvändning av ventilationsfläktar ifall ett VAV-system med rumstempertaurreglering används. Vidare har en känslighetsanalys genomförts på komforttaken där det undersöks hur kyleffekten påverkar kyltider, temperatur och komfort. Känslighetsanalysen visar att en ökning eller minskning av kyleffekten med 10% påverkar resultaten mest under en mycket varm dag jämfört med en normalvarm. Skillnaden i komfort var dock liten, endast 0,2 procentenheter från grundfallet
D6.1: Technologies and Tools for Lexical Acquisition
This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)
D7.1. Criteria for evaluation of resources, technology and integration.
This deliverable defines how evaluation is carried out at each integration cycle in the PANACEA project. As PANACEA aims at producing large scale resources, evaluation becomes a critical and challenging issue. Critical because it is important to assess the quality of the results that should be delivered to users. Challenging because we prospect rather new areas, and through a technical platform: some new methodologies will have to be explored or old ones to be adapted
Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania.
It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition.
Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html
In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report.
The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn
- …