40 research outputs found
Scalable succinct indexing for large text collections
Self-indexes save space by emulating operations of traditional data structures using basic operations on bitvectors. Succinct text indexes provide full-text search functionality which is traditionally provided by suffix trees and suffix arrays for a given text, while using space equivalent to the compressed representation of the text. Succinct text indexes can therefore provide full-text search functionality over inputs much larger than what is viable using traditional uncompressed suffix-based data structures. Fields such as Information Retrieval involve the processing of massive text collections. However, the in-memory space requirements of succinct text indexes during construction have hampered their adoption for large text collections. One promising approach to support larger data sets is to avoid constructing the full suffix array by using alternative indexing representations. This thesis focuses on several aspects related to the scalability of text indexes to larger data sets. We identify practical improvements in the core building blocks of all succinct text indexing algorithms, and subsequently improve the index performance on large data sets. We evaluate our findings using several standard text collections and demonstrate: (1) the practical applications of our improved indexing techniques; and (2) that succinct text indexes are a practical alternative to inverted indexes for a variety of top-k ranked document retrieval problems
Applied Weaving: Mapping Creativity Into Every Strand of a Curriculum
The purpose of this paper is to explore the value of adding creativity skills into curriculum mapping documents at the elementary and middle school level, with the goal of gaining some clarity regarding the intrinsic value and ubiquity of teaching for creativity. The language for the maps was taken from Weaving Creativity Into Every Strand of Your Curriculum (Burnett & Figliotti, 2015). The maps were developed after observations of and meetings with classroom teachers in order to assure their accuracy and authenticity. Because the intended purpose of a curriculum map is to provide a sweeping view of targeted content and skills, the maps are non-specific by their nature. As a result, a major challenge of this project continues to be how to find ways to make these maps meaningful to the greater community, who is largely unaware of the meaning and value of creativity. For the purposes of this project, creativity skills were added to the curriculum maps of grades five through eight at the Elmwood Franklin School in Buffalo, New York
English for science and technology: a computer corpus-based analysis of English science and technology texts for application in higher education
Doutoramento em LinguísticaThis thesis presents two analyses: first the analysis of computer
corpora from undergraduate textbooks to isolate the (American) English
language of science and technology they present; secondly an analysis of
the English language competence of undergraduates starting their
university studies in science and technology. These two analyses are
contrasted in order to apply the results to the design of an English
language syllabus for first year undergraduates.
A frequency and range word list was produced using a large
baseline corpus to contrast with the main corpora taken from physics
and chemistry textbooks on the students’ bibliographies as a resource for
syllabus design. Secondly, four corpora, two main and two sub-corpora
produced from the physics and chemistry textbooks on the bibliographies
of the undergraduates were analysed using Biber’s (1988) algorithms and
functions for variation across speech and writing.
The student intake was tested over five years and the results of
those tests analysed. It was found that there was considerable variation
in the students’ levels of language competence. However, there was a
close correlation between the students’ competence and the number of
years they had studied English in secondary school. Nevertheless there
were students with extremely advanced competence and some with little
or no competence in English amongst the undergraduates.
Comprehension of scientific texts was generally found to correlate with
more advanced competence and more years of study.
The frequency and range word list showed the contexts which are
appropriate for materials to be used with these students and
demonstrated variation from many of the accepted views of the language
of science and technology. The computer corpora analyses varied from
Biber’s academic prose category. The sub-corpora demonstrated greatest
variation which is believed to be as a result of specific cultural and/or
literary material in the analogies used in the textbooks.
The heavy load of cultural background knowledge which the reader
would need in order to work with the textbooks adequately was also
found in the exercises the students were supposed to use for practice on
the topic presented in the chapter. This and the interpretation of visuals
in the textbooks were considered to be two principle factors that needed
to be emphasised in a syllabus for first year undergraduates. However,
given the time constraints on language teaching for science and
technology students, a methodology which would lead to greater student
autonomy is suggested using computer corpus-based studies - data-
viii
driven learning and computer-supported distance communications and
learning.Esta tese apresenta duas análises: primeiro uma análise de corpora
computadorizados, criados a partir de livros dos estudantes de
licenciaturas, para isolar a linguagem Inglesa (Americana) das ciências e
tecnologias que apresentam; segundo uma análise dos conhecimentos da
língua Inglesa que estes alunos apresentam ao iniciar os seus estudos
universitários em ciências e tecnologias. Estas duas análises são postas
em contraste para se aplicar os resultados obtidos ao desenho de um
programa de língua Inglesa para os alunos do primeiro ano.
Foi criada uma lista com a abrangência e a frequência das palavras
de um corpus de larga base, para ser contrastada com os principais
corpora compilados dos livros de física e química constantes das
bibliografias dos estudantes, como uma fonte para o desenho de
programas. Seguidamente, quatro corpora, dois principais e dois
subordinados, produzidos a partir dos livros de física e química referidos
nas bibliografias dos estudantes, foram analisados usando os algoritmos
e funções de Biber (1988) para variações entre linguagem falada e escrita.
Durante cinco anos, à entrada para a Universidade, os estudantes
foram submetidos a testes e os resultados analisados. Constatou-se que
havia variações consideráveis no nível de conhecimentos da língua por
parte dos estudantes. Contudo, havia uma correlação apertada entre as
competências dos estudantes e o número de anos que tinham estudado
Inglês nas escolas secundárias. Todavia, havia estudantes com
competências extremamente avançadas e outros com competências
reduzidas, ou quase nulas, em Inglês. A compreensão de textos
científicos estava geralmente correlacionada com os níveis mais
avançados de competências e maior número de anos de estudo.
A lista com a abrangência e a frequência das palavras mostrou os
contextos apropriados dos materiais a utilizar com estes estudantes e
demonstrou que havia diferenças em relação a muitos dos pontos de
vista aceites em relação à linguagem das ciências e tecnologias. A análise
dos corpora computadorizados varia das categorias da linguagem da
prosa académica de Biber. Os corpora subordinados mostram uma maior
variação, que se julga ser devida a materiais específicos, culturais e/ou
literário, usados nas analogias dos livros de estudo.
O grande peso dos conhecimentos de fundo de que os estudantes
necessitam para trabalhar adequadamente com os livros de estudo foi,
também, encontrado nos exercícios que necessitam de fazer para
praticarem o que está referido nos tópicos dos capítulos. Isto, juntamente
com a interpretação das imagens dos livros, foram considerados os dois
principais factores a precisarem de ser relevados no programa para o
primeiro ano dos estudantes. Contudo, atendendo às restrições de tempo
x
para o ensino de línguas a estudante de ciências e tecnologias, a
metodologia que conduziria a maior autonomia dos alunos será baseada
na utilização de corpora computadorizados (data-driven learning) e
aprendizagem à distância assistida por computador
Recommended from our members
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.
At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1
EPSRC Tier-2 capital grant EP/P020259/
Verbal interaction in mathematics lessons in Anglophone Cameroon.
In 2 volsAvailable from British Library Document Supply Centre- DSC:DX177236 / BLDSC - British Library Document Supply CentreSIGLEGBUnited Kingdo
Think Big: учебное пособие по английскому языку для студентов философского факультета
Данное учебное пособие разработано для студентов философского факультета и содержит аутентичные тексты из зарубежных и отечественных книг, пособий и электронных источников по профильным дисциплинам, преподаваемым на всех отделениях философского факультета Казанского университета: философия, политология, религиоведение, конфликтология и культурология. Тексты пособия включают в себя социально - политические аспекты развития человеческого общества с точки зрения их истории и современного состояния. Материалы пособия прошли апробацию в студенческих группах и могут быть использованы как для аудиторной, так и для самостоятельной работы студентовбакалавриа
The languages of Malta
The purpose of this volume is to present a snapshot of the state of the art of
research on the languages of the Maltese islands, which include spoken
Maltese, Maltese English and Maltese Sign Language. Malta is a tiny, but
densely populated country, with over 422,000 inhabitants spread over only 316
square kilometers. It is a bilingual country, with Maltese and English
enjoying the status of official languages. Maltese is a descendant of Arabic,
but due to the history of the island, it has borrowed extensively from
Sicilian, Italian and English. Furthermore, local dialects still coexist
alongside the official standard language. The status of English as a second
language dates back to British colonial rule, and just as in other former
British colonies, a characteristic Maltese variety of English has developed.
To these languages must be added Maltese Sign Language, which is the language
of the Maltese Deaf community. This was recently recognised as Malta’s third
official language by an act of Parliament in 2016. While a volume such as the
present one can hardly do justice to all aspects of a diverse and complex
linguistic situation, even in a small community like that of Malta, our aim in
editing this book was to shed light on the main strands of research being
undertaken in the Maltese linguistic context. Six of the contributions in this
book focus on Maltese and explore a broad range of topics including:
historical changes in the Maltese sound system; syllabification strategies;
the interaction of prosody and gesture; the constraints regulating
/t/-insertion; the productivity of derivational suffixes; and raising
phenomena. The study of Maltese English, especially with the purpose of
establishing the defining characteristics of this variety of English, is a
relatively new area of research. Three of the papers in this volume deal with
Maltese English, which is explored from the different perspectives of rhythm,
the syntax of nominal phrases, and lexical choice. The last contribution
discusses the way in which Maltese Sign Language (LSM) has evolved alongside
developments in LSM research. In summary, we believe the present volume has
the potential to present a unique snapshot of a complex linguistic situation
in a geographically restricted area. Given the nature and range of topics
proposed, the volume will likely be of interest to researchers in both
theoretical and comparative linguistics, as well as those working with
experimental and corpus-based methodologies. Our hope is that the studies
presented here will also serve to pave the way for further research on the
languages of Malta, encouraging researchers to also take new directions,
including the exploration of variation and sociolinguistic factors which,
while often raised as explanatory constructs in the papers presented here,
remain under-researched
Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022
The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing
Health Sciences undergraduate handbook
2003 undergraduate handbook for the faculty of Health Science
Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022
The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing