Search CORE

40 research outputs found

Scalable succinct indexing for large text collections

Author: Petri M
Publication venue: RMIT University
Publication date: 01/01/2013
Field of study

Self-indexes save space by emulating operations of traditional data structures using basic operations on bitvectors. Succinct text indexes provide full-text search functionality which is traditionally provided by suffix trees and suffix arrays for a given text, while using space equivalent to the compressed representation of the text. Succinct text indexes can therefore provide full-text search functionality over inputs much larger than what is viable using traditional uncompressed suffix-based data structures. Fields such as Information Retrieval involve the processing of massive text collections. However, the in-memory space requirements of succinct text indexes during construction have hampered their adoption for large text collections. One promising approach to support larger data sets is to avoid constructing the full suffix array by using alternative indexing representations. This thesis focuses on several aspects related to the scalability of text indexes to larger data sets. We identify practical improvements in the core building blocks of all succinct text indexing algorithms, and subsequently improve the index performance on large data sets. We evaluate our findings using several standard text collections and demonstrate: (1) the practical applications of our improved indexing techniques; and (2) that succinct text indexes are a practical alternative to inverted indexes for a variety of top-k ranked document retrieval problems

RMIT Research Repository

Applied Weaving: Mapping Creativity Into Every Strand of a Curriculum

Author: Garra Jonathan P
Publication venue: Digital Commons at Buffalo State
Publication date: 01/05/2017
Field of study

The purpose of this paper is to explore the value of adding creativity skills into curriculum mapping documents at the elementary and middle school level, with the goal of gaining some clarity regarding the intrinsic value and ubiquity of teaching for creativity. The language for the maps was taken from Weaving Creativity Into Every Strand of Your Curriculum (Burnett & Figliotti, 2015). The maps were developed after observations of and meetings with classroom teachers in order to assure their accuracy and authenticity. Because the intended purpose of a curriculum map is to provide a sweeping view of targeted content and skills, the maps are non-specific by their nature. As a result, a major challenge of this project continues to be how to find ways to make these maps meaningful to the greater community, who is largely unaware of the meaning and value of creativity. For the purposes of this project, creativity skills were added to the curriculum maps of grades five through eight at the Elmwood Franklin School in Buffalo, New York

Digital Commons at Buffalo State

English for science and technology: a computer corpus-based analysis of English science and technology texts for application in higher education

Author: Howcroft Susan Jean
Publication venue: Universidade de Aveiro
Publication date: 01/01/1999
Field of study

Doutoramento em LinguísticaThis thesis presents two analyses: first the analysis of computer corpora from undergraduate textbooks to isolate the (American) English language of science and technology they present; secondly an analysis of the English language competence of undergraduates starting their university studies in science and technology. These two analyses are contrasted in order to apply the results to the design of an English language syllabus for first year undergraduates. A frequency and range word list was produced using a large baseline corpus to contrast with the main corpora taken from physics and chemistry textbooks on the students’ bibliographies as a resource for syllabus design. Secondly, four corpora, two main and two sub-corpora produced from the physics and chemistry textbooks on the bibliographies of the undergraduates were analysed using Biber’s (1988) algorithms and functions for variation across speech and writing. The student intake was tested over five years and the results of those tests analysed. It was found that there was considerable variation in the students’ levels of language competence. However, there was a close correlation between the students’ competence and the number of years they had studied English in secondary school. Nevertheless there were students with extremely advanced competence and some with little or no competence in English amongst the undergraduates. Comprehension of scientific texts was generally found to correlate with more advanced competence and more years of study. The frequency and range word list showed the contexts which are appropriate for materials to be used with these students and demonstrated variation from many of the accepted views of the language of science and technology. The computer corpora analyses varied from Biber’s academic prose category. The sub-corpora demonstrated greatest variation which is believed to be as a result of specific cultural and/or literary material in the analogies used in the textbooks. The heavy load of cultural background knowledge which the reader would need in order to work with the textbooks adequately was also found in the exercises the students were supposed to use for practice on the topic presented in the chapter. This and the interpretation of visuals in the textbooks were considered to be two principle factors that needed to be emphasised in a syllabus for first year undergraduates. However, given the time constraints on language teaching for science and technology students, a methodology which would lead to greater student autonomy is suggested using computer corpus-based studies - data- viii driven learning and computer-supported distance communications and learning.Esta tese apresenta duas análises: primeiro uma análise de corpora computadorizados, criados a partir de livros dos estudantes de licenciaturas, para isolar a linguagem Inglesa (Americana) das ciências e tecnologias que apresentam; segundo uma análise dos conhecimentos da língua Inglesa que estes alunos apresentam ao iniciar os seus estudos universitários em ciências e tecnologias. Estas duas análises são postas em contraste para se aplicar os resultados obtidos ao desenho de um programa de língua Inglesa para os alunos do primeiro ano. Foi criada uma lista com a abrangência e a frequência das palavras de um corpus de larga base, para ser contrastada com os principais corpora compilados dos livros de física e química constantes das bibliografias dos estudantes, como uma fonte para o desenho de programas. Seguidamente, quatro corpora, dois principais e dois subordinados, produzidos a partir dos livros de física e química referidos nas bibliografias dos estudantes, foram analisados usando os algoritmos e funções de Biber (1988) para variações entre linguagem falada e escrita. Durante cinco anos, à entrada para a Universidade, os estudantes foram submetidos a testes e os resultados analisados. Constatou-se que havia variações consideráveis no nível de conhecimentos da língua por parte dos estudantes. Contudo, havia uma correlação apertada entre as competências dos estudantes e o número de anos que tinham estudado Inglês nas escolas secundárias. Todavia, havia estudantes com competências extremamente avançadas e outros com competências reduzidas, ou quase nulas, em Inglês. A compreensão de textos científicos estava geralmente correlacionada com os níveis mais avançados de competências e maior número de anos de estudo. A lista com a abrangência e a frequência das palavras mostrou os contextos apropriados dos materiais a utilizar com estes estudantes e demonstrou que havia diferenças em relação a muitos dos pontos de vista aceites em relação à linguagem das ciências e tecnologias. A análise dos corpora computadorizados varia das categorias da linguagem da prosa académica de Biber. Os corpora subordinados mostram uma maior variação, que se julga ser devida a materiais específicos, culturais e/ou literário, usados nas analogias dos livros de estudo. O grande peso dos conhecimentos de fundo de que os estudantes necessitam para trabalhar adequadamente com os livros de estudo foi, também, encontrado nos exercícios que necessitam de fazer para praticarem o que está referido nos tópicos dos capítulos. Isto, juntamente com a interpretação das imagens dos livros, foram considerados os dois principais factores a precisarem de ser relevados no programa para o primeiro ano dos estudantes. Contudo, atendendo às restrições de tempo x para o ensino de línguas a estudante de ciências e tecnologias, a metodologia que conduziria a maior autonomia dos alunos será baseada na utilização de corpora computadorizados (data-driven learning) e aprendizagem à distância assistida por computador

Repositório Institucional da Universidade de Aveiro

Recommended from our members

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

Author: Stahlberg Felix
Publication venue: University of Cambridge
Publication date: 17/02/2020
Field of study

With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand. At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1 EPSRC Tier-2 capital grant EP/P020259/

Apollo (Cambridge)

Verbal interaction in mathematics lessons in Anglophone Cameroon.

Author: Breet Felicity Grace
Publication venue
Publication date: 01/01/1993
Field of study

In 2 volsAvailable from British Library Document Supply Centre- DSC:DX177236 / BLDSC - British Library Document Supply CentreSIGLEGBUnited Kingdo

Durham e-Theses

OpenGrey Repository

Think Big: учебное пособие по английскому языку для студентов философского факультета

Author: Арсланова Г. А. (Составитель)
Мельникова О. К. (Составитель)
Сосновская Г. И. (Составитель)
Тябина Д. В. (Составитель)
Publication venue
Publication date: 01/01/2012
Field of study

Данное учебное пособие разработано для студентов философского факультета и содержит аутентичные тексты из зарубежных и отечественных книг, пособий и электронных источников по профильным дисциплинам, преподаваемым на всех отделениях философского факультета Казанского университета: философия, политология, религиоведение, конфликтология и культурология. Тексты пособия включают в себя социально - политические аспекты развития человеческого общества с точки зрения их истории и современного состояния. Материалы пособия прошли апробацию в студенческих группах и могут быть использованы как для аудиторной, так и для самостоятельной работы студентовбакалавриа

Kazan Federal University Digital Repository

The languages of Malta

Author
Publication venue
Publication date: 01/01/2018
Field of study

The purpose of this volume is to present a snapshot of the state of the art of research on the languages of the Maltese islands, which include spoken Maltese, Maltese English and Maltese Sign Language. Malta is a tiny, but densely populated country, with over 422,000 inhabitants spread over only 316 square kilometers. It is a bilingual country, with Maltese and English enjoying the status of official languages. Maltese is a descendant of Arabic, but due to the history of the island, it has borrowed extensively from Sicilian, Italian and English. Furthermore, local dialects still coexist alongside the official standard language. The status of English as a second language dates back to British colonial rule, and just as in other former British colonies, a characteristic Maltese variety of English has developed. To these languages must be added Maltese Sign Language, which is the language of the Maltese Deaf community. This was recently recognised as Malta’s third official language by an act of Parliament in 2016. While a volume such as the present one can hardly do justice to all aspects of a diverse and complex linguistic situation, even in a small community like that of Malta, our aim in editing this book was to shed light on the main strands of research being undertaken in the Maltese linguistic context. Six of the contributions in this book focus on Maltese and explore a broad range of topics including: historical changes in the Maltese sound system; syllabification strategies; the interaction of prosody and gesture; the constraints regulating /t/-insertion; the productivity of derivational suffixes; and raising phenomena. The study of Maltese English, especially with the purpose of establishing the defining characteristics of this variety of English, is a relatively new area of research. Three of the papers in this volume deal with Maltese English, which is explored from the different perspectives of rhythm, the syntax of nominal phrases, and lexical choice. The last contribution discusses the way in which Maltese Sign Language (LSM) has evolved alongside developments in LSM research. In summary, we believe the present volume has the potential to present a unique snapshot of a complex linguistic situation in a geographically restricted area. Given the nature and range of topics proposed, the volume will likely be of interest to researchers in both theoretical and comparative linguistics, as well as those working with experimental and corpus-based methodologies. Our hope is that the studies presented here will also serve to pave the way for further research on the languages of Malta, encouraging researchers to also take new directions, including the exploration of variation and sociolinguistic factors which, while often raised as explanatory constructs in the papers presented here, remain under-researched

Institutional Repository of the Freie Universität Berlin

Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022

Author
Publication venue: TU Wien Academic Press
Publication date: 18/10/2022
Field of study

The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

Directory of Open Access Books (DOAB)