10,841 research outputs found

    Towards a Universal Wordnet by Learning from Combined Evidenc

    Get PDF
    Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification

    Using the beat histogram for speech rhythm description and language identification

    Get PDF
    In this paper we present a novel approach for the description of speech rhythm and the extraction of rhythm-related features for automatic language identification (LID). Previous methods have extracted speech rhythm through the calculation of features based on salient elements of speech such as consonants, vowels and syllables. We present how an automatic rhythm extraction method borrowed from music information retrieval, the beat histogram, can be adapted for the analysis of speech rhythm by defining the most relevant novelty functions in the speech signal and extracting features describing their periodicities. We have evaluated those features in a rhythm-based LID task for two multilingual speech corpora using support vector machines, including feature selection methods to identify the most informative descriptors. Results suggest that the method is successful in describing speech rhythm and provides LID classification accuracy comparable to or better than that of other approaches, without the need for a preceding segmentation or annotation of the speech signal. Concerning rhythm typology, the rhythm class hypothesis in its original form seems to be only partly confirmed by our results

    Developing a distributed electronic health-record store for India

    Get PDF
    The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India

    Fluctuations in Learners’ Willingness to Communicate During Communicative Task Performance: Conditions and Tendencies

    Get PDF
    A person’s willingness to communicate (WTC), believed to stem from a combination of proximal and distal variables comprising psychological, linguistic, educational and communicative dimensions of language, appears to be a significant predictor of success in language learning. The ability to communicate is both a means and end of language education, since, on the one hand, being able to express the intended meanings in the target language is generally perceived as the main purpose of any language course and, on the other, linguistic development proceeds in the course of language use. However, MacIntyre (2007, p. 564) observes that some learners, despite extensive study, may never become successful L2 speakers. The inability or unwillingness to sustain contacts with more competent language users may influence the way learners are evaluated in various social contexts. Establishing social networks as a result of frequent communication with target language users is believed to foster linguistic development. WTC, initially considered a stable personality trait and then a result of context-dependent influences, has recently been viewed as a dynamic phenomenon changing its intensity within one communicative event (MacIntyre and Legatto, 2011; MacIntyre et al., 2011). The study whose results are reported here attempts to tap into factors that shape one’s willingness to speak during a communicative task. The measures employed to collect the data - selfratings and surveys - allow looking at the issue from a number of perspectives

    Pivot-based Hybrid Machine Translation to Support Multilingual Communication

    Get PDF
    Machine Translation (MT) is very useful in support- ing multicultural communication. Existing Statistical Machine Translation (SMT) which requires high quality and quantity of corpora and Rule-Based Machine Translation (RBMT) which requires bilingual dictionaries, morphological, syntax, and se- mantic analyzer are scarce for low-resource languages. Due to the lack of language resources, it is difficult to create MT from high-resource languages to low-resource languages like Indonesian ethnic languages. Nevertheless, Indonesian ethnic languages’ characteristics motivate us to introduce a Pivot- Based Hybrid Machine Translation (PHMT) by combining SMT and RBMT with Indonesian as a pivot which we further utilize in a multilingual communication support system. We evaluate PHMT translation quality with fluency and adequacy as metrics and then evaluate usability of the system. Despite the medium average translation quality (3.05 fluency score and 3.06 adequacy score), the 3.71 average mean scores of the usability evaluation indicates that the system is useful to support multilingual collaboration

    Developing language in the primary school: literacy and primary languages (National strategies: primary)

    Get PDF

    European Language Grid

    Get PDF
    This open access book provides an in-depth description of the EU project European Language Grid (ELG). Its motivation lies in the fact that Europe is a multilingual society with 24 official European Union Member State languages and dozens of additional languages including regional and minority languages. The only meaningful way to enable multilingualism and to benefit from this rich linguistic heritage is through Language Technologies (LT) including Natural Language Processing (NLP), Natural Language Understanding (NLU), Speech Technologies and language-centric Artificial Intelligence (AI) applications. The European Language Grid provides a single umbrella platform for the European LT community, including research and industry, effectively functioning as a virtual home, marketplace, showroom, and deployment centre for all services, tools, resources, products and organisations active in the field. Today the ELG cloud platform already offers access to more than 13,000 language processing tools and language resources. It enables all stakeholders to deposit, upload and deploy their technologies and datasets. The platform also supports the long-term objective of establishing digital language equality in Europe by 2030 – to create a situation in which all European languages enjoy equal technological support. This is the very first book dedicated to Language Technology and NLP platforms. Cloud technology has only recently matured enough to make the development of a platform like ELG feasible on a larger scale. The book comprehensively describes the results of the ELG project. Following an introduction, the content is divided into four main parts: (I) ELG Cloud Platform; (II) ELG Inventory of Technologies and Resources; (III) ELG Community and Initiative; and (IV) ELG Open Calls and Pilot Projects

    Reviews

    Get PDF
    Europe In the Round CD‐ROM, Guildford, Vocational Technologies, 1994
    • 

    corecore