2,371 research outputs found

    Passage retrieval in legal texts

    Get PDF
    [EN] Legal texts usually comprise many kinds of texts, such as contracts, patents and treaties. These texts usually include a huge quantity of unstructured information written in natural language. Thanks to automatic analysis and Information Retrieval (IR) techniques, it is possible to filter out information that is not relevant and, therefore, to reduce the amount of documents that users need to browse to find the information they are looking for. In this paper we adapted the JIRS passage retrieval system to work with three kinds of legal texts: treaties, patents and contracts, studying the issues related with the processing of this kind of information. In particular, we studied how a passage retrieval system might be linked up to automated analysis based on logic and algebraic programming for the detection of conflicts in contracts. In our set-up, a contract is translated into formal clauses, which are analysed by means of a model checking tool; then, the passage retrieval system is used to extract conflicting sentences from the original contract text. © 2011 Elsevier Inc. All rights reserved.We thank the MICINN (Plan I+D+i) TEXT-ENTERPRISE 2.0: (TIN2009-13391-C04-03) research project. The work of the second author has been possible thanks to a scholarship funded by Maat Gknowledge in the framework of the project with the Universidad Politécnica de Valencia Módulo de servicios semánticos de la plataforma GRosso, P.; Correa García, S.; Buscaldi, D. (2011). Passage retrieval in legal texts. Journal of Logic and Algebraic Programming. 80(3-5):139-153. doi:10.1016/j.jlap.2011.02.001S139153803-

    Valuing All Languages in Europe

    Get PDF
    The VALEUR project (2004-2007) took as its focus the 'additional' languages of Europe. These are defined as all languages in use in contexts where they are not 'national', 'official', or 'dominant' languages. They include 'migrant' languages, 'regional/minority' languages, sign languages and 'non-territorial' languages of diasporas such as Yiddish and Romani. The project team brought together a range of expertise in sociolinguistics and language pedagogy, planning and research from Finland, Netherlands, Poland, Spain and the UK. We took as our starting point Council of Europe policies on plurilingualism and the desirability of promoting linguistic diversity both for individual citizenship and for social cohesion in Europe. Our aim was to map provision for additional languages in Europe, in a more systematic and inclusive way than ever before. We looked at provision at school level for different languages in different contexts in order to identify good practices to be shared. In order to achieve our objectives we drew on the good will and enthusiasm of workshop participants, who provided a wealth of information and insights from 21 of the Council of Europe member states. Our work is not definitive: its purpose is awareness-raising and to stimulate further activity to support the learning of all Europe's languages

    Are Passages Enough? The MIRACLE Team Participation at QA@CLEF2009

    Get PDF
    Preceedins of: 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009.Took place September 30 - October 2, 2009,in Corfu, Greece. The event Web site is http://www.clef-campaign.org/2009.htmlThis paper summarizes the participation of the MIRACLE team in the Multilingual Question Answering Track at CLEF 2009. In this campaign, we took part in the monolingual Spanish task at ResPubliQA and submitted two runs. We have adapted our QA system to the new JRC-Acquis collection and the legal domain. We tested the use of answer filtering and ranking techniques against a baseline system using passage retrieval with no success. The run using question analysis and passage retrieval obtained a global accuracy of 0.33, while the addition of an answer filtering resulted in 0.29. We provide an analy-sis of the results for different questions types to investigate why it is difficult to leverage previous QA techniques. Another task of our work has been the appli-cation of temporal management to QA. Finally we include some discussion of the problems found with the new collection and the complexities of the domain.This work has been partially supported by the Research Network MAVIR (S-0505/TIC/000267) and by the project BRAVO (TIN2007- 67407-C3-01).Publicad

    Machine translation as an underrated ingredient? : solving classification tasks with large language models for comparative research

    Get PDF
    While large language models have revolutionised computational text analysis methods, the field is still tilted towards English language resources. Even as there are pre-trained models for some "smaller" languages, the coverage is far from universal, and pre-training large language models is an expensive and complicated task. This uneven language coverage limits comparative social research in terms of its geographical and linguistic scope. We propose a solution that sidesteps these issues by leveraging transfer learning and open-source machine translation. We use English as a bridge language between Hungarian and Polish bills and laws to solve a classification task related to the Comparative Agendas Project (CAP) coding scheme. Using the Hungarian corpus as training data for model fine-tuning, we categorise the Polish laws into 20 CAP categories. In doing so, we compare the performance of Transformer-based deep learning models (monolinguals, such as BERT, and multilinguals such as XLM-RoBERTa) and machine learning algorithms (e.g., SVM). Results show that the fine-tuned large language models outperform the traditional supervised learning benchmarks but are themselves surpassed by the machine translation approach. Overall, the proposed solution demonstrates a viable option for applying a transfer learning framework for low-resource languages and achieving state-of-the-art results without requiring expensive pre-training

    Unsupervised cross-lingual scaling of political texts

    Get PDF

    Towards a more complex language identity? An investigation of opinions on Scots in a sample of policy makers and others

    Get PDF

    Evaluation of contextual embeddings on less-resourced languages

    Get PDF
    The current dominance of deep neural networks in natural language processing is based on contextual embeddings such as ELMo, BERT, and BERT derivatives. Most existing work focuses on English; in contrast, we present here the first multilingual empirical comparison of two ELMo and several monolingual and multilingual BERT models using 14 tasks in nine languages. In monolingual settings, our analysis shows that monolingual BERT models generally dominate, with a few exceptions such as the dependency parsing task, where they are not competitive with ELMo models trained on large corpora. In cross-lingual settings, BERT models trained on only a few languages mostly do best, closely followed by massively multilingual BERT models

    Decolonial Potential in a Multilingual FYC

    Get PDF
    Scholars in rhetoric and composition have questioned to what extent the field can be decolonial because of the gatekeeping role that writing plays in the university. This article examines the decolonial potential of implementing multilingual practices in first-year composition (fyc), enacting what Walter Mignolo calls “epistemic disobedience” by complicating the primacy of English as the language of knowledge-building. I describe a Spanish-English “bilingual” fyc course offered at a private university with a Jesuit Catholic heritage. The course is characterized by a translanguaging approach in which Spanish is presented as a valid language for academic writing. The students’ writing highlights the enduring influence of colonialism in the form of monolingual ideology within the linguistically diverse geographical context of Silicon Valley, where the potential of decolonial practices are tempered by the economic power of the tech industry and its hiring practices, which have resulted in a low number of employed women and minorities in comparison to both national employment levels and diversity within the region

    English-Only policy and belief in the United States

    Get PDF
    English-Only initiatives are commonplace in the United States. Proponents of Official English would like to make the official language of the United States English despite the prestige English already has in the United States. The motivations behind this movement are varied and have substantial effects on the opinion of the American population. This paper examines a group of American residents in the Northeast, aged 18 and older. States considered Northeastern in this study are Vermont, Maine, New Hampshire, Massachusetts, New York, New Jersey, Rhode Island, Connecticut and Pennsylvania. The survey distributed contains questions on the topic of English- only issues, languages in general, and the role of language in participant’s personal lives. This survey tested assumptions about English-only attitudes and language use against the data contributed by participants. The findings confirm that English- only attitudes are pervasive in American society, that education is necessary to further compete against prevailing negative ideologies and beliefs, and that continued survey can accomplish more work and research in this area of study

    Image Pivoting for Learning Multilingual Multimodal Representations

    Get PDF
    In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.Comment: 7 pages, EMNLP 201
    corecore