598 research outputs found

    Multi-language transfer learning for low-resource legal case summarization

    Get PDF
    Analyzing and evaluating legal case reports are labor-intensive tasks for judges and lawyers, who usually base their decisions on report abstracts, legal principles, and commonsense reasoning. Thus, summarizing legal documents is time-consuming and requires excellent human expertise. Moreover, public legal corpora of specific languages are almost unavailable. This paper proposes a transfer learning approach with extractive and abstractive techniques to cope with the lack of labeled legal summarization datasets, namely a low-resource scenario. In particular, we conducted extensive multi- and cross-language experiments. The proposed work outperforms the state-of-the-art results of extractive summarization on the Australian Legal Case Reports dataset and sets a new baseline for abstractive summarization. Finally, syntactic and semantic metrics assessments have been carried out to evaluate the accuracy and the factual consistency of the machine-generated legal summaries

    Building a Text Collection for Urdu Information Retrieval

    Get PDF
    Urdu is a widely spoken language in the Indian subcontinent with over 300 million speakers worldwide. However, linguistic advancements in Urdu are rare compared to those in other European and Asian languages. Therefore, by following Text Retrieval Conference standards, we attempted to construct an extensive text collection of 85 304 documents from diverse categories covering over 52 topics with relevance judgment sets at 100 pool depth. We also present several applications to demonstrate the effectiveness of our collection. Although this collection is primarily intended for text retrieval, it can also be used for named entity recognition, text summarization, and other linguistic applications with suitable modifications. Ours is the most extensive existing collection for the Urdu language, and it will be freely available for future research and academic education

    The Nature of attachment:An Australian experience

    Get PDF
    Throughout the world, protected area management regimes typically separate cultural and natural heritage in legislation, policy, administrative structures, disciplinary expertise, and on-ground practice. Within settler colonial nations, including Australia, cultural heritage is itself habitually separated into indigenous heritage and 'historic' (or non-indigenous) heritage. A consequence of these multiple binaries and disconnected regimes is that they work across rather than with one another. In this chapter, I use the frame of place-attachment to consider issues arising from the separation of natural and cultural heritage in the management of protected areas. The case examples are homestead gardens within protected areas, and my concern is for the recognition of Anglo-Australian place-attachment to domestic gardens.</p

    PRILJ: an efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments

    Get PDF
    In an era characterized by fast technological progress that introduces new unpredictable scenarios every day, working in the law field may appear very difficult, if not supported by the right tools. In this respect, some systems based on Artificial Intelligence methods have been proposed in the literature, to support several tasks in the legal sector. Following this line of research, in this paper we propose a novel method, called PRILJ, that identifies paragraph regularities in legal case judgments, to support legal experts during the redaction of legal documents. Methodologically, PRILJ adopts a two-step approach that first groups documents into clusters, according to their semantic content, and then identifies regularities in the paragraphs for each cluster. Embedding-based methods are adopted to properly represent documents and paragraphs into a semantic numerical feature space, and an Approximated Nearest Neighbor Search method is adopted to efficiently retrieve the most similar paragraphs with respect to the paragraphs of a document under preparation. Our extensive experimental evaluation, performed on a real-world dataset provided by EUR-Lex, proves the effectiveness and the efficiency of the proposed method. In particular, its ability of modeling different topics of legal documents, as well as of capturing the semantics of the textual content, appear very beneficial for the considered task, and make PRILJ very robust to the possible presence of noise in the data

    Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

    Get PDF
    The spread of Hate Speech on online platforms is a severe issue for societies and requires the identification of offensive content by platforms. Research has modeled Hate Speech recognition as a text classification problem that predicts the class of a message based on the text of the message only. However, context plays a huge role in communication. In particular, for short messages, the text of the preceding tweets can completely change the interpretation of a message within a discourse. This work extends previous efforts to classify Hate Speech by considering the current and previous tweets jointly. In particular, we introduce a clearly defined way of extracting context. We present the development of the first dataset for conversational-based Hate Speech classification with an approach for collecting context from long conversations for code-mixed Hindi (ICHCL dataset). Overall, our benchmark experiments show that the inclusion of context can improve classification performance over a baseline. Furthermore, we develop a novel processing pipeline for processing the context. The best-performing pipeline uses a fine-tuned SentBERT paired with an LSTM as a classifier. This pipeline achieves a macro F1 score of 0.892 on the ICHCL test dataset. Another KNN, SentBERT, and ABC weighting-based pipeline yields an F1 Macro of 0.807, which gives the best results among traditional classifiers. So even a KNN model gives better results with an optimized BERT than a vanilla BERT model

    Native language identification of fluent and advanced non-native writers

    Get PDF
    This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in April 2020, available online: https://doi.org/10.1145/3383202 The accepted version of the publication may differ from the final published version.Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language acquisition and require the learner corpora. This article performs NLI in a challenging context of the user-generated-content (UGC) where authors are fluent and advanced non-native speakers of a second language. Existing NLI studies with UGC (i) rely on the content-specific/social-network features and may not be generalizable to other domains and datasets, (ii) are unable to capture the variations of the language-usage-patterns within a text sample, and (iii) are not associated with any outlier handling mechanism. Moreover, since there is a sizable number of people who have acquired non-English second languages due to the economic and immigration policies, there is a need to gauge the applicability of NLI with UGC to other languages. Unlike existing solutions, we define a topic-independent feature space, which makes our solution generalizable to other domains and datasets. Based on our feature space, we present a solution that mitigates the effect of outliers in the data and helps capture the variations of the language-usage-patterns within a text sample. Specifically, we represent each text sample as a point set and identify the top-k stylistically similar text samples (SSTs) from the corpus. We then apply the probabilistic k nearest neighbors’ classifier on the identified top-k SSTs to predict the native languages of the authors. To conduct experiments, we create three new corpora where each corpus is written in a different language, namely, English, French, and German. Our experimental studies show that our solution outperforms competitive methods and reports more than 80% accuracy across languages.Research funded by Higher Education Commission, and Grants for Development of New Faculty Staff at Chulalongkorn University | Digital Economy Promotion Agency (# MP-62-0003) | Thailand Research Funds (MRG6180266 and MRG6280175).Published versio

    Gendered economies of extraction: seeking permanence amidst the rubble of Bengaluru’s construction industry

    Get PDF
    Bengaluru, the capital of Karnataka state in southern India, has undergone rapid transformation in recent decades, from ‘Pensioner’s paradise’ to ‘The city of the future’. At present, its kinetic landscape reflects the competing aspirations of an array of global investors, entrepreneurs, local politicians and would-be real estate moguls. But what of those who lay the foundations for this possibility? By focusing on the lives and labour of women and their families working in construction, this thesis sheds light on a frequently overlooked demographic of co-contributors to Bengaluru’s growth. Attending to the precarity experienced by interlocutors, this thesis situates women’s endeavours to establish familial forms of permanence through an ethic of pragmatism. Illustrating how such projects are strived towards, it examines the cultivation of pragmatism to navigate various aspects of the city, and beyond. By acknowledging the resources long-term resident communities may acquire in the city, this thesis also examines the contrasting liminality experienced by migrant workers, who have scant access to these. In doing so, it attends to the ways in which urban precarity is shaped and harnessed by real estate developers seeking to maximise profit and devolve the financial risks of industry speculation. Illuminating how hegemonic masculinity informs these actions and subsequently, who is able to speculate, this thesis attends to the gendered relations that belie economies of extraction. Making visible employer strategies to maintain flexible labour, it also explores workers’ efforts to counter precarity via the state and their own forms of collective organisation. Utilising ethnographic data collected during fieldwork between October 2014-May 2016 and August 2016-February 2017, this thesis provides a nuanced perspective of the gendered relations of production and social reproduction; and the political life that unfolds between them. Attending to the intersectionality of precarious labour conditions, it contributes to the overlapping fields of the anthropology of work, gender, and economics
    corecore