5 research outputs found

    A large corpus of product reviews in Portuguese: tackling out-of-vocabulary words

    Get PDF
    Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes.University of SĂŁo PauloSamsung EletrĂ´nica da AmazĂ´nia LtdaFAPESPCNP

    Towards the Classification of the Finnish Internet Parsebank: Detecting Translations and Informality

    Get PDF
    Abstract This paper presents the first results on detecting informality, machine and human translations in the Finnish Internet Parsebank, a project developing a large-scale, web-based corpus with full morphological and syntactic analyses. The paper aims at classifying the Parsebank according to these criteria, as well as studying the linguistic characteristics of the classes. The features used include both lexical and morpho-syntactic properties, such as syntactic n-grams. The results are practically applicable, with an AUC range of 85-85% for the human, ∟ 98% for the machine translated texts and 73% for the informal texts. While word-based classification performs well for the indomain experiments, delexicalized methods with morpho-syntactic features prove to be more tolerant to variation caused by genre or source language. In addition, the results show that the features used in the classification provide interesting pointers for further, more detailed studies on the linguistic characteristics of these texts

    Contract Meta-Interpretation

    Get PDF
    This Article provides a general framework for resolving the contract law’s ambivalence between textualism and contextualism, one of the most difficult questions in modern contract interpretation. Simply put, the Article’s argument is that courts need to determine the parties’ preferences as to how their contracts should be interpreted; this “meta-interpretive” inquiry can then direct the court’s interpretation or construction of the parties’ substantive rights and duties. Moreover, the Article argues that while contextualist interpretation is not, and should not be, mandatory for all interpretive questions under contract law, contextualism is necessary to resolve the initial “meta-interpretive” question: What interpretive regime do the parties prefer? Recognizing this distinction, and applying this twostep inquiry, can resolve some of the academic and practical debates between textualists and contextualists, and it can also explain some features of modern contract law

    Contract Meta-Interpretation

    Get PDF
    This Article provides a general framework for resolving the contract law’s ambivalence between textualism and contextualism, one of the most difficult questions in modern contract interpretation. Simply put, the Article’s argument is that courts need to determine the parties’ preferences as to how their contracts should be interpreted; this “meta-interpretive” inquiry can then direct the court’s interpretation or construction of the parties’ substantive rights and duties. Moreover, the Article argues that while contextualist interpretation is not, and should not be, mandatory for all interpretive questions under contract law, contextualism is necessary to resolve the initial “meta-interpretive” question: What interpretive regime do the parties prefer? Recognizing this distinction, and applying this twostep inquiry, can resolve some of the academic and practical debates between textualists and contextualists, and it can also explain some features of modern contract law
    corecore