1,512 research outputs found

    Using Conjunctions and Adverbs for Author Verification

    Get PDF
    Abstract: Linguistics and stylistics have been investigated for author identification for quite a while, but recently, we have testified a impressive growth in the volume with which lawyers and courts have called upon the expertise of linguists in cases of disputed authorship. This motivates computer science researchers to look to the problem of author identification from a different perspective. In this work, we propose a stylometric feature set based on conjunctions and adverbs of the Portuguese language to address the problem of author identification. Two different approaches of classification were considered. The first one is called writer-independent and it reduces the pattern recognition problem to a single model and two classes, hence, makes it possible to build robust system even when few genuine samples per writer are available. The second one is called the personal model, or writer-dependent, which very often performs better but needs a bigger number of samples per writer. Experiments on a database composed of short articles from 30 different authors and Support Vector Machine (SVM) as classifier demonstrate that the proposed strategy can produced results comparable to the literature

    Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews

    Get PDF
    This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews

    Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus

    Get PDF
    The evaluative character of a word is called its semantic orientation. A positive semantic orientation implies desirability (e.g., "honest", "intrepid") and a negative semantic orientation implies undesirability (e.g., "disturbing", "superfluous"). This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora. The method involves issuing queries to a Web search engine and using pointwise mutual information to analyse the results. The algorithm is empirically evaluated using a training corpus of approximately one hundred billion words — the subset of the Web that is indexed by the chosen search engine. Tested with 3,596 words (1,614 positive and 1,982 negative), the algorithm attains an accuracy of 80%. The 3,596 test words include adjectives, adverbs, nouns, and verbs. The accuracy is comparable with the results achieved by Hatzivassiloglou and McKeown (1997), using a complex four-stage supervised learning algorithm that is restricted to determining the semantic orientation of adjectives

    Measuring praise and criticism: Inference of semantic orientation from association

    Get PDF
    The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., "honest", "intrepid") and negative semantic orientation indicates criticism (e.g., "disturbing", "superfluous"). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is allowed to abstain from classifying mild words

    An automatic method for reporting the quality of thesauri

    Get PDF
    Thesauri are knowledge models commonly used for information classification and retrieval whose structure is defined by standards such as the ISO 25964. However, when creators do not correctly follow the specifications, they construct models with inadequate concepts or relations that provide a limited usability. This paper describes a process that automatically analyzes the thesaurus properties and relations with respect to ISO 25964 specification, and suggests the correction of potential problems. It performs a lexical and syntactic analysis of the concept labels, and a structural and semantic analyses of the relations. The process has been tested with Urbamet and Gemet thesauri and the results have been analyzed to determine how well the proposed process works

    English-Arabic Translator Education Through Systemic Functional Linguistics: Analysis of Cohesive Devices in Investopedia Business Texts

    Get PDF
    In translation courses, students are asked to practice translation skills by translating a source text (ST) in a specific field. While teachers usually select texts based on topic and language accuracy, some such texts do not provide rich textual features that help students practice and improve their translation skills. This study aimed to analyze the cohesive features in business texts collected from “Investopedia” to investigate their suitability for use as STs to practice translation skills in the field of finance and administration. It was framed by Halliday’s (1978) systemic functional linguistics (SFL) approach to language and Halliday and Hassan’s (1976) cohesion analysis scheme. The findings demonstrated that the most prominent type used was lexical cohesion, followed by reference and conjunctions. Ellipses and substitution were rarely used. The findings indicated that the intensive use of lexical cohesion and the various subcategories used in these texts can help enrich the background knowledge of financial terminology and provide a communicative understanding of the ST while practicing various elements of textual features. The study provided a demonstration of the significance of SFL in providing coherent and cohesive STs that facilitate the needs of translation instructors and students in the field of finance and administration. Other SFL tools can be employed to provide a better understanding of these texts
    • …
    corecore