5 research outputs found

    Characterization of Prose by Rhetorical Structure for Machine Learning Classification

    Get PDF
    Measures of classical rhetorical structure in text can improve accuracy in certain types of stylistic classification tasks such as authorship attribution. This research augments the relatively scarce work in the automated identification of rhetorical figures and uses the resulting statistics to characterize an author\u27s rhetorical style. These characterizations of style can then become part of the feature set of various classification models. Our Rhetorica software identifies 14 classical rhetorical figures in free English text, with generally good precision and recall, and provides summary measures to use in descriptive or classification tasks. Classification models trained on Rhetorica\u27s rhetorical measures paired with lexical features typically performed better at authorship attribution than either set of features used individually. The rhetorical measures also provide new stylistic quantities for describing texts, authors, genres, etc

    Automated Annotation and Visualization of Rhetorical Figures

    Get PDF
    Linguistic annotation provides additional information asserted with a particular purpose in a document or other piece of information. It is widely used in various fields, from computing and bioinformatics, through imaging, to law and linguistics. There is also a clear distinction between what is communicated through the written/spoken natural language and how this is passed on. A new problem of linguistic annotation is the annotation of classical rhetorical figures --- patterns of text in which a characteristic syntactic form modifies the standard meanings of words, and leads to a change or an extension of meaning. Rhetoric studies the effectiveness of language comprehensively, including its emotional impact, as much as its propositional content. The annotation of rhetorical figures is therefore important not only for the linguistic point of view, but also for discovering different styles of writing, purpose and effect of written documents, and for better natural language understanding in general. The purpose of this thesis is the automated annotation of rhetorical figures. In the thesis we primarily focus on the figures of repetition, which include the repetition of words, phrases, and clauses. Additionally, we also describe the work we have done on the detection and annotation of figures of parallelism, as well as those that pertain more to the semantics than to the syntax, or positioning. We have developed a rhetorical figure annotation tool dubbed JANTOR (Java ANnotation Tool Of Rhetoric), which enables manual and automated annotation of files in HTML format. We have applied a lexicalized probabilistic context-free grammar parser for the recognition of the figures of repetition. We also describe a simple parse tree distance used for calculating the difference between similarly structured phrases, which is necessary for the recognition of some of the figures of parallelism. Moreover, we have applied the semantic relationships contained in the WordNet lexical database and extended Porter stemmer algorithm for finding derivationally related words. Finally, we present a method for finding pairs of words which are ordinarily contradictory, which is crucial for detecting the interesting figure of speech: oxymoron. For this purpose typed dependency grammars together with WordNet are used. The experiments we have conducted on the detection of selected subset of rhetorical figures have yielded very promising results. Lastly, we present the visualization of the occurrences of the figures and comparison between 14 American presidents' inaugural addresses including the most recent one by President Barack Obama. The provocative results of this comparison show that a) automated analysis of meaningful rhetorical information is possible and tractable, and b) help us with understanding what creates a successful orator

    Bell inequalities in cardinality-based similarity measurement

    Get PDF
    In this thesis a parametric family of cardinality-based similarity measures for ordinary sets (on a finite universe) harbouring numerous well-known similarity measures is introduced. The Lukasiewicz- and product-transitive members of this family are characterized. Their importance derives from the one-to-one correspondence with pseudo-metrics. Also a parametric family of cardinality-based inclusion measures for ordinary sets (on a finite universe) is introduced, and the Lukasiewicz- and product-transitivity properties are also studied. Fuzzification schemes based on a commutative quasi-copula are then used to transform these similarity and inclusion measures for ordinary sets into similarity and inclusion measures for fuzzy sets on a finite universe, rendering them applicable on graded feature set representations of objects. One of the main results of this thesis is that transitivity, and hence the corresponding dual metrical interpretation (for similarity measures only), is preserved along this fuzzification process. It is remarkable that one stumbles across the same inequalities that should be fulfilled when checking these transitivity properties. The inequalities are known as the Bell inequalities. All Bell-type inequalities regarding at most four random events of which not more than two are intersected at the same time are presented in this work and are reformulated in the context of fuzzy scalar cardinalities leading to related inequalities on commutative conjunctors. It is proven that some of these inequalities are fulfilled for commutative (quasi-)copulas and for the most important families of Archimedean t-norms and each of the inequalities, the parameter values such that the corresponding t-norms satisfy the inequality considered, are identified. Meta-theorems, stating general conditions ensuring that certain inequalities for cardinalities of ordinary sets are preserved under fuzzification, when adopting a scalar approach to fuzzy set cardinality, are presented. The conditions pertain to a commutative conjunctor used for modeling fuzzy set intersection. In particular, this conjunctor should fulfill a number of Bell-type inequalities. The advantage of these meta-theorems is that repetitious calculations (for example, when checking the transitivity properties of fuzzy similarity measures) can be avoided
    corecore