Search CORE

566 research outputs found

Polarity analisys od reviews based on the omission of asymetric sentences

Author: Martí Antonin M. Antònia
Roberto Rodríguez John Alexander
Salamó Llorente Maria
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Publication date: 25/02/2019
Field of study

In this paper, we present a novel approach to polarity analysis of product reviews which detects and removes sentences with the opposite polarity to that of the entire document (asymmetric sentences) as a previous step to identify positive and negative reviews. We postulate that asymmetric sentences are morpho-syntactically more complex than symmetric ones (sentences with the same polarity to that of the entire document) and that it is possible to improve the detection of the polarity orientation of reviews by removing asymmetric sentences from the text. To validate this hypothesis, we measured the syntactic complexity of both types of sentences in a multi-domain corpus of product reviews and contrasted three relevant data configurations based on inclusion and omission of asymmetric sentences from the reviews

Diposit Digital de la Universitat de Barcelona

Using Data Analytics to Filter Insincere Posts from Online Social Networks A case study: Quora Insincere Questions

Author: Al-Ramahi Mohammad
Alsmadi Izzat
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2020
Field of study

The internet in general and Online Social Networks (OSNs) in particular continue to play a significant role in our life where information is massively uploaded and exchanged. With such high importance and attention, abuses of such media of communication for different purposes are common. Driven by goals such as marketing and financial gains, some users use OSNs to post their misleading or insincere content. In this context, we utilized a real-world dataset posted by Quora in Kaggle.com to evaluate different mechanisms and algorithms to filter insincere and spam contents. We evaluated different preprocessing and analysis models. Moreover, we analyzed the cognitive efforts users made in writing their posts and whether that can improve the prediction accuracy. We reported the best models in terms of insincerity prediction accuracy

Crossref

Digital Commons @ Texas A&M University-San Antonio

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Sociolinguistic perception of lexical and syntactic variation among Persian-English bilinguals

Author: Méndez Kline Tyler
Publication venue: 'Linguistic Society of America'
Publication date: 27/04/2023
Field of study

This study examines the relationship between sociolinguistic perception and Persian language variation. Prior work has shown that preconceived notions about how speakers use language and what kind of language they produce can affect listeners’ perceptions (D’Onofrio 2016; Hansen Edwards et al. 2019; Mack & Munson 2012; Niedzielski 1999). However, many questions remain unanswered regarding how social meaning is applied in contact situations, especially among self-identified native and heritage speakers. Within Persian language studies, some work has observed linguistic practices among both native and non-native speakers, finding that both vary significantly in their production patterns of certain syntactic and lexical features (Megerdoomian 2020). I ask whether Persian-English bilinguals associate non-standard forms with certain social personae categorized by linguistic background. Sixteen bilingual Persian-English speakers participated in an online survey with the task of matching standard and non-standard written productions to a pre-defined linguistic persona. Results so far suggest that Persian-English bilinguals actively construct associations between language use and speaker personae, with specific grammatical categories appearing more likely to index a non-native speaking identity. This brings up further questions about how bilinguals navigate sociolinguistic ideologies tied to speaker identity, and how heritage speakers and learners approach these notions. This study adds to the growing literature on bilingualism and sociolinguistic perception, with implications for critical discussions surrounding the various ideologies that place communities of multilingual speakers into strict social categories

Proceedings Published by the LSA (Linguistic Society of America)

Triggered essential reviewing: The effect of technology affordances on service experience evaluations

Author: Piccoli Gabriele
Publication venue
Publication date: 01/01/2016
Field of study

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Form and function:Optional complementizers reduce causal inferences

Author: Carlson Katy
Rohde Hannah
Tyler Joseph
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/05/2017
Field of study

Many factors are known to influence the inference of the discourse coherence relationship between two sentences. Here, we examine the relationship between two conjoined embedded clauses in sentences like 'The professor noted that the student teacher did not look confident and (that) the students were poorly behaved'. In two studies, we find that the presence of 'that 'before the second embedded clause in such sentences reduces the possibility of a forward causal relationship between the clauses, i.e., the inference that the student teacher’s confidence was what affected student behavior. Three further studies tested the possibility of a backward causal relationship between clauses in the same structure, and found that the complementizer’s presence aids that relationship, especially in a forced-choice paradigm. The empirical finding that a complementizer, a linguistic element associated primarily with structure rather than event-level semantics, can affect discourse coherence is novel and illustrates an interdependence between syntactic parsing and discourse parsing

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

Morehead State University

Developmental language disorder and universal grammar

Author: Beritognolo Gustavo
Publication venue
Publication date: 01/12/2021
Field of study

L'étude de la Faculté des langues (FL), telle que définie par la grammaire générative, a été principalement entreprise à travers l'examen des langues adultes, l'acquisition de la langue première, l'acquisition des langues secondes et l'acquisition bilingue. Peu de travaux ont abordé la FL à partir d'une situation d'acquisition atypique, communément appelée Trouble développemental du langage (TDL). Cette thèse est consacrée à l'étude de la façon dont FL est affectée par cette condition malheureuse. Le TDL est manifesté par certains jeunes enfants et adultes et peut être la cause de limitations importantes dans le développement du langage. La production et la compréhension langagières de ce groupe d'enfants sont atypiques par rapport au comportement linguistique d'autres enfants du même âge. Leur atypicité consiste en une grammaire non-cible en ce qui concerne ce qui est autorisé et ce qui est interdit dans la/les langue(s) à laquelle/auxquelles ils sont exposés. Les symptômes les plus communs, d'un point de vue morpho-syntaxique, sont (a) l'omission de morphèmes et de mots, (b) les commissions, c'est-à-dire la présence inadéquate de certains mots ou le remplacement inapproprié de morphèmes et (c) les redoublements, c'est-à-dire, l'apparition de mots ou de morphèmes dans plus de positions que celles autorisées dans la langue cible. Ces symptômes ont été pris comme l’indication que la FL est déficiente. Le résultat de cette défaillance est une grammaire développée par les enfants ayant le TDL qui est qualitativement différente de celle développée par leurs pairs typiques. Cette thèse examinera si la compétence linguistique sous-jacente des enfants DLD est déterminée par les mêmes traits, opérations et principes qui régissent le langage naturel en général. Extraites de la littérature expérimentale sur le TDL, les données pour l’analyse incluent la compréhension et la production par les enfants du TDL et concernent les domaines nominal, temporel/verbal et propositionnel. Les propositionsiii avancées pour rendre compte de ce disorder seront évaluées. Toutes proposent explicitement ou implicitement que la grammaire universelle (GU), c'est-à-dire l'ensemble des traits et opérations phonologiques, sémantiques et syntaxiques qui sous-tendent FL, est défectueuse: certains traits peuvent être absents, ou des opérations peuvent être inactives ou fonctionner par intermittence. Contrairement à ces propositions, l'hypothèse défendue ici est que la GU n'est pas affectée chez les enfants TDL. C'est-à-dire que malgré les nombreuses différences entre le TDL et l'acquisition typique du langage, la GU se révèle être similaire à un certain niveau dans les deux situations d'acquisition. Si la GU était altérée chez les enfants TDL, on s'attendrait à ce que les enfants affectés par cette condition produisent des phrases remarquablement différentes de celles produites par des enfants typiques. Plusieurs études ont révélé que les enfants DLD et leurs pairs typiques peuvent montrer des performances linguistiques similaires en termes de quantité et de type d'erreurs. De plus, les données révèlent que les énoncés TDL ne sont pas toujours erronés; lorsque tous les éléments et les mécanismes linguistiques sont présents, ils sont correctement utilisés. Ceci est considéré comme un signe que les traits syntaxiques, bien qu'ils ne soient pas toujours réalisés morpho-phonologiquement, sont présents dans les dérivations syntaxiques des enfants TDL, et que les opérations syntaxiques Fusion et Accord sont actives, tout comme dans les grammaires typiques. Enfin, l'analyse des énoncés non-cibles par les enfants TDL met en évidence une grammaire syntaxiquement normale et même une ressemblance avec des langues auxquelles ces enfants n'ont pas été exposés. La conclusion est que, malgré la non-convergence entre le TDL et la langue cible, la GU dans cette situation d'acquisition est intacte.The study of the Faculty of Language (FL), as defined by generative grammar, has been mainly undertaken through the examination of adult language, first language acquisition, second language acquisition and bilingual acquisition. Few works have approached the FL from an atypical acquisitional situation, standardly called Developmental Language Disorder (DLD). This dissertation is devoted to the study of how FL is affected by this unfortunate condition. DLD is displayed by some young children and adults and can be the cause of significant limitations in language development. The linguistic production and comprehension by this group of children is atypical compared to the linguistic behaviour of other children of the same age. Their atypicality consists in a non-target-like grammar with regard to both what is allowed and what is disallowed in the language(s) to which they are exposed. The most common symptoms, from a morpho-syntactic point of view, are (a) omission of morphemes and words, (b) commissions, i.e., the inadequate presence of certain words or the inappropriate replacement of morphemes and (c) doublings, i.e., the appearance of words or morphemes in more positions than are allowed in the target language. These symptoms have been taken to indicate that the FL is deficient. The result of this deficiency is a grammar developed by children with DLD that is qualitatively different from that developed by their typical peers. This dissertation will consider whether or not the underlying linguistic competence of children with DLD is determined by the same features, operations and principles that regulate natural language in general. Drawn from the experimental literature on DLD, the data for analysis include comprehension and production by children with DLD and concern the nominal, the temporal/verbal and the propositional domains. The proposals that have been put forth to account for this impairment will be evaluated. All of them explicitly or implicitly propose that Universal Grammar (UG), i.e., the set of phonological, semantic and syntactic features and operations that underlie FL, is faulty: Some features can be absent, or operations can be inactive or function intermittently. Contrary to these proposals, the hypothesis defended here is that UG is not affected in DLD children. That is to say, despite the many differences between DLD and typical language acquisition, UG is revealed to be similar at a certain level in both acquisitional situations. If UG were impaired in DLD, children affected by this condition would be expected to produce sentences remarkably different from those produced by typical children. Several studies have shown that children with DLD and their typical peers can display similar linguistic performance in terms of both quantity and type of errors. Moreover, the data reveal that DLD utterances are not always erroneous; when all linguistic elements and mechanisms are present, they are correctly used. This is taken as a sign that syntactic features, while not always realized morpho-phonologically, are present in DLD syntactic derivations, and that the syntactic operations Merge and Agree are active, just as in typical grammars. Finally, the analysis of non-target utterances by children with DLD evinces a syntactically normal grammar and even a resemblance with languages to which these children have not been exposed. The conclusion is that, despite the non-convergence of DLD and the target language, UG in this acquisitional situation is intact

Dépôt Institutionnel Numérique

An Information theoretic approach to production and comprehension of discourse markers

Author: Torabi Asr Fatemeh
Publication venue
Publication date: 08/12/2015
Field of study

Discourse relations are the building blocks of a coherent text. The most important linguistic elements for constructing these relations are discourse markers. The presence of a discourse marker between two discourse segments provides information on the inferences that need to be made for interpretation of the two segments as a whole (e.g., because marks a reason). This thesis presents a new framework for studying human communication at the level of discourse by adapting ideas from information theory. A discourse marker is viewed as a symbol with a measurable amount of relational information. This information is communicated by the writer of a text to guide the reader towards the right semantic decoding. To examine the information theoretic account of discourse markers, we conduct empirical corpus-based investigations, offline crowd-sourced studies and online laboratory experiments. The thesis contributes to computational linguistics by proposing a quantitative meaning representation for discourse markers and showing its advantages over the classic descriptive approaches. For the first time, we show that readers are very sensitive to the fine-grained information encoded in a discourse marker obtained from its natural usage and that writers use explicit marking for less expected relations in terms of linguistic and cognitive predictability. These findings open new directions for implementation of advanced natural language processing systems.Diskursrelationen sind die Bausteine eines kohärenten Texts. Die wichtigsten sprachlichen Elemente für die Konstruktion dieser Relationen sind Diskursmarker. Das Vorhandensein eines Diskursmarkers zwischen zwei Diskurssegmenten liefert Informationen über die Inferenzen, die für die Interpretation der beiden Segmente als Ganzes getroffen werden müssen (zB. weil markiert einen Grund). Diese Dissertation bietet ein neues Framework für die Untersuchung menschlicher Kommunikation auf der Ebene von Diskursrelationen durch Anpassung von denen aus der Informationstheorie. Ein Diskursmarker wird als ein Symbol mit einer messbaren Menge relationaler Information betrachtet. Diese Information wird vom Autoren eines Texts kommuniziert, um den Leser zur richtigen semantischen Decodierung zu führen. Um die informationstheoretische Beschreibung von Diskursmarkern zu untersuchen, führen wir empirische korpusbasierte Untersuchungen durch: offline Crowdsourcing-Studien und online Labor-Experimente. Die Dissertation trägt zur Computerlinguistik bei, indem sie eine quantitative Bedeutungs-Repräsentation zu Diskursmarkern vorschlägt und ihre Vorteile gegenüber den klassischen deskriptiven Ansätzen aufzeigt. Wir zeigen zum ersten Mal, dass Leser sensitiv für feinkörnige Informationen sind, die durch Diskursmarker kodiert werden, und dass Textproduzenten Relationen, die sowohl auf linguistischer Ebene als auch kognitiv weniger vorhersagbar sind, häufiger explizit markieren. Diese Erkenntnisse eröffnen neue Richtungen für die Implementierung fortschrittlicher Systeme der Verarbeitung natürlicher Sprache

Universaar

Acronym

Monolingual Plagiarism Detection and Paraphrase Type Identification

Author: Alvi Faisal
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/08/2020
Field of study

White Rose E-theses Online

Recommended from our members

Problem-solving recognition in scientific text

Author: Heffernan Kevin
Publication venue: University of Cambridge
Publication date: 01/10/2020
Field of study

As far back as Aristotle, problems and solutions have been recognised as a core pattern of thought, and in particular of the scientific method. Therefore, they play a significant role in the understanding of academic texts from the scientific domain. Capturing knowledge of such problem-solving utterances would provide a deep insight into text understanding. In this dissertation, I present the task of problem-solving recognition in scientific text. To date, work on problem-solving recognition has received both theoretical and computational treatment. However, theories of problem-solving put forward by applied linguists lack practical adaptation to the domain of scientific text, and computational analyses have been narrow in scope. This dissertation provides a new model of problem-solving. It is an adaptation of Hoey's (2001) model, tailored to the scientific domain. As far as modelling problems is concerned, I divided the text string expressing the statement of a problem into sub-components; this is one of my main contributions. I have mapped these sub-components to functional roles, and thus operationalised the model in such a way that it can be annotated by humans reliably. As far as the problem-solving relationship between problems and solutions is concerned, my model takes into account the local network of relationships existing between problems. In order to validate this new model, a large-scale annotation study was conducted. The annotation study shows significant agreement amongst the annotators. The model is automated in two stages using a blend of classical machine learning and state-of-the-art deep learning methods. The first stage involves the implementation of problem and solution recognisers which operate at the sentence level. The second stage is more complex in that it recognises problems and solutions jointly at the token-level, and also establishes whether there is a problem-solving relationship between each of them. One of the best performers at this stage was a Neural Relational Topic Model. The results from automation show that the model is able to recognise problem-solving utterances in text to a high degree of accuracy. My work has already shown a positive impact in both industry and academia. One start-up is currently using the model for representing academic articles, and a Japanese collaborator has received a grant to adapt my model to Japanese text

Apollo (Cambridge)

All structures great and small: on copular sentences with shì in Mandarin

Author: Cheng H.
Publication venue: LOT
Publication date: 02/09/2021
Field of study

This dissertation provides a description and analysis of the Mandarin copula shì and copular structures containing it. On the basis of a comprehensive description of the syntactic distribution of shì and properties of different types of copular sentences (predicational, specificational, and equative), this study proposes a unified structural analysis for predicational and specificational copular sentences in Mandarin.It is proposed that shì is a functional element in the structure of the clause. Importantly, shì is not a verb, and copular structures in Mandarin contain no verb phrase at all, which is consistent with proposals about pronominal copular elements in other languages. Specificational copular sentences are analysed as inverted predicational copular sentences, derived via predicate inversion. This analysis captures both the underlying similarities and the differences between the two types of copular sentences. It is also pointed out that the third type of copular sentences, equatives, is clearly distinct from both predicational and specificational copular sentences and should thus be analysed in a different way.The dissertation also proposes that tense is not always syntactically expressed in Mandarin copular structures. While sentences with a stage-level predicate express tense syntactically, those with an individual-level predicate do not.Theoretical and Experimental Linguistic

Leiden University Scholary Publications