566 research outputs found

    Polarity analisys od reviews based on the omission of asymetric sentences

    Get PDF
    In this paper, we present a novel approach to polarity analysis of product reviews which detects and removes sentences with the opposite polarity to that of the entire document (asymmetric sentences) as a previous step to identify positive and negative reviews. We postulate that asymmetric sentences are morpho-syntactically more complex than symmetric ones (sentences with the same polarity to that of the entire document) and that it is possible to improve the detection of the polarity orientation of reviews by removing asymmetric sentences from the text. To validate this hypothesis, we measured the syntactic complexity of both types of sentences in a multi-domain corpus of product reviews and contrasted three relevant data configurations based on inclusion and omission of asymmetric sentences from the reviews

    Using Data Analytics to Filter Insincere Posts from Online Social Networks A case study: Quora Insincere Questions

    Get PDF
    The internet in general and Online Social Networks (OSNs) in particular continue to play a significant role in our life where information is massively uploaded and exchanged. With such high importance and attention, abuses of such media of communication for different purposes are common. Driven by goals such as marketing and financial gains, some users use OSNs to post their misleading or insincere content. In this context, we utilized a real-world dataset posted by Quora in Kaggle.com to evaluate different mechanisms and algorithms to filter insincere and spam contents. We evaluated different preprocessing and analysis models. Moreover, we analyzed the cognitive efforts users made in writing their posts and whether that can improve the prediction accuracy. We reported the best models in terms of insincerity prediction accuracy

    Sociolinguistic perception of lexical and syntactic variation among Persian-English bilinguals

    Get PDF
    This study examines the relationship between sociolinguistic perception and Persian language variation. Prior work has shown that preconceived notions about how speakers use language and what kind of language they produce can affect listeners’ perceptions (D’Onofrio 2016; Hansen Edwards et al. 2019; Mack & Munson 2012; Niedzielski 1999). However, many questions remain unanswered regarding how social meaning is applied in contact situations, especially among self-identified native and heritage speakers. Within Persian language studies, some work has observed linguistic practices among both native and non-native speakers, finding that both vary significantly in their production patterns of certain syntactic and lexical features (Megerdoomian 2020). I ask whether Persian-English bilinguals associate non-standard forms with certain social personae categorized by linguistic background. Sixteen bilingual Persian-English speakers participated in an online survey with the task of matching standard and non-standard written productions to a pre-defined linguistic persona. Results so far suggest that Persian-English bilinguals actively construct associations between language use and speaker personae, with specific grammatical categories appearing more likely to index a non-native speaking identity. This brings up further questions about how bilinguals navigate sociolinguistic ideologies tied to speaker identity, and how heritage speakers and learners approach these notions. This study adds to the growing literature on bilingualism and sociolinguistic perception, with implications for critical discussions surrounding the various ideologies that place communities of multilingual speakers into strict social categories

    Form and function:Optional complementizers reduce causal inferences

    Get PDF
    Many factors are known to influence the inference of the discourse coherence relationship between two sentences. Here, we examine the relationship between two conjoined embedded clauses in sentences like 'The professor noted that the student teacher did not look confident and (that) the students were poorly behaved'. In two studies, we find that the presence of 'that 'before the second embedded clause in such sentences reduces the possibility of a forward causal relationship between the clauses, i.e., the inference that the student teacher’s confidence was what affected student behavior. Three further studies tested the possibility of a backward causal relationship between clauses in the same structure, and found that the complementizer’s presence aids that relationship, especially in a forced-choice paradigm. The empirical finding that a complementizer, a linguistic element associated primarily with structure rather than event-level semantics, can affect discourse coherence is novel and illustrates an interdependence between syntactic parsing and discourse parsing

    Developmental language disorder and universal grammar

    Full text link
    L'Ă©tude de la FacultĂ© des langues (FL), telle que dĂ©finie par la grammaire gĂ©nĂ©rative, a Ă©tĂ© principalement entreprise Ă  travers l'examen des langues adultes, l'acquisition de la langue premiĂšre, l'acquisition des langues secondes et l'acquisition bilingue. Peu de travaux ont abordĂ© la FL Ă  partir d'une situation d'acquisition atypique, communĂ©ment appelĂ©e Trouble dĂ©veloppemental du langage (TDL). Cette thĂšse est consacrĂ©e Ă  l'Ă©tude de la façon dont FL est affectĂ©e par cette condition malheureuse. Le TDL est manifestĂ© par certains jeunes enfants et adultes et peut ĂȘtre la cause de limitations importantes dans le dĂ©veloppement du langage. La production et la comprĂ©hension langagiĂšres de ce groupe d'enfants sont atypiques par rapport au comportement linguistique d'autres enfants du mĂȘme Ăąge. Leur atypicitĂ© consiste en une grammaire non-cible en ce qui concerne ce qui est autorisĂ© et ce qui est interdit dans la/les langue(s) Ă  laquelle/auxquelles ils sont exposĂ©s. Les symptĂŽmes les plus communs, d'un point de vue morpho-syntaxique, sont (a) l'omission de morphĂšmes et de mots, (b) les commissions, c'est-Ă -dire la prĂ©sence inadĂ©quate de certains mots ou le remplacement inappropriĂ© de morphĂšmes et (c) les redoublements, c'est-Ă -dire, l'apparition de mots ou de morphĂšmes dans plus de positions que celles autorisĂ©es dans la langue cible. Ces symptĂŽmes ont Ă©tĂ© pris comme l’indication que la FL est dĂ©ficiente. Le rĂ©sultat de cette dĂ©faillance est une grammaire dĂ©veloppĂ©e par les enfants ayant le TDL qui est qualitativement diffĂ©rente de celle dĂ©veloppĂ©e par leurs pairs typiques. Cette thĂšse examinera si la compĂ©tence linguistique sous-jacente des enfants DLD est dĂ©terminĂ©e par les mĂȘmes traits, opĂ©rations et principes qui rĂ©gissent le langage naturel en gĂ©nĂ©ral. Extraites de la littĂ©rature expĂ©rimentale sur le TDL, les donnĂ©es pour l’analyse incluent la comprĂ©hension et la production par les enfants du TDL et concernent les domaines nominal, temporel/verbal et propositionnel. Les propositionsiii avancĂ©es pour rendre compte de ce disorder seront Ă©valuĂ©es. Toutes proposent explicitement ou implicitement que la grammaire universelle (GU), c'est-Ă -dire l'ensemble des traits et opĂ©rations phonologiques, sĂ©mantiques et syntaxiques qui sous-tendent FL, est dĂ©fectueuse: certains traits peuvent ĂȘtre absents, ou des opĂ©rations peuvent ĂȘtre inactives ou fonctionner par intermittence. Contrairement Ă  ces propositions, l'hypothĂšse dĂ©fendue ici est que la GU n'est pas affectĂ©e chez les enfants TDL. C'est-Ă -dire que malgrĂ© les nombreuses diffĂ©rences entre le TDL et l'acquisition typique du langage, la GU se rĂ©vĂšle ĂȘtre similaire Ă  un certain niveau dans les deux situations d'acquisition. Si la GU Ă©tait altĂ©rĂ©e chez les enfants TDL, on s'attendrait Ă  ce que les enfants affectĂ©s par cette condition produisent des phrases remarquablement diffĂ©rentes de celles produites par des enfants typiques. Plusieurs Ă©tudes ont rĂ©vĂ©lĂ© que les enfants DLD et leurs pairs typiques peuvent montrer des performances linguistiques similaires en termes de quantitĂ© et de type d'erreurs. De plus, les donnĂ©es rĂ©vĂšlent que les Ă©noncĂ©s TDL ne sont pas toujours erronĂ©s; lorsque tous les Ă©lĂ©ments et les mĂ©canismes linguistiques sont prĂ©sents, ils sont correctement utilisĂ©s. Ceci est considĂ©rĂ© comme un signe que les traits syntaxiques, bien qu'ils ne soient pas toujours rĂ©alisĂ©s morpho-phonologiquement, sont prĂ©sents dans les dĂ©rivations syntaxiques des enfants TDL, et que les opĂ©rations syntaxiques Fusion et Accord sont actives, tout comme dans les grammaires typiques. Enfin, l'analyse des Ă©noncĂ©s non-cibles par les enfants TDL met en Ă©vidence une grammaire syntaxiquement normale et mĂȘme une ressemblance avec des langues auxquelles ces enfants n'ont pas Ă©tĂ© exposĂ©s. La conclusion est que, malgrĂ© la non-convergence entre le TDL et la langue cible, la GU dans cette situation d'acquisition est intacte.The study of the Faculty of Language (FL), as defined by generative grammar, has been mainly undertaken through the examination of adult language, first language acquisition, second language acquisition and bilingual acquisition. Few works have approached the FL from an atypical acquisitional situation, standardly called Developmental Language Disorder (DLD). This dissertation is devoted to the study of how FL is affected by this unfortunate condition. DLD is displayed by some young children and adults and can be the cause of significant limitations in language development. The linguistic production and comprehension by this group of children is atypical compared to the linguistic behaviour of other children of the same age. Their atypicality consists in a non-target-like grammar with regard to both what is allowed and what is disallowed in the language(s) to which they are exposed. The most common symptoms, from a morpho-syntactic point of view, are (a) omission of morphemes and words, (b) commissions, i.e., the inadequate presence of certain words or the inappropriate replacement of morphemes and (c) doublings, i.e., the appearance of words or morphemes in more positions than are allowed in the target language. These symptoms have been taken to indicate that the FL is deficient. The result of this deficiency is a grammar developed by children with DLD that is qualitatively different from that developed by their typical peers. This dissertation will consider whether or not the underlying linguistic competence of children with DLD is determined by the same features, operations and principles that regulate natural language in general. Drawn from the experimental literature on DLD, the data for analysis include comprehension and production by children with DLD and concern the nominal, the temporal/verbal and the propositional domains. The proposals that have been put forth to account for this impairment will be evaluated. All of them explicitly or implicitly propose that Universal Grammar (UG), i.e., the set of phonological, semantic and syntactic features and operations that underlie FL, is faulty: Some features can be absent, or operations can be inactive or function intermittently. Contrary to these proposals, the hypothesis defended here is that UG is not affected in DLD children. That is to say, despite the many differences between DLD and typical language acquisition, UG is revealed to be similar at a certain level in both acquisitional situations. If UG were impaired in DLD, children affected by this condition would be expected to produce sentences remarkably different from those produced by typical children. Several studies have shown that children with DLD and their typical peers can display similar linguistic performance in terms of both quantity and type of errors. Moreover, the data reveal that DLD utterances are not always erroneous; when all linguistic elements and mechanisms are present, they are correctly used. This is taken as a sign that syntactic features, while not always realized morpho-phonologically, are present in DLD syntactic derivations, and that the syntactic operations Merge and Agree are active, just as in typical grammars. Finally, the analysis of non-target utterances by children with DLD evinces a syntactically normal grammar and even a resemblance with languages to which these children have not been exposed. The conclusion is that, despite the non-convergence of DLD and the target language, UG in this acquisitional situation is intact

    An Information theoretic approach to production and comprehension of discourse markers

    Get PDF
    Discourse relations are the building blocks of a coherent text. The most important linguistic elements for constructing these relations are discourse markers. The presence of a discourse marker between two discourse segments provides information on the inferences that need to be made for interpretation of the two segments as a whole (e.g., because marks a reason). This thesis presents a new framework for studying human communication at the level of discourse by adapting ideas from information theory. A discourse marker is viewed as a symbol with a measurable amount of relational information. This information is communicated by the writer of a text to guide the reader towards the right semantic decoding. To examine the information theoretic account of discourse markers, we conduct empirical corpus-based investigations, offline crowd-sourced studies and online laboratory experiments. The thesis contributes to computational linguistics by proposing a quantitative meaning representation for discourse markers and showing its advantages over the classic descriptive approaches. For the first time, we show that readers are very sensitive to the fine-grained information encoded in a discourse marker obtained from its natural usage and that writers use explicit marking for less expected relations in terms of linguistic and cognitive predictability. These findings open new directions for implementation of advanced natural language processing systems.Diskursrelationen sind die Bausteine eines kohĂ€renten Texts. Die wichtigsten sprachlichen Elemente fĂŒr die Konstruktion dieser Relationen sind Diskursmarker. Das Vorhandensein eines Diskursmarkers zwischen zwei Diskurssegmenten liefert Informationen ĂŒber die Inferenzen, die fĂŒr die Interpretation der beiden Segmente als Ganzes getroffen werden mĂŒssen (zB. weil markiert einen Grund). Diese Dissertation bietet ein neues Framework fĂŒr die Untersuchung menschlicher Kommunikation auf der Ebene von Diskursrelationen durch Anpassung von denen aus der Informationstheorie. Ein Diskursmarker wird als ein Symbol mit einer messbaren Menge relationaler Information betrachtet. Diese Information wird vom Autoren eines Texts kommuniziert, um den Leser zur richtigen semantischen Decodierung zu fĂŒhren. Um die informationstheoretische Beschreibung von Diskursmarkern zu untersuchen, fĂŒhren wir empirische korpusbasierte Untersuchungen durch: offline Crowdsourcing-Studien und online Labor-Experimente. Die Dissertation trĂ€gt zur Computerlinguistik bei, indem sie eine quantitative Bedeutungs-ReprĂ€sentation zu Diskursmarkern vorschlĂ€gt und ihre Vorteile gegenĂŒber den klassischen deskriptiven AnsĂ€tzen aufzeigt. Wir zeigen zum ersten Mal, dass Leser sensitiv fĂŒr feinkörnige Informationen sind, die durch Diskursmarker kodiert werden, und dass Textproduzenten Relationen, die sowohl auf linguistischer Ebene als auch kognitiv weniger vorhersagbar sind, hĂ€ufiger explizit markieren. Diese Erkenntnisse eröffnen neue Richtungen fĂŒr die Implementierung fortschrittlicher Systeme der Verarbeitung natĂŒrlicher Sprache

    Monolingual Plagiarism Detection and Paraphrase Type Identification

    Get PDF

    All structures great and small: on copular sentences with shĂŹ in Mandarin

    Get PDF
    This dissertation provides a description and analysis of the Mandarin copula shĂŹ and copular structures containing it. On the basis of a comprehensive description of the syntactic distribution of shĂŹ and properties of different types of copular sentences (predicational, specificational, and equative), this study proposes a unified structural analysis for predicational and specificational copular sentences in Mandarin.It is proposed that shĂŹ is a functional element in the structure of the clause. Importantly, shĂŹ is not a verb, and copular structures in Mandarin contain no verb phrase at all, which is consistent with proposals about pronominal copular elements in other languages. Specificational copular sentences are analysed as inverted predicational copular sentences, derived via predicate inversion. This analysis captures both the underlying similarities and the differences between the two types of copular sentences. It is also pointed out that the third type of copular sentences, equatives, is clearly distinct from both predicational and specificational copular sentences and should thus be analysed in a different way.The dissertation also proposes that tense is not always syntactically expressed in Mandarin copular structures. While sentences with a stage-level predicate express tense syntactically, those with an individual-level predicate do not.Theoretical and Experimental Linguistic
    • 

    corecore