566 research outputs found
Polarity analisys od reviews based on the omission of asymetric sentences
In this paper, we present a novel approach to polarity analysis of product reviews which detects and removes sentences with the opposite polarity to that of the entire document (asymmetric sentences) as a previous step to identify positive and negative reviews. We postulate that asymmetric sentences are morpho-syntactically more complex than symmetric ones (sentences with the same polarity to that of the entire document) and that it is possible to improve the detection of the polarity orientation of reviews by removing asymmetric sentences from the text. To validate this hypothesis, we measured the syntactic complexity of both types of sentences in a multi-domain corpus of product reviews and contrasted three relevant data configurations based on inclusion and omission of asymmetric sentences from the reviews
Using Data Analytics to Filter Insincere Posts from Online Social Networks A case study: Quora Insincere Questions
The internet in general and Online Social Networks (OSNs) in particular continue to play a significant role in our life where information is massively uploaded and exchanged. With such high importance and attention, abuses of such media of communication for different purposes are common. Driven by goals such as marketing and financial gains, some users use OSNs to post their misleading or insincere content. In this context, we utilized a real-world dataset posted by Quora in Kaggle.com to evaluate different mechanisms and algorithms to filter insincere and spam contents. We evaluated different preprocessing and analysis models. Moreover, we analyzed the cognitive efforts users made in writing their posts and whether that can improve the prediction accuracy. We reported the best models in terms of insincerity prediction accuracy
Sociolinguistic perception of lexical and syntactic variation among Persian-English bilinguals
This study examines the relationship between sociolinguistic perception and Persian language variation. Prior work has shown that preconceived notions about how speakers use language and what kind of language they produce can affect listenersâ perceptions (DâOnofrio 2016; Hansen Edwards et al. 2019; Mack & Munson 2012; Niedzielski 1999). However, many questions remain unanswered regarding how social meaning is applied in contact situations, especially among self-identified native and heritage speakers. Within Persian language studies, some work has observed linguistic practices among both native and non-native speakers, finding that both vary significantly in their production patterns of certain syntactic and lexical features (Megerdoomian 2020). I ask whether Persian-English bilinguals associate non-standard forms with certain social personae categorized by linguistic background. Sixteen bilingual Persian-English speakers participated in an online survey with the task of matching standard and non-standard written productions to a pre-defined linguistic persona. Results so far suggest that Persian-English bilinguals actively construct associations between language use and speaker personae, with specific grammatical categories appearing more likely to index a non-native speaking identity. This brings up further questions about how bilinguals navigate sociolinguistic ideologies tied to speaker identity, and how heritage speakers and learners approach these notions. This study adds to the growing literature on bilingualism and sociolinguistic perception, with implications for critical discussions surrounding the various ideologies that place communities of multilingual speakers into strict social categories
Form and function:Optional complementizers reduce causal inferences
Many factors are known to influence the inference of the discourse coherence relationship between two sentences. Here, we examine the relationship between two conjoined embedded clauses in sentences like 'The professor noted that the student teacher did not look confident and (that) the students were poorly behaved'. In two studies, we find that the presence of 'that 'before the second embedded clause in such sentences reduces the possibility of a forward causal relationship between the clauses, i.e., the inference that the student teacherâs confidence was what affected student behavior. Three further studies tested the possibility of a backward causal relationship between clauses in the same structure, and found that the complementizerâs presence aids that relationship, especially in a forced-choice paradigm. The empirical finding that a complementizer, a linguistic element associated primarily with structure rather than event-level semantics, can affect discourse coherence is novel and illustrates an interdependence between syntactic parsing and discourse parsing
Developmental language disorder and universal grammar
L'étude de la Faculté des langues (FL), telle que définie par la grammaire
générative, a été principalement entreprise à travers l'examen des langues
adultes, l'acquisition de la langue premiĂšre, l'acquisition des langues secondes
et l'acquisition bilingue. Peu de travaux ont abordé la FL à partir d'une situation
d'acquisition atypique, communément appelée Trouble développemental du
langage (TDL). Cette thÚse est consacrée à l'étude de la façon dont FL est
affectée par cette condition malheureuse. Le TDL est manifesté par certains
jeunes enfants et adultes et peut ĂȘtre la cause de limitations importantes dans
le développement du langage. La production et la compréhension langagiÚres
de ce groupe d'enfants sont atypiques par rapport au comportement
linguistique d'autres enfants du mĂȘme Ăąge. Leur atypicitĂ© consiste en une
grammaire non-cible en ce qui concerne ce qui est autorisé et ce qui est interdit
dans la/les langue(s) à laquelle/auxquelles ils sont exposés. Les symptÎmes
les plus communs, d'un point de vue morpho-syntaxique, sont (a) l'omission de
morphÚmes et de mots, (b) les commissions, c'est-à -dire la présence
inadéquate de certains mots ou le remplacement inapproprié de morphÚmes
et (c) les redoublements, c'est-Ă -dire, l'apparition de mots ou de morphĂšmes
dans plus de positions que celles autorisées dans la langue cible. Ces
symptĂŽmes ont Ă©tĂ© pris comme lâindication que la FL est dĂ©ficiente. Le rĂ©sultat
de cette défaillance est une grammaire développée par les enfants ayant le
TDL qui est qualitativement différente de celle développée par leurs pairs
typiques. Cette thÚse examinera si la compétence linguistique sous-jacente
des enfants DLD est dĂ©terminĂ©e par les mĂȘmes traits, opĂ©rations et principes
qui régissent le langage naturel en général. Extraites de la littérature
expĂ©rimentale sur le TDL, les donnĂ©es pour lâanalyse incluent la
compréhension et la production par les enfants du TDL et concernent les
domaines nominal, temporel/verbal et propositionnel. Les propositionsiii
avancées pour rendre compte de ce disorder seront évaluées. Toutes
proposent explicitement ou implicitement que la grammaire universelle (GU),
c'est-à -dire l'ensemble des traits et opérations phonologiques, sémantiques et
syntaxiques qui sous-tendent FL, est dĂ©fectueuse: certains traits peuvent ĂȘtre
absents, ou des opĂ©rations peuvent ĂȘtre inactives ou fonctionner par
intermittence. Contrairement à ces propositions, l'hypothÚse défendue ici est
que la GU n'est pas affectée chez les enfants TDL. C'est-à -dire que malgré les
nombreuses différences entre le TDL et l'acquisition typique du langage, la GU
se rĂ©vĂšle ĂȘtre similaire Ă un certain niveau dans les deux situations
d'acquisition. Si la GU était altérée chez les enfants TDL, on s'attendrait à ce
que les enfants affectés par cette condition produisent des phrases
remarquablement différentes de celles produites par des enfants typiques.
Plusieurs études ont révélé que les enfants DLD et leurs pairs typiques peuvent
montrer des performances linguistiques similaires en termes de quantité et de
type d'erreurs. De plus, les données révÚlent que les énoncés TDL ne sont pas
toujours erronés; lorsque tous les éléments et les mécanismes linguistiques
sont présents, ils sont correctement utilisés. Ceci est considéré comme un
signe que les traits syntaxiques, bien qu'ils ne soient pas toujours réalisés
morpho-phonologiquement, sont présents dans les dérivations syntaxiques
des enfants TDL, et que les opérations syntaxiques Fusion et Accord sont
actives, tout comme dans les grammaires typiques. Enfin, l'analyse des
énoncés non-cibles par les enfants TDL met en évidence une grammaire
syntaxiquement normale et mĂȘme une ressemblance avec des langues
auxquelles ces enfants n'ont pas été exposés. La conclusion est que, malgré
la non-convergence entre le TDL et la langue cible, la GU dans cette situation
d'acquisition est intacte.The study of the Faculty of Language (FL), as defined by generative grammar, has been mainly undertaken through the examination of adult language, first language acquisition, second language acquisition and bilingual acquisition. Few works have approached the FL from an atypical acquisitional situation, standardly called Developmental Language Disorder (DLD). This dissertation is devoted to the study of how FL is affected by this unfortunate condition. DLD is displayed by some young children and adults and can be the cause of significant limitations in language development. The linguistic production and comprehension by this group of children is atypical compared to the linguistic behaviour of other children of the same age. Their atypicality consists in a non-target-like grammar with regard to both what is allowed and what is disallowed in the language(s) to which they are exposed. The most common symptoms, from a morpho-syntactic point of view, are (a) omission of morphemes and words, (b) commissions, i.e., the inadequate presence of certain words or the inappropriate replacement of morphemes and (c) doublings, i.e., the appearance of words or morphemes in more positions than are allowed in the target language. These symptoms have been taken to indicate that the FL is deficient. The result of this deficiency is a grammar developed by children with DLD that is qualitatively different from that developed by their typical peers. This dissertation will consider whether or not the underlying linguistic competence of children with DLD is determined by the same features, operations and principles that regulate natural language in general. Drawn from the experimental literature on DLD, the data for analysis include comprehension and production by children with DLD and concern the nominal, the temporal/verbal and the propositional domains. The proposals that have been put forth to account for this impairment will be evaluated. All of them explicitly or implicitly propose that Universal Grammar (UG), i.e., the set of phonological, semantic and syntactic features and operations that underlie FL, is faulty: Some features can be absent, or operations can be inactive or function intermittently. Contrary to these proposals, the hypothesis defended here is that UG is not affected in DLD children. That is to say, despite the many differences between DLD and typical language acquisition, UG is revealed to be similar at a certain level in both acquisitional situations. If UG were impaired in DLD, children affected by this condition would be expected to produce sentences remarkably different from those produced by typical children. Several studies have shown that children with DLD and their typical peers can display similar linguistic performance in terms of both quantity and type of errors. Moreover, the data reveal that DLD utterances are not always erroneous; when all linguistic elements and mechanisms are present, they are correctly used. This is taken as a sign that syntactic features, while not always realized morpho-phonologically, are present in DLD syntactic derivations, and that the syntactic operations Merge and Agree are active, just as in typical grammars. Finally, the analysis of non-target utterances by children with DLD evinces a syntactically normal grammar and even a resemblance with languages to which these children have not been exposed. The conclusion is that, despite the non-convergence of DLD and the target language, UG in this acquisitional situation is intact
An Information theoretic approach to production and comprehension of discourse markers
Discourse relations are the building blocks of a coherent text. The most important linguistic elements for constructing these relations are discourse markers. The presence of a discourse marker between two discourse segments provides information on the inferences that need to be made for interpretation of the two segments as a whole (e.g., because marks a reason).
This thesis presents a new framework for studying human communication at the level of discourse by adapting ideas from information theory. A discourse marker is viewed as a symbol with a measurable amount of relational information. This information is communicated by the writer of a text to guide the reader towards the right semantic decoding. To examine the information theoretic account of discourse markers, we conduct empirical corpus-based investigations, offline crowd-sourced studies and online laboratory experiments. The thesis contributes to computational linguistics by proposing a quantitative meaning representation for discourse markers and showing its advantages over the classic descriptive approaches.
For the first time, we show that readers are very sensitive to the fine-grained information encoded in a discourse marker obtained from its natural usage and that writers use explicit marking for less expected relations in terms of linguistic and cognitive predictability. These findings open new directions for implementation of advanced natural language processing systems.Diskursrelationen sind die Bausteine eines kohĂ€renten Texts. Die wichtigsten sprachlichen Elemente fĂŒr die Konstruktion dieser Relationen sind Diskursmarker. Das Vorhandensein eines Diskursmarkers zwischen zwei Diskurssegmenten liefert Informationen ĂŒber die Inferenzen, die fĂŒr die Interpretation der beiden Segmente als Ganzes getroffen werden mĂŒssen (zB. weil markiert einen Grund). Diese Dissertation bietet ein neues Framework fĂŒr die Untersuchung menschlicher Kommunikation auf der Ebene von Diskursrelationen durch Anpassung von denen aus der Informationstheorie. Ein Diskursmarker wird als ein Symbol mit einer messbaren Menge relationaler Information betrachtet. Diese Information wird vom Autoren eines Texts kommuniziert, um den Leser zur richtigen semantischen Decodierung zu fĂŒhren. Um die informationstheoretische Beschreibung von Diskursmarkern zu untersuchen, fĂŒhren wir empirische korpusbasierte Untersuchungen durch: offline Crowdsourcing-Studien und online Labor-Experimente. Die Dissertation trĂ€gt zur Computerlinguistik bei, indem sie eine quantitative Bedeutungs-ReprĂ€sentation zu Diskursmarkern vorschlĂ€gt und ihre Vorteile gegenĂŒber den klassischen deskriptiven AnsĂ€tzen aufzeigt. Wir zeigen zum ersten Mal, dass Leser sensitiv fĂŒr feinkörnige Informationen sind, die durch Diskursmarker kodiert werden, und dass Textproduzenten Relationen, die sowohl auf linguistischer Ebene als auch kognitiv weniger vorhersagbar sind, hĂ€ufiger explizit markieren. Diese Erkenntnisse eröffnen neue Richtungen fĂŒr die Implementierung fortschrittlicher Systeme der Verarbeitung natĂŒrlicher Sprache
Recommended from our members
Problem-solving recognition in scientific text
As far back as Aristotle, problems and solutions have been recognised as a core pattern of thought, and in particular of the scientific method. Therefore, they play a significant role in the understanding of academic texts from the scientific domain. Capturing knowledge of such problem-solving utterances would provide a deep insight into text understanding. In this dissertation, I present the task of problem-solving recognition in scientific text.
To date, work on problem-solving recognition has received both theoretical and computational treatment. However, theories of problem-solving put forward by applied linguists lack practical adaptation to the domain of scientific text, and computational analyses have been narrow in scope.
This dissertation provides a new model of problem-solving. It is an adaptation of Hoey's (2001) model, tailored to the scientific domain. As far as modelling problems is concerned, I divided the text string expressing the statement of a problem into sub-components; this is one of my main contributions. I have mapped these sub-components to functional roles, and thus operationalised the model in such a way that it can be annotated by humans reliably. As far as the problem-solving relationship between problems and solutions is concerned, my model takes into account the local network of relationships existing between problems.
In order to validate this new model, a large-scale annotation study was conducted. The annotation study shows significant agreement amongst the annotators. The model is automated in two stages using a blend of classical machine learning and state-of-the-art deep learning methods. The first stage involves the implementation of problem and solution recognisers which operate at the sentence level. The second stage is more complex in that it recognises problems and solutions jointly at the token-level, and also establishes whether there is a problem-solving relationship between each of them. One of the best performers at this stage was a Neural Relational Topic Model. The results from automation show that the model is able to recognise problem-solving utterances in text to a high degree of accuracy.
My work has already shown a positive impact in both industry and academia. One start-up is currently using the model for representing academic articles, and a Japanese collaborator has received a grant to adapt my model to Japanese text
All structures great and small: on copular sentences with shĂŹ in Mandarin
This dissertation provides a description and analysis of the Mandarin copula shĂŹ and copular structures containing it. On the basis of a comprehensive description of the syntactic distribution of shĂŹ and properties of different types of copular sentences (predicational, specificational, and equative), this study proposes a unified structural analysis for predicational and specificational copular sentences in Mandarin.It is proposed that shĂŹ is a functional element in the structure of the clause. Importantly, shĂŹ is not a verb, and copular structures in Mandarin contain no verb phrase at all, which is consistent with proposals about pronominal copular elements in other languages. Specificational copular sentences are analysed as inverted predicational copular sentences, derived via predicate inversion. This analysis captures both the underlying similarities and the differences between the two types of copular sentences. It is also pointed out that the third type of copular sentences, equatives, is clearly distinct from both predicational and specificational copular sentences and should thus be analysed in a different way.The dissertation also proposes that tense is not always syntactically expressed in Mandarin copular structures. While sentences with a stage-level predicate express tense syntactically, those with an individual-level predicate do not.Theoretical and Experimental Linguistic
- âŠ