42 research outputs found

    Is writing style predictive of scientific fraud?

    Get PDF
    The problem of detecting scientific fraud using machine learning was recently introduced, with initial, positive results from a model taking into account various general indicators. The results seem to suggest that writing style is predictive of scientific fraud. We revisit these initial experiments, and show that the leave-one-out testing procedure they used likely leads to a slight over-estimate of the predictability, but also that simple models can outperform their proposed model by some margin. We go on to explore more abstract linguistic features, such as linguistic complexity and discourse structure, only to obtain negative results. Upon analyzing our models, we do see some interesting patterns, though: Scientific fraud, for examples, contains less comparison, as well as different types of hedging and ways of presenting logical reasoning.Comment: To appear in the Proceedings of the Workshop on Stylistic Variation 2017 (EMNLP), 6 page

    Implicit Discourse Relation Classification via Multi-Task Neural Networks

    Full text link
    Without discourse connectives, classifying implicit discourse relations is a challenging task and a bottleneck for building a practical discourse parser. Previous research usually makes use of one kind of discourse framework such as PDTB or RST to improve the classification performance on discourse relations. Actually, under different discourse annotation frameworks, there exist multiple corpora which have internal connections. To exploit the combination of different discourse corpora, we design related discourse classification tasks specific to a corpus, and propose a novel Convolutional Neural Network embedded multi-task learning system to synthesize these tasks by learning both unique and shared representations for each task. The experimental results on the PDTB implicit discourse relation classification task demonstrate that our model achieves significant gains over baseline systems.Comment: This is the pre-print version of a paper accepted by AAAI-1

    Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification

    Full text link
    Implicit discourse relation classification is of great challenge due to the lack of connectives as strong linguistic cues, which motivates the use of annotated implicit connectives to improve the recognition. We propose a feature imitation framework in which an implicit relation network is driven to learn from another neural network with access to connectives, and thus encouraged to extract similarly salient features for accurate classification. We develop an adversarial model to enable an adaptive imitation scheme through competition between the implicit network and a rival feature discriminator. Our method effectively transfers discriminability of connectives to the implicit features, and achieves state-of-the-art performance on the PDTB benchmark.Comment: To appear in ACL201

    Differences Over Discourse Structure Differences: A Reply to Urquhart and Urquhart

    Get PDF
    Purpose – In this paper we respond to Urquhart and Urquhart’s critique of our previous work entitled “Discourse structure differences in lay and professional health communication”, published in this journal in 2012 (Vol. 68 No. 6, pp.826 – 851, doi: 10.1108/00220411211277064). Design/methodology/approach – We examine Urquhart and Urquhart’s critique and provide responses to their concerns and cautionary remarks against cross-disciplinary contributions. We reiterate our central claim. Findings – We argue that Mann and Thompson’s (1987, 1988) Rhetorical Structure Theory (RST) offers valuable insights into computer-mediated health communication and deserves further discussion of its methodological strength and weaknesses for application in LIS. Research limitations/implications – While we agree that some methodological limitations pointed out by Urquhart and Urquhart are valid, we take this opportunity to correct certain misunderstandings and misstatements. Originality/value – We argue for continued use of innovative techniques borrowed from neighboring disciplines, in spite of objections from the researchers accustomed to a familiar strand of literature. We encourage researchers to consider RST and other computational linguistics-based discourse analysis annotation frameworks that could provide the basis for integrated research, and eventual applications in information behaviour and information retrieval
    corecore