8,750 research outputs found

    Detecting Sockpuppets in Deceptive Opinion Spam

    Full text link
    This paper explores the problem of sockpuppet detection in deceptive opinion spam using authorship attribution and verification approaches. Two methods are explored. The first is a feature subsampling scheme that uses the KL-Divergence on stylistic language models of an author to find discriminative features. The second is a transduction scheme, spy induction that leverages the diversity of authors in the unlabeled test set by sending a set of spies (positive samples) from the training set to retrieve hidden samples in the unlabeled test set using nearest and farthest neighbors. Experiments using ground truth sockpuppet data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on Intelligent Text Processing and Computational Linguistic

    Testing Market Response to Auditor Change Filings: a comparison of machine learning classifiers

    Get PDF
    The use of textual information contained in company filings with the Securities Exchange Commission (SEC), including annual reports on Form 10-K, quarterly reports on Form 10-Q, and current reports on Form 8-K, has gained the increased attention of finance and accounting researchers. In this paper we use a set of machine learning methods to predict the market response to changes in a firm\u27s auditor as reported in public filings. We vectorize the text of 8-K filings to test whether the resulting feature matrix can explain the sign of the market response to the filing. Specifically, using classification algorithms and a sample consisting of the Item 4.01 text of 8-K documents, which provides information on changes in auditors of companies that are registered with the SEC, we predict the sign of the cumulative abnormal return (CAR) around 8-K filing dates. We report the correct classification performance and time efficiency of the classification algorithms. Our results show some improvement over the naïve classification method

    Quantitative Screening of Cervical Cancers for Low-Resource Settings: Pilot Study of Smartphone-Based Endoscopic Visual Inspection After Acetic Acid Using Machine Learning Techniques

    Get PDF
    Background: Approximately 90% of global cervical cancer (CC) is mostly found in low- and middle-income countries. In most cases, CC can be detected early through routine screening programs, including a cytology-based test. However, it is logistically difficult to offer this program in low-resource settings due to limited resources and infrastructure, and few trained experts. A visual inspection following the application of acetic acid (VIA) has been widely promoted and is routinely recommended as a viable form of CC screening in resource-constrained countries. Digital images of the cervix have been acquired during VIA procedure with better quality assurance and visualization, leading to higher diagnostic accuracy and reduction of the variability of detection rate. However, a colposcope is bulky, expensive, electricity-dependent, and needs routine maintenance, and to confirm the grade of abnormality through its images, a specialist must be present. Recently, smartphone-based imaging systems have made a significant impact on the practice of medicine by offering a cost-effective, rapid, and noninvasive method of evaluation. Furthermore, computer-aided analyses, including image processing-based methods and machine learning techniques, have also shown great potential for a high impact on medicinal evaluations

    A complex network approach to stylometry

    Get PDF
    Statistical methods have been widely employed to study the fundamental properties of language. In recent years, methods from complex and dynamical systems proved useful to create several language models. Despite the large amount of studies devoted to represent texts with physical models, only a limited number of studies have shown how the properties of the underlying physical systems can be employed to improve the performance of natural language processing tasks. In this paper, I address this problem by devising complex networks methods that are able to improve the performance of current statistical methods. Using a fuzzy classification strategy, I show that the topological properties extracted from texts complement the traditional textual description. In several cases, the performance obtained with hybrid approaches outperformed the results obtained when only traditional or networked methods were used. Because the proposed model is generic, the framework devised here could be straightforwardly used to study similar textual applications where the topology plays a pivotal role in the description of the interacting agents.Comment: PLoS ONE, 2015 (to appear
    corecore