Search CORE

3,535 research outputs found

Authorship Verification, Neighborhood-based Classification

Author: Adame Yaritza
Castro Daniel
Muñoz Rafael
Pelaez María
Publication venue: 'Instituto Politecnico Nacional/Centro de Investigacion en Computacion'
Publication date: 01/01/2017
Field of study

El análisis de autoría se ha convertido en una herramienta determinante para el análisis de documentos digitales en las ciencias forenses. Proponemos un método de Verificación de Autoría mediante el análisis de las semejanzas entre documentos de un autor por vecindad, sin estimar umbrales a partir de un entrenamiento, implementamos dos estrategias de representación de los documentos de un autor, una basada en instancias y otra en el cálculo del centroide. Evaluamos colecciones según el número de muestras, los géneros textuales y el tema abordado. Realizamos un análisis del aporte de cada función de comparación y de cada rasgo empleado así como una combinación por mayoría de los votos de cada par función-rasgo empleado en la semejanza entre documentos. Las pruebas se realizaron usando las colecciones públicas de las competencias PAN 2014 y 2015. Los resultados obtenidos son prometedores y nos permiten evaluar nuestra propuesta y la identificación del trabajo futuro a desarrollar.The Authorship Analysis task has become a determining tool for the analysis of digital documents in forensic sciences. We propose a neighborhood classification method of Authorship Verification analyzing the similarities of a document of unknown authorship between samples documents of one author, without estimating parameters values from a training data, we implemented two strategies of representation of the documents of an author, an instance based and a profile based one. We will evaluate the methods in different data collections according to the number of samples, the textual genres and the topic addressed. We perform an analysis of the contribution of each function of comparison and each feature used to take as final decision a combination by majority of the votes of each function-feature pair used in the similarity between documents. The tests were carried out using the public data sets of the Authorship Verification PAN 2014 and 2015 competitions. The results obtained are promising and allow us to evaluate our proposal and the identification of future work to be developed

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Detecting Sockpuppets in Deceptive Opinion Spam

Author: Chih-Chung Chang
DH Fusilier
E Stamatatos
M Koppel
N Graham
T Qian
Vladimir N. Vapnik
Xinxing Xu
Publication venue
Publication date: 09/03/2017
Field of study

This paper explores the problem of sockpuppet detection in deceptive opinion spam using authorship attribution and verification approaches. Two methods are explored. The first is a feature subsampling scheme that uses the KL-Divergence on stylistic language models of an author to find discriminative features. The second is a transduction scheme, spy induction that leverages the diversity of authors in the unlabeled test set by sending a set of spies (positive samples) from the training set to retrieve hidden samples in the unlabeled test set using nearest and farthest neighbors. Experiments using ground truth sockpuppet data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on Intelligent Text Processing and Computational Linguistic

arXiv.org e-Print Archive

Crossref

Recommended from our members

Point-of-Sale Marketing in Recreational Marijuana Dispensaries Around California Schools.

Author: Cao Yiwen
Carrillo Angelina S
Shi Yuyan
Zhu Shu-Hong
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

PurposeAfter marijuana commercialization, the presence of recreational marijuana dispensaries (RMDs) was rapidly increasing. The point-of-sale marketing poses concerns about children's exposure. This study examined advertising and promotions that potentially appeal to children and access restrictions in RMDs around California schools.MethodsThis was a cross-sectional and observational study conducted from June to September 2018. Trained fieldworkers audited retail environments in 163 RMDs in closest proximity to 333 randomly sampled public schools in California.ResultsAbout 44% of schools had RMDs located within 3 miles. Regarding interior marketing, 74% of RMDs had at least one instance of child-appealing products, packages, paraphernalia, or advertisements. RMDs closer to a school had a higher proportion with interior child-appealing marketing. More than three fourths of RMDs had generic promotional activities; particularly, 28% violated the free-sample ban. Regarding exterior marketing, only 2% of RMDs had those appealing to children. More than 60% of RMDs had exterior signs indicative of marijuana. Approximately, one-third had generic advertisements, and 13% had advertisements bigger than 1,600 square inches. Regarding access restrictions, almost all RMDs complied with age verification, but 84% had no age limit signs, and only 40% had security personnel.ConclusionsDespite minimal point-of-sale marketing practices appealing to children on the exterior of RMDs around California schools, such practices were abundant on the interior. Marketing practices not specifically appealing to children were also common on both the interior and exterior of RMDs. Dispensaries' violation of age verification law, lack of security personnel, and presence of child-appealing marketing should be continuously monitored and prevented

eScholarship - University of California

The power of indirect social ties

Author: Blackburn Jeremy
Iamnitchi Adriana
Kourtellis Nicolas
Skvoretz John
Zuo Xiang
Publication venue
Publication date: 16/01/2014
Field of study

While direct social ties have been intensely studied in the context of computer-mediated social networks, indirect ties (e.g., friends of friends) have seen little attention. Yet in real life, we often rely on friends of our friends for recommendations (of good doctors, good schools, or good babysitters), for introduction to a new job opportunity, and for many other occasional needs. In this work we attempt to 1) quantify the strength of indirect social ties, 2) validate it, and 3) empirically demonstrate its usefulness for distributed applications on two examples. We quantify social strength of indirect ties using a(ny) measure of the strength of the direct ties that connect two people and the intuition provided by the sociology literature. We validate the proposed metric experimentally by comparing correlations with other direct social tie evaluators. We show via data-driven experiments that the proposed metric for social strength can be used successfully for social applications. Specifically, we show that it alleviates known problems in friend-to-friend storage systems by addressing two previously documented shortcomings: reduced set of storage candidates and data availability correlations. We also show that it can be used for predicting the effects of a social diffusion with an accuracy of up to 93.5%.Comment: Technical Repor

arXiv.org e-Print Archive

CiteSeerX

A Multitude of Linguistically-rich Features for Authorship Attribution

Author: Calderone Basilio
Hathout Nabil
Sajous Franck
Tanguy Ludovic
Urieli Assaf
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceThis paper reports on the procedure and learning models we adopted for the 'PAN 2011 Author Identification' challenge targetting real-world email messages. The novelty of our approach lies in a design which combines shallow characteristics of the emails (words and trigrams frequencies) with a large number of ad hoc linguistically-rich features addressing different language levels. For the author attribution tasks, all these features were used to train a maximum entropy model which gave very good results. For the single author verification tasks, a set of features exclusively based on the linguistic description of the emails' messages was considered as input for symbolic learning techniques (rules and decision trees), and gave weak results. This paper presents in detail the features extracted from the corpus, the learning models and the results obtained

CiteSeerX

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes