3,535 research outputs found
Authorship Verification, Neighborhood-based Classification
El análisis de autorÃa se ha convertido en una herramienta determinante para el análisis de documentos digitales en las ciencias forenses. Proponemos un método de Verificación de AutorÃa mediante el análisis de las semejanzas entre documentos de un autor por vecindad, sin estimar umbrales a partir de un entrenamiento, implementamos dos estrategias de representación de los documentos de un autor, una basada en instancias y otra en el cálculo del centroide. Evaluamos colecciones según el número de muestras, los géneros textuales y el tema abordado. Realizamos un análisis del aporte de cada función de comparación y de cada rasgo empleado asà como una combinación por mayorÃa de los votos de cada par función-rasgo empleado en la semejanza entre documentos. Las pruebas se realizaron usando las colecciones públicas de las competencias PAN 2014 y 2015. Los resultados obtenidos son prometedores y nos permiten evaluar nuestra propuesta y la identificación del trabajo futuro a desarrollar.The Authorship Analysis task has become a determining tool for the analysis of digital documents in forensic sciences. We propose a neighborhood classification method of Authorship Verification analyzing the similarities of a document of unknown authorship between samples documents of one author, without estimating parameters values from a training data, we implemented two strategies of representation of the documents of an author, an instance based and a profile based one. We will evaluate the methods in different data collections according to the number of samples, the textual genres and the topic addressed. We perform an analysis of the contribution of each function of comparison and each feature used to take as final decision a combination by majority of the votes of each function-feature pair used in the similarity between documents. The tests were carried out using the public data sets of the Authorship Verification PAN 2014 and 2015 competitions. The results obtained are promising and allow us to evaluate our proposal and the identification of future work to be developed
Detecting Sockpuppets in Deceptive Opinion Spam
This paper explores the problem of sockpuppet detection in deceptive opinion
spam using authorship attribution and verification approaches. Two methods are
explored. The first is a feature subsampling scheme that uses the KL-Divergence
on stylistic language models of an author to find discriminative features. The
second is a transduction scheme, spy induction that leverages the diversity of
authors in the unlabeled test set by sending a set of spies (positive samples)
from the training set to retrieve hidden samples in the unlabeled test set
using nearest and farthest neighbors. Experiments using ground truth sockpuppet
data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on
Intelligent Text Processing and Computational Linguistic
Recommended from our members
Point-of-Sale Marketing in Recreational Marijuana Dispensaries Around California Schools.
PurposeAfter marijuana commercialization, the presence of recreational marijuana dispensaries (RMDs) was rapidly increasing. The point-of-sale marketing poses concerns about children's exposure. This study examined advertising and promotions that potentially appeal to children and access restrictions in RMDs around California schools.MethodsThis was a cross-sectional and observational study conducted from June to September 2018. Trained fieldworkers audited retail environments in 163 RMDs in closest proximity to 333 randomly sampled public schools in California.ResultsAbout 44% of schools had RMDs located within 3 miles. Regarding interior marketing, 74% of RMDs had at least one instance of child-appealing products, packages, paraphernalia, or advertisements. RMDs closer to a school had a higher proportion with interior child-appealing marketing. More than three fourths of RMDs had generic promotional activities; particularly, 28% violated the free-sample ban. Regarding exterior marketing, only 2% of RMDs had those appealing to children. More than 60% of RMDs had exterior signs indicative of marijuana. Approximately, one-third had generic advertisements, and 13% had advertisements bigger than 1,600 square inches. Regarding access restrictions, almost all RMDs complied with age verification, but 84% had no age limit signs, and only 40% had security personnel.ConclusionsDespite minimal point-of-sale marketing practices appealing to children on the exterior of RMDs around California schools, such practices were abundant on the interior. Marketing practices not specifically appealing to children were also common on both the interior and exterior of RMDs. Dispensaries' violation of age verification law, lack of security personnel, and presence of child-appealing marketing should be continuously monitored and prevented
The power of indirect social ties
While direct social ties have been intensely studied in the context of
computer-mediated social networks, indirect ties (e.g., friends of friends)
have seen little attention. Yet in real life, we often rely on friends of our
friends for recommendations (of good doctors, good schools, or good
babysitters), for introduction to a new job opportunity, and for many other
occasional needs. In this work we attempt to 1) quantify the strength of
indirect social ties, 2) validate it, and 3) empirically demonstrate its
usefulness for distributed applications on two examples. We quantify social
strength of indirect ties using a(ny) measure of the strength of the direct
ties that connect two people and the intuition provided by the sociology
literature. We validate the proposed metric experimentally by comparing
correlations with other direct social tie evaluators. We show via data-driven
experiments that the proposed metric for social strength can be used
successfully for social applications. Specifically, we show that it alleviates
known problems in friend-to-friend storage systems by addressing two previously
documented shortcomings: reduced set of storage candidates and data
availability correlations. We also show that it can be used for predicting the
effects of a social diffusion with an accuracy of up to 93.5%.Comment: Technical Repor
A Multitude of Linguistically-rich Features for Authorship Attribution
International audienceThis paper reports on the procedure and learning models we adopted for the 'PAN 2011 Author Identification' challenge targetting real-world email messages. The novelty of our approach lies in a design which combines shallow characteristics of the emails (words and trigrams frequencies) with a large number of ad hoc linguistically-rich features addressing different language levels. For the author attribution tasks, all these features were used to train a maximum entropy model which gave very good results. For the single author verification tasks, a set of features exclusively based on the linguistic description of the emails' messages was considered as input for symbolic learning techniques (rules and decision trees), and gave weak results. This paper presents in detail the features extracted from the corpus, the learning models and the results obtained
- …