38 research outputs found

    The use of orthogonal similarity relations in the prediction of authorship

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-37256-8_38Recent work on Authorship Attribution (AA) proposes the use of meta characteristics to train author models. The meta characteristics are orthogonal sets of similarity relations between the features from the different candidate authors. In that approach, the features are grouped and processed separately according to the type of information they encode, the so called linguistic modalities. For instance, the syntactic, stylistic and semantic features are each considered different modalities as they represent different aspects of the texts. The assumption is that the independent extraction of meta characteristics results in more informative feature vectors, that in turn result in higher accuracies. In this paper we set out to the task of studying the empirical value of this modality specific process. We experimented with different ways of generating the meta characteristics on different data sets with different numbers of authors and genres. Our results show that by extracting the meta characteristics from splitting features by their linguistic dimension we achieve consistent improvement of prediction accuracy.This research was partially supported by ONR grant N00014-12-1-0217 and by NSF award 1254108. It was also supported in part by the CONACYT grant 134186 and by the European Commission as part of the WIQ-EI project (project no. 269180) within the FP7 People Programme.Sapkota, U.; Solorio, T.; Montes Gómez, M.; Rosso, P. (2013). The use of orthogonal similarity relations in the prediction of authorship. En Computational Linguistics and Intelligent Text Processing. Springer Verlag (Germany). 463-475. https://doi.org/10.1007/978-3-642-37256-8_38S463475Baker, L.D., McCallum, A.: Distributional clustering of words for text classification. In: SIGIR 1998: Proceedings of the 21st Annual International ACM SIGIR, pp. 96–103. ACM, Melbourne (1998)Biber, D.: The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings. Computers and the Humanities 26, 331–345 (1993)Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 1998 Conference on Computational Learning Theory (1998)Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clsutering algorithm for text classification. Journal of Machine Learning Research 3, 1265–1287 (2003)Escalante, H.J., Montes-y-Gómez, M., Solorio, T.: A weighted profile intersection measure for profile-based authorship attribution. In: Batyrshin, I., Sidorov, G. (eds.) MICAI 2011, Part I. LNCS, vol. 7094, pp. 232–243. Springer, Heidelberg (2011)Escalante, H.J., Solorio, T., Montes-y-Gomez, M.: Local histograms of character n-grams for authorship attribution. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 288–298. Association for Computational Linguistics, Portland (2011)Hayes, J.H.: Authorship attribution: A principal component and linear discriminant analysis of the consistent programmer hypothesis. I. J. Comput. Appl., 79–99 (2008)Houvardas, J., Stamatatos, E.: N-gram feature selection for authorship identification. In: Euzenat, J., Domingue, J. (eds.) AIMSA 2006. LNCS (LNAI), vol. 4183, pp. 77–86. Springer, Heidelberg (2006)Karypis, G.: CLUTO - a clustering toolkit. Tech. Rep. #02-017 (November 2003)Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram based author profiles for authorship attribution. In: Proceedings of the Pacific Association for Computational Linguistics, pp. 255–264 (2003)Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Language Resources and Evaluation 45, 83–94 (2011)Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 513–520 (August 2008)Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. In: Literary and Linguistic Computing, pp. 1–21 (August 2010)Marneffe, M.D., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC 2006 (2006)Plakias, S., Stamatatos, E.: Tensor space models for authorship identification. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 239–249. Springer, Heidelberg (2008)Raghavan, S., Kovashka, A., Mooney, R.: Authorship attribution using probabilistic context-free grammars. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 38–42. Association for Computational Linguistics, Uppsala (2010)Slonim, N., Tishby, N.: The power of word clusters for text classification. In: 23rd European Colloquium on Information Retrieval Research, ECIR (2001)Solorio, T., Pillay, S., Raghavan, S., Montes-y-Gómez: Generating metafeatures for authorship attribution on web forum posts. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, IJCNLP 2011, pp. 156–164. AFNLP, Chiang Mai (2011)Stamatatos, E.: Author identification using imbalanced and limited training texts. In: 18th International Workshop on Database and Expert Systems Applications, DEXA 2007, pp. 237–241 (September 2007)Stamatatos, E.: Author identification: Using text sampling to handle the class imbalance problem. Information Processing and Managemement 44, 790–799 (2008)Stamatatos, E.: Plagiarism detection using stopword n-grams. Journal of the American Society for Information Science and Technology 62(12), 2512–2527 (2011)Stamatatos, E.: A survey on modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60(3), 538–556 (2009)Stolcke, A.: SRILM - an extensible language modeling toolkit, pp. 901–904 (2002)Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, vol. 1, pp. 173–180 (2003)de Vel, O., Anderson, A., Corney, M., Mohay, G.: Multi-topic e-mail authorship attribution forensics. In: Proceedings of the Workshop on Data Mining for Security Applications, 8th ACM Conference on Computer Security (2001)Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann (2005

    Geographical Variability in the Likelihood of Bloodstream Infections Due to Gram-Negative Bacteria: Correlation with Proximity to the Equator and Health Care Expenditure (vol 9, e114548, 2014)

    Get PDF
    Hosp Univ Austral, Div Infect Dis Prevent & Infect Control Serv, Buenos Aires, DF, ArgentinaHosp Univ Austral, Microbiol Lab, Buenos Aires, DF, ArgentinaMonash Hlth, Monash Infect Dis, Clayton, Vic, AustraliaWollongong Hosp, Wollongong, NSW, AustraliaUniversidade Federal de São Paulo, Div Infect Dis, Lab Especial Microbiol Clin, São Paulo, BrazilHosp Israelita Albert Einstein, São Paulo, BrazilVirginia Commonwealth Univ, Med Ctr, Richmond, VA USAHosp Rim & Hipertensao, São Paulo, BrazilHosp Santa Casa Porto Alegre, Porto Alegre, RS, BrazilHosp Conceicao, Porto Alegre, RS, BrazilHosp Walter Cantidio, Fortaleza, Ceara, BrazilHosp Diadema, São Paulo, BrazilHosp Espanhol, Salvador, BA, BrazilHosp Clin Goiania, Goiania, Go, BrazilMt Sinai Hosp, Toronto, ON M5G 1X5, CanadaUniv Alberta, Div Infect Dis, Edmonton, AB, CanadaCairo Univ Kasr Ainy, Dar Al Fouad Hosp, Fac Med, Dept Clin Pathol, Cairo, EgyptHygeia Gen Hosp, Athens, GreeceUniv Tubingen Hosp, Internal Med, Div Infect Dis, Tubingen, GermanyTokyo Metropolitan Tama Med Ctr, Dept Infect Prevent, Tokyo, JapanAmphia Hosp Breda, Lab Microbiol & Infect Control, Breda, NetherlandsThammasat Univ Hosp, Div Infect Dis, Pathum Thani, ThailandSt John Hosp & Med Ctr, Infect Prevent & Control Dept, Grosse Pointe Woods, MI USAUniv Hosp Bern, Dept Infect Dis, CH-3010 Bern, SwitzerlandUniv Bern, Bern, SwitzerlandBarnes Jewish Hosp, St Louis, MO 63110 USAUniversidade Federal de São Paulo, Div Infect Dis, Lab Especial Microbiol Clin, São Paulo, BrazilWeb of Scienc

    Food insecurity, fruit and vegetable consumption, and use of the Supplemental Nutrition Assistance Program (SNAP) in Appalachian Ohio.

    No full text
    Food insecurity and inadequate nutrition are two major challenges that contribute to poor health conditions among U.S. households. Ohioans continue to face food insecurity, and rates of food insecurity in rural Southeast Ohio are higher than the state average. The main purpose of this project is to evaluate the associations between Supplemental Nutrition Assistance Program (SNAP) participation and food security in rural Ohio, and to explore the association between SNAP participation and fruit/vegetable consumption. We control for food shopping patterns, such as shopping frequency, because previous research reports a significant relationship between shopping patterns and food security. To achieve our purpose, we use novel household-level data on food insecurity and SNAP participation in rural Southeast Ohio, collected during the COVID-19 pandemic. We find that people who experience higher levels of food insecurity than others are more likely to participate in SNAP, though this is likely a function of selection bias. To correct for the bias, we employ the nearest neighbor matching method to match treated (SNAP participant) and untreated (similar SNAP nonparticipant) groups. We find that participating in SNAP increases the probability of being food secure by around 26 percentage points after controlling for primary food shopping patterns. We do not find any significant association between SNAP participation and estimated intake of fruits and vegetables. This study provides policymakers with suggestive evidence that SNAP is associated with food security in rural Southeast Ohio during the pandemic, and what additional factors may mediate these relationships

    Table A.7: Marginal effects of participating in SNAP within the last 3 months on FV intake without matching, compared to eligible SNAP nonparticipants (house-hold income < 130% poverty line).

    No full text
    Table A.7: Marginal effects of participating in SNAP within the last 3 months on FV intake without matching, compared to eligible SNAP nonparticipants (house-hold income < 130% poverty line).</p

    Summary statistics of covariates for households participating in SNAP within the last 3 months and nonparticipants.

    No full text
    Summary statistics of covariates for households participating in SNAP within the last 3 months and nonparticipants.</p

    Percentage of food secure and food insecure households for different samples.

    No full text
    Percentage of food secure and food insecure households for different samples.</p

    Table A.6: Marginal effects of participating in SNAP within the last 3 months on food security status without matching (10 item), compared to eligible SNAP nonparticipants (household income < 130% poverty line).

    No full text
    Table A.6: Marginal effects of participating in SNAP within the last 3 months on food security status without matching (10 item), compared to eligible SNAP nonparticipants (household income < 130% poverty line).</p

    Table A.2: Summary statistics of covariates for SNAP participants and nonparticipants.

    No full text
    Table A.2: Summary statistics of covariates for SNAP participants and nonparticipants.</p
    corecore