253 research outputs found

    Monte Carlo Procedure for Protein Design

    Full text link
    A new method for sequence optimization in protein models is presented. The approach, which has inherited its basic philosophy from recent work by Deutsch and Kurosky [Phys. Rev. Lett. 76, 323 (1996)] by maximizing conditional probabilities rather than minimizing energy functions, is based upon a novel and very efficient multisequence Monte Carlo scheme. By construction, the method ensures that the designed sequences represent good folders thermodynamically. A bootstrap procedure for the sequence space search is devised making very large chains feasible. The algorithm is successfully explored on the two-dimensional HP model with chain lengths N=16, 18 and 32.Comment: 7 pages LaTeX, 4 Postscript figures; minor change

    Overview of the PAN/CLEF 2015 Evaluation Lab

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-24027-5_49This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problems. In plagiarism detection, community-driven corpus construction is introduced as a new way of developing evaluation resources with diversity. In author identification, cross-topic and cross-genre author verification (where the texts of known and unknown authorship do not match in topic and/or genre) is introduced. A new corpus was built for this challenging, yet realistic, task covering four languages. In author profiling, in addition to usual author demographics, such as gender and age, five personality traits are introduced (openness, conscientiousness, extraversion, agreeableness, and neuroticism) and a new corpus of Twitter messages covering four languages was developed. In total, 53 teams participated in all three tasks of PAN 2015 and, following the practice of previous editions, software submissions were required and evaluated within the TIRA experimentation framework.Stamatatos, E.; Potthast, M.; Rangel, F.; Rosso, P.; Stein, B. (2015). Overview of the PAN/CLEF 2015 Evaluation Lab. En Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th International Conference of the CLEF Association, CLEF'15, Toulouse, France, September 8-11, 2015, Proceedings. Springer International Publishing. 518-538. doi:10.1007/978-3-319-24027-5_49S518538Álvarez-Carmona, M.A., López-Monroy, A.P., Montes-Y-Gómez, M., Villaseñor-Pineda, L., Jair-Escalante, H.: INAOE’s participation at PAN 2015: author profiling task–notebook for PAN at CLEF 2015. In: CLEF 2013 Working Notes. CEUR (2015)Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, Genre, and Writing Style in Formal Written Texts. TEXT 23, 321–346 (2003)Bagnall, D.: Author identification using multi-headed recurrent neural networks. In: CLEF 2015 Working Notes. CEUR (2015)Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of EMNLP 2011. ACL (2011)Burrows, S., Potthast, M., Stein, B.: Paraphrase Acquisition via Crowdsourcing and Machine Learning. ACM TIST 4(3), 43:1–43:21 (2013)Castillo, E., Cervantes, O., Vilariño, D., Pinto, D., León, S.: Unsupervised method for the authorship identification task. In: CLEF 2014 Labs and Workshops, Notebook Papers. CEUR (2014)Celli, F., Lepri, B., Biel, J.I., Gatica-Perez, D., Riccardi, G., Pianesi, F.: The workshop on computational personality recognition 2014. In: Proceedings of ACM MM 2014 (2014)Celli, F., Pianesi, F., Stillwell, D., Kosinski, M.: Workshop on computational personality recognition: shared task. In: Proceedings of WCPR at ICWSM 2013 (2013)Celli, F., Polonio, L.: Relationships between personality and interactions in facebook. In: Social Networking: Recent Trends, Emerging Issues and Future Outlook. Nova Science Publishers, Inc. (2013)Chaski, C.E.: Who’s at the Keyboard: Authorship Attribution in Digital Evidence Invesigations. International Journal of Digital Evidence 4 (2005)Chittaranjan, G., Blom, J., Gatica-Perez, D.: Mining Large-scale Smartphone Data for Personality Studies. Personal and Ubiquitous Computing 17(3), 433–450 (2013)Fréry, J., Largeron, C., Juganaru-Mathieu, M.: UJM at clef in author identification. In: CLEF 2014 Labs and Workshops, Notebook Papers. CEUR (2014)Gollub, T., Potthast, M., Beyer, A., Busse, M., Rangel, F., Rosso, P., Stamatatos, E., Stein, B.: Recent trends in digital text forensics and its evaluation. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 282–302. Springer, Heidelberg (2013)Gollub, T., Stein, B., Burrows, S.: Ousting ivory tower research: towards a web framework for providing experiments as a service. In: Proceedings of SIGIR 2012. ACM (2012)Hagen, M., Potthast, M., Stein, B.: Source retrieval for plagiarism detection from large web corpora: recent approaches. In: CLEF 2015 Working Notes. CEUR (2015)van Halteren, H.: Linguistic profiling for author recognition and verification. In: Proceedings of ACL 2004. ACL (2004)Holmes, J., Meyerhoff, M.: The Handbook of Language and Gender. Blackwell Handbooks in Linguistics. Wiley (2003)Jankowska, M., Keselj, V., Milios, E.: CNG text classification for authorship profiling task–notebook for PAN at CLEF 2013. In: CLEF 2013 Working Notes. CEUR (2013)Juola, P.: Authorship Attribution. Foundations and Trends in Information Retrieval 1, 234–334 (2008)Juola, P.: How a Computer Program Helped Reveal J.K. Rowling as Author of A Cuckoo’s Calling. Scientific American (2013)Juola, P., Stamatatos, E.: Overview of the author identification task at PAN-2013. In: CLEF 2013 Working Notes. CEUR (2013)Kalimeri, K., Lepri, B., Pianesi, F.: Going beyond traits: multimodal classification of personality states in the wild. In: Proceedings of ICMI 2013. ACM (2013)Koppel, M., Argamon, S., Shimoni, A.R.: Automatically Categorizing Written Texts by Author Gender. Literary and Linguistic Computing 17(4) (2002)Koppel, M., Schler, J., Bonchek-Dokow, E.: Measuring Differentiability: Unmasking Pseudonymous Authors. J. Mach. Learn. Res. 8, 1261–1276 (2007)Koppel, M., Winter, Y.: Determining if Two Documents are Written by the same Author. Journal of the American Society for Information Science and Technology 65(1), 178–187 (2014)Kosinski, M., Bachrach, Y., Kohli, P., Stillwell, D., Graepel, T.: Manifestations of User Personality in Website Choice and Behaviour on Online Social Networks. Machine Learning (2013)López-Monroy, A.P., y Gómez, M.M., Jair-Escalante, H., Villaseñor-Pineda, L.: Using intra-profile information for author profiling–notebook for PAN at CLEF 2014. In: CLEF 2014 Working Notes. CEUR (2014)Lopez-Monroy, A.P., Montes-Y-Gomez, M., Escalante, H.J., Villasenor-Pineda, L., Villatoro-Tello, E.: INAOE’s participation at PAN 2013: author profiling task-notebook for PAN at CLEF 2013. In: CLEF 2013 Working Notes. CEUR (2013)Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of COLING 2008 (2008)Maharjan, S., Shrestha, P., Solorio, T., Hasan, R.: A straightforward author profiling approach in mapreduce. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 95–107. Springer, Heidelberg (2014)Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text. Journal of Artificial Intelligence Research 30(1), 457–500 (2007)Eissen, S.M., Stein, B.: Intrinsic plagiarism detection. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 565–569. Springer, Heidelberg (2006)Mohammadi, G., Vinciarelli, A.: Automatic personality perception: Prediction of Trait Attribution Based on Prosodic Features. IEEE Transactions on Affective Computing 3(3), 273–284 (2012)Moreau, E., Jayapal, A., Lynch, G., Vogel, C.: Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners. In: CLEF 2015 Working Notes. CEUR (2015)Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “How old do you think I am?”; a study of language and age in twitter. In: Proceedings of ICWSM 2013. AAAI (2013)Oberlander, J., Nowson, S.: Whose thumb is it anyway?: classifying author personality from weblog text. In: Proceedings of COLING 2006. ACL (2006)Peñas, A., Rodrigo, A.: A simple measure to assess non-response. In: Proceedings of HLT 2011. ACL (2011)Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological Aspects of Natural Language Use: Our Words. Our Selves. Annual Review of Psychology 54(1), 547–577 (2003)Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd international competition on plagiarism detection. In: CLEF 2010 Working Notes. CEUR (2010)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Language Resources and Evaluation (LRE) 45, 45–62 (2011)Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: CLEF 2011 Working Notes (2011)Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th international competition on plagiarism detection. In: CLEF 2012 Working Notes. CEUR (2012)Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th international competition on plagiarism detection. In: CLEF 2013 Working Notes. CEUR (2013)Potthast, M., Gollub, T., Rangel, F., Rosso, P., Stamatatos, E., Stein, B.: Improving the reproducibility of PAN’s shared tasks: plagiarism detection, author identification, and author profiling. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 268–299. Springer, Heidelberg (2014)Potthast, M., Hagen, M., Beyer, A., Busse, M., Tippmann, M., Rosso, P., Stein, B.: Overview of the 6th international competition on plagiarism detection. In: CLEF 2014 Working Notes. CEUR (2014)Potthast, M., Göring, S., Rosso, P., Stein, B.: Towards data submissions for shared tasks: first experiences for the task of text alignment. In: CLEF 2015 Working Notes. CEUR (2015)Potthast, M., Hagen, M., Stein, B., Graßegger, J., Michel, M., Tippmann, M., Welsch, C.: ChatNoir: a search engine for the clueweb09 corpus. In: Proceedings of SIGIR 2012. ACM (2012)Potthast, M., Hagen, M., Völske, M., Stein, B.: Crowdsourcing interaction logs to understand text reuse from the web. In: Proceedings of ACL 2013. ACL (2013)Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An evaluation framework for plagiarism detection. In: Proceedings of COLING 2010. ACL (2010)Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: Proceedings of PAN at SEPLN 2009. CEUR (2009)Quercia, D., Lambiotte, R., Stillwell, D., Kosinski, M., Crowcroft, J.: The personality of popular facebook users. In: Proceedings of CSCW 2012. ACM (2012)Rammstedt, B., John, O.: Measuring Personality in One Minute or Less: A 10 Item Short Version of the Big Five Inventory in English and German. Journal of Research in Personality (2007)Rangel, F., Rosso, P.: On the impact of emotions on author profiling. In: Information Processing & Management, Special Issue on Emotion and Sentiment in Social and Expressive Media (2014) (in press)Rangel, F., Rosso, P., Celli, F., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: CLEF 2015 Working Notes. CEUR (2015)Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at PAN 2014. In: CLEF 2014 Working Notes. CEUR (2014)Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013–notebook for PAN at CLEF 2013. In: CLEF 2013 Working Notes. CEUR (2013)Sapkota, U., Bethard, S., Montes-y-Gómez, M., Solorio, T.: Not all character N-grams are created equal: a study in authorship attribution. In: Proceedings of NAACL 2015. ACL (2015)Sapkota, U., Solorio, T., Montes-y-Gómez, M., Bethard, S., Rosso, P.: Cross-topic authorship attribution: will out-of-topic data help? In: Proceedings of COLING 2014 (2014)Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. AAAI (2006)Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E., et al.: Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PloS one 8(9), 773–791 (2013)Stamatatos, E.: A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology 60, 538–556 (2009)Stamatatos, E.: On the Robustness of Authorship Attribution Based on Character N-gram Features. Journal of Law and Policy 21, 421–439 (2013)Stamatatos, E., Daelemans, W., Verhoeven, B., Juola, P., López-López, A., Potthast, M., Stein, B.: Overview of the author identification task at PAN 2015. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR (2015)Stamatatos, E., Daelemans, W., Verhoeven, B., Stein, B., Potthast, M., Juola, P., Sánchez-Pérez, M.A., Barrón-Cedeño, A.: Overview of the author identification task at PAN 2014. In: CLEF 2014 Working Notes. CEUR (2014)Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic Text Categorization in Terms of Genre and Author. Comput. Linguist. 26(4), 471–495 (2000)Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic Plagiarism Analysis. Language Resources and Evaluation (LRE) 45, 63–82 (2011)Stein, B., Meyer zu Eißen, S.: Near similarity search and plagiarism analysis. In: Proceedings of GFKL 2005. Springer (2006)Sushant, S.A., Argamon, S., Dhawle, S., Pennebaker, J.W.: Lexical predictors of personality type. In: Proceedings of Joint Interface/CSNA 2005Verhoeven, B., Daelemans, W.: Clips stylometry investigation (CSI) corpus: a dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of LREC 2014. ACL (2014)Weren, E., Kauer, A., Mizusaki, L., Moreira, V., de Oliveira, P., Wives, L.: Examining Multiple Features for Author Profiling. Journal of Information and Data Management (2014)Zhang, C., Zhang, P.: Predicting gender from blog posts. Tech. rep., Technical Report. University of Massachusetts Amherst, USA (2010

    ProSAS: a database for analyzing alternative splicing in the context of protein structures

    Get PDF
    Alternative splicing is known to be one of the major sources for functional diversity in higher eukaryotes. Several splicing isoforms have been characterized in the literature that play important roles in cellular processes like apoptosis or signal transduction pathways. Splicing events can often be detected on the mRNA level by large-scale cDNA or EST experiments and such data is collected and annotated in several databases. Nevertheless, the effects of splicing on the structure of a protein are largely unknown. The ProSAS (Protein Structure and Alternative Splicing) database fills this gap and provides a unified resource for analyzing effects of alternative splicing events in the context of protein structures. ProSAS comprehensively annotates and models protein structures for several Ensembl genomes as well as SwissProt entries harbouring splicing events. Alternative isoforms annotated in Ensembl or SwissProt can be analyzed on the protein structure and protein function level using an intuitive user interface that provides several features and tools for a structure-based analysis of alternative splicing events. The ProSAS database is freely accessible at http://www.bio.ifi.lmu.de/ProSAS

    Transparent, flexible, and strong 2,3-dialdehyde cellulose films with high oxygen barrier properties

    Get PDF
    2,3-Dialdehyde cellulose (DAC) of a high degree of oxidation (92% relative to AGU units) prepared by oxidation of microcrystalline cellulose with sodium periodate (48 degrees C, 19 h) is soluble in hot water. Solution casting, slow air drying, hot pressing, and reinforcement by cellulose nanocrystals afforded films (similar to 100 mu m thickness) that feature intriguing properties: they have very smooth surfaces (SEM), are highly flexible, and have good light transmittance for both the visible and near-infrared range (89-91%), high tensile strength (81-122 MPa), and modulus of elasticity (3.4-4.0 GPa) depending on hydration state and respective water content. The extraordinarily low oxygen permeation ofPeer reviewe

    Cross-Language Plagiarism Detection

    Full text link
    Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (1) a comprehensive retrieval process for cross-language plagiarism detection is introduced, highlighting the differences to monolingual plagiarism detection, (2) state-of-the-art solutions for two important subtasks are reviewed, (3) retrieval models for the assessment of cross-language similarity are surveyed, and, (4) the three models CL-CNG, CL-ESA and CL-ASA are compared. Our evaluation is of realistic scale: it relies on 120,000 test documents which are selected from the corpora JRC-Acquis and Wikipedia, so that for each test document highly similar documents are available in all of the six languages English, German, Spanish, French, Dutch, and Polish. The models are employed in a series of ranking tasks, and more than 100 million similarities are computed with each model. The results of our evaluation indicate that CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related. CL-ESA almost matches the performance of CL-CNG, but on arbitrary pairs of languages. CL-ASA works best on "exact" translations but does not generalize well.This work was partially supported by the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 project and the CONACyT-Mexico 192021 grant.Potthast, M.; Barrón Cedeño, LA.; Stein, B.; Rosso, P. (2011). Cross-Language Plagiarism Detection. Language Resources and Evaluation. 45(1):45-62. https://doi.org/10.1007/s10579-009-9114-zS4562451Ballesteros, L. A. (2001). Resolving ambiguity for cross-language information retrieval: A dictionary approach. PhD thesis, University of Massachusetts Amherst, USA, Bruce Croft.Barrón-Cedeño, A., Rosso, P., Pinto, D., & Juan A. (2008). On cross-lingual plagiarism analysis using a statistical model. In S. Benno, S. Efstathios, & K. Moshe (Eds.), ECAI 2008 workshop on uncovering plagiarism, authorship, and social software misuse (PAN 08) (pp. 9–13). Patras, Greece.Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities, 3, 1–8.Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In SIGIR’99: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (vol. 4629, pp. 222–229). Berkeley, California, United States: ACM.Brin, S., Davis, J., & Garcia-Molina, H. (1995). Copy detection mechanisms for digital documents. In SIGMOD ’95 (pp. 398–409). New York, NY, USA: ACM Press.Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.Ceska, Z., Toman, M., & Jezek, K. (2008). Multilingual plagiarism detection. In AIMSA’08: Proceedings of the 13th international conference on artificial intelligence (pp. 83–92). Berlin, Heidelberg: Springer.Clough, P. (2003). Old and new challenges in automatic plagiarism detection. National UK Plagiarism Advisory Service, http://www.ir.shef.ac.uk/cloughie/papers/pas_plagiarism.pdf .Dempster A. P., Laird N. M., Rubin D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.Dumais, S. T., Letsche, T. A., Littman, M. L., & Landauer, T. K. (1997). Automatic cross-language retrieval using latent semantic indexing. In D. Hull & D. Oard (Eds.), AAAI-97 spring symposium series: Cross-language text and speech retrieval (pp. 18–24). Stanford University, American Association for Artificial Intelligence.Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference for artificial intelligence, Hyderabad, India.Hoad T. C., & Zobel, J. (2003). Methods for identifying versioned and plagiarised documents. American Society for Information Science and Technology, 54(3), 203–215.Levow, G.-A., Oard, D. W., & Resnik, P. (2005). Dictionary-based techniques for cross-language information retrieval. Information Processing & Management, 41(3), 523–547.Littman, M., Dumais, S. T., & Landauer, T. K. (1998). Automatic cross-language information retrieval using latent semantic indexing. In Cross-language information retrieval, chap. 5 (pp. 51–62). Kluwer.Maurer, H., Kappe, F., & Zaka, B. (2006). Plagiarism—a survey. Journal of Universal Computer Science, 12(8), 1050–1084.McCabe, D. (2005). Research report of the Center for Academic Integrity. http://www.academicintegrity.org .Mcnamee, P., & Mayfield, J. (2004). Character N-gram tokenization for European language text retrieval. Information Retrieval, 7(1–2), 73–97.Meyer zu Eissen, S., & Stein, B. (2006). Intrinsic plagiarism detection. In M. Lalmas, A. MacFarlane, S. M. Rüger, A. Tombros, T. Tsikrika, & A. Yavlinsky (Eds.), Proceedings of the European conference on information retrieval (ECIR 2006), volume 3936 of Lecture Notes in Computer Science (pp. 565–569). Springer.Meyer zu Eissen, S., Stein, B., & Kulig, M. (2007). Plagiarism detection without reference collections. In R. Decker & H. J. Lenz (Eds.), Advances in data analysis (pp. 359–366), Springer.Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.Pinto, D., Juan, A., & Rosso, P. (2007). Using query-relevant documents pairs for cross-lingual information retrieval. In V. Matousek & P. Mautner (Eds.), Lecture Notes in Artificial Intelligence (pp. 630–637). Pilsen, Czech Republic.Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., & Rosso, P. (2009). A statistical approach to cross-lingual natural language tasks. Journal of Algorithms, 64(1), 51–60.Potthast, M. (2007). Wikipedia in the pocket-indexing technology for near-duplicate detection and high similarity search. In C. Clarke, N. Fuhr, N. Kando, W. Kraaij, & A. de Vries (Eds.), 30th Annual international ACM SIGIR conference (pp. 909–909). ACM.Potthast, M., Stein, B., & Anderka, M. (2008). A Wikipedia-based multilingual retrieval model. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, & R. W. White (Eds.), 30th European conference on IR research, ECIR 2008, Glasgow , volume 4956 LNCS of Lecture Notes in Computer Science (pp. 522–530). Berlin: Springer.Pouliquen, B., Steinberger, R., & Ignat, C. (2003a). Automatic annotation of multilingual text collections with a conceptual thesaurus. In Proceedings of the workshop ’ontologies and information extraction’ at the Summer School ’The Semantic Web and Language Technology—its potential and practicalities’ (EUROLAN’2003) (pp. 9–28), Bucharest, Romania.Pouliquen, B., Steinberger, R., & Ignat, C. (2003b). Automatic identification of document translations in large multilingual document collections. In Proceedings of the international conference recent advances in natural language processing (RANLP’2003) (pp. 401–408). Borovets, Bulgaria.Stein, B. (2007). Principles of hash-based text retrieval. In C. Clarke, N. Fuhr, N. Kando, W. Kraaij, & A. de Vries (Eds.), 30th Annual international ACM SIGIR conference (pp. 527–534). ACM.Stein, B. (2005). Fuzzy-fingerprints for text-based information retrieval. In K. Tochtermann & H. Maurer (Eds.), Proceedings of the 5th international conference on knowledge management (I-KNOW 05), Graz, Journal of Universal Computer Science. (pp. 572–579). Know-Center.Stein, B., & Anderka, M. (2009). Collection-relative representations: A unifying view to retrieval models. In A. M. Tjoa & R. R. Wagner (Eds.), 20th International conference on database and expert systems applications (DEXA 09) (pp. 383–387). IEEE.Stein, B., & Meyer zu Eissen, S. (2007). Intrinsic plagiarism analysis with meta learning. In B. Stein, M. Koppel, & E. Stamatatos (Eds.), SIGIR workshop on plagiarism analysis, authorship identification, and near-duplicate detection (PAN 07) (pp. 45–50). CEUR-WS.org.Stein, B., & Potthast, M. (2007). Construction of compact retrieval models. In S. Dominich & F. Kiss (Eds.), Studies in theory of information retrieval (pp. 85–93). Foundation for Information Society.Stein, B., Meyer zu Eissen, S., & Potthast, M. (2007). Strategies for retrieving plagiarized documents. In C. Clarke, N. Fuhr, N. Kando, W. Kraaij, & A. de Vries (Eds.), 30th Annual international ACM SIGIR conference (pp. 825–826). ACM.Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., & Varga, D. (2006). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th international conference on language resources and evaluation (LREC’2006).Steinberger, R., Pouliquen, B., & Ignat, C. (2004). Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications. In Proceedings of the 4th Slovenian language technology conference. Information Society 2004 (IS’2004).Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2003). Inferring a semantic representation of text via cross-language correlation analysis. In S. Becker, S. Thrun, & K. Obermayer (Eds.), NIPS-02: Advances in neural information processing systems (pp. 1473–1480). MIT Press.Yang, Y., Carbonell, J. G., Brown, R. D., & Frederking, R. E. (1998). Translingual information retrieval: Learning from bilingual corpora. Artificial Intelligence, 103(1–2), 323–345

    Dynamics of conflicts in Wikipedia

    Get PDF
    In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the overall behavior of the articles. We are able to distinguish cases eventually leading to consensus from those cases where a compromise is far from achievable. Finally, we analyze discussion networks and conclude that edit wars are mainly fought by few editors only.Comment: Supporting information adde
    corecore