Search CORE

153,592 research outputs found

Cross-Language Plagiarism Detection

Author: A. P. Dempster
Alberto Barrón-Cedeño
Benno Stein
D. Pinto
F. J. Och
G.-A. Levow
H. Maurer
L. E. Baum
Martin Potthast
P. F. Brown
P. Mcnamee
Paolo Rosso
T. C. Hoad
Y. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (1) a comprehensive retrieval process for cross-language plagiarism detection is introduced, highlighting the differences to monolingual plagiarism detection, (2) state-of-the-art solutions for two important subtasks are reviewed, (3) retrieval models for the assessment of cross-language similarity are surveyed, and, (4) the three models CL-CNG, CL-ESA and CL-ASA are compared. Our evaluation is of realistic scale: it relies on 120,000 test documents which are selected from the corpora JRC-Acquis and Wikipedia, so that for each test document highly similar documents are available in all of the six languages English, German, Spanish, French, Dutch, and Polish. The models are employed in a series of ranking tasks, and more than 100 million similarities are computed with each model. The results of our evaluation indicate that CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related. CL-ESA almost matches the performance of CL-CNG, but on arbitrary pairs of languages. CL-ASA works best on "exact" translations but does not generalize well.This work was partially supported by the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 project and the CONACyT-Mexico 192021 grant.Potthast, M.; Barrón Cedeño, LA.; Stein, B.; Rosso, P. (2011). Cross-Language Plagiarism Detection. Language Resources and Evaluation. 45(1):45-62. https://doi.org/10.1007/s10579-009-9114-zS4562451Ballesteros, L. A. (2001). Resolving ambiguity for cross-language information retrieval: A dictionary approach. PhD thesis, University of Massachusetts Amherst, USA, Bruce Croft.Barrón-Cedeño, A., Rosso, P., Pinto, D., & Juan A. (2008). On cross-lingual plagiarism analysis using a statistical model. In S. Benno, S. Efstathios, & K. Moshe (Eds.), ECAI 2008 workshop on uncovering plagiarism, authorship, and social software misuse (PAN 08) (pp. 9–13). Patras, Greece.Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities, 3, 1–8.Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In SIGIR’99: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (vol. 4629, pp. 222–229). Berkeley, California, United States: ACM.Brin, S., Davis, J., & Garcia-Molina, H. (1995). Copy detection mechanisms for digital documents. In SIGMOD ’95 (pp. 398–409). New York, NY, USA: ACM Press.Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.Ceska, Z., Toman, M., & Jezek, K. (2008). Multilingual plagiarism detection. In AIMSA’08: Proceedings of the 13th international conference on artificial intelligence (pp. 83–92). Berlin, Heidelberg: Springer.Clough, P. (2003). Old and new challenges in automatic plagiarism detection. National UK Plagiarism Advisory Service, http://www.ir.shef.ac.uk/cloughie/papers/pas_plagiarism.pdf .Dempster A. P., Laird N. M., Rubin D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.Dumais, S. T., Letsche, T. A., Littman, M. L., & Landauer, T. K. (1997). Automatic cross-language retrieval using latent semantic indexing. In D. Hull & D. Oard (Eds.), AAAI-97 spring symposium series: Cross-language text and speech retrieval (pp. 18–24). Stanford University, American Association for Artificial Intelligence.Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference for artificial intelligence, Hyderabad, India.Hoad T. C., & Zobel, J. (2003). Methods for identifying versioned and plagiarised documents. American Society for Information Science and Technology, 54(3), 203–215.Levow, G.-A., Oard, D. W., & Resnik, P. (2005). Dictionary-based techniques for cross-language information retrieval. Information Processing & Management, 41(3), 523–547.Littman, M., Dumais, S. T., & Landauer, T. K. (1998). Automatic cross-language information retrieval using latent semantic indexing. In Cross-language information retrieval, chap. 5 (pp. 51–62). Kluwer.Maurer, H., Kappe, F., & Zaka, B. (2006). Plagiarism—a survey. Journal of Universal Computer Science, 12(8), 1050–1084.McCabe, D. (2005). Research report of the Center for Academic Integrity. http://www.academicintegrity.org .Mcnamee, P., & Mayfield, J. (2004). Character N-gram tokenization for European language text retrieval. Information Retrieval, 7(1–2), 73–97.Meyer zu Eissen, S., & Stein, B. (2006). Intrinsic plagiarism detection. In M. Lalmas, A. MacFarlane, S. M. Rüger, A. Tombros, T. Tsikrika, & A. Yavlinsky (Eds.), Proceedings of the European conference on information retrieval (ECIR 2006), volume 3936 of Lecture Notes in Computer Science (pp. 565–569). Springer.Meyer zu Eissen, S., Stein, B., & Kulig, M. (2007). Plagiarism detection without reference collections. In R. Decker & H. J. Lenz (Eds.), Advances in data analysis (pp. 359–366), Springer.Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.Pinto, D., Juan, A., & Rosso, P. (2007). Using query-relevant documents pairs for cross-lingual information retrieval. In V. Matousek & P. Mautner (Eds.), Lecture Notes in Artificial Intelligence (pp. 630–637). Pilsen, Czech Republic.Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., & Rosso, P. (2009). A statistical approach to cross-lingual natural language tasks. Journal of Algorithms, 64(1), 51–60.Potthast, M. (2007). Wikipedia in the pocket-indexing technology for near-duplicate detection and high similarity search. In C. Clarke, N. Fuhr, N. Kando, W. Kraaij, & A. de Vries (Eds.), 30th Annual international ACM SIGIR conference (pp. 909–909). ACM.Potthast, M., Stein, B., & Anderka, M. (2008). A Wikipedia-based multilingual retrieval model. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, & R. W. White (Eds.), 30th European conference on IR research, ECIR 2008, Glasgow , volume 4956 LNCS of Lecture Notes in Computer Science (pp. 522–530). Berlin: Springer.Pouliquen, B., Steinberger, R., & Ignat, C. (2003a). Automatic annotation of multilingual text collections with a conceptual thesaurus. In Proceedings of the workshop ’ontologies and information extraction’ at the Summer School ’The Semantic Web and Language Technology—its potential and practicalities’ (EUROLAN’2003) (pp. 9–28), Bucharest, Romania.Pouliquen, B., Steinberger, R., & Ignat, C. (2003b). Automatic identification of document translations in large multilingual document collections. In Proceedings of the international conference recent advances in natural language processing (RANLP’2003) (pp. 401–408). Borovets, Bulgaria.Stein, B. (2007). Principles of hash-based text retrieval. In C. Clarke, N. Fuhr, N. Kando, W. Kraaij, & A. de Vries (Eds.), 30th Annual international ACM SIGIR conference (pp. 527–534). ACM.Stein, B. (2005). Fuzzy-fingerprints for text-based information retrieval. In K. Tochtermann & H. Maurer (Eds.), Proceedings of the 5th international conference on knowledge management (I-KNOW 05), Graz, Journal of Universal Computer Science. (pp. 572–579). Know-Center.Stein, B., & Anderka, M. (2009). Collection-relative representations: A unifying view to retrieval models. In A. M. Tjoa & R. R. Wagner (Eds.), 20th International conference on database and expert systems applications (DEXA 09) (pp. 383–387). IEEE.Stein, B., & Meyer zu Eissen, S. (2007). Intrinsic plagiarism analysis with meta learning. In B. Stein, M. Koppel, & E. Stamatatos (Eds.), SIGIR workshop on plagiarism analysis, authorship identification, and near-duplicate detection (PAN 07) (pp. 45–50). CEUR-WS.org.Stein, B., & Potthast, M. (2007). Construction of compact retrieval models. In S. Dominich & F. Kiss (Eds.), Studies in theory of information retrieval (pp. 85–93). Foundation for Information Society.Stein, B., Meyer zu Eissen, S., & Potthast, M. (2007). Strategies for retrieving plagiarized documents. In C. Clarke, N. Fuhr, N. Kando, W. Kraaij, & A. de Vries (Eds.), 30th Annual international ACM SIGIR conference (pp. 825–826). ACM.Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., & Varga, D. (2006). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th international conference on language resources and evaluation (LREC’2006).Steinberger, R., Pouliquen, B., & Ignat, C. (2004). Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications. In Proceedings of the 4th Slovenian language technology conference. Information Society 2004 (IS’2004).Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2003). Inferring a semantic representation of text via cross-language correlation analysis. In S. Becker, S. Thrun, & K. Obermayer (Eds.), NIPS-02: Advances in neural information processing systems (pp. 1473–1480). MIT Press.Yang, Y., Carbonell, J. G., Brown, R. D., & Frederking, R. E. (1998). Translingual information retrieval: Learning from bilingual corpora. Artificial Intelligence, 103(1–2), 323–345

Crossref

RiuNet

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Introductory programming: a systematic literature review

Author: Abu Naser Samy S.
Agarwal Achla
Ahmed
Ahren T. C.
Al-Jarrah Ahmad
Alammary Ali
Annamalai Subashini
Ayub Mewati
Badri Suzan
Bai Yu
Baird Bridget
Bandura Albert
Barlow-Jones Glenda
Bayliss Jessica D.
Ben-Ari Mordechai
Bennedsen Jens
Bennett Chris
Berglund Anders
Berland Matthew
Briggs Tom
Bumbacher Engin
Burch Carl
Carbonaro Antonella
Carbone Angela
Cardell-Oliver Rachel
Chad Lane H
Char Bruce
Charalampos Spyropoulos
Charles Therese
Chinn Donald
Chinn Donald
Corney Malcolm
Coull Natalie J
Crawford Stewart
Cruz Gilbert
de Raadt Michael
de Raadt Michael
de Raadt Michael
de Raadt Michael
de Raadt Michael
Devey Adrian
Dickson Paul E.
Dillon Edward
Doherty Liam
Durrheim Mark S.
D’Souza Daryl
Eagly Alice H
Edgcomb Alex
Edwards Stephen H.
Falkner Katrina
Firmalo Fabic Geela Venise
Fonseca Fred
Fürst Luka
Garner Stuart
Goadrich Mark
Gonzalez Gracielo
Gudmundsen Dee
Haghighi Pari Delir
Hare Brian K
Heliotis James
Hooshyar Danial
Hovemeyer David
Hu Minjie
Hu Minjie
Hu Yun-Jen
Huang Chenn-Jung
Jacqueline
Jayal Ambikesh
Jurado Francisco
Kanaparan Geetha
Kasto Nadia
Kiran L.
Kirby Stephen
Kitchenham Barbara
Kouznetsova Svetlana
Kölling Michael
LeJeune Noel
Leska Chuck
Lipman Derrell
Lister Raymond
Lister Raymond
Lister Raymond
Lopez Mike
Lulis Evelyn
Luoma Harri
Major L.
McKeown Jim
McWhorter William Isaac
Medley M. Dee
Mentis Alexander
Menyhárt László
Mullins Paul
Munson Jonathan P.
Muntha Surya
Murphy Laurie
Neto Vicente Lustosa
Nguyen Thuy-Linh
Okada Ken
Orehovački Tihomir
Orehovački Tihomir
Palmer James Dean
Park Myung Ah
Parsons Dale
Paul Jody
Peachock Patrick
Pearce Janice L.
Pero Štefan
Price Kellie
Quintin
Rajala Teemu
Ramli R.Z.
Ray Andrew
Rodrigo Maria Mercedes T
Roels Reinout
Rountree Janet
Russo Mark F.
Sanou Loé
Schoeffel Pablo
Schramm Joachim
Shabalina Olga
Sharp Jason H
Sheard Judy
Sheard Judy
Shuhidan Shuhaida
Sindre Guttorm
Skudder Ben
Song Hosung
Sorva Juha
Sung Kelvin
Takemura Yasuhiro
Teague D.
Teague Donna
Teague Donna
Teague Donna
Teague Donna
Thompson Errol
Torrey Lisa
Truong Nghi
Vincenti Giovanni
Wang Hong
Watkins Kera Z. B.
Weragama Dinesha
Whalley Jacqueline
Whalley Jacqueline
Whalley Jacqueline
Whittall S. J.
Whittinghill David
Wiebe E
Wood Krissi
Yoo Jungsoon P
Yusri Nurliana
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2018
Field of study

As computing becomes a mainstream discipline embedded in the school curriculum and acts as an enabler for an increasing range of academic disciplines in higher education, the literature on introductory programming is growing. Although there have been several reviews that focus on specific aspects of introductory programming, there has been no broad overview of the literature exploring recent trends across the breadth of introductory programming. This paper is the report of an ITiCSE working group that conducted a systematic review in order to gain an overview of the introductory programming literature. Partitioning the literature into papers addressing the student, teaching, the curriculum, and assessment, we explore trends, highlight advances in knowledge over the past 15 years, and indicate possible directions for future research

Michigan Technological University

Crossref

Falmouth University Research Repository (FURR)

ResearchOnline@GCU

Text Summarization Techniques: A Brief Survey

Author: Allahyari Mehdi
Assefi Mehdi
Gutierrez Juan B.
Kochut Krys
Pouriyeh Seyedamin
Safaei Saeid
Trippe Elizabeth D.
Publication venue
Publication date: 01/01/2017
Field of study

In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.Comment: Some of references format have update

arXiv.org e-Print Archive

Georgia Southern University: Digital Commons@Georgia Southern

Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

Author: Gujarathi Hemal S
McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Academy Publisher'
Publication date: 01/01/2009
Field of study

The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

University of Essex Research Repository

CiteSeerX

Crossref

Applying a User-centred Approach to Interactive Visualization Design

Author: A. Cooper
A. MacEachren
A. Sutcliffe
B. Latour
B. Shneiderman
C. Chen
C. North
C. van der Lelie
C. Ware
D. Benyon
D. G. Novick
D. Morgan
G. J. Trafton
H. Beyer
H. Javahery
H. Rauwerda
J. A. Landay
J. D. Thompson
J. M. Carroll
J. Nielsen
J. Preece
J. Seo
J. Zhang
K. Dunbar
L. Arnstein
L. E. Wood
M. Clamp
M. Graham
M. Rettig
M. Tory
O. Kulyk
P. A. Pevzner
P. Figueroa
R. Chenna
R. Poppe
R. Spence
S. Westerman
W. E. Mackay
Publication venue: Springer Verlag
Publication date: 01/01/2008
Field of study

Analysing users in their context of work and finding out how and why they use different information resources is essential to provide interactive visualisation systems that match their goals and needs. Designers should actively involve the intended users throughout the whole process. This chapter presents a user-centered approach for the design of interactive visualisation systems. We describe three phases of the iterative visualisation design process: the early envisioning phase, the global specification hase, and the detailed specification phase. The whole design cycle is repeated until some criterion of success is reached. We discuss different techniques for the analysis of users, their tasks and domain. Subsequently, the design of prototypes and evaluation methods in visualisation practice are presented. Finally, we discuss the practical challenges in design and evaluation of collaborative visualisation environments. Our own case studies and those of others are used throughout the whole chapter to illustrate various approaches

VU Research Portal

Crossref

University of Twente Research Information

Real-time and Probabilistic Temporal Logics: An Overview

Author: Konur Savas
Publication venue
Publication date: 01/01/2010
Field of study

Over the last two decades, there has been an extensive study on logical formalisms for specifying and verifying real-time systems. Temporal logics have been an important research subject within this direction. Although numerous logics have been introduced for the formal specification of real-time and complex systems, an up to date comprehensive analysis of these logics does not exist in the literature. In this paper we analyse real-time and probabilistic temporal logics which have been widely used in this field. We extrapolate the notions of decidability, axiomatizability, expressiveness, model checking, etc. for each logic analysed. We also provide a comparison of features of the temporal logics discussed

arXiv.org e-Print Archive

CiteSeerX