7,507 research outputs found

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    A comparative evaluation of popular search engines on finding Turkish documents for a specific time period

    Get PDF
    U ovom se istraživanju ocjenjuju popularni mehanizmi za pretraživanje, Google, Yahoo, Bing, i Ask, pri traženju turskih dokumenata usporedbom njihovog sadašnjeg rada s radom izmjerenim prije šest godina. Nadalje, istraživanje pokazuje sadašnju učinkovitost mehanizama u pronalaženju podataka. Najprije su učinjeni upiti za turske riječi odvojeno na svakom mehanizmu. Svaki pronađeni dokument je klasificiran, a izračunati su omjeri točnosti na raznim cut-off točkama za svaki upit i svaki mehanizam. Zatim su ti omjeri uspoređeni s onima od prije šest godina zbog procjene. Pored opisne statistike, korišteni su Mann-Whitney U i Kruskal-Wallis H statistički testovi kako bi se pronašle statistički značajne razlike. Svi mehanizmi za ispitivanje osim Google-a danas su učinkovitiji. Bing je njviše napredovao u odnosu na prije šest godina. Danas Yahoo ima najviše prosječne omjere točnosti u raznim cut-off točkama. Svi mehanizmi za pretraživanje imaju najviše prosječne omjere točnosti u cut-off točki 5; ugašene veze (dead links) su nađene u Google-u, Bingu i Asku, a ponovljeni dokumenti u Google-u i Yahoo-u.This study evaluates the popular search engines, Google, Yahoo, Bing, and Ask, on finding Turkish documents by comparing their current performances with their performances measured six years ago. Furthermore, the study reveals the current information retrieval effectiveness of the search engines. First of all, the Turkish queries were run on the search engines separately. Each retrieved document was classified and precision ratios were calculated at various cut-off points for each query and engine pair. Afterwards, these ratios were compared with the six years ago ratios for the evaluations. Besides the descriptive statistics, Mann-Whitney U and Kruskal-Wallis H statistical tests were used in order to find out statistically significant differences. All search engines, except Google, have better performance today. Bing has the most increased performance compared to six years ago. Nowadays: Yahoo has the highest mean precision ratios at various cut-off points; all search engines have their highest mean precision ratios at cut-off point 5; dead links were encountered in Google, Bing, and Ask; and repeated documents were encountered in Google and Yahoo

    Generic Text Summarization for Turkish

    Full text link

    Categorization of web sites in Turkey with SVM

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2004Includes bibliographical references (leaves: 61-63)Text in English; Abstract: Turkish and Englishix, 70 leavesIn this study of topic .Categorization of Web Sites in Turkey with SVM. after a brief introduction to what the World Wide Web is and a more detailed description of text categorization and web site categorization concepts, categorization of web sites including all prerequisites for classification task takes part. As an information resource the web has an undeniable importance in human life. However the huge structure of the web and its uncontrolled growth led to new information retrieval research areas to be risen in last years. Web mining, the general name of these studies, investigates activities and structures on the web to automatically discover and gather meaningful information from the web documents. It consists of three subfields: .Web Structure Mining., .Web Content Mining. and .Web Usage Mining.. In this project, web content mining concept was applied on the web sites in Turkey during the categorization process. Support Vector Machine, a supervised learning method based on statistics and principle of structural risk minimization is used as the machine learning technique for web site categorization. This thesis is intended to draw a conclusion about web site distributions with respect to thematic categorization based on text. The popular web directory Yahoo.s 12 top level categories were used in this project. Beside of the main purpose, we gathered several statistical descriptive informations about web sites and contents used in html pages. Metatag usage percentages, html design structures and plug-in usage are some of these information. The processes taken through solution, start with employing a web downloader which downloads web page contents and other information such as frame content from each web site. Next, manipulating, parsing and simplifying the downloaded documents takes place. At this point, preperations for categorization task are completed. Then, by applying Support Vector Machine (SVM) package SVMLight developed by Thorsten Joachims, web sites are classified under given categories. The classification results obtained in the last section show that there are some over-lapping categories exist and accuracy and precision values are between 60-80. In addition to categorization results, we saw that almost 17 of web sites utilize html frames and 9367 web sites include metakeywords

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    EBSLG Annual General Conference, 18. - 21.05.2010, Cologne. Selected papers

    Get PDF
    Am 18.-21. Mai 2010 fand in der Universitäts- und Stadtbibliothek (USB) Köln die „Annual General Conference“ der European Business Schools Librarians Group (EBSLG) statt. Die EBSLG ist eine relativ kleine, aber exklusive Gruppe von Bibliotheksdirektorinnen und –direktoren bzw. Bibliothekarinnen und Bibliothekaren in Leitungspositionen aus den Bibliotheken führender Business Schools. Im Mittelpunkt der Tagung standen zwei Themenschwerpunkte: Der erste Themenkreis beschäftigte sich mit Bibliotheksportalen und bibliothekarischen Suchmaschinen. Der zweite Themenschwerpunkt Fragen der Bibliotheksorganisation wie die Aufbauorganisation einer Bibliothek, Outsourcing und Relationship Management. Der vorliegende Tagungsband enthält ausgewählte Tagungsbeiträge

    Overview of the 2005 cross-language image retrieval track (ImageCLEF)

    Get PDF
    The purpose of this paper is to outline efforts from the 2005 CLEF crosslanguage image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore the use of both text and content-based retrieval methods for cross-language image retrieval. Four tasks were offered in the ImageCLEF track: a ad-hoc retrieval from an historic photographic collection, ad-hoc retrieval from a medical collection, an automatic image annotation task, and a user-centered (interactive) evaluation task that is explained in the iCLEF summary. 24 research groups from a variety of backgrounds and nationalities (14 countries) participated in ImageCLEF. In this paper we describe the ImageCLEF tasks, submissions from participating groups and summarise the main fndings
    corecore