Search CORE

3,329 research outputs found

Natural language processing

Author: Adams
Amsler
Bangalore
Barker
Benoît
Bian
Bondale
Carrick
Ceric
Chandrasekar
Chang
Charniak
Chen
Chowdhury
Chowdhury
Costantino
Cowie
Craven
Craven
Craven
Dogru
Evans
Feldman
Fernandez
Gaizauskas
Glasgow
Haas
Hayes
Hayes
Hedlund
Herath
Ide
Isahara
Jelinek
Jeong
Jurafsky
Kazakov
Kehler
Khoo
Kim
King
Lange
Lee
Lehmam
Lehtokangas
Lewis
Liddy
Liddy
Lovis
Ma
Magnini
Mani
Manning
Marquez
Martinez
Martinez
McMurchie
Meyer
Mihalcea
Mock
Moens
Morin
Narita
Nerbonne
Oard
Ogura
Oudet
Owei
Paris
Pasero
Pedersen
Perez-Carballo
Petreley
Pirkola
Poesio
Rosenfield
Roux
Say
Scarlett
Schenker
Silber
Smeaton
Smeaton
Smith
Sokol
Song
Sparck Jones
Staab
Stock
Tolle
Trybula
Tsuda
Vickery
Waldrop
Warner
Weigard
Wilks
Wong
Yang
Yang
Zadrozny
Zweigenbaum
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney

Categorization of web sites in Turkey with SVM

Author: Şimşek Kadir
Publication venue: Izmir Institute of Technology
Publication date: 01/01/2004
Field of study

Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2004Includes bibliographical references (leaves: 61-63)Text in English; Abstract: Turkish and Englishix, 70 leavesIn this study of topic .Categorization of Web Sites in Turkey with SVM. after a brief introduction to what the World Wide Web is and a more detailed description of text categorization and web site categorization concepts, categorization of web sites including all prerequisites for classification task takes part. As an information resource the web has an undeniable importance in human life. However the huge structure of the web and its uncontrolled growth led to new information retrieval research areas to be risen in last years. Web mining, the general name of these studies, investigates activities and structures on the web to automatically discover and gather meaningful information from the web documents. It consists of three subfields: .Web Structure Mining., .Web Content Mining. and .Web Usage Mining.. In this project, web content mining concept was applied on the web sites in Turkey during the categorization process. Support Vector Machine, a supervised learning method based on statistics and principle of structural risk minimization is used as the machine learning technique for web site categorization. This thesis is intended to draw a conclusion about web site distributions with respect to thematic categorization based on text. The popular web directory Yahoo.s 12 top level categories were used in this project. Beside of the main purpose, we gathered several statistical descriptive informations about web sites and contents used in html pages. Metatag usage percentages, html design structures and plug-in usage are some of these information. The processes taken through solution, start with employing a web downloader which downloads web page contents and other information such as frame content from each web site. Next, manipulating, parsing and simplifying the downloaded documents takes place. At this point, preperations for categorization task are completed. Then, by applying Support Vector Machine (SVM) package SVMLight developed by Thorsten Joachims, web sites are classified under given categories. The classification results obtained in the last section show that there are some over-lapping categories exist and accuracy and precision values are between 60-80. In addition to categorization results, we saw that almost 17 of web sites utilize html frames and 9367 web sites include metakeywords

Overview of the 2005 cross-language image retrieval track (ImageCLEF)

Author: Clough P.
Deselaers T.
Grubinger M.
Hersh W.
Jensen J.
Lehmann T.
Müller H.
Publication venue
Publication date: 01/01/2005
Field of study

The purpose of this paper is to outline efforts from the 2005 CLEF crosslanguage image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore the use of both text and content-based retrieval methods for cross-language image retrieval. Four tasks were offered in the ImageCLEF track: a ad-hoc retrieval from an historic photographic collection, ad-hoc retrieval from a medical collection, an automatic image annotation task, and a user-centered (interactive) evaluation task that is explained in the iCLEF summary. 24 research groups from a variety of backgrounds and nationalities (14 countries) participated in ImageCLEF. In this paper we describe the ImageCLEF tasks, submissions from participating groups and summarise the main fndings

White Rose Research Online

A comparative evaluation of popular search engines on finding Turkish documents for a specific time period

Author: Abdül Kadir Görür
Yıltan Bitirim
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2017
Field of study

U ovom se istraživanju ocjenjuju popularni mehanizmi za pretraživanje, Google, Yahoo, Bing, i Ask, pri traženju turskih dokumenata usporedbom njihovog sadašnjeg rada s radom izmjerenim prije šest godina. Nadalje, istraživanje pokazuje sadašnju učinkovitost mehanizama u pronalaženju podataka. Najprije su učinjeni upiti za turske riječi odvojeno na svakom mehanizmu. Svaki pronađeni dokument je klasificiran, a izračunati su omjeri točnosti na raznim cut-off točkama za svaki upit i svaki mehanizam. Zatim su ti omjeri uspoređeni s onima od prije šest godina zbog procjene. Pored opisne statistike, korišteni su Mann-Whitney U i Kruskal-Wallis H statistički testovi kako bi se pronašle statistički značajne razlike. Svi mehanizmi za ispitivanje osim Google-a danas su učinkovitiji. Bing je njviše napredovao u odnosu na prije šest godina. Danas Yahoo ima najviše prosječne omjere točnosti u raznim cut-off točkama. Svi mehanizmi za pretraživanje imaju najviše prosječne omjere točnosti u cut-off točki 5; ugašene veze (dead links) su nađene u Google-u, Bingu i Asku, a ponovljeni dokumenti u Google-u i Yahoo-u.This study evaluates the popular search engines, Google, Yahoo, Bing, and Ask, on finding Turkish documents by comparing their current performances with their performances measured six years ago. Furthermore, the study reveals the current information retrieval effectiveness of the search engines. First of all, the Turkish queries were run on the search engines separately. Each retrieved document was classified and precision ratios were calculated at various cut-off points for each query and engine pair. Afterwards, these ratios were compared with the six years ago ratios for the evaluations. Besides the descriptive statistics, Mann-Whitney U and Kruskal-Wallis H statistical tests were used in order to find out statistically significant differences. All search engines, except Google, have better performance today. Bing has the most increased performance compared to six years ago. Nowadays: Yahoo has the highest mean precision ratios at various cut-off points; all search engines have their highest mean precision ratios at cut-off point 5; dead links were encountered in Google, Bing, and Ask; and repeated documents were encountered in Google and Yahoo

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Effectiveness of Title-Search vs. Full-Text Search in the Web

Author: Kuflik Tsvi
Shoval Peretz
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2004
Field of study

Search engines sometimes apply the search on the full text of documents or web-pages; but sometimes they can apply the search on selected parts of the documents only, e.g. their titles. Full-text search may consume a lot of computing resources and time. It may be possible to save resources by applying the search on the titles of documents only, assuming that a title of a document provides a concise representation of its content. We tested this assumption using Google search engine. We ran search queries that have been defined by users, distinguishing between two types of queries/users: queries of users who are familiar with the area of the search, and queries of users who are not familiar with the area of the search. We found that searches which use titles provide similar and sometimes even (slightly) better results compared to searches which use the full-text. These results hold for both types of queries/users. Moreover, we found an advantage in title-search when searching in unfamiliar areas because the general terms used in queries in unfamiliar areas match better with general terms which tend to be used in document titles

Bulgarian Digital Mathematics Library at IMI-BAS