164 research outputs found

    Towards the Automatic Classification of Documents in User-generated Classifications

    Get PDF
    There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing

    Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution

    Get PDF
    This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Strik- ing the right balance between running time and cluster well- formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the ?y by processing only the snippets provided by the auxil- iary search engines, and use no external sources of knowl- edge. Clustering is performed by means of a fast version of the furthest-point-?rst algorithm for metric k-center cluster- ing. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering ef- fectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Di- rectory Project hierarchy. According to two widely accepted external\u27 metrics of clustering quality, Armil achieves bet- ter performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. On a standard 1GHz ma- chine, Armil performs clustering and labelling altogether in less than one second

    An adaptive meta-search engine considering the user’s field of interest

    Get PDF
    AbstractExisting meta-search engines return web search results based on the page relevancy to the query, their popularity and content. It is necessary to provide a meta-search engine capable of ranking results considering the user’s field of interest. Social networks can be useful to find the users’ tendencies, favorites, skills, and interests. In this paper we propose MSE, a meta-search engine for document retrieval utilizing social information of the user. In this approach, each user is assumed to have a profile containing his fields of interest. MSE extracts main phrases from the title and short description of receiving results from underlying search engines. Then it clusters the main phrases by a Self-Organizing Map neural network. Generated clusters are then ranked on the basis of the user’s field of interest. We have compared the proposed MSE against two other meta-search engines. The experimental results show the efficiency and effectiveness of the proposed method

    Semantic Clustering of Search Engine Results

    Get PDF
    This paper presents a novel approach for search engine results clustering that relies on the semantics of the retrieved documents rather than the terms in those documents. The proposed approach takes into consideration both lexical and semantics similarities among documents and applies activation spreading technique in order to generate semantically meaningful clusters. This approach allows documents that are semantically similar to be clustered together rather than clustering documents based on similar terms. A prototype is implemented and several experiments are conducted to test the prospered solution. The result of the experiment confirmed that the proposed solution achieves remarkable results in terms of precision

    From Classification to Indexing: How Automation Transforms the Way we Think

    Get PDF
    To classify is to organize the particulars in a body of information according to some meaningful scheme. Difficulty recognizing metaphor, synonyms and homonyms, and levels of generalization renders those applications of artificial intelligence that are currently in widespread use at a loss to deal effectively with classification. Indexing conveys nothing about relationships; it pinpoints information on particular topics without reference to anything else. Keyword searching is a form of indexing, and here artificial intelligence excels. Growing reliance on automated means of accessing information brings an increase in indexing and a corresponding decrease in classification. This brings about a shift from the modernist view of the world as permanently and hierarchically structured to the indeterminacy and contingency associated with postmodernism

    History Of Search Engines

    Get PDF
    As the number of sites on the Web increased in the mid-to-late 90s, search engines started appearing to help people find information quickly. Search engines developed business models to finance their services, such as pay per click programs offered by Open Text in 1996 and then Goto.com in 1998. Goto.com later changed its name to Overture in 2001, and was purchased by Yahoo! in 2003, and now offers paid search opportunities for advertisers through Yahoo! Search Marketing. Google also began to offer advertisements on search results pages in 2000 through the Google Ad Words program. By 2007, pay-per-click programs proved to be primary money-makers for search engines. In a market dominated by Google, in 2009 Yahoo! and Microsoft announced the intention to forge an alliance. The Yahoo! & Microsoft Search Alliance eventually received approval from regulators in the US and Europe in February 2010. Search engine optimization consultants expanded their offerings to help businesses learn about and use the advertising opportunities offered by search engines, and new agencies focusing primarily upon marketing and advertising through search engines emerged. The term "Search Engine Marketing" was proposed by Danny Sullivan in 2001 to cover the spectrum of activities involved in performing SEO, managing paid listings at the search engines, submitting sites to directories, and developing online marketing strategies for businesses, organizations, and individuals. Some of the latest theoretical advances include Search Engine Marketing Management (SEMM). SEMM relates to activities including SEO but focuses on return on investment (ROI) management instead of relevant traffic building (as is the case of mainstream SEO). SEMM also integrates organic SEO, trying to achieve top ranking without using paid means of achieving top in search engines, and PayPerClick SEO. For example some of the attention is placed on the web page layout design and how content and information is displayed to the website visitor

    Google and Beyond: Finding Information Using Search Engines, and Evaluating Your Results

    Full text link
    Searching the World Wide Web can be a daunting task. The Web has expanded at such a rapid pace that nobody knows exactly how large it is, but it is safe to say that there are many billions of Web pages residing on servers all over the world. Add to this scenario the task of evaluating information found on the web and choosing between the hundreds of different search tools available – including directories, search engines, meta-searchers, and specialized search engines – and the situation begins to feel overwhelming. Fortunately, learning a few essential concepts of Web searching and site evaluation, along with mastering a handful of the top-rated search tools, can make the picture much brighter. This paper will discuss the basics of a few search engines and provide examples of advanced searches that can be done to increase your searching efficiency. It will also address the task of assessing the quality of the information you find on the Internet. In addition it will list and describe places to go for more information on improving you Internet searching and evaluating skills

    Using Search Engines to Find Online Medical Information

    Get PDF
    Al-Ubaydli shares some useful tips for making the most of search engines
    • 

    corecore