35,086 research outputs found

    Spatial information retrieval and geographical ontologies: an overview of the SPIRIT project

    Get PDF
    A large proportion of the resources available on the world-wide web refer to information that may be regarded as geographically located. Thus most activities and enterprises take place in one or more places on the Earth's surface and there is a wealth of survey data, images, maps and reports that relate to specific places or regions. Despite the prevalence of geographical context, existing web search facilities are poorly adapted to help people find information that relates to a particular location. When the name of a place is typed into a typical search engine, web pages that include that name in their text will be retrieved, but it is likely that many resources that are also associated with the place may not be retrieved. Thus resources relating to places that are inside the specified place may not be found, nor may be places that are nearby or that are equivalent but referred to by another name. Specification of geographical context frequently requires the use of spatial relationships concerning distance or containment for example, yet such terminology cannot be understood by existing search engines. Here we provide a brief survey of existing facilities for geographical information retrieval on the web, before describing a set of tools and techniques that are being developed in the project SPIRIT : Spatially-Aware Information Retrieval on the Internet (funded by European Commission Framework V Project IST-2001-35047)

    An automatically built named entity lexicon for Arabic

    Get PDF
    We have successfully adapted and extended the automatic Multilingual, Interoperable Named Entity Lexicon approach to Arabic, using Arabic WordNet (AWN) and Arabic Wikipedia (AWK). First, we extract AWN’s instantiable nouns and identify the corresponding categories and hyponym subcategories in AWK. Then, we exploit Wikipedia inter-lingual links to locate correspondences between articles in ten different languages in order to identify Named Entities (NEs). We apply keyword search on AWK abstracts to provide for Arabic articles that do not have a correspondence in any of the other languages. In addition, we perform a post-processing step to fetch further NEs from AWK not reachable through AWN. Finally, we investigate diacritization using matching with geonames databases, MADA-TOKAN tools and different heuristics for restoring vowel marks of Arabic NEs. Using this methodology, we have extracted approximately 45,000 Arabic NEs and built, to the best of our knowledge, the largest, most mature and well-structured Arabic NE lexical resource to date. We have stored and organised this lexicon following the Lexical Markup Framework (LMF) ISO standard. We conduct a quantitative and qualitative evaluation of the lexicon against a manually annotated gold standard and achieve precision scores from 95.83% (with 66.13% recall) to 99.31% (with 61.45% recall) according to different values of a threshold

    Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page

    Full text link
    Each month, more attacks are launched with the aim of making web users believe that they are communicating with a trusted entity which compels them to share their personal, financial information. Phishing costs Internet users billions of dollars every year. Researchers at Carnegie Mellon University (CMU) created an anti-phishing landing page supported by Anti-Phishing Working Group (APWG) with the aim to train users on how to prevent themselves from phishing attacks. It is used by financial institutions, phish site take down vendors, government organizations, and online merchants. When a potential victim clicks on a phishing link that has been taken down, he / she is redirected to the landing page. In this paper, we present the comparative analysis on two datasets that we obtained from APWG's landing page log files; one, from September 7, 2008 - November 11, 2009, and other from January 1, 2014 - April 30, 2014. We found that the landing page has been successful in training users against phishing. Forty six percent users clicked lesser number of phishing URLs from January 2014 to April 2014 which shows that training from the landing page helped users not to fall for phishing attacks. Our analysis shows that phishers have started to modify their techniques by creating more legitimate looking URLs and buying large number of domains to increase their activity. We observed that phishers are exploiting ICANN accredited registrars to launch their attacks even after strict surveillance. We saw that phishers are trying to exploit free subdomain registration services to carry out attacks. In this paper, we also compared the phishing e-mails used by phishers to lure victims in 2008 and 2014. We found that the phishing e-mails have changed considerably over time. Phishers have adopted new techniques like sending promotional e-mails and emotionally targeting users in clicking phishing URLs

    eStorys: A visual storyboard system supporting back-channel communication for emergencies

    Get PDF
    This is the post-print version of the final paper published in Journal of Visual Languages & Computing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2010 Elsevier B.V.In this paper we present a new web mashup system for helping people and professionals to retrieve information about emergencies and disasters. Today, the use of the web during emergencies, is confirmed by the employment of systems like Flickr, Twitter or Facebook as demonstrated in the cases of Hurricane Katrina, the July 7, 2005 London bombings, and the April 16, 2007 shootings at Virginia Polytechnic University. Many pieces of information are currently available on the web that can be useful for emergency purposes and range from messages on forums and blogs to georeferenced photos. We present here a system that, by mixing information available on the web, is able to help both people and emergency professionals in rapidly obtaining data on emergency situations by using multiple web channels. In this paper we introduce a visual system, providing a combination of tools that demonstrated to be effective in such emergency situations, such as spatio/temporal search features, recommendation and filtering tools, and storyboards. We demonstrated the efficacy of our system by means of an analytic evaluation (comparing it with others available on the web), an usability evaluation made by expert users (students adequately trained) and an experimental evaluation with 34 participants.Spanish Ministry of Science and Innovation and Universidad Carlos III de Madrid and Banco Santander

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
    • …
    corecore