4 research outputs found

    Automatic Geotagging of Russian Web Sites

    Full text link
    The poster describes a fast, simple, yet accurate method to associate large amounts of web resources stored in a search engine database with geographic locations. The method uses location-by-IP data, domain names, and content-related features: ZIP and area codes. The novelty of the approach lies in building location-by-IP database by using continuous IP blocks method. Another contribution is domain name analysis. The method uses search engine infrastructure and makes it possible to effectively associate large amounts of search engine data with geography on a regular basis. Experiments ran on Yandex search engine index; evaluation has proved the efficacy of the approach.ACM Special Interest Group on Hypertext, Hypermedia, and We

    Automatic geotagging of Russian web sites

    Full text link

    One model to rule them all: unified classification model for geotagging websites

    Full text link
    The paper presents a novel approach to finding regional scopes (geotagging) of websites. It relies on a single binary classification model per region type to perform the multi-label classification and uses a variety of different features that have not been yet used together for machine-learning based regional classification of websites. The evaluation demonstrates the advantage of our one model per region type method versus the traditional one model per region approach

    Единая модСль для гСоклассификации Π²Π΅Π±-сайтов

    Get PDF
    The paper presents a novel approach to finding regional scopes (geotagging) of websites. Unlike the traditional approaches, which generally involve training a separate classification model for each class (region), the proposed method is based on training a single model which is used for all regions of the same type (e.g. cities). This approach is made possible by the usage of ”relative” features which indicate how a selected region matches up to other regions for a given website. The classification system uses a variety of features of different nature that have not been yet used together for machine-learning based regional classification of websites. The evaluation demonstrates the advantage of our ”one model per region type” method versus the traditional ”one model per region” approach. A separate experiment demonstrates the ability of the proposed classifier to successfully detect regions which were not present in the training set (which is impossible for traditional approaches).Π Π°Π±ΠΎΡ‚Π° прСдставляСт Π½ΠΎΠ²Ρ‹ΠΉ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ ΠΊ Π·Π°Π΄Π°Ρ‡Π΅ опрСдСлСния Ρ€Π΅Π³ΠΈΠΎΠ½Π°Π»ΡŒΠ½ΠΎΠ³ΠΎ фокуса Π²Π΅Π±-сайтов (гСоклассификации). Π’ ΠΎΡ‚Π»ΠΈΡ‡ΠΈΠ΅ ΠΎΡ‚ Ρ‚Ρ€Π°Π΄ΠΈΡ†ΠΈΠΎΠ½Π½Ρ‹Ρ… ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΎΠ² ΠΊ ΠΌΠ½ΠΎΠ³ΠΎΠ·Π½Π°Ρ‡Π½ΠΎΠΉ классификации, ΠΊΠΎΠ³Π΄Π° для ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ класса (Ρ€Π΅Π³ΠΈΠΎΠ½Π°) обучаСтся ΠΏΠΎ ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½ΠΎΠΉ классификационной ΠΌΠΎΠ΄Π΅Π»ΠΈ, ΠΏΡ€Π΅Π΄Π»Π°Π³Π°Π΅ΠΌΡ‹ΠΉ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ основан Π½Π° ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΠΈ всСго ΠΎΠ΄Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ, которая ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ для всСх Ρ€Π΅Π³ΠΈΠΎΠ½ΠΎΠ² ΠΎΠ΄Π½ΠΎΠ³ΠΎ Ρ‚ΠΈΠΏΠ° (Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, для Π³ΠΎΡ€ΠΎΠ΄ΠΎΠ²). Π’Π°ΠΊΠΎΠΉ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ становится Π²ΠΎΠ·ΠΌΠΎΠΆΠ½Ρ‹ΠΌ благодаря использованию "ΠΎΡ‚Π½ΠΎΡΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹Ρ…" Ρ„Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ², ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ ΠΏΠΎΠΊΠ°Π·Ρ‹Π²Π°ΡŽΡ‚, ΠΊΠ°ΠΊ Π½Π΅ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Π²Ρ‹Π±Ρ€Π°Π½Π½Ρ‹ΠΉ Ρ€Π΅Π³ΠΈΠΎΠ½ соотносится с Π΄Ρ€ΡƒΠ³ΠΈΠΌΠΈ Ρ€Π΅Π³ΠΈΠΎΠ½Π°ΠΌΠΈ для Π·Π°Π΄Π°Π½Π½ΠΎΠ³ΠΎ Π²Π΅Π±-сайта. ΠšΠ»Π°ΡΡΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ задСйствуСт большой Π½Π°Π±ΠΎΡ€ Ρ€Π°Π·Π½ΠΎΡ€ΠΎΠ΄Π½Ρ‹Ρ… Ρ„Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ², ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π΄ΠΎ этого ΠΌΠΎΠΌΠ΅Π½Ρ‚Π° Π½Π΅ использовались вмСстС для гСоклассификации Π²Π΅Π±-сайтов с ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ΠΌ машинного обучСния. ΠžΡ†Π΅Π½ΠΊΠ° качСства дСмонстрируСт прСимущСство нашСго ΠΏΠΎΠ΄Ρ…ΠΎΠ΄Π° "ΠΏΠΎ ΠΎΠ΄Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π½Π° Ρ‚ΠΈΠΏ Ρ€Π΅Π³ΠΈΠΎΠ½Π°" ΠΏΠ΅Ρ€Π΅Π΄ Ρ‚Ρ€Π°Π΄ΠΈΡ†ΠΈΠΎΠ½Π½Ρ‹ΠΌ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΎΠΌ "ΠΏΠΎ ΠΎΠ΄Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π½Π° Ρ€Π΅Π³ΠΈΠΎΠ½". ΠžΡ‚Π΄Π΅Π»ΡŒΠ½Ρ‹ΠΉ экспСримСнт дСмонстрируСт ΡΠΏΠΎΡΠΎΠ±Π½ΠΎΡΡ‚ΡŒ описываСмого классификатора ΡƒΡΠΏΠ΅ΡˆΠ½ΠΎ Π΄Π΅Ρ‚Π΅ΠΊΡ‚ΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ Ρ€Π΅Π³ΠΈΠΎΠ½Ρ‹, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ отсутствовали Π² ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰Π΅ΠΉ Π²Ρ‹Π±ΠΎΡ€ΠΊΠ΅ (Ρ‡Ρ‚ΠΎ Π½Π΅Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ ΠΏΡ€ΠΈ использовании Ρ‚Ρ€Π°Π΄ΠΈΡ†ΠΈΠΎΠ½Π½Ρ‹Ρ… ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΎΠ²)
    corecore