112 research outputs found

    マッシュアップ・リソースとマッシュアップ・グルー

    Get PDF
    WebDB forum 2008 : 2008年12月1日-2日 : 学習院 創立百周年記念会館 (主催:情報処理学会データベースシステム研究会, 日本データベース学会, 電子情報通信学会 データ工学研究専門委員会)近年,店舗や駅などの検索結果と地図情報とを融合するサービスの実現手法としてマッシュアップが利用されるようになど,実用的で興味深いマッシュアップの例が多数見られるようになった.しかし,手元ににあるデータとWeb 上のサービスを組み合わせて簡単な処理を施し結果をまとめることや,それらの処理プログラムを試行錯誤して実務者が開発するといったような,実務的作業の能率をあげるといった観点でのマッシュアップ開発は少ない.本稿ではまず,マッシュアップを,対象(mashupresouce) と結合法(mashup glue) という二つの観点で捉えるプログラミング・スタイルを提案する.マッシュアップ対象は,API や検索サイト,そして手元のCSV ファイルなどであり,入力型と出力型が規定された機械としてとらえる.マッシュアップ結合法は,出力と入力の単純結合の他に,マージ,ソート,CGI リンク,各種グラフ表示などのフィルター機能からなる.さらに,ブラウザ上で開発できるマッシュアップ開発環境を構築した.Recently many entertaining mashups have been introduced, where mashup technique has been used to realize integration of web services, for example, web mapping services and search results about shops and stations from web search engines. However there are few studies how to develop practical mashups such as reporting and integrating both web data and spread sheet data on local PC, and how to improve business efficiency using mashup. In this paper, we propose a new programming style of mashup with respect to two perspectives, objects(mashup resource) and their functional composition(mashup glue). Mashup resources includes WebAPI, web search engines and local data on users’ computers. We consider them as I/O machines with data types. Mashup glue includes simple composition from output to input, filtering functions such as merging, sorting and CGI links, and visualizing components. In addition we developed a Web-based programming environment of mashups

    Algorithm, Experimentation

    No full text
    A Deep Web wrapper is a program that extracts contents from search results. We propose a new automatic wrapper generation algorithm which discovers a repetitive pattern from search results. The repetitive pattern is expressed by token sequences which consist of HTML tags, plain texts and wild-cards. The algorithm applies a string matching with mismatches to unify the variation from the template and uses FFT(fast Fourier transformation) to attain efficiency. We show an empirical evaluation of the algorithm for 51 Web databases

    A Report on Metadata for Web Databases(Web Data Mining)

    No full text
    利用は著作権の範囲内に限られますIncreasing number of Web Databases are available on the Web. The query for such databases is not just a keyword but a complex, query and that search results are not a listing of general. Web pages but a listing ofirecords generated dynamically from the database behind the Web interface. This paper is a survey of 2,880 Web databases listed in Dnavi (Japanese National Diet Library). The first part analyses the form components TextlnputFields, SelectMenu, RadioButtons and CheckBoxes for 880 "book search" DBs in Dnavi. The second part reports a strong connection between the input form and output data by analyzing the attribute names for 100 Web DBs chosen randomly from 1,541 DBs which accepts complex query.Increasing number of Web Databases are available on the Web. The query for such databases is not just a keyword but a complex, query and that search results are not a listing of general. Web pages but a listing ofirecords generated dynamically from the database behind the Web interface. This paper is a survey of 2,880 Web databases listed in Dnavi (Japanese National Diet Library). The first part analyses the form components TextlnputFields, SelectMenu, RadioButtons and CheckBoxes for 880 "book search" DBs in Dnavi. The second part reports a strong connection between the input form and output data by analyzing the attribute names for 100 Web DBs chosen randomly from 1,541 DBs which accepts complex query

    Testbed for information extraction from deep web

    No full text
    Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1]. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data. Categories and Subject Descriptors H.3.4 [Systems and Software]: Performance evaluation (efficiency and effectiveness
    corecore