57 research outputs found

    Polish Web resources described in the "Polish World" directory (1997). Characteristics of domains and their conservation state

    Get PDF
    For the purposes of this study, the print version of the Polish World directory by Martin Miszczak (Helion, 1997) was used to create an index of historical URLs and verify their current availability and presence in Web archives. The quantitative analysis of the index was prepared  to obtain the rank data on top-level domains (TLDs) and subdomains, while the language of pages published in domains other than .PL was also examined. This study uncovered a low current availability (21.77 per cent) of Polish World URIs with a 79.6 presence in Web archives (60.35 for addresses unreachable today). Forty-six per cent of the addresses from the directory were available on domains other than .PL, of which only 15.36 per cent had content in Polish. It would seem that in 1997, Polish Internet users were able to use Polish-centric resources, mostly already available through the Polish country domain. The 180 domain names with the .PL suffix uncovered during the study constitute around 20 per cent of .PL domain names active until at least the end of 1996 on the Web.W ramach badania wykorzystano drukowaną wersję katalogu Polish World Martina Miszczaka (wyd. Helion, 1997) w celu stworzenia indeksu historycznych adresów URL i zbadania ich współczesnej dostępności oraz obecności w archiwach Webu. Zasoby katalogu poddano analizie ilościowej pod kątem statystyki domen najwyższego rzędu i subdomen oraz zbadano języki stron publikowanych w domenie innej niż PL. Badanie ujawniło niską współczesną dostępność tych adresów (21.77 proc.) przy obecności kopii w archiwach Webu na poziomie 79.6 proc. (dla nieosiągalnych dziś adresów - 60.35 proc). 40.64 proc. adresów z katalogu dostępnych było na domenach innych niż PL, przy czym tylko 15.36 proc. z nich posiadało treść w języku polskim. Wydaje się, że w początkach 1997 roku polscy użytkownicy korzystać mogli z polskocentrycznych zasobów dostępnych już przede wszystkim w polskiej domenie krajowej. Wyodrębnione w trakcie badania 180 wspólnych nazw domenowych z domeny PL to około 20 proc. nazw domenowych PL aktywnych przynajmniej do końca 1996 roku w sieci WWW

    URL Decay at Year 20: A Research Note

    Get PDF
    All text is ephemeral. Some texts are more ephemeral than others. The web has proved to be among the most ephemeral and changing of information vehicles. The research note revisits Koehler's original data set after about 20 years since it was first collected. By late 2013, the number of URLs responding to a query had fallen to 1.6% of the original sample. A query of the 6 remaining URLs in February 2015 showed only 2 still responding

    Changes in Web Content in First 20 NIRF Ranking Institutes During 2010-19: an Analysis

    Get PDF
    Web content is an important source for education and research. At present it is a mandatory requirement for higher learning institutes of India to present information on their institutional home page. Due to dynamic nature of web content and increase use of emerging technology, the new ways of presenting information on higher education web sites become complex. In this paper, we try to study the changes in web content during last decade in first 20 NIRF ranking institute. The Internet Archive’s Wayback Machine has been used to get the web site update dates and the content of archived web pages

    Croatian online continuing resources at the beginning of the third millenium

    Get PDF
    U radu su prikazani rezultati istraživanja 452 publikacije hrvatske mrežne neomeđene građe identificirane u ISSN uredu za Hrvatsku u rasponu od 1998. do 2008. godine. Metodom statističke analize praćena su svojstva i promjene koje su primijećene na uzorku od njenog nastanka do travnja 2008. godine. Detaljno se analiziraju vrste građe, status, naslovi, adrese, jezik, nakladnik, mjesto objavljivanja, dostupnost, format, trajanje, izdanja na drugim medijima, znanstveno stručne publikacije te tehničke mogućnosti hrvatske mrežne neomeđene građe. Cilj je istraživanja iskoristiti rezultate za upoznavanje svojstava ove vrste građe, njezinog daljnjeg razvoja, preispitivanje mjerila odabira i vrednovanja mrežne neomeđene građe koji bi trebali poslužiti daljnjem razvoju knjižničnih postupaka.The paper presents the results of the research of 452 Croatian online continuing resources that have been identified in the ISSN Centre for Croatia from 1998 to 2008. Using the method of statistical analysis the characteristics and changes of the sample until April 2008 are analyzed and presented. The following aspects of the Croatian online continuing resources are analyzed in detail: types, status, titles, addresses, language, publisher, place of publishing, fee, format, half-life, other media editions, scientific and professional publications, and technical possibilities. The aim of the research is to use the research results for exploring and understanding the characteristics of the publications of this type, their further development, review the selection criteria and evaluation of the online continuing resources that should contribute to the future development of library procedures

    Uncovering the unarchived web

    Get PDF
    htmlabstractMany national and international heritage institutes realize the importance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national domain, or by crawling a pre-defined list of websites selected by the archiving institution. In either method, crawling results in more information being harvested than just the websites intended for preservation; which could be used to reconstruct impressions of pages that existed on the live web of the crawl date, but would have been lost forever. We present a method to create representations of what we will refer to as a web collection's (aura): the web documents that were not included in the archived collection, but are known to have existed --- due to their mentions on pages that were included in the archived web collection. To create representations of these unarchived pages, we exploit the information about the unarchived URLs that can be derived from the crawls by combining crawl date distribution, anchor text and link structure. We illustrate empirically that the size of the aura can be substantial: in 2012, the Dutch Web archive contained 12.3M unique pages, while we uncover references to 11.9M additional (unarchived) pages

    Uncovering the unarchived web

    Get PDF
    Many national and international heritage institutes realize the importance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national domain, or by crawling a pre-defined list of websites selected by the archiving institution. In either method, crawling results in more information being harvested than just the websites intended for preservation; which could be used to reconstruct impressions of pages that existed on the live web of the crawl date, but would have been lost forever. We present a method to create representations of what we will refer to as a web collection's (aura): the web documents that were not included in the archived collection, but are known to have existed --- due to their mentions on pages that were included in the archived web collection. To create representations of these unarchived pages, we exploit the information about the unarchived URLs that can be derived from the crawls by combining crawl date distribution, anchor text and link structure. We illustrate empirically that the size of the aura can be substantial: in 2012, the Dutch Web archive contained 12.3M unique pages, while we uncover references to 11.9M additional (unarchived) pages

    Open Science in Software Engineering

    Full text link
    Open science describes the movement of making any research artefact available to the public and includes, but is not limited to, open access, open data, and open source. While open science is becoming generally accepted as a norm in other scientific disciplines, in software engineering, we are still struggling in adapting open science to the particularities of our discipline, rendering progress in our scientific community cumbersome. In this chapter, we reflect upon the essentials in open science for software engineering including what open science is, why we should engage in it, and how we should do it. We particularly draw from our experiences made as conference chairs implementing open science initiatives and as researchers actively engaging in open science to critically discuss challenges and pitfalls, and to address more advanced topics such as how and under which conditions to share preprints, what infrastructure and licence model to cover, or how do it within the limitations of different reviewing models, such as double-blind reviewing. Our hope is to help establishing a common ground and to contribute to make open science a norm also in software engineering.Comment: Camera-Ready Version of a Chapter published in the book on Contemporary Empirical Methods in Software Engineering; fixed layout issue with side-note

    GenBank

    Get PDF
    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage ()
    corecore