67,079 research outputs found

    Legal Deposit Web Archives and the Digital Humanities: a Universe of Lost Opportunity?

    Get PDF
    Legal deposit libraries have archived the web for over a decade. Several nations, supported by legal deposit regu-lations, have introduced comprehensive national domain web crawling, an essential part of the national library re-mit to collect, preserve and make accessible a nation’s intellectual and cultural heritage (Brazier, 2016). Scholars have traditionally been the chief beneficiaries of legal de-posit collections: in the case of web archives, the poten-tial for research extends to contemporary materials, and to Digital Humanities text and data mining approaches. To date, however, little work has evaluated whether legal deposit regulations support computational approaches to research using national web archive data (BrĂŒgger, 2012; Hockx-Yu, 2014; Black, 2016). This paper examines the impact of electronic legal deposit (ELD) in the United Kingdom, particularly how the 2013 regulations influence innovative scholarship using the Legal Deposit UK Web Archive. As the first major case study to analyse the implementation of ELD, it will ad-dress the following key research questions:‱ ‱ Is legal deposit, a concept defined and refined for print materials, the most suitable vehicle for suppor-ting DH research using web archives? ‱ How does the current framing of ELD affect digital in-novation in the UK library sector? ‱ How does the current information ecology, including not for-profit archives, influence the relationship between DH researchers and legal deposit libraries

    HyperText Corpus Initiative : how to help researchers sieving the web?

    Get PDF
    Since its foundation in May 2009, the mĂ©dialab Sciences Po works to foster the use of digital methods and tools in social sciences. With the help of existing tools and methods, we experienced the use of web mining techniques to extract data on collective phenomena. We also attended the symposiums organised by the two institutions responsible of web archiving in France: BnF and INA where we learnt about the difficulties posed to social scientists by the use of web archives. Actually our own experience in mining the live web wasn’t easier. Such difficulties, we believe, can be explained by the lack of tools allowing scholars to build themselves the highly specialized corpora they need from the wide heterogeneity of the web. The web isn’t a well-known document space for scholars or librarians. Its hyperlinked and heterogeneous nature requires to envision new ways of conceiving and building web corpora. And this notion of web corpus is a necessity for both live and archived web. If methods are not appropriate enough for analysing the live web, the problem will not be easier on an archive where the time dimension adds complexity

    MassMine: Collecting and Archiving Big Data for Social Media Humanities Researchers

    Get PDF
    The MassMine project team representing participants from the Department of English, George A. Smathers Libraries (Libraries), and Research Computing at the University of Florida (UF) requests $60,000 to finish the version 1.0 release, develop a robust training program, and promote the MassMine open source software. MassMine enables researchers to collect their own social media data archives and supports data mining, thus providing free access to big data for academic inquiry. MassMine further supports researchers in creating and defining methods and measures for analyzing cultural and localized trends, and developing humanities research questions and data mining practices. The primary aims of this project are to: 1) refine the MassMine tools to support collection, acquisition, and use of available social media and web data; and, 2) develop a training program and corresponding online resources for supporting the broad use of MassMine by humanities researchers, regardless of experience

    Mining the ESO WFI and INT WFC archives for known Near Earth Asteroids. Mega-Precovery software

    Full text link
    The ESO/MPG WFI and the INT WFC wide field archives comprising 330,000 images were mined to search for serendipitous encounters of known Near Earth Asteroids (NEAs) and Potentially Hazardous Asteroids (PHAs). A total of 152 asteroids (44 PHAs and 108 other NEAs) were identified using the PRECOVERY software, their astrometry being measured on 761 images and sent to the Minor Planet Centre. Both recoveries and precoveries were reported, including prolonged orbital arcs for 18 precovered objects and 10 recoveries. We analyze all new opposition data by comparing the orbits fitted before and after including our contributions. We conclude the paper presenting Mega-Precovery, a new online service focused on data mining of many instrument archives simultaneously for one or a few given asteroids. A total of 28 instrument archives have been made available for mining using this tool, adding together about 2.5 million images forming the Mega-Archive.Comment: Accepted for publication in Astronomische Nachrichten (Sep 2012

    Mining Knowledge in Astrophysical Massive Data Sets

    Full text link
    Modern scientific data mainly consist of huge datasets gathered by a very large number of techniques and stored in very diversified and often incompatible data repositories. More in general, in the e-science environment, it is considered as a critical and urgent requirement to integrate services across distributed, heterogeneous, dynamic "virtual organizations" formed by different resources within a single enterprise. In the last decade, Astronomy has become an immensely data rich field due to the evolution of detectors (plates to digital to mosaics), telescopes and space instruments. The Virtual Observatory approach consists into the federation under common standards of all astronomical archives available worldwide, as well as data analysis, data mining and data exploration applications. The main drive behind such effort being that once the infrastructure will be completed, it will allow a new type of multi-wavelength, multi-epoch science which can only be barely imagined. Data Mining, or Knowledge Discovery in Databases, while being the main methodology to extract the scientific information contained in such MDS (Massive Data Sets), poses crucial problems since it has to orchestrate complex problems posed by transparent access to different computing environments, scalability of algorithms, reusability of resources, etc. In the present paper we summarize the present status of the MDS in the Virtual Observatory and what is currently done and planned to bring advanced Data Mining methodologies in the case of the DAME (DAta Mining & Exploration) project.Comment: Pages 845-849 1rs International Conference on Frontiers in Diagnostics Technologie
    • 

    corecore