1,453 research outputs found

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Information Outlook, September/October 2019

    Get PDF
    Volume 23, Issue 5https://scholarworks.sjsu.edu/sla_io_2019/1004/thumbnail.jp

    BIG DATA AND ANALYTICS AS A NEW FRONTIER OF ENTERPRISE DATA MANAGEMENT

    Get PDF
    Big Data and Analytics (BDA) promises significant value generation opportunities across industries. Even though companies increase their investments, their BDA initiatives fall short of expectations and they struggle to guarantee a return on investments. In order to create business value from BDA, companies must build and extend their data-related capabilities. While BDA literature has emphasized the capabilities needed to analyze the increasing volumes of data from heterogeneous sources, EDM researchers have suggested organizational capabilities to improve data quality. However, to date, little is known how companies actually orchestrate the allocated resources, especially regarding the quality and use of data to create value from BDA. Considering these gaps, this thesis – through five interrelated essays – investigates how companies adapt their EDM capabilities to create additional business value from BDA. The first essay lays the foundation of the thesis by investigating how companies extend their Business Intelligence and Analytics (BI&A) capabilities to build more comprehensive enterprise analytics platforms. The second and third essays contribute to fundamental reflections on how organizations are changing and designing data governance in the context of BDA. The fourth and fifth essays look at how companies provide high quality data to an increasing number of users with innovative EDM tools, that are, machine learning (ML) and enterprise data catalogs (EDC). The thesis outcomes show that BDA has profound implications on EDM practices. In the past, operational data processing and analytical data processing were two “worlds” that were managed separately from each other. With BDA, these "worlds" are becoming increasingly interdependent and organizations must manage the lifecycles of data and analytics products in close coordination. Also, with BDA, data have become the long-expected, strategically relevant resource. As such data must now be viewed as a distinct value driver separate from IT as it requires specific mechanisms to foster value creation from BDA. BDA thus extends data governance goals: in addition to data quality and regulatory compliance, governance should facilitate data use by broadening data availability and enabling data monetization. Accordingly, companies establish comprehensive data governance designs including structural, procedural, and relational mechanisms to enable a broad network of employees to work with data. Existing EDM practices therefore need to be rethought to meet the emerging BDA requirements. While ML is a promising solution to improve data quality in a scalable and adaptable way, EDCs help companies democratize data to a broader range of employees

    Bibliographic Control in the Digital Ecosystem

    Get PDF
    With the contributions of international experts, the book aims to explore the new boundaries of universal bibliographic control. Bibliographic control is radically changing because the bibliographic universe is radically changing: resources, agents, technologies, standards and practices. Among the main topics addressed: library cooperation networks; legal deposit; national bibliographies; new tools and standards (IFLA LRM, RDA, BIBFRAME); authority control and new alliances (Wikidata, Wikibase, Identifiers); new ways of indexing resources (artificial intelligence); institutional repositories; new book supply chain; “discoverability” in the IIIF digital ecosystem; role of thesauri and ontologies in the digital ecosystem; bibliographic control and search engines

    Strategies of development and maintenance in supervision, control, synchronization, data acquisition and processing in light sources

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] Os aceleradores de partículas e fontes de luz sincrotrón, evolucionan constantemente para estar na vangarda da tecnoloxía, levando os límites cada vez mais lonxe para explorar novos dominios e universos. Os sistemas de control son unha parte crucial desas instalacións científicas e buscan logra-la flexibilidade de manobra para poder facer experimentos moi variados, con configuracións diferentes que engloban moitos tipos de detectores, procedementos, mostras a estudar e contornas. As propostas de experimento son cada vez måis ambiciosas e van sempre un paso por diante do establecido. Precísanse detectores cada volta måis råpidos e eficientes, con måis ancho de banda e con måis resolución. Tamén é importante a operación simultånea de varios detectores tanto escalares como mono ou bidimensionåis, con mecanismos de sincronización de precisión que integren as singularidades de cada un. Este traballo estuda as solucións existentes no campo dos sistemas de control e adquisición de datos nos aceleradores de partículas e fontes de luz e raios X, ó tempo que explora novos requisitos e retos no que respecta å sincronización e velocidade de adquisición de datos para novos experimentos, a optimización do deseño, soporte, xestión de servizos e custos de operación. Tamén se estudan diferentes solucións adaptadas a cada contorna.[Resumen] Los aceleradores de partículas y fuentes de luz sincrotrón, evolucionan constantemente para estar en la vanguardia de la tecnología, y poder explorar nuevos dominios. Los sistemas de control son una parte fundamental de esas instalaciones científicas y buscan lograr la måxima flexibilidad para poder llevar a cabo experimentos mås variados, con configuraciones diferentes que engloban varios tipos de detectores, procedimientos, muestras a estudiar y entornos. Los experimentos se proponen cada vez mås ambiciosos y en ocasiones mås allå de los límites establecidos. Se necesitan detectores cada vez mås råpidos y eficientes, con mås resolución y ancho de banda, que puedan sincronizarse simultåneamente con otros detectores tanto escalares como mono y bidimensionales, integrando las singularidades de cada uno y homogeneizando la adquisición de datos. Este trabajo estudia los sistemas de control y adquisición de datos de aceleradores de partículas y fuentes de luz y rayos X, y explora nuevos requisitos y retos en lo que respecta a la sincronización y velocidad de adquisición de datos, optimización y costo-eficiencia en el diseño, operación soporte, mantenimiento y gestión de servicios. También se estudian diferentes soluciones adaptadas a cada entorno.[Abstract] Particle accelerators and photon sources are constantly evolving, attaining the cutting-edge technologies to push the limits forward and explore new domains. The control systems are a crucial part of these installations and are required to provide flexible solutions to the new challenging experiments, with different kinds of detectors, setups, sample environments and procedures. Experiment proposals are more and more ambitious at each call and go often a step beyond the capabilities of the instrumentation. Detectors shall be faster, with higher efficiency, more resolution, more bandwidth and able to synchronize with other detectors of all kinds; scalars, one or two-dimensional, taking into account their singularities and homogenizing the data acquisition. This work examines the control and data acquisition systems for particle accelerators and X- ray / light sources and explores new requirements and challenges regarding synchronization and data acquisition bandwidth, optimization and cost-efficiency in the design / operation / support. It also studies different solutions depending on the environment
    • 

    corecore