1,955 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Data Access for LIGO on the OSG

    Full text link
    During 2015 and 2016, the Laser Interferometer Gravitational-Wave Observatory (LIGO) conducted a three-month observing campaign. These observations delivered the first direct detection of gravitational waves from binary black hole mergers. To search for these signals, the LIGO Scientific Collaboration uses the PyCBC search pipeline. To deliver science results in a timely manner, LIGO collaborated with the Open Science Grid (OSG) to distribute the required computation across a series of dedicated, opportunistic, and allocated resources. To deliver the petabytes necessary for such a large-scale computation, our team deployed a distributed data access infrastructure based on the XRootD server suite and the CernVM File System (CVMFS). This data access strategy grew from simply accessing remote storage to a POSIX-based interface underpinned by distributed, secure caches across the OSG.Comment: 6 pages, 3 figures, submitted to PEARC1

    Replicability of data collected for empirical estimation of relative pollen productivity

    Get PDF
    The effects of repeated survey and fieldwork timing on data derived from a recently proposed standard field methodology for empirical estimation of relative pollen productivity (RPP) have been tested. Seasonal variations in vegetation and associated pollen assemblages were studied in three contrasting cultural habitat types; semi-natural ancient woodlands, lowland heaths, and unimproved, traditionally managed hay meadows. Results show that in woodlands and heathlands the standard method generates vegetation data with a reasonable degree of similarity throughout the field season, though in some instances additional recording of woodland canopy cover should be undertaken, and differences were greater for woodland understorey taxa than for arboreal taxa. Large differences in vegetation cover were observed over the field season in the grassland community, and matching the phenological timing of surveys within and between studies is clearly important if RPP estimates from these sites are to be comparable. Pollen assemblages from closely co-located moss polsters collected on different visits are shown to be variable in all communities, to a greater degree than can be explained by the sampling error associated with pollen counting, and further study of moss polsters as pollen traps is recommended

    The Earth System Grid Federation: Delivering globally accessible petascale data for CMIP5

    Get PDF
    The fifth Coupled Model Intercomparison Project (CMIP5) will involve the global production and analysis of petabytes of data. The Program for Climate Model Diagnosis and Intercomparison (PCMDI), with responsibility for archival for CMIP5, has established the global “Earth System Grid Federation” (ESGF) of data producers and data archives to support CMIP5. ESGF will provide a set of globally synchronised views of globally distributed data – including some large cache replicants which will be persisted for (at least) decades. Here we describe the archive requirements and key aspects of the resulting architecture. ESGF will stress international networks, as well as the data archives themselves – but significantly less than would have been the case of a centralised archive. Developing and deploying the ESGF has exploited good will and best efforts, but future developments are likely to require more formalised architecture and management

    Optimizing small mammal relative abundance measures using non-invasive sampling and assessment of its contribution to occupancy modelling of small carnivores in dry woodland savannah of South Africa

    Get PDF
    Tese de mestrado, Biologia da Conservação, Universidade de Lisboa, Faculdade de Ciências, 2019Os pequenos mamíferos são essenciais para a estrutura e funcionamento dos ecossistemas pelo papel que desempenham na dispersão de sementes, ciclo de nutrientes e como principal fonte de alimentos de diversos predadores de superfície e voadores. Enquanto presas, as flutuações na sua abundância de acordo com a disponibilidade de alimento e temperatura, influenciam os padrões de ocupação do espaço, as densidades e os padrões de atividade dos seus predadores. Contudo, e apesar de constituírem uma importante fonte de alimento para muitos pequenos carnívoros, os estudos acerca da distribuição destes são limitados pela não utilização de medidas de abundância das suas presas, em particular os roedores. Esta falha deve-se muito a limitações e constrangimentos dos métodos de amostragem para avaliação de abundância. O método de captura-recaptura por armadilhagem, largamente usado em estudos de roedores, além de invasivo requer um esforço amostral muito elevado, visto que as armadilhas devem ser verificadas duas vezes ao dia para evitar a morte indesejada de indivíduos, acarreta um elevado custo e necessita de licença de captura e manuseamento de animais selvagens. Estes fatores impedem o seu uso em estudos de larga escala, tais como estudos de distribuição de carnívoros. Como alternativa, os túneis de tinta são um método de amostragem que permite a estimativa de abundância relativa de pequenos mamíferos através das suas pegadas, sendo, portanto, não invasivo e evitando a necessidade de licença, têm reduzido baixo custo e, principalmente, são de fácil colocação e não requerem controlo diário. Sendo objetivo da presente tese avaliar os padrões de distribuição dos pequenos mamíferos numa savana seca da África do Sul, numa primeira fase este estudo procurou avaliar a eficácia da amostragem de roedores com túneis de tinta para estimar a sua abundância relativa, comparativamente a um índice de abundância relativa obtido por armadilhagem. Numa segunda fase testou-se a utilidade do uso do índice de pegadas obtido com túneis de tinta como medida de abundância de presas no estudo da distribuição de pequenos carnívoros. A amostragem dividiu-se assim em duas etapas. Numa primeira etapa, em 19 locais selecionados num gradiente de perturbação antropogénica (Reserva Natural de Phinda, fazendas e comunidades rurais), os túneis de tinta foram colocados num desenho em Y de 3 x 3 com 10 m de distância entre si, lado a lado com uma grelha de 7 x 7 armadilhas Sherman. Na segunda etapa, os túneis foram colocados com o mesmo desenho em redor de cada uma das 192 câmaras de foto-armadilhagem para os pequenos carnívoros, dispostas em grelha (1311 m de distância média entre câmaras), cobrindo o mesmo gradiente de perturbação. Considerando que os carnívoros podem mostrar preferência por presas de diferentes tamanhos de acordo com as suas necessidades energéticas, ainda que nas análises se tenham considerado os roedores no geral (variável presas), pegadas foram ainda divididas em grupos funcionais de acordo com o seu tamanho, refletindo consequentemente o tamanho dos roedores: roedores pequenos, médios e grandes. A partir desta divisão foi estimado o índice de pegadas por grupo funcional, que consiste numa proporção de túneis por local com pegadas de cada grupo. Ao comparar este índice com o índice de abundância obtido através da armadilhagem, foi possível observar uma forte correlação entre ambas as medidas, a qual é dependente da abundância local de roedores. Ou seja, o método é mais eficaz a capturar grandes diferenças na abundância relativa quando os roedores se encontram em elevada abundância, do que quando a sua abundância é reduzida. É de referir que os túneis de tinta não permitem uma estimativa rigorosa da abundância das populações, mas são úteis na monitorização de flutuações de abundância, permitindo uma comparação entre locais ou ao longo do tempo. Os resultados previamente obtidos sustentaram a aplicação do método dos túneis de tinta para avaliação da abundância relativa de presas no estudo de foto-armadilhagem de pequenos carnívoros ao longo do referido gradiente de perturbação antropogénica. Os índices de pegadas de roedores pequenos e médios, e de roedores em geral, foram incorporados no processo de modelação da ocupação pelos pequenos carnívoros, juntamente com variáveis de habitat e perturbação. Os roedores grandes não foram considerados devido ao reduzido número de deteções. Os resultados mostraram que para os carnívoros em estudo a abundância relativa dos roedores (global ou por grupo funcional) não é um fator decisivo na sua distribuição. A única exceção foi registada relativamente à geneta-malhada, na paisagem com mais alto nível de perturbação (comunidades rurais), mas apenas quando as variáveis de presas foram combinadas com variáveis de habitat e perturbação. Assim, é possível concluir que a importância de roedores depende da espécie e do contexto, podendo a sua utilização ser pouco relevante no estudo da distribuição de espécies de carnívoros generalistas, no entanto, deve ser considerada juntamente com variáveis de habitat em estudos de carnívoros especialistas em roedores ou cuja dieta integre uma elevada percentagem de roedores.Small mammals of the Order Rodentia represent a large portion of small carnivores’ diet, influencing their distribution, densities and activity patterns. However, small carnivore studies based on camera-trapping do not include small mammals’ relative abundance as prey covariate, mainly because of the large effort and cost associated with live-trapping at large scales. Alternatively, ink-tracking tunnels are a non-invasive, inexpensive and a low effort sampling method that can be used to monitor fluctuations in small mammals’ relative abundance across sites and time. I implemented ink-tracking tunnels in a y-design to understand its efficiency when compared to live-trapping and the utility of the track index as prey covariate in a carnivore distribution study across a landscape gradient of human disturbance. Tracks were successfully divided into functional groups according to track size and consequently rodents’ biomass. Track index of these groups was highly correlated with live-trapping abundance index, despite this correlation being abundance dependent, as the method is better at detecting large fluctuations of abundance when the group is very abundant than for low abundant species. I applied the track index of the rodent functional groups as prey covariates to a single species – single season occupancy model for African small carnivore species, along with habitat and disturbance as alternative covariates. Results showed no preference for prey size and neither were prey covariates important for most combinations of species and areas. The only exception was the large-spotted genet at the highest level of disturbance, but only when prey was combined with habitat and disturbance variables. Therefore, the importance of prey covariates is species and context dependent, and it can be discarded from generalist multi-species distribution studies. However, prey should be considered together with habitat variables in studies of carnivore species that are rodent specialists or that rodents represent a large percentage of their diet

    An autonomic framework for enhancing the quality of data grid services

    Get PDF
    Data grid services have been used to deal with the increasing needs of applications in terms of data volume and throughput. The large scale, heterogeneity and dynamism of grid environments often make management and tuning of these data services very complex. Furthermore, current high-performance I/O approaches are characterized by their high complexity and specific features that usually require specialized administrator skills. Autonomic computing can help manage this complexity. The present paper describes an autonomic subsystem intended to provide self-management features aimed at efficiently reducing the I/O problem in a grid environment, thereby enhancing the quality of service (QoS) of data access and storage services in the grid. Our proposal takes into account that data produced in an I/O system is not usually immediately required. Therefore, performance improvements are related not only to current but also to any future I/O access, as the actual data access usually occurs later on. Nevertheless, the exact time of the next I/O operations is unknown. Thus, our approach proposes a long-term prediction designed to forecast the future workload of grid components. This enables the autonomic subsystem to determine the optimal data placement to improve both current and future I/O operations

    A Framework for Downloading Wide-Area Files

    Get PDF
    The challenge of efficiently retrieving files that are broken into segments and replicated across the widearea is of prime importance to wide-area, peer-to-peer, and Grid file systems. Two different algorithms addressing this challenge have been proposed and evaluated. While both have been successful in different performance scenarios, there has been no unifying work that can view both algorithms under a single framework. In this thesis, we define such a framework, where download algorithms are defined in terms of the four dimensions that the client always controls: the number of simultaneous downloads, the degree of work replication, the failover strategy, and the server selection algorithm. We then explore the impact of varying parameters along each of these dimensions, testing the framework over several types of file distributions. In addition, the additional dependencies and trends that arise when files are augmented with erasure codes rather than replication are examined

    Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming

    Full text link
    Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC. The model enables efficient and easy distribution of input data files to computing nodes and gathering of output results from them. It eliminates the need for such manual tuning and makes the programming of large-scale clusters using a loosely coupled model easier. Our approach, inspired by in-memory approaches to collective operations for parallel programming, builds on fast local file systems to provide high-speed local file caches for parallel scripts, uses a broadcast approach to handle distribution of common input data, and uses efficient scatter/gather and caching techniques for input and output. We describe the design of the prototype model, its implementation on the Blue Gene/P supercomputer, and present preliminary measurements of its performance on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200
    corecore