11,143 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Preparing Laboratory and Real-World EEG Data for Large-Scale Analysis: A Containerized Approach.
Large-scale analysis of EEG and other physiological measures promises new insights into brain processes and more accurate and robust brain-computer interface models. However, the absence of standardized vocabularies for annotating events in a machine understandable manner, the welter of collection-specific data organizations, the difficulty in moving data across processing platforms, and the unavailability of agreed-upon standards for preprocessing have prevented large-scale analyses of EEG. Here we describe a "containerized" approach and freely available tools we have developed to facilitate the process of annotating, packaging, and preprocessing EEG data collections to enable data sharing, archiving, large-scale machine learning/data mining and (meta-)analysis. The EEG Study Schema (ESS) comprises three data "Levels," each with its own XML-document schema and file/folder convention, plus a standardized (PREP) pipeline to move raw (Data Level 1) data to a basic preprocessed state (Data Level 2) suitable for application of a large class of EEG analysis methods. Researchers can ship a study as a single unit and operate on its data using a standardized interface. ESS does not require a central database and provides all the metadata data necessary to execute a wide variety of EEG processing pipelines. The primary focus of ESS is automated in-depth analysis and meta-analysis EEG studies. However, ESS can also encapsulate meta-information for the other modalities such as eye tracking, that are increasingly used in both laboratory and real-world neuroimaging. ESS schema and tools are freely available at www.eegstudy.org and a central catalog of over 850 GB of existing data in ESS format is available at studycatalog.org. These tools and resources are part of a larger effort to enable data sharing at sufficient scale for researchers to engage in truly large-scale EEG analysis and data mining (BigEEG.org)
Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating and Analyzing Large-Scale e-Commerce Data
Widespread e-commerce activity on the Internet has led to new opportunities
to collect vast amounts of micro-level market and nonmarket data. In this paper
we share our experiences in collecting, validating, storing and analyzing large
Internet-based data sets in the area of online auctions, music file sharing and
online retailer pricing. We demonstrate how such data can advance knowledge by
facilitating sharper and more extensive tests of existing theories and by
offering observational underpinnings for the development of new theories. Just
as experimental economics pushed the frontiers of economic thought by enabling
the testing of numerous theories of economic behavior in the environment of a
controlled laboratory, we believe that observing, often over extended periods
of time, real-world agents participating in market and nonmarket activity on
the Internet can lead us to develop and test a variety of new theories.
Internet data gathering is not controlled experimentation. We cannot randomly
assign participants to treatments or determine event orderings. Internet data
gathering does offer potentially large data sets with repeated observation of
individual choices and action. In addition, the automated data collection holds
promise for greatly reduced cost per observation. Our methods rely on
technological advances in automated data collection agents. Significant
challenges remain in developing appropriate sampling techniques integrating
data from heterogeneous sources in a variety of formats, constructing
generalizable processes and understanding legal constraints. Despite these
challenges, the early evidence from those who have harvested and analyzed large
amounts of e-commerce data points toward a significant leap in our ability to
understand the functioning of electronic commerce.Comment: Published at http://dx.doi.org/10.1214/088342306000000231 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Addressing the Challenges in Federating Edge Resources
This book chapter considers how Edge deployments can be brought to bear in a
global context by federating them across multiple geographic regions to create
a global Edge-based fabric that decentralizes data center computation. This is
currently impractical, not only because of technical challenges, but is also
shrouded by social, legal and geopolitical issues. In this chapter, we discuss
two key challenges - networking and management in federating Edge deployments.
Additionally, we consider resource and modeling challenges that will need to be
addressed for a federated Edge.Comment: Book Chapter accepted to the Fog and Edge Computing: Principles and
Paradigms; Editors Buyya, Sriram
- …