3,994 research outputs found
Enabling On-Demand Database Computing with MIT SuperCloud Database Management System
The MIT SuperCloud database management system allows for rapid creation and
flexible execution of a variety of the latest scientific databases, including
Apache Accumulo and SciDB. It is designed to permit these databases to run on a
High Performance Computing Cluster (HPCC) platform as seamlessly as any other
HPCC job. It ensures the seamless migration of the databases to the resources
assigned by the HPCC scheduler and centralized storage of the database files
when not running. It also permits snapshotting of databases to allow
researchers to experiment and push the limits of the technology without
concerns for data or productivity loss if the database becomes unstable.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing (HPEC)
conference 2015. arXiv admin note: text overlap with arXiv:1406.492
Lustre, Hadoop, Accumulo
Data processing systems impose multiple views on data as it is processed by
the system. These views include spreadsheets, databases, matrices, and graphs.
There are a wide variety of technologies that can be used to store and process
data through these different steps. The Lustre parallel file system, the Hadoop
distributed file system, and the Accumulo database are all designed to address
the largest and the most challenging data storage problems. There have been
many ad-hoc comparisons of these technologies. This paper describes the
foundational principles of each technology, provides simple models for
assessing their capabilities, and compares the various technologies on a
hypothetical common cluster. These comparisons indicate that Lustre provides 2x
more storage capacity, is less likely to loose data during 3 simultaneous drive
failures, and provides higher bandwidth on general purpose workloads. Hadoop
can provide 4x greater read bandwidth on special purpose workloads. Accumulo
provides 10,000x lower latency on random lookups than either Lustre or Hadoop
but Accumulo's bulk bandwidth is 10x less. Significant recent work has been
done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo
to be combined in different ways.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing
conference, Waltham, MA, 201
Guide on the Side and LibWizard Tutorials side-by-side: How do the two platforms for split-screen online tutorials compare?
Split-screen tutorials are an appealing and effective way for libraries to create online learning objects where learners interact with real-time web content. Many libraries are using the University of Arizona’s award-winning, open source platform, Guide on the Side; in 2016, Springshare released a proprietary alternative, LibWizard Tutorials. This article reviews the advantages and limitations of this kind of tutorial. It also examines the differences between each platform’s distinctive characteristics. These platforms create similar split-screen tutorials, but have differences that affect diverse aspects of installation, administration, authoring and editing, student learning, data management, and accessibility. Libraries now have the opportunity to consider and compare alternative platforms and decide which one is best suited to their needs, priorities and resources
Popular and/or Prestigious? Measures of Scholarly Esteem
Citation analysis does not generally take the quality of citations into
account: all citations are weighted equally irrespective of source. However, a
scholar may be highly cited but not highly regarded: popularity and prestige
are not identical measures of esteem. In this study we define popularity as the
number of times an author is cited and prestige as the number of times an
author is cited by highly cited papers. Information Retrieval (IR) is the test
field. We compare the 40 leading researchers in terms of their popularity and
prestige over time. Some authors are ranked high on prestige but not on
popularity, while others are ranked high on popularity but not on prestige. We
also relate measures of popularity and prestige to date of Ph.D. award, number
of key publications, organizational affiliation, receipt of prizes/honors, and
gender.Comment: 26 pages, 5 figure
A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance
This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics
- …