Search CORE

2,698 research outputs found

A Survey on Array Storage, Query Languages, and Systems

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 19/02/2013
Field of study

Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

arXiv.org e-Print Archive

CiteSeerX

Shingled Magnetic Recording disks for Mass Storage Systems

Author: Le Quoc Minh
Publication venue: Scholar Commons
Publication date: 22/02/2019
Field of study

Disk drives have seen a dramatic increase in storage density over the last five decades, but to continue the growth seems difficult if not impossible because of physical limitations. One way to increase storage density is using a shingled magnetic recording (SMR) disk. Shingled writing is a promising technique that trades off the inability to update in-place for narrower tracks and thus a much higher data density. It is particularly appealing as it can be adopted while utilizing essentially the same physical recording mechanisms currently in use. Because of its manner of writing, an SMR disk would be unable to update a written track without overwriting neighboring tracks, potentially requiring the rewrite of all the tracks to the end of a band where the end of a band is an area left unwritten to allow for a non-overlapped final track. Random reads are still possible on such devices, but the handling of writes becomes particularly critical. In this manuscript, we first look at a variety of potential workloads, drawn from real-world traces, and evaluate their impact on SMR disk models. Later, we evaluate the behavior of SMR disks when used in an array configuration or when faced with heavily interleaved workloads. Specifically, we demonstrate the dramatically different effects that different workloads can have upon the opposing approaches of remapping and restoring blocks, and how write-heavy workloads can (under the right conditions, and contrary to intuition) result in a performance advantage for an SMR disk

Scholar Commons - Santa Clara University

FFCV: Accelerating Training by Removing Data Bottlenecks

Author: Engstrom Logan
Ilyas Andrew
Leclerc Guillaume
Madry Aleksander
Park Sung Min
Salman Hadi
Publication venue
Publication date: 21/06/2023
Field of study

We present FFCV, a library for easy and fast machine learning model training. FFCV speeds up model training by eliminating (often subtle) data bottlenecks from the training process. In particular, we combine techniques such as an efficient file storage format, caching, data pre-loading, asynchronous data transfer, and just-in-time compilation to (a) make data loading and transfer significantly more efficient, ensuring that GPUs can reach full utilization; and (b) offload as much data processing as possible to the CPU asynchronously, freeing GPU cycles for training. Using FFCV, we train ResNet-18 and ResNet-50 on the ImageNet dataset with competitive tradeoff between accuracy and training time. For example, we are able to train an ImageNet ResNet-50 model to 75\% in only 20 mins on a single machine. We demonstrate FFCV's performance, ease-of-use, extensibility, and ability to adapt to resource constraints through several case studies. Detailed installation instructions, documentation, and Slack support channel are available at https://ffcv.io/

arXiv.org e-Print Archive

Proceedings of the Workshop on Change of Representation and Problem Reformulation

Author: Lowry Michael R.
Publication venue
Publication date
Field of study

The proceedings of the third Workshop on Change of representation and Problem Reformulation is presented. In contrast to the first two workshops, this workshop was focused on analytic or knowledge-based approaches, as opposed to statistical or empirical approaches called 'constructive induction'. The organizing committee believes that there is a potential for combining analytic and inductive approaches at a future date. However, it became apparent at the previous two workshops that the communities pursuing these different approaches are currently interested in largely non-overlapping issues. The constructive induction community has been holding its own workshops, principally in conjunction with the machine learning conference. While this workshop is more focused on analytic approaches, the organizing committee has made an effort to include more application domains. We have greatly expanded from the origins in the machine learning community. Participants in this workshop come from the full spectrum of AI application domains including planning, qualitative physics, software engineering, knowledge representation, and machine learning

NASA Technical Reports Server

Technology Directions for the 21st Century, volume 1

Author: Botta Robert
Crimi Giles F.
McIntosh William
Verheggen Henry
Publication venue
Publication date
Field of study

For several decades, semiconductor device density and performance have been doubling about every 18 months (Moore's Law). With present photolithography techniques, this rate can continue for only about another 10 years. Continued improvement will need to rely on newer technologies. Transition from the current micron range for transistor size to the nanometer range will permit Moore's Law to operate well beyond 10 years. The technologies that will enable this extension include: single-electron transistors; quantum well devices; spin transistors; and nanotechnology and molecular engineering. Continuation of Moore's Law will rely on huge capital investments for manufacture as well as on new technologies. Much will depend on the fortunes of Intel, the premier chip manufacturer, which, in turn, depend on the development of mass-market applications and volume sales for chips of higher and higher density. The technology drivers are seen by different forecasters to include video/multimedia applications, digital signal processing, and business automation. Moore's Law will affect NASA in the areas of communications and space technology by reducing size and power requirements for data processing and data fusion functions to be performed onboard spacecraft. In addition, NASA will have the opportunity to be a pioneering contributor to nanotechnology research without incurring huge expenses

NASA Technical Reports Server

Networking high-end CAD systems based on PC/MS-DOS platforms

Author: Brinkmann Dagmar Antje
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1993
Field of study

The concept of today\u27s technology has been dropped. Everything is now either oobsolete or experimental. Yesterday\u27s technology is appealing only because it is tried-and-true and prices are reduced for clearance. Tomorrow\u27s technology is exciting, somewhat expensive and not well tested. In the field of architecture, where most firms are medium or small, having limited resources, the high cost initially required for a CAD installation was generally impossible to meet not too many years ago. From spreadsheets and CAD graphics to network file systems and distributed database management, the basic systems and application tools have matured to the point that the possibilities are now limited mainly by how creatively the architects can apply them. CAD systems on the market today are not so different from the systems of the mid 70s, except they have gone from hardware costing a hundred thousand dollar to PC based systems, costing under ten thousand dollars. Choices of hardware and software for CAD systems undergo continual changes in power and efficiency. There will come a point where upgrading will create more a deficiency rather than an augmentation of capability efficiency and overall function. Thus it becomes a major problem for the prospect buyer

Digital Commons @ New Jersey Institute of Technology (NJIT)

The Machete Number

Author: Freund David
Publication venue: Open Works
Publication date: 01/01/2013
Field of study

Knot theory is a branch of topology that deals with the structure and properties of links. Employing a variety of tools, including surfaces, graph theory, and polynomials, we develop and explore classical link invariants. From this foundation, we de fine two novel link invariants, braid height and machete number, and investigate their properties and connection to classical invariants

The College of Wooster