21 research outputs found
Shibboleth as a Tool for Authorized Access Control to the Subversion Repository System
Shibboleth is an architecture and protocol for allowing users to authenticate and be authorized to use a remote resource by logging into the identity management system that is maintained at their home institution. With Shibboleth, a federation of institutions can share resources among users and yet allow the administration of both the user access control to resources and the user identity and attribute information to be performed at the hosting or home institution. Subversion is a version control repository system that allows the creation of fine-grained permissions to files and directories. In this project an infrastructure, Shibbolized Subversion, has been created that consists of a Subversion repository with an Apache web interface that is protected by a Shibboleth authentication system. The infrastructure can allow authorized and authenticated data sharing between institutions yet retains simplicity and protects privacy for users. In addition, it also relieves local administrators from the task of having to perform extra account management for users from other institutions. This paper describes the Shibboleth and Subversion systems, the implementation of the file sharing infrastructure, and issues of attribute maintenance, privacy and security
Maneuverable Applications: Advancing Distributed Computing
Extending the military principle of maneuver into the war-fighting domain of cyberspace, academic and military researchers have produced many theoretical and strategic works, though few have focused on researching the applications and systems that apply this principle. We present a survey of our research in developing new architectures for the enhancement of parallel and distributed applica-tions. Specifically, we discuss our work in applying the military concept of maneuver in the cyberspace domain by creating a set of applications and systems called “ma-neuverable applications.” Our research investigates resource provisioning, application optimization, and cybersecurity enhancement through the modification, relocation, addition or removal of computing resources.
We first describe our work to create a system to provision a big data computational re-source within academic environments. Secondly, we present a computing testbed built to allow researchers to study network optimizations of data centers. Thirdly, we discuss our Petri Net model of an adaptable system, which increases its cyber security posture in the face of varying levels of threat from malicious actors. Finally, we present evidence that traditional ideas about extending maneuver into cyberspace focus on security only, but computing can benefit from maneuver in multiple manners beyond security
Teaching HDFS/MapReduce Systems Concepts to Undergraduates
This paper presents the development of a Hadoop MapReduce module that has been taught in a course in distributed computing to upper undergraduate computer science students at Clemson University. The paper describes our teaching experiences and the feedback from the students over several semesters that have helped to shape the course. We provide suggested best practices for lecture materials, the computing platform, and the teaching methods. In addition, the computing platform and teaching methods can be extended to accommodate emerging technologies and modules for related courses
Teaching HDFS/MapReduce Systems Concepts to Undergraduates
This paper presents the development of a Hadoop MapReduce module that has been taught in a course in distributed computing to upper undergraduate computer science students at Clemson University. The paper describes our teaching experiences and the feedback from the students over several semesters that have helped to shape the course. We provide suggested best practices for lecture materials, the computing platform, and the teaching methods. In addition, the computing platform and teaching methods can be extended to accommodate emerging technologies and modules for related courses
Teaching HDFS/MapReduce Systems Concepts to Undergraduates
This paper presents the development of a Hadoop MapReduce module that has been taught in a course in distributed computing to upper undergraduate computer science students at Clemson University. The paper describes our teaching experiences and the feedback from the students over several semesters that have helped to shape the course. We provide suggested best practices for lecture materials, the computing platform, and the teaching methods. In addition, the computing platform and teaching methods can be extended to accommodate emerging technologies and modules for related courses
Random Access in Nondelimited Variable-length Record Collections for Parallel Reading with Hadoop
The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge for parallel analysis of large networks, where packet traces can be many gigabytes in size. In this work we present RAPCAP, a novel method for random access into variable-length record collections like PCAP by identifying a record boundary within a small number of bytes of the access point. Unlike related heuristic methods that can limit scalability with a nonzero probability of error, the new method offers a correctness guarantee with a well formed file and does not rely on prior knowledge of the contents. We include a practical implementation of the algorithm with an extension to the Hadoop framework, and a performance comparison to serial ingestion. Finally, we present a number of similar storage types that could utilize a modified version of RAPCAP for random access
Synthetic Image Data for Deep Learning
Realistic synthetic image data rendered from 3D models can be used to augment
image sets and train image classification semantic segmentation models. In this
work, we explore how high quality physically-based rendering and domain
randomization can efficiently create a large synthetic dataset based on
production 3D CAD models of a real vehicle. We use this dataset to quantify the
effectiveness of synthetic augmentation using U-net and Double-U-net models. We
found that, for this domain, synthetic images were an effective technique for
augmenting limited sets of real training data. We observed that models trained
on purely synthetic images had a very low mean prediction IoU on real
validation images. We also observed that adding even very small amounts of real
images to a synthetic dataset greatly improved accuracy, and that models
trained on datasets augmented with synthetic images were more accurate than
those trained on real images alone. Finally, we found that in use cases that
benefit from incremental training or model specialization, pretraining a base
model on synthetic images provided a sizeable reduction in the training cost of
transfer learning, allowing up to 90\% of the model training to be
front-loaded
Teaching HDFS/MapReduce Systems Concepts to Undergraduates
This paper presents the development of a Hadoop MapReduce module that has been taught in a course in distributed computing to upper undergraduate computer science students at Clemson University. The paper describes our teaching experiences and the feedback from the students over several semesters that have helped to shape the course. We provide suggested best practices for lecture materials, the computing platform, and the teaching methods. In addition, the computing platform and teaching methods can be extended to accommodate emerging technologies and modules for related courses
Measuring the Effects of Thread Placement on the Kendall Square KSR1
This paper describes a measurement study of the effects of thread placement on memory access times on the Kendall Square multiprocessor, the KSRl. The KSRl uses a conventional shared memory programming model in a distributed memory architecture. The architecture is based on a ring of rings of 64-bit superscalar microprocessors. The KSRl has a Cache-Only Memory Architecture (COMA). Memory consists of the local cache memoria attached to each processor. Whenever an address is accessed, the data item is automatically copied to the local cache memory module, 80 that access times for subsequent references will be minimal. If a local cache has space allocated for a particular data item, but does not have a current valid copy of that data item, then it is possible for the cache to acquire a valid read-only copy before it is requested by the local processor due to a request by a different processor that happens to pass by on the ring. This automatic prefetching can greatly reduce the average time for a thread to acquire data items. Because of the automatic prefetching, the time required to obtain a valid copy of a data item does not depend simply on the distance from the owner of the data item, but also depends on the placement and number of other processing threads which ehare the same data item. Also, the strategic placement of processing threads helps programs take advantage of the unique features of the memory architecture which help eliminate memory access bottlenecks for shared data sets. Experiments run on the KSRl across a wide variety of thread configurations show that shared memory access is accelerated through strategic placement of threads which share data. The results indicate strategies for improving the performance of applications programs, and illustrate that KSRl memory access times can remain nearly constant even when the number of participating threads increases