26 research outputs found
HathiTrust Research Center: Computational Research on the HathiTrust Repository
PIs (exec mgt team): Beth A. Plale, Indiana University; Marshall Scott Poole, University of Illinois
Urbana-Champaign ; Robert McDonald, IU; John Unsworth (UIUC) Senior investigators: Loretta
Auvil (UIUC); Johan Bollen (IU), Randy Butler (UIUC); Dennis Cromwell (IU), Geoffrey Fox (IU),
Eileen Julien (IU), Stacy Kowalczyk (IU); Danny Powell (UIUC); Beth Sandore (UIUC); Craig
Stewart (IU); John Towns (UIUC); Carolyn Walters (IU), Michael Welge (UIUC); Eric Wernert
(IU
January 1 - December 31, 2012
This report summarizes training, education, and outreach activities for calendar 2012 of PTI and affiliated organizations, including the School of Informatics and Computing, Office of the Vice President for Information Technology, and Maurer School of Law. Reported activities include those led by PTI Research Centers (Center for Applied Cybersecurity Research, Center for Research in Extreme Scale Technologies, Data to Insight Center, Digital Science Center) and Service and Cyberinfrastructure Centers (Research Technologies Division of University Information Technology Services, National Center for Genome Assembly Support
What is Cyberinfrastructure?
Cyberinfrastructure is a word commonly used but lacking a single, precise definition. One recognizes intuitively the analogy with infrastructure, and the use of cyber to refer to thinking or computing – but what exactly is cyberinfrastructure as opposed to information technology infrastructure? Indiana University has
developed one of the more widely cited definitions of cyberinfrastructure:
"Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible." A second definition, more inclusive of scholarship generally and educational activities, has also been published and is useful in describing cyberinfrastructure: "Cyberinfrastructure consists of systems, data and information management, advanced instruments, visualization environments, and people, all linked together by software and advanced networks to improve scholarly productivity and enable knowledge breakthroughs and discoveries not otherwise possible." In this paper, we describe the origin of the term cyberinfrastructure based on the history of the root word infrastructure, discuss several terms related to cyberinfrastructure, and provide several examples of cyberinfrastructure
Application benchmark results for Big Red, an IBM e1350 BladeCenter Cluster
The purpose of this report is to present the results of benchmark tests with Big Red, an IBM e1350 BladeCenter Cluster. This report is particularly focused on providing details of system architecture and test run results in detail to allow for analysis in other reports and comparison with other systems, rather than presenting such analysis here
The First Provenance Challenge
The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions
Intelligent Systems for Geosciences: An Essential Research Agenda
A research agenda for intelligent systems that will result in fundamental new capabilities for understanding the Earth system. Many aspects of geosciences pose novel problems for intelligent systems research. Geoscience data is challenging because it tends to be uncertain, intermittent, sparse, multiresolution, and multiscale. Geosciences processes and objects often have amorphous spatiotemporal boundaries. The lack of ground truth makes model evaluation, testing, and comparison difficult. Overcoming these challenges requires breakthroughs that would significantly transform intelligent systems, while greatly benefitting the geosciences in turn
SOFTWARE APPROACH TO HAZARD DETECTION USING ON-LINE ANALYSIS OF SAFETY CONSTRAINTS BY
Safety critical systems are pervasive inmodernsociety. Financial systems, transportation sys-tems, medical record retrieval systems, and air tra c control systems are all could potentially threaten economic, property, or personal safety. However, this class of safety critical system is not amenable to the existing approaches to achieving safe software. Hence a new approach to software safety is needed � one that can accommodate continuous distributed systems that may be heterogeneous, may contain commercial o-the-shelf (COTS) components, and may consist of components not all of which were designed to be used in safety-critical settings. In response to this need, we have developed a software hazard detection tool that we argue in-creases the safety level of heterogeneous, continuous safety critical systems, in part by employing dynamic behavior to enhance the exibility and expand the potential for possible optimizations. The detection approach employs on-line, application-level monitoring to extract interesting be-havior from a large-scale system. It allows the user to specify complex, multi-source hazards using a query-like language � hazard queries are then transformed and applied against the event stream. Dynamic optimization and management of hazards is made possible by this on-line, language-based approach. i
Time-Based Data Streams: Fundamental Concepts for a Data Resource for Streams
Real time data, which we call data streams, are readings from instruments, environmental, bodily or building sensors that are generated at regular intervals and often, due to their volume, need to be processed in real time. Often a single pass is all that can be made on the data, and a decision to discard or keep the instance is made on the spot. Too, the stream is for all practical purposes indefinite, so decisions must be made on incomplete knowledge. This notion of data streams has a different set of issues from a file, for instance, that is byte streamed to a reader. The file is finite, so the byte stream is becomes a processing convenience more than a fundamentally different kind of data. Through the duration of the project we examined three aspects of streaming data: the first, techniques to handle streaming data in a distributed system organized as a collection of web services, the second, the notion of the dashboard and real time controllable analysis constructs in the context of the Fermi Tevatron Beam Position Monitor, and third and finally, we examined provenance collection of stream processing such as might occur as raw observational data flows from the source and undergoes correction, cleaning, and quality control. The impact of this work is severalfold. We were one of the first to advocate that streams had little value unless aggregated, and that notion is now gaining general acceptance. We were one of the first groups to grapple with the notion of provenance of stream data also