10 research outputs found

    Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required. To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems. In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries. The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms. In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms. Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies

    Spinoff 2013

    Get PDF
    Topics covered include: Innovative Software Tools Measure Behavioral Alertness; Miniaturized, Portable Sensors Monitor Metabolic Health; Patient Simulators Train Emergency Caregivers; Solar Refrigerators Store Life-Saving Vaccines; Monitors Enable Medication Management in Patients' Homes; Handheld Diagnostic Device Delivers Quick Medical Readings; Experiments Result in Safer, Spin-Resistant Aircraft; Interfaces Visualize Data for Airline Safety, Efficiency; Data Mining Tools Make Flights Safer, More Efficient; NASA Standards Inform Comfortable Car Seats; Heat Shield Paves the Way for Commercial Space; Air Systems Provide Life Support to Miners; Coatings Preserve Metal, Stone, Tile, and Concrete; Robots Spur Software That Lends a Hand; Cloud-Based Data Sharing Connects Emergency Managers; Catalytic Converters Maintain Air Quality in Mines; NASA-Enhanced Water Bottles Filter Water on the Go; Brainwave Monitoring Software Improves Distracted Minds; Thermal Materials Protect Priceless, Personal Keepsakes; Home Air Purifiers Eradicate Harmful Pathogens; Thermal Materials Drive Professional Apparel Line; Radiant Barriers Save Energy in Buildings; Open Source Initiative Powers Real-Time Data Streams; Shuttle Engine Designs Revolutionize Solar Power; Procedure-Authoring Tool Improves Safety on Oil Rigs; Satellite Data Aid Monitoring of Nation's Forests; Mars Technologies Spawn Durable Wind Turbines; Programs Visualize Earth and Space for Interactive Education; Processor Units Reduce Satellite Construction Costs; Software Accelerates Computing Time for Complex Math; Simulation Tools Prevent Signal Interference on Spacecraft; Software Simplifies the Sharing of Numerical Models; Virtual Machine Language Controls Remote Devices; Micro-Accelerometers Monitor Equipment Health; Reactors Save Energy, Costs for Hydrogen Production; Cameras Monitor Spacecraft Integrity to Prevent Failures; Testing Devices Garner Data on Insulation Performance; Smart Sensors Gather Information for Machine Diagnostics; Oxygen Sensors Monitor Bioreactors and Ensure Health and Safety; Vision Algorithms Catch Defects in Screen Displays; and Deformable Mirrors Capture Exoplanet Data, Reflect Lasers

    NASA Tech Briefs, September 2006

    Get PDF
    Topics covered include: Improving Thermomechanical Properties of SiC/SiC Composites; Aerogel/Particle Composites for Thermoelectric Devices; Patches for Repairing Ceramics and Ceramic- Matrix Composites; Lower-Conductivity Ceramic Materials for Thermal-Barrier Coatings; An Alternative for Emergency Preemption of Traffic Lights; Vehicle Transponder for Preemption of Traffic Lights; Automated Announcements of Approaching Emergency Vehicles; Intersection Monitor for Traffic-Light-Preemption System; Full-Duplex Digital Communication on a Single Laser Beam; Stabilizing Microwave Frequency of a Photonic Oscillator; Microwave Oscillators Based on Nonlinear WGM Resonators; Pointing Reference Scheme for Free-Space Optical Communications Systems; High-Level Performance Modeling of SAR Systems; Spectral Analysis Tool 6.2 for Windows; Multi-Platform Avionics Simulator; Silicon-Based Optical Modulator with Ferroelectric Layer; Multiplexing Transducers Based on Tunnel-Diode Oscillators; Scheduling with Automated Resolution of Conflicts; Symbolic Constraint Maintenance Grid; Discerning Trends in Performance Across Multiple Events; Magnetic Field Solver; Computing for Aiming a Spaceborne Bistatic- Radar Transmitter; 4-Vinyl-1,3-Dioxolane-2-One as an Additive for Li-Ion Cells; Probabilistic Prediction of Lifetimes of Ceramic Parts; STRANAL-PMC Version 2.0; Micromechanics and Piezo Enhancements of HyperSizer; Single-Phase Rare-Earth Oxide/Aluminum Oxide Glasses; Tilt/Tip/Piston Manipulator with Base-Mounted Actuators; Measurement of Model Noise in a Hard-Wall Wind Tunnel; Loci-STREAM Version 0.9; The Synergistic Engineering Environment; Reconfigurable Software for Controlling Formation Flying; More About the Tetrahedral Unstructured Software System; Computing Flows Using Chimera and Unstructured Grids; Avoiding Obstructions in Aiming a High-Gain Antenna; Analyzing Aeroelastic Stability of a Tilt-Rotor Aircraft; Tracking Positions and Attitudes of Mars Rovers; Stochastic Evolutionary Algorithms for Planning Robot Paths; Compressible Flow Toolbox; Rapid Aeroelastic Analysis of Blade Flutter in Turbomachines; General Flow-Solver Code for Turbomachinery Applications; Code for Multiblock CFD and Heat-Transfer Computations; Rotating-Pump Design Code; Covering a Crucible with Metal Containing Channels; Repairing Fractured Bones by Use of Bioabsorbable Composites; Kalman Filter for Calibrating a Telescope Focal Plane; Electronic Absolute Cartesian Autocollimator; Fiber-Optic Gratings for Lidar Measurements of Water Vapor; Simulating Responses of Gravitational-Wave Instrumentation; SOFTC: A Software Correlator for VLBI; Progress in Computational Simulation of Earthquakes; Database of Properties of Meteors; Computing Spacecraft Solar-Cell Damage by Charged Particles; Thermal Model of a Current-Carrying Wire in a Vacuum; Program for Analyzing Flows in a Complex Network; Program Predicts Performance of Optical Parametric Oscillators; Processing TES Level-1B Data; Automated Camera Calibration; Tracking the Martian CO2 Polar Ice Caps in Infrared Images; Processing TES Level-2 Data; SmaggIce Version 1.8; Solving the Swath Segment Selection Problem; The Spatial Standard Observer; Less-Complex Method of Classifying MPSK; Improvement in Recursive Hierarchical Segmentation of Data; Using Heaps in Recursive Hierarchical Segmentation of Data; Tool for Statistical Analysis and Display of Landing Sites; Automated Assignment of Proposals to Reviewers; Array-Pattern-Match Compiler for Opportunistic Data Analysis; Pre-Processor for Compression of Multispectral Image Data; Compressing Image Data While Limiting the Effects of Data Losses; Flight Operations Analysis Tool; Improvement in Visual Target Tracking for a Mobile Robot; Software for Simulating Air Traffic; Automated Vectorization of Decision-Based Algorithms; Grayscale Optical Correlator Workbench; "One-Stop Shopping" for Ocean Remote-Sensing and Model Data; State Analysis Database Tool; Generating CAHV and CAHVOmages with Shadows in ROAMS; Improving UDP/IP Transmission Without Increasing Congestion; FORTRAN Versions of Reformulated HFGMC Codes; Program for Editing Spacecraft Command Sequences; Flight-Tested Prototype of BEAM Software; Mission Scenario Development Workbench; Marsviewer; Tool for Analysis and Reduction of Scientific Data; ASPEN Version 3.0; Secure Display of Space-Exploration Images; Digital Front End for Wide-Band VLBI Science Receiver; Multifunctional Tanks for Spacecraft; Lightweight, Segmented, Mostly Silicon Telescope Mirror; Assistant for Analyzing Tropical-Rain-Mapping Radar Data; and Anion-Intercalating Cathodes for High-Energy- Density Cells

    NASA Tech Briefs, April 2012

    Get PDF
    Topics include: Computational Ghost Imaging for Remote Sensing; Digital Architecture for a Trace Gas Sensor Platform; Dispersed Fringe Sensing Analysis - DFSA; Indium Tin Oxide Resistor-Based Nitric Oxide Microsensors; Gas Composition Sensing Using Carbon Nanotube Arrays; Sensor for Boundary Shear Stress in Fluid Flow; Model-Based Method for Sensor Validation; Qualification of Engineering Camera for Long-Duration Deep Space Missions; Remotely Powered Reconfigurable Receiver for Extreme Environment Sensing Platforms; Bump Bonding Using Metal-Coated Carbon Nanotubes; In Situ Mosaic Brightness Correction; Simplex GPS and InSAR Inversion Software; Virtual Machine Language 2.1; Multi-Scale Three-Dimensional Variational Data Assimilation System for Coastal Ocean Prediction; Pandora Operation and Analysis Software; Fabrication of a Cryogenic Bias Filter for Ultrasensitive Focal Plane; Processing of Nanosensors Using a Sacrificial Template Approach; High-Temperature Shape Memory Polymers; Modular Flooring System; Non-Toxic, Low-Freezing, Drop-In Replacement Heat Transfer Fluids; Materials That Enhance Efficiency and Radiation Resistance of Solar Cells; Low-Cost, Rugged High-Vacuum System; Static Gas-Charging Plug; Floating Oil-Spill Containment Device; Stemless Ball Valve; Improving Balance Function Using Low Levels of Electrical Stimulation of the Balance Organs; Oxygen-Methane Thruster; Lunar Navigation Determination System - LaNDS; Launch Method for Kites in Low-Wind or No-Wind Conditions; Supercritical CO2 Cleaning System for Planetary Protection and Contamination Control Applications; Design and Performance of a Wideband Radio Telescope; Finite Element Models for Electron Beam Freeform Fabrication Process Autonomous Information Unit for Fine-Grain Data Access Control and Information Protection in a Net-Centric System; Vehicle Detection for RCTA/ANS (Autonomous Navigation System); Image Mapping and Visual Attention on the Sensory Ego-Sphere; HyDE Framework for Stochastic and Hybrid Model-Based Diagnosis; and IMAGESEER - IMAGEs for Education and Research

    Indiana University’s advanced cyberinfrastructure in service of IU strategic goals: Activities of the Research Technologies Division of UITS and National Center for Genome Analysis Support – two Pervasive Technology Institute cyberinfrastructure and service centers - during FY2014

    Get PDF
    This report presents information on the activities of the Research Technologies Division of UITS and the National Center for Genome Analysis Support, two cyberinfrastructure and service centers of the Pervasive Technology Institute. Research Technologies (RT) is a subunit of University Information Technology Services (UITS) and it operates and supports the largest computational, data, and visualization systems at IU. The National Center for Genome Analysis Support (NCGAS) is primarily federally funded, serving the national community of genome scientists. NCGAS leadership is drawn from the Office of the Vice President for Information Technology, UITS, the College, and the School of Informatics and Computing. This report focuses on contributions of RT and NCGAS to accomplishment of IU’s bicentennial goals, and is organized according to those goals. Together the activities of NCGAS and RT represent a large share of the activities of PTI in support of the IU community. PTI’s Research Centers (Data to Insight Center, Digital Science Center, and the Center for Applies Cybersecurity Research) also provide support to the IU community in various forms but the primary focus of these research centers is informatics, information technology, and computer science research

    A formal architecture-centric and model driven approach for the engineering of science gateways

    Get PDF
    From n-Tier client/server applications, to more complex academic Grids, or even the most recent and promising industrial Clouds, the last decade has witnessed significant developments in distributed computing. In spite of this conceptual heterogeneity, Service-Oriented Architecture (SOA) seems to have emerged as the common and underlying abstraction paradigm, even though different standards and technologies are applied across application domains. Suitable access to data and algorithms resident in SOAs via so-called ‘Science Gateways’ has thus become a pressing need in order to realize the benefits of distributed computing infrastructures.In an attempt to inform service-oriented systems design and developments in Grid-based biomedical research infrastructures, the applicant has consolidated work from three complementary experiences in European projects, which have developed and deployed large-scale production quality infrastructures and more recently Science Gateways to support research in breast cancer, pediatric diseases and neurodegenerative pathologies respectively. In analyzing the requirements from these biomedical applications the applicant was able to elaborate on commonly faced issues in Grid development and deployment, while proposing an adapted and extensible engineering framework. Grids implement a number of protocols, applications, standards and attempt to virtualize and harmonize accesses to them. Most Grid implementations therefore are instantiated as superposed software layers, often resulting in a low quality of services and quality of applications, thus making design and development increasingly complex, and rendering classical software engineering approaches unsuitable for Grid developments.The applicant proposes the application of a formal Model-Driven Engineering (MDE) approach to service-oriented developments, making it possible to define Grid-based architectures and Science Gateways that satisfy quality of service requirements, execution platform and distribution criteria at design time. An novel investigation is thus presented on the applicability of the resulting grid MDE (gMDE) to specific examples and conclusions are drawn on the benefits of this approach and its possible application to other areas, in particular that of Distributed Computing Infrastructures (DCI) interoperability, Science Gateways and Cloud architectures developments

    Performance Modeling Codes for the QuakeSim Problem Solving Environment

    No full text
    corecore