12,793 research outputs found

    Python bindings for the open source electromagnetic simulator Meep

    Get PDF
    Meep is a broadly used open source package for finite-difference time-domain electromagnetic simulations. Python bindings for Meep make it easier to use for researchers and open promising opportunities for integration with other packages in the Python ecosystem. As this project shows, implementing Python-Meep offers benefits for specific disciplines and for the wider research community

    The archive solution for distributed workflow management agents of the CMS experiment at LHC

    Full text link
    The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate O\mathcal{O}(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.Comment: This is a pre-print of an article published in Computing and Software for Big Science. The final authenticated version is available online at: https://doi.org/10.1007/s41781-018-0005-

    High-Performance Multi-Mode Ptychography Reconstruction on Distributed GPUs

    Full text link
    Ptychography is an emerging imaging technique that is able to provide wavelength-limited spatial resolution from specimen with extended lateral dimensions. As a scanning microscopy method, a typical two-dimensional image requires a number of data frames. As a diffraction-based imaging technique, the real-space image has to be recovered through iterative reconstruction algorithms. Due to these two inherent aspects, a ptychographic reconstruction is generally a computation-intensive and time-consuming process, which limits the throughput of this method. We report an accelerated version of the multi-mode difference map algorithm for ptychography reconstruction using multiple distributed GPUs. This approach leverages available scientific computing packages in Python, including mpi4py and PyCUDA, with the core computation functions implemented in CUDA C. We find that interestingly even with MPI collective communications, the weak scaling in the number of GPU nodes can still remain nearly constant. Most importantly, for realistic diffraction measurements, we observe a speedup ranging from a factor of 1010 to 10310^3 depending on the data size, which reduces the reconstruction time remarkably from hours to typically about 1 minute and is thus critical for real-time data processing and visualization.Comment: work presented in NYSDS 201

    A Data Science Course for Undergraduates: Thinking with Data

    Get PDF
    Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be non-traditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students with a variety of techniques to analyze small, neat, and clean data sets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that is considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms.Comment: 21 pages total including supplementary material

    SPARCS: Stream-processing architecture applied in real-time cyber-physical security

    Get PDF
    In this paper, we showcase a complete, end-To-end, fault tolerant, bandwidth and latency optimized architecture for real time utilization of data from multiple sources that allows the collection, transport, storage, processing, and display of both raw data and analytics. This architecture can be applied for a wide variety of applications ranging from automation/control to monitoring and security. We propose a practical, hierarchical design that allows easy addition and reconfiguration of software and hardware components, while utilizing local processing of data at sensor or field site ('fog computing') level to reduce latency and upstream bandwidth requirements. The system supports multiple fail-safe mechanisms to guarantee the delivery of sensor data. We describe the application of this architecture to cyber-physical security (CPS) by supporting security monitoring of an electric distribution grid, through the collection and analysis of distribution-grid level phasor measurement unit (PMU) data, as well as Supervisory Control And Data Acquisition (SCADA) communication in the control area network
    corecore