73 research outputs found
A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients’ predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm's parameters and data-related modeling choices are also both crucial and challenging. In this paper we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. Extensive experiments show that our approach can effectively highlight the most promising and performant missing-data handling strategy for our case study. Moreover, our methodology allowed a better understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types
Datapath-oriented FPGA Mapping and Placement for Configurable Computing (Extended Abstract)
Timothy J. Callahan and John Wawrzynek University of California--Berkeley Widespread acceptance of FPGA-based reconfigurable coprocessors will be expedited if compilation time for FPGA configurations can be reduced to be comparable to software compilation. This research achieves this goal, generating complete datapath layouts in fractions of a second rather than hours. Our algorithm, adapted from instruction selection in compilers, packs multiple operations into single rows of CLBs when possible, while preserving a regular bit-slice layout. Furthermore, placement and thus routing delays are considered simultaneously with packing, so that the total delay, not just the CLB delay, is optimized. The Problem Reconfigurable coprocessors, most commonly implemented with field programmable gate array (FPGA) technology, have been shown effective in accelerating certain classes of applications. Computation-intense kernels can be selected automatically or by hand for acceleration using the coproc..
Recommended from our members
Modeling Storm Water Runoff and Soil Interflow in a Managed Forest, Upper Coastal Plain of the Southeast US.
The Forest Service-Savannah River is conducting a hectare-scale monitoring and modeling study on forest productivity in a Short Rotation Woody Crop plantation at the Savannah River Site, which is on Upper Coastal Plain of South Carolina. Detailed surveys, i.e., topography, soils, vegetation, and dainage network, of small (2-5 ha) plots have been completed in a 2 square-km watershed draining to Fourmile Creek, a tributary of the Savannah River. We wish to experimentally determine the relative importance of interflow on water yield and water quality at this site. Interflow (shallow subsurface lateral flow) can short-circuit rainfall infiltration, preventing deep seepage and resulting in water and chemical residence times in the watershed much shorter than that if deep seepage were the sole component of infiltration. The soil series at the site (Wagram, Dothan, Fuquay, Ogeechee, and Vaucluse) each have a clay-rich B horizon of decimeter-scale thickness at depths of 1-2 m below surface. As interflow is affected by rainfall intensity and duration and soil properties such as porosity, permeability, and antecedent soil moisture, our calculations made using the Green and Ampt equation show that the intensity and duration of a storm event must be greater than about 3 cm per hour and 2 hours, respectively, in order to initiate interflow for the least permeable soils series (Vaucluse). Tabulated values of soil properties were used in these preliminary calculations. Simulations of the largest rainfall events from 1972-2002 data using the Green and Ampt equation provide an interflow: rainfall ratio of 0 for the permeable Wagram soil series (no interflow) compared to 0.46 for the less permeable Vaucluse soil series. These initial predictions will be compared to storm water hydrographs of interflow collected at the outflow point of each plot and refined using more detailed soil property measurements
Adaptive Computing Systems and their Design Tools
While reconfigurable adaptive computing has many proven advantages over conventional processors, in practice, it is often limited to niche applications. This situation, which we aim to resolve with our research, is often linked to the lack of programming languages for adaptive computers that are familiar to software developers. We present a compile flow capable of translating general-purpose C programs to hybrid hardware/software applications for execution on an adaptive computer and give an overview of the required advances in compiler technology as well as in computer architecture and operating system design
Recommended from our members
Development of a dedicated ethanol ultra-low emission vehicle (ULEV) -- Phase 2 report
The objective of this 3.5-year project is to develop a commercially competitive vehicle powered by ethanol (or an ethanol blend) that can meet California`s ultra-low emission vehicle (ULEV) standards and equivalent corporate average fuel economy (CAFE) energy efficiency for a light-duty passenger car application. The definition of commercially competitive is independent of fuel cost, but does include technical requirements for competitive power, performance, refueling times, vehicle range, driveability, fuel handling safety, and overall emissions performance. This report summarizes the second phase of this project, which lasted 12 months. This report documents two baseline vehicles, the engine modifications made to the original equipment manufacturer (OEM) engines, advanced aftertreatment testing, and various fuel tests to evaluate the flammability, lubricity, and material compatibility of the ethanol fuel blends
- …