37 research outputs found
Color-complexity enabled exhaustive color-dots identification and spatial patterns testing in images
Targeted color-dots with varying shapes and sizes in images are first
exhaustively identified, and then their multiscale 2D geometric patterns are
extracted for testing spatial uniformness in a progressive fashion. Based on
color theory in physics, we develop a new color-identification algorithm
relying on highly associative relations among the three color-coordinates: RGB
or HSV. Such high associations critically imply low color-complexity of a color
image, and renders potentials of exhaustive identification of targeted
color-dots of all shapes and sizes. Via heterogeneous shaded regions and
lighting conditions, our algorithm is shown being robust, practical and
efficient comparing with the popular Contour and OpenCV approaches. Upon all
identified color-pixels, we form color-dots as individually connected networks
with shapes and sizes. We construct minimum spanning trees (MST) as spatial
geometries of dot-collectives of various size-scales. Given a size-scale, the
distribution of distances between immediate neighbors in the observed MST is
extracted, so do many simulated MSTs under the spatial uniformness assumption.
We devise a new algorithm for testing 2D spatial uniformness based on a
Hierarchical clustering tree upon all involving MSTs. Our developments are
illustrated on images obtained by mimicking chemical spraying via drone in
Precision Agriculture.Comment: 21 pages, 21 figure
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Analysis of Variability of Functionals of Recombinant Protein Production Trajectories Based on Limited Data
Making statistical inference on quantities defining various characteristics of a temporally measured biochemical process and analyzing its variability across different experimental conditions is a core challenge in various branches of science. This problem is particularly difficult when the amount of data that can be collected is limited in terms of both the number of replicates and the number of time points per process trajectory. We propose a method for analyzing the variability of smooth functionals of the growth or production trajectories associated with such processes across different experimental conditions. Our modeling approach is based on a spline representation of the mean trajectories. We also develop a bootstrap-based inference procedure for the parameters while accounting for possible multiple comparisons. This methodology is applied to study two types of quantities—the “time to harvest” and “maximal productivity”—in the context of an experiment on the production of recombinant proteins. We complement the findings with extensive numerical experiments comparing the effectiveness of different types of bootstrap procedures for various tests of hypotheses. These numerical experiments convincingly demonstrate that the proposed method yields reliable inference on complex characteristics of the processes even in a data-limited environment where more traditional methods for statistical inference are typically not reliable
Color-complexity enabled exhaustive color-dots identification and spatial patterns testing in images.
Recommended from our members
Data-driven Computing and Analysis with Contrasting Statistical Developments in Real-world Applications
The real data generated from the real-world complex systems in general embraces rather sophisticated deterministic and stochastic structures on multiscale levels. Such structural complexity surely induces very challenging learning problems and poses very difficult data-analyzing issues. Data coming from diverse complex systems studied in scientific fields are often found to have diverse ways of preserving data pattern information. This diversity of ways of encoding information is in part due to the constraints between data’s sophisticated deterministic and stochastic structures. It becomes necessary for data scientists to adapt to such sophisticated constraints by adopting data-driven computing approaches when analyzing data from real-world complex systems. That is, to gain authentic information in data, it is essential to develop data-analysis methodologies according to the data’s intrinsic characteristics. In this dissertation, we develop and propose data-driven adaptive computational methods and statistical frameworks based on specific data structures, including digital images, data on Alzheimer’s Disease as well as limited data on biochemical experiments. In a project of evaluating the effectiveness of chemical spraying through an unmanned aerial vehicle (UAV), we prescribe a computational approach to using color-identification algorithms and minimum spanning trees (MSTs) to analyze the spatial distribution of color dots of various sizes and colors on the image. We succeeded in achieving the goal of testing the evenness of mechanical spray via color-dot testing papers. In a project studying the aging effects on a series of three of Van Gogh’s Sunflowers in a vase, we develop a computational approach to restore the original color and vibrancy in a reverse-engineering fashion. Their already faded or brownish-yellow backgrounds are successfully revived to shed yellow-oriented lights computationally. In a project of analyzing time-to-event data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, we employ conditional entropy to unravel heterogeneity among subjects and evaluate the potential factors that affect the diagnosis of Alzheimer’s disease. Our data-driven results are compared to Cox’s proportional hazard modeling and demonstrate better capability in identifying significant factors. In a contrasting fashion, we also study a statistical problem in modeling biochemical experiments with data being limited in size and scope. Under such constraints, we propose a flexible methodology for analyzing the variability of smooth functionals of the growth or production trajectories associated with temporally measured biochemical processes across different experimental conditions when the amount of data is limited. We demonstrate, through numerical experiments and real data analysis, the effectiveness of the statistical inference of key parameters of interest and the flexibility to extend to correlated structures. We conclude that data-driven approaches are necessary when analyzing big data sets, while statistical modeling has its merit when data is limited
Inference on the dynamics of COVID-19 in the United States.
The evolution of the COVID-19 pandemic is described through a time-dependent stochastic dynamic model in discrete time. The proposed multi-compartment model is expressed through a system of difference equations. Information on the social distancing measures and diagnostic testing rates are incorporated to characterize the dynamics of the various compartments of the model. In contrast with conventional epidemiological models, the proposed model involves interpretable temporally static and dynamic epidemiological rate parameters. A model fitting strategy built upon nonparametric smoothing is employed for estimating the time-varying parameters, while profiling over the time-independent parameters. Confidence bands of the parameters are obtained through a residual bootstrap procedure. A key feature of the methodology is its ability to estimate latent unobservable compartments such as the number of asymptomatic but infected individuals who are known to be the key vectors of COVID-19 spread. The nature of the disease dynamics is further quantified by relevant epidemiological markers that make use of the estimates of latent compartments. The methodology is applied to understand the true extent and dynamics of the pandemic in various states within the United States (US)
Analysis of Variability of Functionals of Recombinant Protein Production Trajectories Based on Limited Data
Making statistical inference on quantities defining various characteristics of a temporally measured biochemical process and analyzing its variability across different experimental conditions is a core challenge in various branches of science. This problem is particularly difficult when the amount of data that can be collected is limited in terms of both the number of replicates and the number of time points per process trajectory. We propose a method for analyzing the variability of smooth functionals of the growth or production trajectories associated with such processes across different experimental conditions. Our modeling approach is based on a spline representation of the mean trajectories. We also develop a bootstrap-based inference procedure for the parameters while accounting for possible multiple comparisons. This methodology is applied to study two types of quantities—the “time to harvest” and “maximal productivity”—in the context of an experiment on the production of recombinant proteins. We complement the findings with extensive numerical experiments comparing the effectiveness of different types of bootstrap procedures for various tests of hypotheses. These numerical experiments convincingly demonstrate that the proposed method yields reliable inference on complex characteristics of the processes even in a data-limited environment where more traditional methods for statistical inference are typically not reliable
Analysis of Variability of Functionals of Recombinant Protein Production Trajectories Based on Limited Data.
Making statistical inference on quantities defining various characteristics of a temporally measured biochemical process and analyzing its variability across different experimental conditions is a core challenge in various branches of science. This problem is particularly difficult when the amount of data that can be collected is limited in terms of both the number of replicates and the number of time points per process trajectory. We propose a method for analyzing the variability of smooth functionals of the growth or production trajectories associated with such processes across different experimental conditions. Our modeling approach is based on a spline representation of the mean trajectories. We also develop a bootstrap-based inference procedure for the parameters while accounting for possible multiple comparisons. This methodology is applied to study two types of quantities-the "time to harvest" and "maximal productivity"-in the context of an experiment on the production of recombinant proteins. We complement the findings with extensive numerical experiments comparing the effectiveness of different types of bootstrap procedures for various tests of hypotheses. These numerical experiments convincingly demonstrate that the proposed method yields reliable inference on complex characteristics of the processes even in a data-limited environment where more traditional methods for statistical inference are typically not reliable