41 research outputs found

    Model selection for partial least squares calibration and implications for analysis of atmospheric organic aerosol samples with mid-infrared spectroscopy

    Get PDF
    In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade-offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user-defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross-validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid-infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0-30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. (C) 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd

    Estimation of Organic and Elemental Carbon using FT-IR absorbance spectra from PTFE Filters

    Get PDF
    Organic carbon and elemental carbon are major components of atmospheric PM. Typically they are measured using destructive and relatively expensive methods (e.g., TOR). We aim to reduce the operating costs of large air quality monitoring networks using FT-IR spectra of ambient PTFE filters and PLS regression. We achieve accurate predictions for models (calibrated in 2011) that use samples collected at the same or different sites of the calibration data set and in a different year (2013)

    Size-resolved particulate matter composition in Beijing during pollution and dust events

    Get PDF
    Each spring, Beijing, China, experiences dust storms which cause high particulate matter concentrations. Beijing also has many anthropogenic sources of particulate matter including the large Capitol Steel Company. On the basis of measured size segregated, speciated particulate matter concentrations, and calculated back trajectories, three types of pollution events occurred in Beijing from 22 March to 1 April 2001: dust storms, urban pollution events, and an industrial pollution event. For each event type, the source of each measured element is determined to be soil or anthropogenic and profiles are created that characterize the particulate matter composition. Dust storms are associated with winds traveling from desert regions and high total suspended particle (TSP) and PM2.5 concentrations. Sixty-two percent of TSP is due to elements with oxides and 98% of that is from soil. Urban pollution events have smaller particulate concentrations but 49% of the TSP is from soil, indicating that dust is a major component of the particulate matter even when there is not an active dust storm. The industrial pollution event is characterized by winds from the southwest, the location of the Capitol Steel Company, and high particulate concentrations. PM2.5 mass and acidic ion concentrations are highest during the industrial pollution event as are Mn, Zn, As, Rb, Cd, Cs and Pb concentrations. These elements can be used as tracers for industrial pollution from the steel mill complex. The industrial pollution is potentially more detrimental to human health than dust storms due to higher PM2.5 concentrations and higher acidic ion and toxic particulate matter concentrations

    A quantitative method for clustering size distributions of elements

    Get PDF
    A quantitative method was developed to group similarly shaped size distributions of particle-phase elements in order to ascertain sources of the elements. This method was developed and applied using data from two sites in Houston, TX; one site surrounded by refineries, chemical plants and vehicular and commercial shipping traffic, and the other site, 25 miles inland surrounded by residences, light industrial facilities and vehicular traffic. Twenty-four hour size-segregated (0.056<D_p (particle diameter)<1.8 ÎŒm) particulate matter samples were collected during five days in August 2000. ICP-MS was used to quantify 32 elements with concentrations as low as a few picograms per cubic meter. Concentrations of particulate matter mass, sulfate and organic carbon at the two sites were often not significantly different from each other and had smooth unimodal size distributions indicating the regional nature of these species. Element concentrations varied widely across events and sites and often showed sharp peaks at particle diameters between 0.1 and 0.3 ÎŒm and in the ultrafine mode (Dp<0.1 ÎŒm), which suggested that the sources of these elements were local, high-temperature processes. Elements were quantitatively grouped together in each event using Ward's Method to cluster normalized size distributions of all elements. Cluster analysis provided groups of elements with similar size distributions that were attributed to sources such as automobile catalysts, fluid catalytic cracking unit catalysts, fuel oil burning, a coal-fired power plant, and high-temperature metal working. The clustered elements were generally attributed to different sources at the two sites during each sampling day indicating the diversity of local sources that impact heavy metals concentrations in the region

    An automated baseline correction protocol for infrared spectra of atmospheric aerosols collected on polytetrafluoroethylene (Teflon) filters

    Get PDF
    A growing body of research on statistical applications for characterization of atmospheric aerosol Fourier transform infrared (FT-IR) samples collected on polytetrafluoroethylene (PTFE) filters (e.g., Russell et al., 2011; Ruthenburg et al., 2014) and a rising interest in analyzing FT-IR samples collected by air quality monitoring networks call for an automated PTFE baseline correction solution. The existing polynomial technique (Takahama et al., 2013) is not scalable to a project with a large number of aerosol samples because it contains many parameters and requires expert intervention. Therefore, the question of how to develop an automated method for baseline correcting hundreds to thousands of ambient aerosol spectra given the variability in both environmental mixture composition and PTFE baselines remains. This study approaches the question by detailing the statistical protocol, which allows for the precise definition of analyte and background subregions, applies nonparametric smoothing splines to reproduce sample-specific PTFE variations, and integrates performance metrics from atmospheric aerosol and blank samples alike in the smoothing parameter selection. Referencing 794 atmospheric aerosol samples from seven Interagency Monitoring of PROtected Visual Environment (IMPROVE) sites collected during 2011, we start by identifying key FT-IR signal characteristics, such as non-negative absorbance or analyte segment transformation, to capture sample-specific transitions between background and analyte. While referring to qualitative properties of PTFE background, the goal of smoothing splines interpolation is to learn the baseline structure in the background region to predict the baseline structure in the analyte region. We then validate the model by comparing smoothing splines baseline-corrected spectra with uncorrected and polynomial baseline (PB)-corrected equivalents via three statistical applications: (1) clustering analysis, (2) functional group quantification, and (3) thermal optical reflectance (TOR) organic carbon (OC) and elemental carbon (EC) predictions. The discrepancy rate for a four-cluster solution is 10 %. For all functional groups but carboxylic COH the discrepancy is = 0.94 %, bias <= 0.01 mu g m(-3), and error <= 0.04 mu g m(-3)) are on a par with those obtained from uncorrected and PB-corrected spectra. The proposed protocol leads to visually and analytically similar estimates as those generated by the polynomial method. More importantly, the automated solution allows us and future users to evaluate its analytical reproducibility while minimizing reducible user bias. We anticipate the protocol will enable FT-IR researchers and data analysts to quickly and reliably analyze a large amount of data and connect them to a variety of available statistical learning methods to be applied to analyte absorbances isolated in atmospheric aerosol samples

    Analysis of functional groups in atmospheric aerosols by infrared spectroscopy: ElnetPLS model for statistical selection of relevant absorption bands for OC predictions

    Get PDF
    Organic carbon (OC) is a major component of atmospheric particulate matter (PM). Typically OC concentrations are measured using thermal methods such as thermal-optical reflectance (TOR) from samples collected on quartz filters. However, TOR measurements are destructive and expensive. We estimate TOR OC concentrations using Fourier transform infrared (FT-IR) spectra of ambient samples collected on Teflon filter. We have developed a sparse statistical calibration model (ElnetPLS), which excludes unnecessary wavenumbers from infrared spectra during the model building process, permitting the identification and evaluation of the most relevant vibrational modes of molecules in complex aerosol mixtures associated with reported TOR OC concentrations. The sparsest ElnetPLS model has similar model performances of the full (2784) wavenumber models while requiring only ten wavenumbers associated with carbonyl groups

    Size-resolved particulate matter composition in Beijing during pollution and dust events

    Get PDF
    Each spring, Beijing, China, experiences dust storms which cause high particulate matter concentrations. Beijing also has many anthropogenic sources of particulate matter including the large Capitol Steel Company. On the basis of measured size segregated, speciated particulate matter concentrations, and calculated back trajectories, three types of pollution events occurred in Beijing from 22 March to 1 April 2001: dust storms, urban pollution events, and an industrial pollution event. For each event type, the source of each measured element is determined to be soil or anthropogenic and profiles are created that characterize the particulate matter composition. Dust storms are associated with winds traveling from desert regions and high total suspended particle (TSP) and PM2.5 concentrations. Sixty-two percent of TSP is due to elements with oxides and 98% of that is from soil. Urban pollution events have smaller particulate concentrations but 49% of the TSP is from soil, indicating that dust is a major component of the particulate matter even when there is not an active dust storm. The industrial pollution event is characterized by winds from the southwest, the location of the Capitol Steel Company, and high particulate concentrations. PM2.5 mass and acidic ion concentrations are highest during the industrial pollution event as are Mn, Zn, As, Rb, Cd, Cs and Pb concentrations. These elements can be used as tracers for industrial pollution from the steel mill complex. The industrial pollution is potentially more detrimental to human health than dust storms due to higher PM2.5 concentrations and higher acidic ion and toxic particulate matter concentrations

    Atmospheric particulate matter characterization by Fourier transform infrared spectroscopy: a review of statistical calibration strategies for carbonaceous aerosol quantification in US measurement networks

    Get PDF
    Atmospheric particulate matter (PM) is a complex mixture of many different substances and requires a suite of instruments for chemical characterization. Fourier transform infrared (FT-IR) spectroscopy is a technique that can provide quantification of multiple species provided that accurate calibration models can be constructed to interpret the acquired spectra. In this capacity, FT-IR spectroscopy has enjoyed a long history in monitoring gas-phase constituents in the atmosphere and in stack emissions. However, application to PM poses a different set of challenges as the condensed-phase spectrum has broad, overlapping absorption peaks and contributions of scattering to the mid-infrared spectrum. Past approaches have used laboratory standards to build calibration models for prediction of inorganic substances or organic functional groups and predict their concentration in atmospheric PM mixtures by extrapolation. In this work, we review recent studies pursuing an alternate strategy, which is to build statistical calibration models for mid-IR spectra of PM using collocated ambient measurements. Focusing on calibrations with organic carbon (OC) and elemental carbon (EC) reported from thermal-optical reflectance (TOR), this synthesis serves to consolidate our knowledge for extending FT-IR spectroscopy to provide TOR-equivalent OC and EC measurements to new PM samples when TOR measurements are not available. We summarize methods for model specification, calibration sample selection, and model evaluation for these substances at several sites in two US national monitoring networks: seven sites in the Interagency Monitoring of Protected Visual Environments (IMPROVE) network for the year 2011 and 10 sites in the Chemical Speciation Network (CSN) for the year 2013. We then describe application of the model in an operational context for the IMPROVE network for samples collected in 2013 at six of the same sites as in 2011 and 11 additional sites. In addition to extending the evaluation to samples from a different year and different sites, we describe strategies for error anticipation due to precision and biases from the calibration model to assess model applicability for new spectra a priori. We conclude with a discussion regarding past work and future strategies for recalibration. In addition to targeting numerical accuracy, we encourage model interpretation to facilitate understanding of the underlying structural composition related to operationally defined quantities of TOR OC and EC from the vibrational modes in mid-IR deemed most informative for calibration. The paper is structured such that the life cycle of a statistical calibration model for FT-IR spectroscopy can be envisioned for any substance with IR-active vibrational modes, and more generally for instruments requiring ambient calibrations

    The European response to the WHO call to eliminate cervical cancer as a public health problem

    Get PDF
    The age-standardised incidence of cervical cancer in Europe varies widely by country (between 3 and 25/100000 women-years) in 2018. Human papillomavirus (HPV) vaccine coverage is low in countries with the highest incidence and screening performance is heterogeneous among European countries. A broad group of delegates of scientific professional societies and cancer organisations endorse the principles of the WHO call to eliminate cervical cancer as a public health problem, also in Europe. All European nations should, by 2030, reach at least 90% HPV vaccine coverage among girls by the age of 15 years and also boys, if cost-effective; they should introduce organised population-based HPV-based screening and achieve 70% of screening coverage in the target age group, providing also HPV testing on self-samples for nonscreened or underscreened women; and to manage 90% of screen-positive women. To guide member states, a group of scientific professional societies and cancer organisations engage to assist in the rollout of a series of concerted evidence-based actions. European health authorities are requested to mandate a group of experts to develop the third edition of European Guidelines for Quality Assurance of Cervical Cancer prevention based on integrated HPV vaccination and screening and to monitor the progress towards the elimination goal. The occurrence of the COVID-19 pandemic, having interrupted prevention activities temporarily, should not deviate stakeholders from this ambition. In the immediate postepidemic phase, health professionals should focus on high-risk women and adhere to cost-effective policies including self-sampling.Peer reviewe
    corecore