    Model Assisted Variable Clustering: Minimax-optimal Recovery and Algorithms

    Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to G-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular K-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we contrast our methods with another popular clustering method, spectral clustering, specialized to variable clustering, and show that ensuring exact cluster recovery via this method requires clusters to have a higher separation, relative to the minimax threshold. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.Comment: Maintext: 38 pages; supplementary information: 37 page

    The Belgian repository of fundamental atomic data and stellar spectra (BRASS). I. Cross-matching atomic databases of astrophysical interest

    Fundamental atomic parameters, such as oscillator strengths, play a key role in modelling and understanding the chemical composition of stars in the universe. Despite the significant work underway to produce these parameters for many astrophysically important ions, uncertainties in these parameters remain large and can propagate throughout the entire field of astronomy. The Belgian repository of fundamental atomic data and stellar spectra (BRASS) aims to provide the largest systematic and homogeneous quality assessment of atomic data to date in terms of wavelength, atomic and stellar parameter coverage. To prepare for it, we first compiled multiple literature occurrences of many individual atomic transitions, from several atomic databases of astrophysical interest, and assessed their agreement. Several atomic repositories were searched and their data retrieved and formatted in a consistent manner. Data entries from all repositories were cross-matched against our initial BRASS atomic line list to find multiple occurrences of the same transition. Where possible we used a non-parametric cross-match depending only on electronic configurations and total angular momentum values. We also checked for duplicate entries of the same physical transition, within each retrieved repository, using the non-parametric cross-match. We report the cross-matched transitions for each repository and compare their fundamental atomic parameters. We find differences in log(gf) values of up to 2 dex or more. We also find and report that ~2% of our line list and Vienna Atomic Line Database retrievals are composed of duplicate transitions. Finally we provide a number of examples of atomic spectral lines with different log(gf) values, and discuss the impact of these uncertain log(gf) values on quantitative spectroscopy. All cross-matched atomic data and duplicate transitions are available to download at brass.sdf.org.Comment: 18 pages, 12 figures, 9 tables. Accepted for publication in A&

    An efficient domain decomposition method with cross-point treatment for Helmholtz problems

    International audienceThe parallel finite-element solution of large-scale time-harmonic scattering problems is addressed with a non-overlapping domain decomposition method (DDM). It is well known that the efficiency of this method strongly depends on the transmission condition enforced on the interfaces between the subdomains. Local conditions based on high-order absorbing boundary conditions (HABCs) are well suited for configurations without cross points (where more than two subdo-mains meet). In this work, we extend this approach to efficiently deal with cross points. Two-dimensional finite-element results are presented

    Comparison of lidar-derived PM10 with regional modeling and ground-based observations in the frame of MEGAPOLI experiment

    International audienceAn innovative approach using mobile lidar measurements was implemented to test the performances of chemistry-transport models in simulating mass concentrations (PM10) predicted by chemistry-transport models. A ground-based mobile lidar (GBML) was deployed around Paris onboard a van during the MEGAPOLI (Megacities: Emissions, urban, regional and Global Atmospheric POLlution and climate effects, and Integrated tools for assessment and mitigation) summer experiment in July 2009. The measurements performed with this Rayleigh-Mie lidar are converted into PM10 profiles using optical-to-mass relationships previously established from in situ measurements performed around Paris for urban and peri-urban aerosols. The method is described here and applied to the 10 measurements days (MD). MD of 1, 15, 16 and 26 July 2009, corresponding to different levels of pollution and atmospheric conditions, are analyzed here in more details. Lidar-derived PM10 are compared with results of simulations from POLYPHEMUS and CHIMERE chemistry-transport models (CTM) and with ground-based observations from the AIRPARIF network. GBML-derived and AIRPARIF in situ measurements have been found to be in good agreement with a mean Root Mean Square Error RMSE (and a Mean Absolute Percentage Error MAPE) of 7.2 μg m−3 (26.0%) and 8.8 μg m−3 (25.2%) with relationships assuming peri-urban and urban-type particles, respectively. The comparisons between CTMs and lidar at ~200 m height have shown that CTMs tend to underestimate wet PM10 concentrations as revealed by the mean wet PM10 observed during the 10 MD of 22.4, 20.0 and 17.5 μg m−3 for lidar with peri-urban relationship, and POLYPHEMUS and CHIMERE models, respectively. This leads to a RMSE (and a MAPE) of 6.4 μg m−3 (29.6%) and 6.4 μg m−3 (27.6%) when considering POLYPHEMUS and CHIMERE CTMs, respectively. Wet integrated PM10 computed (between the ground and 1 km above the ground level) from lidar, POLYPHEMUS and CHIMERE results have been compared and have shown similar results with a RMSE (and MAPE) of 6.3 mg m−2 (30.1%) and 5.2 mg m−2 (22.3%) with POLYPHEMUS and CHIMERE when comparing with lidar-derived PM10 with periurban relationship. The values are of the same order of magnitude than other comparisons realized in previous studies. The discrepancies observed between models and measured PM10 can be explained by difficulties to accurately model the background conditions, the positions and strengths of the plume, the vertical turbulent diffusion (as well as the limited vertical model resolutions) and chemical processes as the formation of secondary aerosols. The major advantage of using vertically resolved lidar observations in addition to surface concentrations is to overcome the problem of limited spatial representativity of surface measurements. Even for the case of a well-mixed boundary layer, vertical mixing is not complete, especially in the surface layer and near source regions. Also a bad estimation of the mixing layer height would introduce errors in simulated surface concentrations, which can be detected using lidar measurements. In addition, horizontal spatial representativity is larger for altitude integrated measurements than for surface measurements, because horizontal inhomogeneities occurring near surface sources are dampened

    A non-overlapping domain decomposition method with high-order transmission conditions and cross-point treatment for Helmholtz problems

    International audienceA non-overlapping domain decomposition method (DDM) is proposed for the parallel finite-element solution of large-scale time-harmonic wave problems. It is well-known that the convergence rate of this kind of method strongly depends on the transmission condition enforced on the interfaces between the subdomains. Local conditions based on high-order absorbing boundary conditions (HABCs) have proved to be well-suited, as a good compromise between basic impedance conditions, which lead to suboptimal convergence, and conditions based on the exact Dirichlet-to-Neumann (DtN) map related to the complementary of the subdomain — which are too expensive to compute. However, a direct application of this approach for configurations with interior cross-points (where more than two subdomains meet) and boundary cross-points (points that belong to both the exterior boundary and at least two subdomains) is suboptimal and, in some cases, can lead to incorrect results.In this work, we extend a non-overlapping DDM with HABC-based transmission conditions approach to efficiently deal with cross-points for lattice-type partitioning. We address the question of the cross-point treatment when the HABC operator is used in the transmission condition, or when it is used in the exterior boundary condition, or both. The proposed cross-point treatment relies on corner conditions developed for Padé-type HABCs. Two-dimensional numerical results with a nodal finite-element discretization are proposed to validate the approach, including convergence studies with respect to the frequency, the mesh size and the number of subdomains. These results demonstrate the efficiency of the cross-point treatment for settings with regular partitions and homogeneous media. Numerical experiments with distorted partitions and smoothly varying heterogeneous media show the robustness of this treatment

    Simultaneous observations of lower tropospheric continental aerosols with a ground-based, an airborne, and the spaceborne CALIOP lidar system

    International audienceWe present an original experiment with multiple lidar systems operated simultaneously to study the capability of the Cloud-Aerosol LIdar with Orthogonal Polarization (CALIOP), on board the Cloud-Aerosol Lidar Pathfinder Satellite Observation (CALIPSO), to infer aerosol optical properties in the lower troposphere over a midlatitude continental site where the aerosol load is low to moderate. The experiment took place from 20 June to 10 July 2007 in southern France. The results are based on three case studies with measurements coincident to CALIOP observations: the first case study illustrates a large-scale pollution event with an aerosol optical thickness at 532 nm (τa532) of ∼0.25, and the two other case studies are devoted to background conditions due to aerosol scavenging by storms with τa532 <0.1. Our experimental approach involved ground-based and airborne lidar systems as well as Sun photometer measurements when the conditions of observation were favorable. Passive spaceborne instruments, namely the Spinning Enhanced Visible and Infrared Imager (SEVERI) and the Moderate-resolution Imaging Spectroradiometer (MODIS), are used to characterize the large-scale aerosol conditions. We show that complex topographical structures increase the complexity of the aerosol analysis in the planetary boundary layer by CALIOP when τa532 is lower than 0.1 because the number of available representative profiles is low to build a mean CALIOP profile with a good signal-to-noise ratio. In a comparison, the aerosol optical properties inferred from CALIOP and those deduced from the other active and passive remote sensing observations in the pollution plume are found to be in reasonable agreement. Level-2 aerosol products of CALIOP are consistent with our retrievals

    Building of the Amsterdam-Saint Paul plateau: A 10 Myr history of a ridge-hot spot interaction and variations in the strength of the hot spot source

    International audienceThe Amsterdam-Saint Paul plateau results from a 10 Myr interaction between the South East Indian Ridge and the Amsterdam-Saint Paul hot spot. During this period of time, the structure of the plateau changed as a consequence of changes in both the ridge-hot spot relative distance and in the strength of the hot spot source. The joint analysis of gravity-derived crust thickness and bathymetry reveals that the plateau started to form at ~10 Ma by an increase of the crustal production at the ridge axis, due to the nearby hot spot. This phase, which lasted 3-4 Myr, corresponds to a period of a strong hot spot source, maybe due to a high temperature or material flux, and decreasing ridge-hot spot distance. A second phase, between ~6 and ~3 Ma, corresponds to a decrease in the ridge crustal production. During this period, the hot spot center was close to the ridge axis and this reduced magmatic activity suggests a weak hot spot source. At ~3 Ma, the ridge was located approximately above the hot spot center. An increase in the hot spot source strength then resulted in the building of the shallower part of the plateau. The variations of the melt production at the ridge axis through time resulted in variations in crustal thickness but also in changes in the ridge morphology. The two periods of increased melt production correspond to smooth ridge morphology, characterized by axial highs, while the intermediate period corresponds to a rougher, rift-valley morphology. These variations reveal changes in axial thermal structure due to higher melting production rates and temperatures