49 research outputs found
Efficient learning in Approximate Bayesian Computation
Efficient learning in Approximate Bayesian Computatio
Variable selection for model-based clustering using the integrated complete-data likelihood
Variable selection in cluster analysis is important yet challenging. It can
be achieved by regularization methods, which realize a trade-off between the
clustering accuracy and the number of selected variables by using a lasso-type
penalty. However, the calibration of the penalty term can suffer from
criticisms. Model selection methods are an efficient alternative, yet they
require a difficult optimization of an information criterion which involves
combinatorial problems. First, most of these optimization algorithms are based
on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are
often greedy because they need multiple calls of EM algorithms. Here we propose
to use a new information criterion based on the integrated complete-data
likelihood. It does not require any estimate and its maximization is simple and
computationally efficient. The original contribution of our approach is to
perform the model selection without requiring any parameter estimation. Then,
parameter inference is needed only for the unique selected model. This approach
is used for the variable selection of a Gaussian mixture model with conditional
independence assumption. The numerical experiments on simulated and benchmark
datasets show that the proposed method often outperforms two classical
approaches for variable selection.Comment: submitted to Statistics and Computin
Bayesian model selection in logistic regression for the detection of adverse drug reactions
Motivation: Spontaneous adverse event reports have a high potential for
detecting adverse drug reactions. However, due to their dimension, exploring
such databases requires statistical methods. In this context,
disproportionality measures are used. However, by projecting the data onto
contingency tables, these methods become sensitive to the problem of
co-prescriptions and masking effects. Recently, logistic regressions have been
used with a Lasso type penalty to perform the detection of associations between
drugs and adverse events. However, the choice of the penalty value is open to
criticism while it strongly influences the results. Results: In this paper, we
propose to use a logistic regression whose sparsity is viewed as a model
selection challenge. Since the model space is huge, a Metropolis-Hastings
algorithm carries out the model selection by maximizing the BIC criterion.
Thus, we avoid the calibration of penalty or threshold. During our application
on the French pharmacovigilance database, the proposed method is compared to
well established approaches on a reference data set, and obtains better rates
of positive and negative controls. However, many signals are not detected by
the proposed method. So, we conclude that this method should be used in
parallel to existing measures in pharmacovigilance.Comment: 7 pages, 3 figures, submitted to Biometrical Journa
Efficient learning in ABC algorithms
Approximate Bayesian Computation has been successfully used in population
genetics to bypass the calculation of the likelihood. These methods provide
accurate estimates of the posterior distribution by comparing the observed
dataset to a sample of datasets simulated from the model. Although
parallelization is easily achieved, computation times for ensuring a suitable
approximation quality of the posterior distribution are still high. To
alleviate the computational burden, we propose an adaptive, sequential
algorithm that runs faster than other ABC algorithms but maintains accuracy of
the approximation. This proposal relies on the sequential Monte Carlo sampler
of Del Moral et al. (2012) but is calibrated to reduce the number of
simulations from the model. The paper concludes with numerical experiments on a
toy example and on a population genetic study of Apis mellifera, where our
algorithm was shown to be faster than traditional ABC schemes
InvestigaciĂłn del efecto de la temperatura de recocido sobre las propiedades Ăłpticas de pelĂculas delgadas de CdSe
Introduction: CdSe is an important II–VI semiconducting material dueto its typical optical properties such as small direct band gap (1.7 eV) anda high refractive index and, thus, a major concern is focused on the investigation of optical properties of CdSe thin films which is important topromote the performances of the devices of solid -state such as SC (solar cells), thin film transistors, LED (light-emitting diodes), EBPL (electron–beam pumped lasers) and electroluminescent devices. In the presentwork, CdSe thin films were deposited by thermal evaporation method andthe results have been analysed and presented. Materials and Methods:CdSe thin films has been deposited on glass microscopic slides as substrates of (75×25×1 mm) under room temperature using PVD technique.CdSe blended powders gets evaporated and condensed on the substrate.The film thickness (t = 100 5 nm) which is measured using Michelsoninterferometry method. Transmission spectrum, from 200-1100 nm, arescanned using two beams UV–VIS Spectrophotometer (6850 UV/Vis.Spectrophotometer-JENWAY). The deposited films then were annealedat temperature range of (1500C to 3500C) under vacuum to have a stable phase of the material and prevent surface oxidization. Results andDiscussion: A transmittance spectrum of CdSe thin film is scanned overwavelength range 200 to 1100 nm using a (6850 UV/Vis. Spectrophotometer-JENWAY) at room temperature. The transmittance percentagebetween the as-deposited film and the annealed films change varies from(17.0%) to (47.0%). It is clearly seen that there is a shift toward higher energy (Blue Shift) in the transmittance spectrum. As annealing temperatureincreased the transmittance edge is shifted to the longer wavelength (i.e.,after annealing the CdSe films shows red shifts in their optical spectra).The band gap was found within the range 1.966-1.7536 eV for CdSe thinfilm. As annealing temperature increases, the Eg continuously decreases.Conclusions: CdSe thin films have been deposited using Physical VaporDeposition (PVD) Technique. It is found that the transmission for asdeposited films is (17%) and increases to (47%) as annealing temperature increases. Beside this the energy gap for as- deposited CdSe film is(1.966eV) and decreased from (1.909 eV) to (1.7536eV) as the annealingtemperature increases. There is a strong red shift in optical spectrum ofthe annealed CdSe films. There is a gradual shift of the annealed filmsthin film spectra as compared of bulk CdSe film
Simultaneous semi-parametric estimation of clustering and regression
International audienceWe investigate the parameter estimation of regression models with fixed group effects, when the group variable is missing while group related variables are available. This problem involves clustering to infer the missing group variable based on the group related variables, and regression to build a model on the target variable given the group and eventually additional variables. Thus, this problem can be formulated as the joint distribution modeling of the target and of the group related variables. The usual parameter estimation strategy for this joint model is a two-step approach starting by learning the group variable (clustering step) and then plugging in its estimator for fitting the regression model (regression step). However, this approach is suboptimal (providing in particular biased regression estimates) since it does not make use of the target variable for clustering. Thus, we claim for a simultaneous estimation approach of both clustering and regression, in a semi-parametric framework. Numerical experiments illustrate the benefits of our proposition by considering wide ranges of distributions and regression models. The relevance of our new method is illustrated on real data dealing with problems associated with high blood pressure prevention