5,937 research outputs found

    Assessing the geometric diversity of cytochrome P450 ligand conformers by hierarchical clustering with a stop criterion.

    Get PDF
    International audienceAn algorithm is presented, which exhibits a computed number of rigid conformers of an input small molecule, covering the geometric diversity in the conformational space, with minimal structural redundancy. The algorithm calls a conformer generator, then performs an agglomerative hierarchical clustering with the modified clustering gain as the stop criterion. The number of classes is computed without an arbitrary parameter. A representative conformer is selected in each class, and nonrepresentative conformers are discarded. For illustration, the algorithm has been applied on a database containing 70 ligands of the cytochrome CYP 3A4, showing that the structural flexibility of each ligand is indeed handled via a small number of its representative conformers. The method is valid for all small molecules

    New methods for the estimation of Takagi-Sugeno model based extended Kalman filter and its applications to optimal control for nonlinear systems

    Get PDF
    This paper describes new approaches to improve the local and global approximation (matching) and modeling capability of Takagi–Sugeno (T-S) fuzzy model. The main aim is obtaining high function approximation accuracy and fast convergence. The main problem encountered is that T-S identification method cannot be applied when the membership functions are overlapped by pairs. This restricts the application of the T-S method because this type of membership function has been widely used during the last 2 decades in the stability, controller design of fuzzy systems and is popular in industrial control applications. The approach developed here can be considered as a generalized version of T-S identification method with optimized performance in approximating nonlinear functions. We propose a noniterative method through weighting of parameters approach and an iterative algorithm by applying the extended Kalman filter, based on the same idea of parameters’ weighting. We show that the Kalman filter is an effective tool in the identification of T-S fuzzy model. A fuzzy controller based linear quadratic regulator is proposed in order to show the effectiveness of the estimation method developed here in control applications. An illustrative example of an inverted pendulum is chosen to evaluate the robustness and remarkable performance of the proposed method locally and globally in comparison with the original T-S model. Simulation results indicate the potential, simplicity, and generality of the algorithm. An illustrative example is chosen to evaluate the robustness. In this paper, we prove that these algorithms converge very fast, thereby making them very practical to use

    High resolution CMB power spectrum from the complete ACBAR data set

    Get PDF
    In this paper, we present results from the complete set of cosmic microwave background (CMB) radiation temperature anisotropy observations made with the Arcminute Cosmology Bolometer Array Receiver (ACBAR) operating at 150 GHz. We include new data from the final 2005 observing season, expanding the number of detector-hours by 210% and the sky coverage by 490% over that used for the previous ACBAR release. As a result, the band-power uncertainties have been reduced by more than a factor of two on angular scales encompassing the third to fifth acoustic peaks as well as the damping tail of the CMB power spectrum. The calibration uncertainty has been reduced from 6% to 2.1% in temperature through a direct comparison of the CMB anisotropy measured by ACBAR with that of the dipole-calibrated WMAP5 experiment. The measured power spectrum is consistent with a spatially flat, LambdaCDM cosmological model. We include the effects of weak lensing in the power spectrum model computations and find that this significantly improves the fits of the models to the combined ACBAR+WMAP5 power spectrum. The preferred strength of the lensing is consistent with theoretical expectations. On fine angular scales, there is weak evidence (1.1 sigma) for excess power above the level expected from primary anisotropies. We expect any excess power to be dominated by the combination of emission from dusty protogalaxies and the Sunyaev-Zel'dovich effect (SZE). However, the excess observed by ACBAR is significantly smaller than the excess power at ell > 2000 reported by the CBI experiment operating at 30 GHz. Therefore, while it is unlikely that the CBI excess has a primordial origin; the combined ACBAR and CBI results are consistent with the source of the CBI excess being either the SZE or radio source contamination.Comment: Submitted to ApJ; Changed to apply a WMAP5-based calibration. The cosmological parameter estimation has been updated to include WMAP

    The Eighth Data Release of the Sloan Digital Sky Survey: First Data from SDSS-III

    Get PDF
    The Sloan Digital Sky Survey (SDSS) started a new phase in August 2008, with new instrumentation and new surveys focused on Galactic structure and chemical evolution, measurements of the baryon oscillation feature in the clustering of galaxies and the quasar Ly alpha forest, and a radial velocity search for planets around ~8000 stars. This paper describes the first data release of SDSS-III (and the eighth counting from the beginning of the SDSS). The release includes five-band imaging of roughly 5200 deg^2 in the Southern Galactic Cap, bringing the total footprint of the SDSS imaging to 14,555 deg^2, or over a third of the Celestial Sphere. All the imaging data have been reprocessed with an improved sky-subtraction algorithm and a final, self-consistent photometric recalibration and flat-field determination. This release also includes all data from the second phase of the Sloan Extension for Galactic Understanding and Evolution (SEGUE-2), consisting of spectroscopy of approximately 118,000 stars at both high and low Galactic latitudes. All the more than half a million stellar spectra obtained with the SDSS spectrograph have been reprocessed through an improved stellar parameters pipeline, which has better determination of metallicity for high metallicity stars.Comment: Astrophysical Journal Supplements, in press (minor updates from submitted version

    Gamma-based clustering via ordered means with application to gene-expression analysis

    Full text link
    Discrete mixture models provide a well-known basis for effective clustering algorithms, although technical challenges have limited their scope. In the context of gene-expression data analysis, a model is presented that mixes over a finite catalog of structures, each one representing equality and inequality constraints among latent expected values. Computations depend on the probability that independent gamma-distributed variables attain each of their possible orderings. Each ordering event is equivalent to an event in independent negative-binomial random variables, and this finding guides a dynamic-programming calculation. The structuring of mixture-model components according to constraints among latent means leads to strict concavity of the mixture log likelihood. In addition to its beneficial numerical properties, the clustering method shows promising results in an empirical study.Comment: Published in at http://dx.doi.org/10.1214/10-AOS805 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Search for Cosmic Microwave Background Anisotropies on Arcminute Scales with Bolocam

    Get PDF
    We have surveyed two science fields totaling one square degree with Bolocam at 2.1 mm to search for secondary CMB anisotropies caused by the Sunyaev- Zel'dovich effect (SZE). The fields are in the Lynx and Subaru/XMM SDS1 fields. Our survey is sensitive to angular scales with an effective angular multipole of l_eff = 5700 with FWHM_l = 2800 and has an angular resolution of 60 arcseconds FWHM. Our data provide no evidence for anisotropy. We are able to constrain the level of total astronomical anisotropy, modeled as a flat bandpower in C_l, with frequentist 68%, 90%, and 95% CL upper limits of 590, 760, and 830 uKCMB^2. We statistically subtract the known contribution from primary CMB anisotropy, including cosmic variance, to obtain constraints on the SZE anisotropy contribution. Now including flux calibration uncertainty, our frequentist 68%, 90% and 95% CL upper limits on a flat bandpower in C_l are 690, 960, and 1000 uKCMB^2. When we instead employ the analytic spectrum suggested by Komatsu and Seljak (2002), and account for the non-Gaussianity of the SZE anisotropy signal, we obtain upper limits on the average amplitude of their spectrum weighted by our transfer function of 790, 1060, and 1080 uKCMB^2. We obtain a 90% CL upper limit on sigma8, which normalizes the power spectrum of density fluctuations, of 1.57. These are the first constraints on anisotropy and sigma8 from survey data at these angular scales at frequencies near 150 GHz.Comment: 68 pages, 17 figures, 2 tables, accepted for publication in Ap

    Enhancing the Performance of Text Mining

    Get PDF
    The amount of text data produced in science, finance, social media, and medicine is growing at an unprecedented pace. The raw text data typically introduces major computational and analytical obstacles (e.g., extremely high dimensionality) to data mining and machine learning algorithms. Besides, the growth in the size of text data makes the search process more difficult for information retrieval systems, making retrieving relevant results to match the users’ search queries challenging. Moreover, the availability of text data in different languages creates the need to develop new methods to analyze multilingual topics to help policymakers in governmental and health systems to make risk decisions and to create policies to respond to public health crises, natural disasters, and political or social movements. The goal of this thesis is to develop new methods that handle computational and analytical problems for complex high-dimensional text data, develop a new query expansion approach to enhance the performance of information retrieval systems, and to present new techniques for analyzing multilingual topics using a translation service. First, in the field of dimensionality reduction, we develop a new method for detecting and eliminating domain-based words. In this method, we use three different datasets and five classifiers for testing and evaluating the performance of our new approach before and after eliminating domain-based words. We compare the performance of our approach with other feature selection methods. We find that the new approach improves the performance of the binary classifier and reduces the dimensionality of the feature space by 90%. Also, our approach reduces the execution time of the classifier and outperforms one of the feature selection methods. Second, in the field of information retrieval, we design and implement a method that integrates words from a current stream with external data sources in order to predict the occurrence of relevant words that have not yet appeared in the primary source. This algorithm enables the construction of new queries that effectively capture emergent events that a user may not have anticipated when initiating the data collection stream. The added value of using the external data sources appears when we have a stream of data and we want to predict something that has not yet happened instead of using only the stream that is limited to the available information at a specific time. We compare the performance of our approach with two alternative approaches. The first approach (static) expands user queries with words extracted from a probabilistic topic model of the stream. The second approach (emergent) reinforces user queries with emergent words extracted from the stream. We find that our method outperforms alternative approaches, exhibiting particularly good results in identifying future emergent topics. Third, in the field of the multilingual text, we present a strategy to analyze the similarity between multilingual topics in English and Arabic tweets surrounding the 2020 COVID-19 pandemic. We make a descriptive comparison between topics in Arabic and English tweets about COVID-19 using tweets collected in the same way and filtered using the same keywords. We analyze Twitter’s discussion to understand the evolution of topics over time and reveal topic similarity among tweets across the datasets. We use probabilistic topic modeling to identify and extract the key topics of Twitter’s discussion in Arabic and English tweets. We use two methods to analyze the similarity between multilingual topics. The first method (full-text topic modeling approach) translates all text to English and then runs topic modeling to find similar topics. The second method (term-based topic modeling approach) runs topic modeling on the text before translation then translates the top keywords in each topic to find similar topics. We find similar topics related to COVID-19 pandemic covered in English and Arabic tweets for certain time intervals. Results indicate that the term-based topic modeling approach can reduce the cost compared to the full-text topic modeling approach and still have comparable results in finding similar topics. The computational time to translate the terms is significantly lower than the translation of the full text
    • …
    corecore