5,937 research outputs found
Assessing the geometric diversity of cytochrome P450 ligand conformers by hierarchical clustering with a stop criterion.
International audienceAn algorithm is presented, which exhibits a computed number of rigid conformers of an input small molecule, covering the geometric diversity in the conformational space, with minimal structural redundancy. The algorithm calls a conformer generator, then performs an agglomerative hierarchical clustering with the modified clustering gain as the stop criterion. The number of classes is computed without an arbitrary parameter. A representative conformer is selected in each class, and nonrepresentative conformers are discarded. For illustration, the algorithm has been applied on a database containing 70 ligands of the cytochrome CYP 3A4, showing that the structural flexibility of each ligand is indeed handled via a small number of its representative conformers. The method is valid for all small molecules
New methods for the estimation of Takagi-Sugeno model based extended Kalman filter and its applications to optimal control for nonlinear systems
This paper describes new approaches to improve the local and global approximation (matching) and modeling capability of Takagi–Sugeno (T-S) fuzzy model. The main aim is obtaining high function approximation accuracy and fast convergence. The main problem encountered is that T-S identification method cannot be applied when the membership functions are overlapped by pairs. This restricts the application of the T-S method because this type of membership function has been widely used during the last 2 decades in the stability, controller design of fuzzy systems and is popular in industrial control applications. The approach developed here can be considered as a generalized version of T-S identification method with optimized performance in approximating nonlinear functions. We propose a noniterative method through weighting of parameters approach and an iterative algorithm by applying the extended Kalman filter, based on the same idea of parameters’ weighting. We show that the Kalman filter is an effective tool in the identification of T-S fuzzy model. A fuzzy controller based linear quadratic regulator is proposed in order to show the effectiveness of the estimation method developed here in control applications. An illustrative example of an inverted pendulum is chosen to evaluate the robustness and remarkable performance of the proposed method locally and globally in comparison with the original T-S model. Simulation results indicate the potential, simplicity, and generality of the algorithm. An illustrative example is chosen to evaluate the robustness. In this paper, we prove that these algorithms converge very fast, thereby making them very practical to use
High resolution CMB power spectrum from the complete ACBAR data set
In this paper, we present results from the complete set of cosmic microwave
background (CMB) radiation temperature anisotropy observations made with the
Arcminute Cosmology Bolometer Array Receiver (ACBAR) operating at 150 GHz. We
include new data from the final 2005 observing season, expanding the number of
detector-hours by 210% and the sky coverage by 490% over that used for the
previous ACBAR release. As a result, the band-power uncertainties have been
reduced by more than a factor of two on angular scales encompassing the third
to fifth acoustic peaks as well as the damping tail of the CMB power spectrum.
The calibration uncertainty has been reduced from 6% to 2.1% in temperature
through a direct comparison of the CMB anisotropy measured by ACBAR with that
of the dipole-calibrated WMAP5 experiment. The measured power spectrum is
consistent with a spatially flat, LambdaCDM cosmological model. We include the
effects of weak lensing in the power spectrum model computations and find that
this significantly improves the fits of the models to the combined ACBAR+WMAP5
power spectrum. The preferred strength of the lensing is consistent with
theoretical expectations. On fine angular scales, there is weak evidence (1.1
sigma) for excess power above the level expected from primary anisotropies. We
expect any excess power to be dominated by the combination of emission from
dusty protogalaxies and the Sunyaev-Zel'dovich effect (SZE). However, the
excess observed by ACBAR is significantly smaller than the excess power at ell
> 2000 reported by the CBI experiment operating at 30 GHz. Therefore, while it
is unlikely that the CBI excess has a primordial origin; the combined ACBAR and
CBI results are consistent with the source of the CBI excess being either the
SZE or radio source contamination.Comment: Submitted to ApJ; Changed to apply a WMAP5-based calibration. The
cosmological parameter estimation has been updated to include WMAP
The Eighth Data Release of the Sloan Digital Sky Survey: First Data from SDSS-III
The Sloan Digital Sky Survey (SDSS) started a new phase in August 2008, with
new instrumentation and new surveys focused on Galactic structure and chemical
evolution, measurements of the baryon oscillation feature in the clustering of
galaxies and the quasar Ly alpha forest, and a radial velocity search for
planets around ~8000 stars. This paper describes the first data release of
SDSS-III (and the eighth counting from the beginning of the SDSS). The release
includes five-band imaging of roughly 5200 deg^2 in the Southern Galactic Cap,
bringing the total footprint of the SDSS imaging to 14,555 deg^2, or over a
third of the Celestial Sphere. All the imaging data have been reprocessed with
an improved sky-subtraction algorithm and a final, self-consistent photometric
recalibration and flat-field determination. This release also includes all data
from the second phase of the Sloan Extension for Galactic Understanding and
Evolution (SEGUE-2), consisting of spectroscopy of approximately 118,000 stars
at both high and low Galactic latitudes. All the more than half a million
stellar spectra obtained with the SDSS spectrograph have been reprocessed
through an improved stellar parameters pipeline, which has better determination
of metallicity for high metallicity stars.Comment: Astrophysical Journal Supplements, in press (minor updates from
submitted version
Gamma-based clustering via ordered means with application to gene-expression analysis
Discrete mixture models provide a well-known basis for effective clustering
algorithms, although technical challenges have limited their scope. In the
context of gene-expression data analysis, a model is presented that mixes over
a finite catalog of structures, each one representing equality and inequality
constraints among latent expected values. Computations depend on the
probability that independent gamma-distributed variables attain each of their
possible orderings. Each ordering event is equivalent to an event in
independent negative-binomial random variables, and this finding guides a
dynamic-programming calculation. The structuring of mixture-model components
according to constraints among latent means leads to strict concavity of the
mixture log likelihood. In addition to its beneficial numerical properties, the
clustering method shows promising results in an empirical study.Comment: Published in at http://dx.doi.org/10.1214/10-AOS805 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Search for Cosmic Microwave Background Anisotropies on Arcminute Scales with Bolocam
We have surveyed two science fields totaling one square degree with Bolocam
at 2.1 mm to search for secondary CMB anisotropies caused by the Sunyaev-
Zel'dovich effect (SZE). The fields are in the Lynx and Subaru/XMM SDS1 fields.
Our survey is sensitive to angular scales with an effective angular multipole
of l_eff = 5700 with FWHM_l = 2800 and has an angular resolution of 60
arcseconds FWHM. Our data provide no evidence for anisotropy. We are able to
constrain the level of total astronomical anisotropy, modeled as a flat
bandpower in C_l, with frequentist 68%, 90%, and 95% CL upper limits of 590,
760, and 830 uKCMB^2. We statistically subtract the known contribution from
primary CMB anisotropy, including cosmic variance, to obtain constraints on the
SZE anisotropy contribution. Now including flux calibration uncertainty, our
frequentist 68%, 90% and 95% CL upper limits on a flat bandpower in C_l are
690, 960, and 1000 uKCMB^2. When we instead employ the analytic spectrum
suggested by Komatsu and Seljak (2002), and account for the non-Gaussianity of
the SZE anisotropy signal, we obtain upper limits on the average amplitude of
their spectrum weighted by our transfer function of 790, 1060, and 1080
uKCMB^2. We obtain a 90% CL upper limit on sigma8, which normalizes the power
spectrum of density fluctuations, of 1.57. These are the first constraints on
anisotropy and sigma8 from survey data at these angular scales at frequencies
near 150 GHz.Comment: 68 pages, 17 figures, 2 tables, accepted for publication in Ap
Enhancing the Performance of Text Mining
The amount of text data produced in science, finance, social media, and medicine is growing at an unprecedented pace. The raw text data typically introduces major computational and analytical obstacles (e.g., extremely high dimensionality) to data mining and machine learning algorithms. Besides, the growth in the size of text data makes the search process more difficult for information retrieval systems, making retrieving relevant results to match the users’ search queries challenging. Moreover, the availability of text data in different languages creates the need to develop new methods to analyze multilingual topics to help policymakers in governmental and health systems to make risk decisions and to create policies to respond to public health crises, natural disasters, and political or social movements. The goal of this thesis is to develop new methods that handle computational and analytical problems for complex high-dimensional text data, develop a new query expansion approach to enhance the performance of information retrieval systems, and to present new techniques for analyzing multilingual topics using a translation service.
First, in the field of dimensionality reduction, we develop a new method for detecting and eliminating domain-based words. In this method, we use three different datasets and five classifiers for testing and evaluating the performance of our new approach before and after eliminating domain-based words. We compare the performance of our approach with other feature selection methods. We find that the new approach improves the performance of the binary classifier and reduces the dimensionality of the feature space by 90%. Also, our approach reduces the execution time of the classifier and outperforms one of the feature selection methods.
Second, in the field of information retrieval, we design and implement a method that integrates words from a current stream with external data sources in order to predict the occurrence of relevant words that have not yet appeared in the primary source. This algorithm enables the construction of new queries that effectively capture emergent events that a user may not have anticipated when initiating the data collection stream. The added value of using the external data sources appears when we have a stream of data and we want to predict something that has not yet happened instead of using only the stream that is limited to the available information at a specific time. We compare the performance of our approach with two alternative approaches. The first approach (static) expands user queries with words extracted from a probabilistic topic model of the stream. The second approach (emergent) reinforces user queries with emergent words extracted from the stream. We find that our method outperforms alternative approaches, exhibiting particularly good results in identifying future emergent topics.
Third, in the field of the multilingual text, we present a strategy to analyze the similarity between multilingual topics in English and Arabic tweets surrounding the 2020 COVID-19 pandemic. We make a descriptive comparison between topics in Arabic and English tweets about COVID-19 using tweets collected in the same way and filtered using the same keywords. We analyze Twitter’s discussion to understand the evolution of topics over time and reveal topic similarity among tweets across the datasets. We use probabilistic topic modeling to identify and extract the key topics of Twitter’s discussion in Arabic and English tweets. We use two methods to analyze the similarity between multilingual topics. The first method (full-text topic modeling approach) translates all text to English and then runs topic modeling to find similar topics. The second method (term-based topic modeling approach) runs topic modeling on the text before translation then translates the top keywords in each topic to find similar topics. We find similar topics related to COVID-19 pandemic covered in English and Arabic tweets for certain time intervals. Results indicate that the term-based topic modeling approach can reduce the cost compared to the full-text topic modeling approach and still have comparable results in finding similar topics. The computational time to translate the terms is significantly lower than the translation of the full text
- …