2,058 research outputs found

    A Spectral Algorithm for Latent Dirichlet Allocation

    Full text link
    The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden. We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on k×kk\times k matrices, where kk is the number of latent factors (e.g. the number of topics), rather than in the dd-dimensional observed space (typically d≫kd \gg k).Comment: Changed title to match conference version, which appears in Advances in Neural Information Processing Systems 25, 201

    A Spectral Algorithm for Latent Dirichlet Allocation

    Get PDF
    Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space

    Timing of preemptive vascular access placement: do we understand the natural history of advanced CKD?: an observational study

    Get PDF
    BACKGROUND: Little is known about the targets and expectations of practicing nephrologists with regard to timing of preemptive AV access surgery and how these relate to actual observed practice patterns in clinical care. METHODS: We administered a 8-question survey to assess nephrologists’ expectations for preemptive vascular access placement to 53 practicing nephrologists in California. We performed a retrospective chart review of 116 patients who underwent preemptive vascular access placement at a large academic medical center and examined progression to ESRD. RESULTS: According to our survey of nephrologists, most aimed to have preemptive vascular access created about 6 months prior to start of ESRD or when the chances of ESRD within the next year is two-thirds or greater. The estimated GFR level at which they believe match these conditions is approximately 18 ml/min/1.73 m(2). Among the 116 patients with CKD who underwent preemptive vascular access creation, the mean estimated GFR at the time of access creation was 16.1 (6.8) ml/min/1.73 m(2). Only 57 out of the 116 patients (49.1%) patients initiated maintenance HD within 1 year after surgery. CONCLUSIONS: In our study, most nephrologists aim for preemptive vascular access surgery approximately 6 months prior to the start of HD. However in fact, only approximately 50% of patients who underwent preemptive vascular access surgery started HD within 1 year. Better tools are needed to predict the natural history of advanced CKD

    Stochastic convex optimization with bandit feedback

    Full text link
    This paper addresses the problem of minimizing a convex, Lipschitz function ff over a convex, compact set \xset under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x)f(x) at any query point x \in \xset. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs \otil(\poly(d)\sqrt{T}) regret. Since any algorithm has regret at least Ω(T)\Omega(\sqrt{T}) on this problem, our algorithm is optimal in terms of the scaling with TT

    CFD analysis of turbopump volutes

    Get PDF
    An effort is underway to develop a procedure for the regular use of CFD analysis in the design of turbopump volutes. Airflow data to be taken at NASA Marshall will be used to validate the CFD code and overall procedure. Initial focus has been on preprocessing (geometry creation, translation, and grid generation). Volute geometries have been acquired electronically and imported into the CATIA CAD system and RAGGS (Rockwell Automated Grid Generation System) via the IGES standard. An initial grid topology has been identified and grids have been constructed for turbine inlet and discharge volutes. For CFD analysis of volutes to be used regularly, a procedure must be defined to meet engineering design needs in a timely manner. Thus, a compromise must be established between making geometric approximations, the selection of grid topologies, and possible CFD code enhancements. While the initial grid developed approximated the volute tongue with a zero thickness, final computations should more accurately account for the geometry in this region. Additionally, grid topologies will be explored to minimize skewness and high aspect ratio cells that can affect solution accuracy and slow code convergence. Finally, as appropriate, code modifications will be made to allow for new grid topologies in an effort to expedite the overall CFD analysis process

    What Makes Theatrical Performances Successful in China's Tourism Industry?

    Get PDF
    This study aims to explore the factors affecting the success of a popular tourist product, namely, theatrical performance, within the context of China's tourism industry and develop a model based on previously successful productions. Using qualitative software, 22 Chinese-language articles on theatrical performances are analyzed to generate a list of success factors, classified as internal and external. The internal factors are storyline and performing, market positioning and marketing strategy, investment and financial support, operation and management, performing team, outdoor venue, indoor/outdoor stage supporting facilities, continuous improvement, and production team. The external factors are collaboration between cultural industries and local tourism, government support, privatization, and social and cultural effect. This study also provides suggestions for the future development of theatrical performances in China

    Quantitative chemical mapping of InGaN quantum wells from calibrated high-angle annular dark field micrographs

    Get PDF
    We present a simple and robust method to acquire quantitative maps of compositional fluctuations in nanostructures from low magnification high-angle annular dark field (HAADF) micrographs calibrated by energy-dispersive X-ray (EDX) spectroscopy in scanning transmission electron microscopy (STEM) mode. We show that a nonuniform background in HAADF-STEM micrographs can be eliminated, to a first approximation, by use of a suitable analytic function. The uncertainty in probe position when collecting an EDX spectrum renders the calibration of HAADF-STEM micrographs indirect, and a statistical approach has been developed to determine the position with confidence. Our analysis procedure, presented in a flowchart to facilitate the successful implementation of the method by users, was applied to discontinuous InGaN/GaN quantum wells in order to obtain quantitative determinations of compositional fluctuations on the nanoscale

    In silico discovery of blood cell macromolecular associations

    Get PDF
    Background Physical molecular interactions are the basis of intracellular signalling and gene regulatory networks, and comprehensive, accessible databases are needed for their discovery. Highly correlated transcripts may reflect important functional associations, but identification of such associations from primary data are cumbersome. We have constructed and adapted a user-friendly web application to discover and identify putative macromolecular associations in human peripheral blood based on significant correlations at the transcriptional level. Methods The blood transcriptome was characterized by quantification of 17,328 RNA species, including 341 mature microRNAs in 105 clinically well-characterized postmenopausal women. Intercorrelation of detected transcripts signal levels generated a matrix with > 150 million correlations recognizing the human blood RNA interactome. The correlations with calculated adjusted p-values were made easily accessible by a novel web application. Results We found that significant transcript correlations within the giant matrix reflect experimentally documented interactions involving select ubiquitous blood relevant transcription factors (CREB1, GATA1, and the glucocorticoid receptor (GR, NR3C1)). Their responsive genes recapitulated up to 91% of these as significant correlations, and were replicated in an independent cohort of 1204 individual blood samples from the Framingham Heart Study. Furthermore, experimentally documented mRNAs/miRNA associations were also reproduced in the matrix, and their predicted functional co-expression described. The blood transcript web application is available at http://app.uio.no/med/klinmed/correlation-browser/blood/index.php and works on all commonly used internet browsers. Conclusions Using in silico analyses and a novel web application, we found that correlated blood transcripts across 105 postmenopausal women reflected experimentally proven molecular associations. Furthermore, the associations were reproduced in a much larger and more heterogeneous cohort and should therefore be generally representative. The web application lends itself to be a useful hypothesis generating tool for identification of regulatory mechanisms in complex biological data sets.publishedVersio
    • 

    corecore