1,299 research outputs found

    A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews

    Full text link
    The efficiency of natural language processing has improved dramatically with the advent of machine learning models, particularly neural network-based solutions. However, some tasks are still challenging, especially when considering specific domains. In this paper, we present a cloud-based system that can extract insights from customer reviews using machine learning methods integrated into a pipeline. For topic modeling, our composite model uses transformer-based neural networks designed for natural language processing, vector embedding-based keyword extraction, and clustering. The elements of our model have been integrated and further developed to meet better the requirements of efficient information extraction, topic modeling of the extracted information, and user needs. Furthermore, our system can achieve better results than this task's existing topic modeling and keyword extraction solutions. Our approach is validated and compared with other state-of-the-art methods using publicly available datasets for benchmarking

    MIR phasing using merohedrally twinned crystals

    Full text link

    Progress and Poverty—1965 Version

    Get PDF
    The first hard X-ray laser, the Linac Coherent Light Source (LCLS), produces 120 shots per second. Particles injected into the X-ray beam are hit randomly and in unknown orientations by the extremely intense X-ray pulses, where the femtosecond-duration X-ray pulses diffract from the sample before the particle structure is significantly changed even though the sample is ultimately destroyed by the deposited X-ray energy. Single particle X-ray diffraction experiments generate data at the FEL repetition rate, resulting in more than 400,000 detector readouts in an hour, the data stream during an experiment contains blank frames mixed with hits on single particles, clusters and contaminants. The diffraction signal is generally weak and it is superimposed on a low but continually fluctuating background signal, originating from photon noise in the beam line and electronic noise from the detector. Meanwhile, explosion of the sample creates fragments with a characteristic signature. Here, we describe methods based on rapid image analysis combined with ion Time-of-Flight (ToF) spectroscopy of the fragments to achieve an efficient, automated and unsupervised sorting of diffraction data. The studies described here form a basis for the development of real-time frame rejection methods, e. g. for the European XFEL, which is expected to produce 100 million pulses per hour. (C)2014 Optical Society of Americ

    Auger Electron Cascades in Water and Ice

    Full text link
    Secondary electron cascades can induce significant ionisation in condensed matter due to electron-atom collisions. This is of interest in the context of diffraction and imaging using X-rays, where radiation damage is the main limiting factor for achieving high resolution data. Here we present new results on electron-induced damage on liquid water and ice, from the simulation of Auger electron cascades. We have compared our theoretical estimations to the available experimental data on elastic and inelastic electron-molecule interactions for water and found the theoretical results for elastic cross sections to be in very good agreement with experiment. As a result of the cascade we find that the average number of secondary electrons after 100 fs in ice is about 25, slightly higher than in water, where it is about 20. The difference in damage between ice and water is discussed in the context of sample handling for biomolecular systems.Comment: 19 pages, 8 figures. Includes slight corrections to the version submitted for publicatio

    Structural variability and the incoherent addition of scattered intensities in single-particle diffraction

    Get PDF
    X-ray lasers may allow structural studies on single particles and biomolecules without crystalline periodicity in the samples. We examine here the effect of sample dynamics as a source of structural heterogeneity on the resolution of the reconstructed image of a small protein molecule. Structures from molecular-dynamics simulations of lysozyme were sampled and aligned. These structures were then used to calculate diffraction patterns corresponding to different dynamic states. The patterns were incoherently summed and the resulting data set was phased using the oversampling method. Reconstructed images of hydrated and dehydrated lysozyme gave resolutions of 3.7 Å and 7.6 Å, respectively. These are significantly worse than the root-mean-square deviation of the hydrated ͑2.7 Å for all atoms and 1.45 Å for C-␣ positions͒ or dehydrated ͑3.7 Å for all atoms and 2.5 Å for C-␣ positions͒ structures. The noise introduced by structural dynamics and incoherent addition of dissimilar structures restricts the maximum resolution to be expected from direct image reconstruction of dynamic systems. A way of potentially reducing this effect is by grouping dynamic structures into distinct structural substates and solving them separately

    A statistical approach to detect protein complexes at X-ray free electron laser facilities

    Get PDF
    The Flash X-ray Imaging (FXI) technique, under development at X-ray free electron lasers (XFEL), aims to achieve structure determination based on diffraction from individual macromolecular complexes. We report an FXI study on the first protein complex-RNA polymerase II-ever injected at an XFEL. A successful 3D reconstruction requires a high number of observations of the sample in various orientations. The measured diffraction signal for many shots can be comparable to background. Here we present a robust and highly sensitive hit-identification method based on automated modeling of beamline background through photon statistics. It can operate at controlled false positive hit-rate of 3 x10(-5). We demonstrate its power in determining particle hits and validate our findings against an independent hit-identification approach based on ion time-of-flight spectra. We also validate the advantages of our method over simpler hit-identification schemes via tests on other samples and using computer simulations, showing a doubled hit-identification power
    corecore