1,299 research outputs found
A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews
The efficiency of natural language processing has improved dramatically with
the advent of machine learning models, particularly neural network-based
solutions. However, some tasks are still challenging, especially when
considering specific domains. In this paper, we present a cloud-based system
that can extract insights from customer reviews using machine learning methods
integrated into a pipeline. For topic modeling, our composite model uses
transformer-based neural networks designed for natural language processing,
vector embedding-based keyword extraction, and clustering. The elements of our
model have been integrated and further developed to meet better the
requirements of efficient information extraction, topic modeling of the
extracted information, and user needs. Furthermore, our system can achieve
better results than this task's existing topic modeling and keyword extraction
solutions. Our approach is validated and compared with other state-of-the-art
methods using publicly available datasets for benchmarking
Progress and Poverty—1965 Version
The first hard X-ray laser, the Linac Coherent Light Source (LCLS), produces 120 shots per second. Particles injected into the X-ray beam are hit randomly and in unknown orientations by the extremely intense X-ray pulses, where the femtosecond-duration X-ray pulses diffract from the sample before the particle structure is significantly changed even though the sample is ultimately destroyed by the deposited X-ray energy. Single particle X-ray diffraction experiments generate data at the FEL repetition rate, resulting in more than 400,000 detector readouts in an hour, the data stream during an experiment contains blank frames mixed with hits on single particles, clusters and contaminants. The diffraction signal is generally weak and it is superimposed on a low but continually fluctuating background signal, originating from photon noise in the beam line and electronic noise from the detector. Meanwhile, explosion of the sample creates fragments with a characteristic signature. Here, we describe methods based on rapid image analysis combined with ion Time-of-Flight (ToF) spectroscopy of the fragments to achieve an efficient, automated and unsupervised sorting of diffraction data. The studies described here form a basis for the development of real-time frame rejection methods, e. g. for the European XFEL, which is expected to produce 100 million pulses per hour. (C)2014 Optical Society of Americ
Auger Electron Cascades in Water and Ice
Secondary electron cascades can induce significant ionisation in condensed
matter due to electron-atom collisions. This is of interest in the context of
diffraction and imaging using X-rays, where radiation damage is the main
limiting factor for achieving high resolution data. Here we present new results
on electron-induced damage on liquid water and ice, from the simulation of
Auger electron cascades. We have compared our theoretical estimations to the
available experimental data on elastic and inelastic electron-molecule
interactions for water and found the theoretical results for elastic cross
sections to be in very good agreement with experiment. As a result of the
cascade we find that the average number of secondary electrons after 100 fs in
ice is about 25, slightly higher than in water, where it is about 20. The
difference in damage between ice and water is discussed in the context of
sample handling for biomolecular systems.Comment: 19 pages, 8 figures. Includes slight corrections to the version
submitted for publicatio
Structural variability and the incoherent addition of scattered intensities in single-particle diffraction
X-ray lasers may allow structural studies on single particles and biomolecules without crystalline periodicity in the samples. We examine here the effect of sample dynamics as a source of structural heterogeneity on the resolution of the reconstructed image of a small protein molecule. Structures from molecular-dynamics simulations of lysozyme were sampled and aligned. These structures were then used to calculate diffraction patterns corresponding to different dynamic states. The patterns were incoherently summed and the resulting data set was phased using the oversampling method. Reconstructed images of hydrated and dehydrated lysozyme gave resolutions of 3.7 Å and 7.6 Å, respectively. These are significantly worse than the root-mean-square deviation of the hydrated ͑2.7 Å for all atoms and 1.45 Å for C-␣ positions͒ or dehydrated ͑3.7 Å for all atoms and 2.5 Å for C-␣ positions͒ structures. The noise introduced by structural dynamics and incoherent addition of dissimilar structures restricts the maximum resolution to be expected from direct image reconstruction of dynamic systems. A way of potentially reducing this effect is by grouping dynamic structures into distinct structural substates and solving them separately
A statistical approach to detect protein complexes at X-ray free electron laser facilities
The Flash X-ray Imaging (FXI) technique, under development at X-ray free electron lasers (XFEL), aims to achieve structure determination based on diffraction from individual macromolecular complexes. We report an FXI study on the first protein complex-RNA polymerase II-ever injected at an XFEL. A successful 3D reconstruction requires a high number of observations of the sample in various orientations. The measured diffraction signal for many shots can be comparable to background. Here we present a robust and highly sensitive hit-identification method based on automated modeling of beamline background through photon statistics. It can operate at controlled false positive hit-rate of 3 x10(-5). We demonstrate its power in determining particle hits and validate our findings against an independent hit-identification approach based on ion time-of-flight spectra. We also validate the advantages of our method over simpler hit-identification schemes via tests on other samples and using computer simulations, showing a doubled hit-identification power
- …