63 research outputs found
The impact of different negative training data on regulatory sequence predictions
Regulatory regions, like promoters and enhancers, cover an estimated 5-15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization
CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores
Background: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies.
Methods: It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants.
Results: We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance.
Conclusions: While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction
Chiral photoelectron angular distributions from ionization of achiral atomic and molecular species
We show that the combination of two achiral components - atomic or molecular
target plus a circularly polarized photon - can yield chirally structured
photoelectron angular distributions. For photoionization of CO, the angular
distribution of carbon K-shell photoelectrons is chiral when the molecular axis
is neither perpendicular nor (anti-)parallel to the light propagation axis. In
photo-double-ionization of He, the distribution of one electron is chiral, if
the other electron is oriented like the molecular axis in the former case and
if the electrons are distinguishable by their energy. In both scenarios, the
circularly polarized photon defines a plane with a sense of rotation and an
additional axis is defined by the CO molecule or one electron. This is
sufficient to establish an unambiguous coordinate frame of well-defined
handedness. To produce a chirally structured electron angular distribution,
such a coordinate frame is necessary, but not sufficient. We show that
additional electron-electron interaction or scattering processes are needed to
create the chiral angular distribution
Kinematically complete experimental study of Compton scattering at helium atoms near the ionization threshold
Compton scattering is one of the fundamental interaction processes of light
with matter. Already upon its discovery [1] it was described as a billiard-type
collision of a photon kicking a quasi-free electron. With decreasing photon
energy, the maximum possible momentum transfer becomes so small that the
corresponding energy falls below the binding energy of the electron. Then
ionization by Compton scattering becomes an intriguing quantum phenomenon. Here
we report a kinematically complete experiment on Compton scattering at helium
atoms below that threshold. We determine the momentum correlations of the
electron, the recoiling ion, and the scattered photon in a coincidence
experiment finding that electrons are not only emitted in the direction of the
momentum transfer, but that there is a second peak of ejection to the backward
direction. This finding links Compton scattering to processes as ionization by
ultrashort optical pulses [2], electron impact ionization [3,4], ion impact
ionization [5,6], and neutron scattering [7] where similar momentum patterns
occur.Comment: 7 pages, 4 figure
Observation of enhanced chiral asymmetries in the inner-shell photoionization of uniaxially oriented methyloxirane enantiomers
Most large molecules are chiral in their structure: they exist as two
enantiomers, which are mirror images of each other. Whereas the rovibronic
sublevels of two enantiomers are almost identical, it turns out that the
photoelectric effect is sensitive to the absolute configuration of the ionized
enantiomer - an effect termed Photoelectron Circular Dichroism (PECD). Our
comprehensive study demonstrates that the origin of PECD can be found in the
molecular frame electron emission pattern connecting PECD to other fundamental
photophysical effects as the circular dichroism in angular distributions
(CDAD). Accordingly, orienting a chiral molecule in space enhances the PECD by
a factor of about 10
Observation of Photoion Backward Emission in Photoionization of He and N2
We experimentally investigate the effects of the linear photon momentum on
the momentum distributions of photoions and photoelectrons generated in
one-photon ionization in an energy range of 300 eV 40 keV.
Our results show that for each ionization event the photon momentum is imparted
onto the photoion, which is essentially the system's center of mass.
Nevertheless, the mean value of the ion momentum distribution along the light
propagation direction is backward-directed by times the photon momentum.
These results experimentally confirm a 90 year old prediction.Comment: 5 pages, 3 figure
Revealing the two-electron cusp in the ground states of He and H2 via quasifree double photoionization
We report on kinematically complete measurements and ab initio nonperturbative calculations of double ionization of He and
H 2 by a single 800 eV circularly polarized photon. We confirm the quasifree mechanism of photoionization for
H 2 and show how it originates from the two-electron cusp in the ground state of a two-electron target. Our approach establishes a method for mapping electrons relative to each other and provides valuable insight into photoionization beyond the electric-dipole approximation.We acknowledge support by DFG and
BMBF
Recommended from our members
Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay
The integrative analysis of highâ throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five diseaseâ associated human enhancers and nine diseaseâ associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cellâ types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of diseaseâ associated genetic variation.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/151884/1/humu23797_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/151884/2/humu23797.pd
High-Energy Molecular-Frame Photoelectron Angular Distributions: A Molecular Bond-Length Ruler
We present an experimental and theoretical study of core-level ionization of
small hetero- and homo-nuclear molecules employing circularly polarized light
and address molecular-frame photoelectron angular distributions in the light's
polarization plane (CP-MFPADs). We find that the main forward-scattering peaks
of CP-MFPADs are slightly tilted with respect to the molecular axis. We show
that this tilt angle can be directly connected to the molecular bond length by
a simple, universal formula. The extraction of the bond length becomes more
accurate as the photoelectron energy is increased. We apply the derived formula
to several examples of CP-MFPADs of C 1s and O 1s photoelectrons of CO, which
have been measured experimentally or obtained by means of ab initio modeling.
The photoelectron kinetic energies range from 70 to 1000~eV and the extracted
bond lengths agree well with the known bond length of the CO molecule in its
ground state. In addition, we discuss the influence of the back-scattering
contribution that is superimposed over the analyzed forward-scattering peak in
case of homo-nuclear diatomic molecules as N
- …