63 research outputs found

    The impact of different negative training data on regulatory sequence predictions

    Get PDF
    Regulatory regions, like promoters and enhancers, cover an estimated 5-15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization

    CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores

    Get PDF
    Background: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods: It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results: We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions: While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction

    Chiral photoelectron angular distributions from ionization of achiral atomic and molecular species

    Full text link
    We show that the combination of two achiral components - atomic or molecular target plus a circularly polarized photon - can yield chirally structured photoelectron angular distributions. For photoionization of CO, the angular distribution of carbon K-shell photoelectrons is chiral when the molecular axis is neither perpendicular nor (anti-)parallel to the light propagation axis. In photo-double-ionization of He, the distribution of one electron is chiral, if the other electron is oriented like the molecular axis in the former case and if the electrons are distinguishable by their energy. In both scenarios, the circularly polarized photon defines a plane with a sense of rotation and an additional axis is defined by the CO molecule or one electron. This is sufficient to establish an unambiguous coordinate frame of well-defined handedness. To produce a chirally structured electron angular distribution, such a coordinate frame is necessary, but not sufficient. We show that additional electron-electron interaction or scattering processes are needed to create the chiral angular distribution

    Kinematically complete experimental study of Compton scattering at helium atoms near the ionization threshold

    Full text link
    Compton scattering is one of the fundamental interaction processes of light with matter. Already upon its discovery [1] it was described as a billiard-type collision of a photon kicking a quasi-free electron. With decreasing photon energy, the maximum possible momentum transfer becomes so small that the corresponding energy falls below the binding energy of the electron. Then ionization by Compton scattering becomes an intriguing quantum phenomenon. Here we report a kinematically complete experiment on Compton scattering at helium atoms below that threshold. We determine the momentum correlations of the electron, the recoiling ion, and the scattered photon in a coincidence experiment finding that electrons are not only emitted in the direction of the momentum transfer, but that there is a second peak of ejection to the backward direction. This finding links Compton scattering to processes as ionization by ultrashort optical pulses [2], electron impact ionization [3,4], ion impact ionization [5,6], and neutron scattering [7] where similar momentum patterns occur.Comment: 7 pages, 4 figure

    Observation of enhanced chiral asymmetries in the inner-shell photoionization of uniaxially oriented methyloxirane enantiomers

    Full text link
    Most large molecules are chiral in their structure: they exist as two enantiomers, which are mirror images of each other. Whereas the rovibronic sublevels of two enantiomers are almost identical, it turns out that the photoelectric effect is sensitive to the absolute configuration of the ionized enantiomer - an effect termed Photoelectron Circular Dichroism (PECD). Our comprehensive study demonstrates that the origin of PECD can be found in the molecular frame electron emission pattern connecting PECD to other fundamental photophysical effects as the circular dichroism in angular distributions (CDAD). Accordingly, orienting a chiral molecule in space enhances the PECD by a factor of about 10

    Observation of Photoion Backward Emission in Photoionization of He and N2

    Full text link
    We experimentally investigate the effects of the linear photon momentum on the momentum distributions of photoions and photoelectrons generated in one-photon ionization in an energy range of 300 eV  Eγ \leq~E_\gamma~\leq 40 keV. Our results show that for each ionization event the photon momentum is imparted onto the photoion, which is essentially the system's center of mass. Nevertheless, the mean value of the ion momentum distribution along the light propagation direction is backward-directed by 3/5-3/5 times the photon momentum. These results experimentally confirm a 90 year old prediction.Comment: 5 pages, 3 figure

    Revealing the two-electron cusp in the ground states of He and H2 via quasifree double photoionization

    Get PDF
    We report on kinematically complete measurements and ab initio nonperturbative calculations of double ionization of He and H 2 by a single 800 eV circularly polarized photon. We confirm the quasifree mechanism of photoionization for H 2 and show how it originates from the two-electron cusp in the ground state of a two-electron target. Our approach establishes a method for mapping electrons relative to each other and provides valuable insight into photoionization beyond the electric-dipole approximation.We acknowledge support by DFG and BMBF

    High-Energy Molecular-Frame Photoelectron Angular Distributions: A Molecular Bond-Length Ruler

    Full text link
    We present an experimental and theoretical study of core-level ionization of small hetero- and homo-nuclear molecules employing circularly polarized light and address molecular-frame photoelectron angular distributions in the light's polarization plane (CP-MFPADs). We find that the main forward-scattering peaks of CP-MFPADs are slightly tilted with respect to the molecular axis. We show that this tilt angle can be directly connected to the molecular bond length by a simple, universal formula. The extraction of the bond length becomes more accurate as the photoelectron energy is increased. We apply the derived formula to several examples of CP-MFPADs of C 1s and O 1s photoelectrons of CO, which have been measured experimentally or obtained by means of ab initio modeling. The photoelectron kinetic energies range from 70 to 1000~eV and the extracted bond lengths agree well with the known bond length of the CO molecule in its ground state. In addition, we discuss the influence of the back-scattering contribution that is superimposed over the analyzed forward-scattering peak in case of homo-nuclear diatomic molecules as N2_2
    corecore