638 research outputs found

    Modeling reactivity to biological macromolecules with a deep multitask network

    Get PDF
    Most small-molecule drug candidates fail before entering the market, frequently because of unexpected toxicity. Often, toxicity is detected only late in drug development, because many types of toxicities, especially idiosyncratic adverse drug reactions (IADRs), are particularly hard to predict and detect. Moreover, drug-induced liver injury (DILI) is the most frequent reason drugs are withdrawn from the market and causes 50% of acute liver failure cases in the United States. A common mechanism often underlies many types of drug toxicities, including both DILI and IADRs. Drugs are bioactivated by drug-metabolizing enzymes into reactive metabolites, which then conjugate to sites in proteins or DNA to form adducts. DNA adducts are often mutagenic and may alter the reading and copying of genes and their regulatory elements, causing gene dysregulation and even triggering cancer. Similarly, protein adducts can disrupt their normal biological functions and induce harmful immune responses. Unfortunately, reactive metabolites are not reliably detected by experiments, and it is also expensive to test drug candidates for potential to form DNA or protein adducts during the early stages of drug development. In contrast, computational methods have the potential to quickly screen for covalent binding potential, thereby flagging problematic molecules and reducing the total number of necessary experiments. Here, we train a deep convolution neural networkthe XenoSite reactivity modelusing literature data to accurately predict both sites and probability of reactivity for molecules with glutathione, cyanide, protein, and DNA. On the site level, cross-validated predictions had area under the curve (AUC) performances of 89.8% for DNA and 94.4% for protein. Furthermore, the model separated molecules electrophilically reactive with DNA and protein from nonreactive molecules with cross-validated AUC performances of 78.7% and 79.8%, respectively. On both the site- and molecule-level, the model’s performances significantly outperformed reactivity indices derived from quantum simulations that are reported in the literature. Moreover, we developed and applied a selectivity score to assess preferential reactions with the macromolecules as opposed to the common screening traps. For the entire data set of 2803 molecules, this approach yielded totals of 257 (9.2%) and 227 (8.1%) molecules predicted to be reactive only with DNA and protein, respectively, and hence those that would be missed by standard reactivity screening experiments. Site of reactivity data is an underutilized resource that can be used to not only predict if molecules are reactive, but also show where they might be modified to reduce toxicity while retaining efficacy. The XenoSite reactivity model is available at http://swami.wustl.edu/xenosite/p/reactivity

    Simple data-driven context-sensitive lemmatization

    Get PDF
    Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text. We treat lemmatization as a classification task for Machine Learning, and automatically induce class labels. We achieve this by computing a Shortest Edit Script (SES) between reversed input and output strings. A SES describes the transformations that have to be applied to the input string (word form) in order to convert it to the output string (lemma). Our approach shows competitive performance on a range of typologically different languages

    On new maximal supergravity and its BPS domain-walls

    Get PDF
    We revise the SU(3)-invariant sector of N=8\mathcal{N}=8 supergravity with dyonic SO(8) gaugings. By using the embedding tensor formalism, analytic expressions for the scalar potential, superpotential(s) and fermion mass terms are obtained as a function of the electromagnetic phase ω\omega and the scalars in the theory. Equipped with these results, we explore non-supersymmetric AdS critical points at ω0\omega \neq 0 for which perturbative stability could not be analysed before. The ω\omega-dependent superpotential is then used to derive first-order flow equations and obtain new BPS domain-wall solutions at ω0\omega \neq 0. We numerically look at steepest-descent paths motivated by the (conjectured) RG flows.Comment: 40 pages (30 pages + appendices), 3 tables, 6 figures. v2: References added and discussion in section 4.2 clarified. v3: References added, published version. v4: Fixed typo

    Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora

    Get PDF
    Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. The aim of this thesis is to develop standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. We want to morphologically tag our Arabic Corpus, but evaluation of existing morphological analyzers has highlighted shortcomings and shown that more research is required. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis – particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA – Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior knowledge broad-coverage lexical resources; the SALMA – ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA –Tag Set is a theory standard for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent. The SALMA – Tagger has been used to lemmatize the 176-million words Arabic Internet Corpus. It has been proposed as a language-engineering toolkit for Arabic lexicography and for phonetically annotating the Qur’an by syllable and primary stress information, as well as, fine-grained morphological tagging

    Advances in structure elucidation of small molecules using mass spectrometry

    Get PDF
    The structural elucidation of small molecules using mass spectrometry plays an important role in modern life sciences and bioanalytical approaches. This review covers different soft and hard ionization techniques and figures of merit for modern mass spectrometers, such as mass resolving power, mass accuracy, isotopic abundance accuracy, accurate mass multiple-stage MS(n) capability, as well as hybrid mass spectrometric and orthogonal chromatographic approaches. The latter part discusses mass spectral data handling strategies, which includes background and noise subtraction, adduct formation and detection, charge state determination, accurate mass measurements, elemental composition determinations, and complex data-dependent setups with ion maps and ion trees. The importance of mass spectral library search algorithms for tandem mass spectra and multiple-stage MS(n) mass spectra as well as mass spectral tree libraries that combine multiple-stage mass spectra are outlined. The successive chapter discusses mass spectral fragmentation pathways, biotransformation reactions and drug metabolism studies, the mass spectral simulation and generation of in silico mass spectra, expert systems for mass spectral interpretation, and the use of computational chemistry to explain gas-phase phenomena. A single chapter discusses data handling for hyphenated approaches including mass spectral deconvolution for clean mass spectra, cheminformatics approaches and structure retention relationships, and retention index predictions for gas and liquid chromatography. The last section reviews the current state of electronic data sharing of mass spectra and discusses the importance of software development for the advancement of structure elucidation of small molecules

    Acoustic seafloor classification using the Weyl transform of multibeam echosounder backscatter mosaic

    Get PDF
    The use of multibeam echosounder systems (MBES) for detailed seafloor mapping is increasing at a fast pace. Due to their design, enabling continuous high-density measurements and the coregistration of seafloor’s depth and reflectivity, MBES has become a fundamental instrument in the advancing field of acoustic seafloor classification (ASC). With these data becoming available, recent seafloor mapping research focuses on the interpretation of the hydroacoustic data and automated predictive modeling of seafloor composition. While a methodological consensus on which seafloor sediment classification algorithm and routine does not exist in the scientific community, it is expected that progress will occur through the refinement of each stage of the ASC pipeline: ranging from the data acquisition to the modeling phase. This research focuses on the stage of the feature extraction; the stage wherein the spatial variables used for the classification are, in this case, derived from the MBES backscatter data. This contribution explored the sediment classification potential of a textural feature based on the recently introduced Weyl transform of 300 kHz MBES backscatter imagery acquired over a nearshore study site in Belgian Waters. The goodness of the Weyl transform textural feature for seafloor sediment classification was assessed in terms of cluster separation of Folk’s sedimentological categories (4-class scheme). Class separation potential was quantified at multiple spatial scales by cluster silhouette coefficients. Weyl features derived from MBES backscatter data were found to exhibit superior thematic class separation compared to other well-established textural features, namely: (1) First-order Statistics, (2) Gray Level Co-occurrence Matrices (GLCM), (3) Wavelet Transform and (4) Local Binary Pattern (LBP). Finally, by employing a Random Forest (RF) categorical classifier, the value of the proposed textural feature for seafloor sediment mapping was confirmed in terms of global and by-class classification accuracies, highest for models based on the backscatter Weyl features. Further tests on different backscatter datasets and sediment classification schemes are required to further elucidate the use of the Weyl transform of MBES backscatter imagery in the context of seafloor mapping
    corecore