261 research outputs found

    Verification of high-level transformations with inductive refinement types

    Get PDF
    International audienceHigh-level transformation languages like Rascal include expressive features for manipulating large abstract syntax trees: first-class traversals, expressive pattern matching, backtrack-ing and generalized iterators. We present the design and implementation of an abstract interpretation tool, Rabit, for verifying inductive type and shape properties for transformations written in such languages. We describe how to perform abstract interpretation based on operational semantics, specifically focusing on the challenges arising when analyzing the expressive traversals and pattern matching. Finally, we evaluate Rabit on a series of transformations (normaliza-tion, desugaring, refactoring, code generators, type inference, etc.) showing that we can effectively verify stated properties. CCS Concepts • Software and its engineering → General programming languages; • Social and professional topics → History of programming languages

    Isolation, identification, and complete genome sequence of a bovine adenovirus type 3 from cattle in China

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bovine adenovirus type 3 (BAV-3) belongs to the <it>Mastadenovirus </it>genus of the family <it>Adenoviridae </it>and is involved in respiratory and enteric infections of calves. The isolation of BAV-3 has not been reported prior to this study in China. In 2009, there were many cases in cattle showing similar clinical signs to BAV-3 infection and a virus strain, showing cytopathic effect in Madin-Darby bovine kidney cells, was isolated from a bovine nasal swab collected from feedlot cattle in Heilongjiang Province, China. The isolate was confirmed as a bovine adenovirus type 3 by PCR and immunofluorescence assay, and named as HLJ0955. So far only the complete genome sequence of prototype of BAV-3 WBR-1 strain has been reported. In order to further characterize the Chinese isolate HLJ0955, the complete genome sequence of HLJ0955 was determined.</p> <p>Results</p> <p>The size of the genome of the Chinese isolate HLJ0955 is 34,132 nucleotides in length with a G+C content of 53.6%. The coding sequences for gene regions of HLJ0955 isolate were similar to the prototype of BAV-3 WBR-1 strain, with 80.0-98.6% nucleotide and 87.5-98.8% amino acid identities. The genome of HLJ0955 strain contains 16 regions and four deletions in inverted terminal repeats, E1B region and E4 region, respectively. The complete genome and DNA binding protein gene based phylogenetic analysis with other adenoviruses were performed and the results showed that HLJ0955 isolate belonged to BAV-3 and clustered within the <it>Mastadenovirus </it>genus of the family <it>Adenoviridae</it>.</p> <p>Conclusions</p> <p>This is the first study to report the isolation and molecular characterization of BAV-3 from cattle in China. The phylogenetic analysis performed in this study supported the use of the DNA binding protein gene of adenovirus as an appropriate subgenomic target for the classification of different genuses of the family <it>Adenoviridae </it>on the molecular basis. Meanwhile, a large-scale pathogen and serological epidemiological investigations for BVA-3 infection might be carried out in cattle in China. This report will be a good beginning for further studies on BAV-3 in China.</p

    Potential of the Julia programming language for high energy physics computing

    Full text link
    Research in high energy physics (HEP) requires huge amounts of computing and storage, putting strong constraints on the code speed and resource usage. To meet these requirements, a compiled high-performance language is typically used; while for physicists, who focus on the application when developing the code, better research productivity pleads for a high-level programming language. A popular approach consists of combining Python, used for the high-level interface, and C++, used for the computing intensive part of the code. A more convenient and efficient approach would be to use a language that provides both high-level programming and high-performance. The Julia programming language, developed at MIT especially to allow the use of a single language in research activities, has followed this path. In this paper the applicability of using the Julia language for HEP research is explored, covering the different aspects that are important for HEP code development: runtime performance, handling of large projects, interface with legacy code, distributed computing, training, and ease of programming. The study shows that the HEP community would benefit from a large scale adoption of this programming language. The HEP-specific foundation libraries that would need to be consolidated are identifiedComment: 32 pages, 5 figures, 4 table

    Clinical Characteristics and Neuroanatomical Predictors of Acute Antidepressant Outcome for Patients with Comorbid Depression and Mild Cognitive Impairment

    Full text link
    Background: Older adults presenting with both a depressive disorder (DEP) and cognitive impairment (CI) represent a unique, understudied population. The classification of cognitive impairment severity continues to be debated though it has recently been subtyped into late (LMCI) versus early (EMCI) stages. Previous studies have found associations between treatment outcome and both cortical thickness and white matter hyperintensities (WMH), though report inconsistent directionality and affected regions. In this study, we examined baseline clinical characteristics and neuroanatomical features as prognostic indicators for older adults with comorbid DEP and CI participating in an open antidepressant trial. EMCI is hypothesized to have greater cortical thickness and global cognition than LMCI. Antidepressant treatment remitters and responders are hypothesized to have greater cortical thickness and lower WMH burden than non-remitters and non-responders. Methods: Key inclusion criteria were diagnosis of major depression or dysthymic disorder with Hamilton Depression Rating Scale (HDRS) score \u3e14, and cognitive impairment defined by MMSE score ≥21 and impaired performance on the WMS-R Logical Memory II test. Patients were classified as EMCI or LMCI based on the 1.5 SD cutoff on tests of verbal memory, and compared on baseline clinical, neuropsychological, and anatomical characteristics. All patients underwent a baseline MRI scan and received open antidepressant treatment for 8 weeks. Cortical thickness was extracted using an automated brain segmentation and reconstruction program (FreeSurfer). Vertex-wise analyses were conducted using general linear models to evaluate the relationships between cortical thickness and clinical variables. Results: 79 DEP-CI patients were recruited, of whom 39 met criteria for EMCI and 40 for LMCI. The mean age was 68.9 and mean HDRS was 23.0. LMCI patients had significantly worse global cognition and smaller right hippocampal volume compared to EMCI patients. EMCI patients had thicker right medial orbitofrontal cortex than LMCI. MRI indices of cerebrovascular disease did not differ between MCI subtypes. Remitters had greater deep WMH burden, left medial orbitofrontal gyrus thickness, and right superior frontal gyrus thickness than non-remitters. Greater HDRS depressive severity was positively correlated with right pars triangularis thickness. Stronger ADAS-Cog global cognitive performance was positively correlated with thickness in diffuse cortical areas. Conclusions: Cognitive and neuronal loss markers differed between EMCI and LMCI among patients with DEP-CI, with LMCI being more likely to have the clinical and neuronal loss markers known to be associated with Alzheimer’s disease. Samples of DEP-CI exhibit unique patterns of cortical thickness and WMHs compared to their non-CI peers. Cortical thickness may serve as predictor of treatment remission and relates to both depressive severity and global cognition

    A Mathematical Methodology for Determining the Temporal Order of Pathway Alterations Arising during Gliomagenesis

    Get PDF
    Human cancer is caused by the accumulation of genetic alterations in cells. Of special importance are changes that occur early during malignant transformation because they may result in oncogene addiction and thus represent promising targets for therapeutic intervention. We have previously described a computational approach, called Retracing the Evolutionary Steps in Cancer (RESIC), to determine the temporal sequence of genetic alterations during tumorigenesis from cross-sectional genomic data of tumors at their fully transformed stage. Since alterations within a set of genes belonging to a particular signaling pathway may have similar or equivalent effects, we applied a pathway-based systems biology approach to the RESIC methodology. This method was used to determine whether alterations of specific pathways develop early or late during malignant transformation. When applied to primary glioblastoma (GBM) copy number data from The Cancer Genome Atlas (TCGA) project, RESIC identified a temporal order of pathway alterations consistent with the order of events in secondary GBMs. We then further subdivided the samples into the four main GBM subtypes and determined the relative contributions of each subtype to the overall results: we found that the overall ordering applied for the proneural subtype but differed for mesenchymal samples. The temporal sequence of events could not be identified for neural and classical subtypes, possibly due to a limited number of samples. Moreover, for samples of the proneural subtype, we detected two distinct temporal sequences of events: (i) RAS pathway activation was followed by TP53 inactivation and finally PI3K2 activation, and (ii) RAS activation preceded only AKT activation. This extension of the RESIC methodology provides an evolutionary mathematical approach to identify the temporal sequence of pathway changes driving tumorigenesis and may be useful in guiding the understanding of signaling rearrangements in cancer development

    A computational intelligence analysis of G proteincoupled receptor sequinces for pharmacoproteomic applications

    Get PDF
    Arguably, drug research has contributed more to the progress of medicine during the past decades than any other scientific factor. One of the main areas of drug research is related to the analysis of proteins. The world of pharmacology is becoming increasingly dependent on the advances in the fields of genomics and proteomics. This dependency brings about the challenge of finding robust methods to analyze the complex data they generate. Such challenge invites us to go one step further than traditional statistics and resort to approaches under the conceptual umbrella of artificial intelligence, including machine learning (ML), statistical pattern recognition and soft computing methods. Sound statistical principles are essential to trust the evidence base built through the use of such approaches. Statistical ML methods are thus at the core of the current thesis. More than 50% of drugs currently available target only four key protein families, from which almost a 30% correspond to the G Protein-Coupled Receptors (GPCR) superfamily. This superfamily regulates the function of most cells in living organisms and is at the centre of the investigations reported in the current thesis. No much is known about the 3D structure of these proteins. Fortunately, plenty of information regarding their amino acid sequences is readily available. The automatic grouping and classification of GPCRs into families and these into subtypes based on sequence analysis may significantly contribute to ascertain the pharmaceutically relevant properties of this protein superfamily. There is no biologically-relevant manner of representing the symbolic sequences describing proteins using real-valued vectors. This does not preclude the possibility of analyzing them using principled methods. These may come, amongst others, from the field of statisticalML. Particularly, kernel methods can be used to this purpose. Moreover, the visualization of high-dimensional protein sequence data can be a key exploratory tool for finding meaningful information that might be obscured by their intrinsic complexity. That is why the objective of the research described in this thesis is twofold: first, the design of adequate visualization-oriented artificial intelligence-based methods for the analysis of GPCR sequential data, and second, the application of the developed methods in relevant pharmacoproteomic problems such as GPCR subtyping and protein alignment-free analysis.Se podría decir que la investigación farmacológica ha desempeñado un papel predominante en el avance de la medicina a lo largo de las últimas décadas. Una de las áreas principales de investigación farmacológica es la relacionada con el estudio de proteínas. La farmacología depende cada vez más de los avances en genómica y proteómica, lo que conlleva el reto de diseñar métodos robustos para el análisis de los datos complejos que generan. Tal reto nos incita a ir más allá de la estadística tradicional para recurrir a enfoques dentro del campo de la inteligencia artificial, incluyendo el aprendizaje automático y el reconocimiento de patrones estadístico, entre otros. El uso de principios sólidos de teoría estadística es esencial para confiar en la base de evidencia obtenida mediante estos enfoques. Los métodos de aprendizaje automático estadístico son uno de los fundamentos de esta tesis. Más del 50% de los fármacos en uso hoy en día tienen como ¿diana¿ apenas cuatro familias clave de proteínas, de las que un 30% corresponden a la super-familia de los G-Protein Coupled Receptors (GPCR). Los GPCR regulan la funcionalidad de la mayoría de las células y son el objetivo central de la tesis. Se desconoce la estructura 3D de la mayoría de estas proteínas, pero, en cambio, hay mucha información disponible de sus secuencias de amino ácidos. El agrupamiento y clasificación automáticos de los GPCR en familias, y de éstas a su vez en subtipos, en base a sus secuencias, pueden contribuir de forma significativa a dilucidar aquellas de sus propiedades de interés farmacológico. No hay forma biológicamente relevante de representar las secuencias simbólicas de las proteínas mediante vectores reales. Esto no impide que se puedan analizar con métodos adecuados. Entre estos se cuentan las técnicas provenientes del aprendizaje automático estadístico y, en particular, los métodos kernel. Por otro lado, la visualización de secuencias de proteínas de alta dimensionalidad puede ser una herramienta clave para la exploración y análisis de las mismas. Es por ello que el objetivo central de la investigación descrita en esta tesis se puede desdoblar en dos grandes líneas: primero, el diseño de métodos centrados en la visualización y basados en la inteligencia artificial para el análisis de los datos secuenciales correspondientes a los GPCRs y, segundo, la aplicación de los métodos desarrollados a problemas de farmacoproteómica tales como la subtipificación de GPCRs y el análisis de proteinas no-alineadas

    Deep Representation Learning of Electronic Health Records to Unlock Patient Stratification at Scale

    Full text link
    Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here we present an unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficiently and effectively enable patient stratification at scale. We considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising of a total of 57,464 clinical concepts. We introduce a representation learning model based on word embeddings, convolutional neural networks, and autoencoders (i.e., ConvAE) to transform patient trajectories into low-dimensional latent vectors. We evaluated these representations as broadly enabling patient stratification by applying hierarchical clustering to different multi-disease and disease-specific patient cohorts. ConvAE significantly outperformed several baselines in a clustering task to identify patients with different complex conditions, with 2.61 entropy and 0.31 purity average scores. When applied to stratify patients within a certain condition, ConvAE led to various clinically relevant subtypes for different disorders, including type 2 diabetes, Parkinson's disease and Alzheimer's disease, largely related to comorbidities, disease progression, and symptom severity. With these results, we demonstrate that ConvAE can generate patient representations that lead to clinically meaningful insights. This scalable framework can help better understand varying etiologies in heterogeneous sub-populations and unlock patterns for EHR-based research in the realm of personalized medicine.Comment: C.F. and R.M. share senior authorshi

    Pathogenic Determinants of the Mycobacterium kansasii Complex: An Unsuspected Role for Distributive Conjugal Transfer.

    Get PDF
    The Mycobacterium kansasii species comprises six subtypes that were recently classified into six closely related species; Mycobacterium kansasii (formerly M. kansasii subtype 1), Mycobacterium persicum (subtype 2), Mycobacterium pseudokansasii (subtype 3), Mycobacterium ostraviense (subtype 4), Mycobacterium innocens (subtype 5) and Mycobacterium attenuatum (subtype 6). Together with Mycobacterium gastri, they form the M. kansasii complex. M. kansasii is the most frequent and most pathogenic species of the complex. M. persicum is classically associated with diseases in immunosuppressed patients, and the other species are mostly colonizers, and are only very rarely reported in ill patients. Comparative genomics was used to assess the genetic determinants leading to the pathogenicity of members of the M. kansasii complex. The genomes of 51 isolates collected from patients with and without disease were sequenced and compared with 24 publicly available genomes. The pathogenicity of each isolate was determined based on the clinical records or public metadata. A comparative genomic analysis showed that all M. persicum, M. ostraviense, M innocens and M. gastri isolates lacked the ESX-1-associated EspACD locus that is thought to play a crucial role in the pathogenicity of M. tuberculosis and other non-tuberculous mycobacteria. Furthermore, M. kansasii was the only species exhibiting a 25-Kb-large genomic island encoding for 17 type-VII secretion system-associated proteins. Finally, a genome-wide association analysis revealed that two consecutive genes encoding a hemerythrin-like protein and a nitroreductase-like protein were significantly associated with pathogenicity. These two genes may be involved in the resistance to reactive oxygen and nitrogen species, a required mechanism for the intracellular survival of bacteria. Three non-pathogenic M. kansasii lacked these genes likely due to two distinct distributive conjugal transfers (DCTs) between M. attenuatum and M. kansasii, and one DCT between M. persicum and M. kansasii. To our knowledge, this is the first study linking DCT to reduced pathogenicity
    corecore