361 research outputs found

    MASSP3: A System for Predicting Protein Secondary Structure

    Get PDF
    A system that resorts to multiple experts for dealing with the problem of predicting secondary structures is described, whose performances are comparable to those obtained by other state-of-the-art predictors. The system performs an overall processing based on two main steps: first, a "sequence-to-structure" prediction is performed, by resorting to a population of hybrid genetic-neural experts, and then a "structure-to-structure" prediction is performed, by resorting to a feedforward artificial neural networks. To investigate the performance of the proposed approach, the system has been tested on the RS126 set of proteins. Experimental results (about 76% of accuracy) point to the validity of the approach

    Removing duplicate reads using graphics processing units

    Get PDF
    Background: During library construction polymerase chain reaction is used to enrich the DNA before sequencing. Typically, this process generates duplicate read sequences. Removal of these artifacts is mandatory, as they can affect the correct interpretation of data in several analyses. Ideally, duplicate reads should be characterized by identical nucleotide sequences. However, due to sequencing errors, duplicates may also be nearly-identical. Removing nearly-identical duplicates can result in a notable computational effort. To deal with this challenge, we recently proposed a GPU method aimed at removing identical and nearly-identical duplicates generated with an Illumina platform. The method implements an approach based on prefix-suffix comparison. Read sequences with identical prefix are considered potential duplicates. Then, their suffixes are compared to identify and remove those that are actually duplicated. Although the method can be efficiently used to remove duplicates, there are some limitations that need to be overcome. In particular, it cannot to detect potential duplicates in the event that prefixes are longer than 27 bases, and it does not provide support for paired-end read libraries. Moreover, large clusters of potential duplicates are split into smaller with the aim to guarantees a reasonable computing time. This heuristic may affect the accuracy of the analysis. Results: In this work we propose GPU-DupRemoval, a new implementation of our method able to (i) cluster reads without constraints on the maximum length of the prefixes, (ii) support both single- and paired-end read libraries, and (iii) analyze large clusters of potential duplicates. Conclusions: Due to the massive parallelization obtained by exploiting graphics cards, GPU-DupRemoval removes duplicate reads faster than other cutting-edge solutions, while outperforming most of them in terms of amount of duplicates reads

    G-CNV: A GPU-based tool for preparing data to detect CNVs with read-depth methods

    Get PDF
    Copy number variations (CNVs) are the most prevalent types of structural variations (SVs) in the human genome and are involved in a wide range of common human diseases. Different computational methods have been devised to detect this type of SVs and to study how they are implicated in human diseases. Recently, computational methods based on high-throughput sequencing (HTS) are increasingly used. The majority of these methods focus on mapping short-read sequences generated from a donor against a reference genome to detect signatures distinctive of CNVs. In particular, read-depth based methods detect CNVs by analyzing genomic regions with significantly different read-depth from the other ones. The pipeline analysis of these methods consists of four main stages: (i) data preparation, (ii) data normalization, (iii) CNV regions identification, and (iv) copy number estimation. However, available tools do not support most of the operations required at the first two stages of this pipeline. Typically, they start the analysis by building the read-depth signal from pre-processed alignments. Therefore, third-party tools must be used to perform most of the preliminary operations required to build the read-depth signal. These data-intensive operations can be efficiently parallelized on graphics processing units (GPUs). In this article, we present G-CNV, a GPU-based tool devised to perform the common operations required at the first two stages of the analysis pipeline. G-CNV is able to filter low-quality read sequences, to mask low-quality nucleotides, to remove adapter sequences, to remove duplicated read sequences, to map the short-reads, to resolve multiple mapping ambiguities, to build the read-depth signal, and to normalize it. G-CNV can be efficiently used as a third-party tool able to prepare data for the subsequent read-depth signal generation and analysis. Moreover, it can also be integrated in CNV detection tools to generate read-depth signals

    SNPLims: a data management system for genome wide association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent progresses in genotyping technologies allow the generation high-density genetic maps using hundreds of thousands of genetic markers for each DNA sample. The availability of this large amount of genotypic data facilitates the whole genome search for genetic basis of diseases.</p> <p>We need a suitable information management system to efficiently manage the data flow produced by whole genome genotyping and to make it available for further analyses.</p> <p>Results</p> <p>We have developed an information system mainly devoted to the storage and management of SNP genotype data produced by the Illumina platform from the raw outputs of genotyping into a relational database.</p> <p>The relational database can be accessed in order to import any existing data and export user-defined formats compatible with many different genetic analysis programs.</p> <p>After calculating family-based or case-control association study data, the results can be imported in SNPLims. One of the main features is to allow the user to rapidly identify and annotate statistically relevant polymorphisms from the large volume of data analyzed. Results can be easily visualized either graphically or creating ASCII comma separated format output files, which can be used as input to further analyses.</p> <p>Conclusions</p> <p>The proposed infrastructure allows to manage a relatively large amount of genotypes for each sample and an arbitrary number of samples and phenotypes. Moreover, it enables the users to control the quality of the data and to perform the most common screening analyses and identify genes that become “candidate” for the disease under consideration.</p

    Hippocampal Atrophy as a Quantitative Trait in a Genome-Wide Association Study Identifying Novel Susceptibility Genes for Alzheimer's Disease

    Get PDF
    With the exception of APOE ε4 allele, the common genetic risk factors for sporadic Alzheimer's Disease (AD) are unknown., which can be considered potential “new” candidate loci to explore in the etiology of sporadic AD. These candidates included EFNA5, CAND1, MAGI2, ARSB, and PRUNE2, genes involved in the regulation of protein degradation, apoptosis, neuronal loss and neurodevelopment. Thus, we identified common genetic variants associated with the increased risk of developing AD in the ADNI cohort, and present publicly available genome-wide data. Supportive evidence based on case-control studies and biological plausibility by gene annotation is provided. Currently no available sample with both imaging and genetic data is available for replication.Using hippocampal atrophy as a quantitative phenotype in a genome-wide scan, we have identified candidate risk genes for sporadic Alzheimer's disease that merit further investigation

    Genetic determinants in a critical domain of ns5a correlate with hepatocellular carcinoma in cirrhotic patients infected with hcv genotype 1b

    Get PDF
    HCV is an important cause of hepatocellular carcinoma (HCC). HCV NS5A domain‐1 interacts with cellular proteins inducing pro‐oncogenic pathways. Thus, we explore genetic variations in NS5A domain‐1 and their association with HCC, by analyzing 188 NS5A sequences from HCV genotype‐1b infected DAA‐naïve cirrhotic patients: 34 with HCC and 154 without HCC. Specific NS5A mutations significantly correlate with HCC: S3T (8.8% vs. 1.3%, p = 0.01), T122M (8.8% vs. 0.0%, p &lt; 0.001), M133I (20.6% vs. 3.9%, p &lt; 0.001), and Q181E (11.8% vs. 0.6%, p &lt; 0.001). By multivariable analysis, the presence of &gt;1 of them independently correlates with HCC (OR (95%CI): 21.8 (5.7–82.3); p &lt; 0.001). Focusing on HCC‐group, the presence of these mutations correlates with higher viremia (median (IQR): 5.7 (5.4–6.2) log IU/mL vs. 5.3 (4.4–5.6) log IU/mL, p = 0.02) and lower ALT (35 (30–71) vs. 83 (48–108) U/L, p = 0.004), suggesting a role in enhancing viral fitness without affecting necroinflammation. Notably, these mutations reside in NS5A regions known to interact with cellular proteins crucial for cell‐cycle regulation (p53, p85‐PIK3, and β‐ catenin), and introduce additional phosphorylation sites, a phenomenon known to ameliorate NS5A interaction with cellular proteins. Overall, these results provide a focus for further investigations on molecular bases of HCV‐mediated oncogenesis. The role of these NS5A domain‐1 mutations in triggering pro‐oncogenic stimuli that can persist also despite achievement of sustained virological response deserves further investigation

    Multidifferential study of identified charged hadron distributions in ZZ-tagged jets in proton-proton collisions at s=\sqrt{s}=13 TeV

    Full text link
    Jet fragmentation functions are measured for the first time in proton-proton collisions for charged pions, kaons, and protons within jets recoiling against a ZZ boson. The charged-hadron distributions are studied longitudinally and transversely to the jet direction for jets with transverse momentum 20 <pT<100< p_{\textrm{T}} < 100 GeV and in the pseudorapidity range 2.5<η<42.5 < \eta < 4. The data sample was collected with the LHCb experiment at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 1.64 fb1^{-1}. Triple differential distributions as a function of the hadron longitudinal momentum fraction, hadron transverse momentum, and jet transverse momentum are also measured for the first time. This helps constrain transverse-momentum-dependent fragmentation functions. Differences in the shapes and magnitudes of the measured distributions for the different hadron species provide insights into the hadronization process for jets predominantly initiated by light quarks.Comment: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-013.html (LHCb public pages
    corecore