363 research outputs found
MASSP3: A System for Predicting Protein Secondary Structure
A system that resorts to multiple experts for dealing with the problem of predicting secondary structures is described, whose performances are comparable to those obtained by other state-of-the-art predictors. The system performs an overall processing based on two main steps: first, a "sequence-to-structure" prediction is performed, by resorting to a population of hybrid genetic-neural experts, and then a "structure-to-structure" prediction is performed, by resorting to a feedforward artificial neural networks. To investigate the performance of the proposed approach, the system has been tested on the RS126 set of proteins. Experimental results (about 76% of accuracy) point to the validity of the approach
Removing duplicate reads using graphics processing units
Background: During library construction polymerase chain reaction is used to enrich the DNA before sequencing. Typically, this process generates duplicate read sequences. Removal of these artifacts is mandatory, as they can affect the correct interpretation of data in several analyses. Ideally, duplicate reads should be characterized by identical nucleotide sequences. However, due to sequencing errors, duplicates may also be nearly-identical. Removing nearly-identical duplicates can result in a notable computational effort. To deal with this challenge, we recently proposed a GPU method aimed at removing identical and nearly-identical duplicates generated with an Illumina platform. The method implements an approach based on prefix-suffix comparison. Read sequences with identical prefix are considered potential duplicates. Then, their suffixes are compared to identify and remove those that are actually duplicated. Although the method can be efficiently used to remove duplicates, there are some limitations that need to be overcome. In particular, it cannot to detect potential duplicates in the event that prefixes are longer than 27 bases, and it does not provide support for paired-end read libraries. Moreover, large clusters of potential duplicates are split into smaller with the aim to guarantees a reasonable computing time. This heuristic may affect the accuracy of the analysis. Results: In this work we propose GPU-DupRemoval, a new implementation of our method able to (i) cluster reads without constraints on the maximum length of the prefixes, (ii) support both single- and paired-end read libraries, and (iii) analyze large clusters of potential duplicates. Conclusions: Due to the massive parallelization obtained by exploiting graphics cards, GPU-DupRemoval removes duplicate reads faster than other cutting-edge solutions, while outperforming most of them in terms of amount of duplicates reads
G-CNV: A GPU-based tool for preparing data to detect CNVs with read-depth methods
Copy number variations (CNVs) are the most prevalent types of structural variations (SVs) in the human genome and are involved in a wide range of common human diseases. Different computational methods have been devised to detect this type of SVs and to study how they are implicated in human diseases. Recently, computational methods based on high-throughput sequencing (HTS) are increasingly used. The majority of these methods focus on mapping short-read sequences generated from a donor against a reference genome to detect signatures distinctive of CNVs. In particular, read-depth based methods detect CNVs by analyzing genomic regions with significantly different read-depth from the other ones. The pipeline analysis of these methods consists of four main stages: (i) data preparation, (ii) data normalization, (iii) CNV regions identification, and (iv) copy number estimation. However, available tools do not support most of the operations required at the first two stages of this pipeline. Typically, they start the analysis by building the read-depth signal from pre-processed alignments. Therefore, third-party tools must be used to perform most of the preliminary operations required to build the read-depth signal. These data-intensive operations can be efficiently parallelized on graphics processing units (GPUs). In this article, we present G-CNV, a GPU-based tool devised to perform the common operations required at the first two stages of the analysis pipeline. G-CNV is able to filter low-quality read sequences, to mask low-quality nucleotides, to remove adapter sequences, to remove duplicated read sequences, to map the short-reads, to resolve multiple mapping ambiguities, to build the read-depth signal, and to normalize it. G-CNV can be efficiently used as a third-party tool able to prepare data for the subsequent read-depth signal generation and analysis. Moreover, it can also be integrated in CNV detection tools to generate read-depth signals
SNPLims: a data management system for genome wide association studies
<p>Abstract</p> <p>Background</p> <p>Recent progresses in genotyping technologies allow the generation high-density genetic maps using hundreds of thousands of genetic markers for each DNA sample. The availability of this large amount of genotypic data facilitates the whole genome search for genetic basis of diseases.</p> <p>We need a suitable information management system to efficiently manage the data flow produced by whole genome genotyping and to make it available for further analyses.</p> <p>Results</p> <p>We have developed an information system mainly devoted to the storage and management of SNP genotype data produced by the Illumina platform from the raw outputs of genotyping into a relational database.</p> <p>The relational database can be accessed in order to import any existing data and export user-defined formats compatible with many different genetic analysis programs.</p> <p>After calculating family-based or case-control association study data, the results can be imported in SNPLims. One of the main features is to allow the user to rapidly identify and annotate statistically relevant polymorphisms from the large volume of data analyzed. Results can be easily visualized either graphically or creating ASCII comma separated format output files, which can be used as input to further analyses.</p> <p>Conclusions</p> <p>The proposed infrastructure allows to manage a relatively large amount of genotypes for each sample and an arbitrary number of samples and phenotypes. Moreover, it enables the users to control the quality of the data and to perform the most common screening analyses and identify genes that become “candidate” for the disease under consideration.</p
Ten Years of Experience With a Telemedicine Platform Dedicated to Health Care Personnel: Implementation Report
Background: Telemedicine, a term that encompasses several applications and tasks, generally involves the remote management and treatment of patients by physicians. It is known as transversal telemedicine when practiced among health care professionals (HCPs). Objective: We describe the experience of implementing our telemedicine Eumeda platform for HCPs over the last 10 years. Methods: A web-based informatics platform was developed that had continuously updated hypertext created using advanced technology and the following features: security, data insertion, dedicated software for image analysis, and the ability to export data for statistical surveys. Customizable files called "modules" were designed and built for different fields of medicine, mainly in the ophthalmology subspecialty. Each module was used by HCPs with different authorization profiles. Implementation (results): Twelve representative modules for different projects are presented in this manuscript. These modules evolved over time, with varying degrees of interconnectivity, including the participation of a number of centers in 19 cities across Italy. The number of HCP operators involved in each single module ranged from 6 to 114 (average 21.8, SD 28.5). Data related to 2574 participants were inserted across all the modules. The average percentage of completed text/image fields in the 12 modules was 65.7%. All modules were evaluated in terms of access, acceptability, and medical efficacy. In their final evaluation, the participants judged the modules to be useful and efficient for clinical use. Conclusions: Our results demonstrate the usefulness of the telemedicine platform for HCPs in terms of improved knowledge in medicine, patient care, scientific research, teaching, and the choice of therapies. It would be useful to start similar projects across various health care fields, considering that in the near future medicine as we know it will completely change
Hippocampal Atrophy as a Quantitative Trait in a Genome-Wide Association Study Identifying Novel Susceptibility Genes for Alzheimer's Disease
With the exception of APOE ε4 allele, the common genetic risk factors for sporadic Alzheimer's Disease (AD) are unknown., which can be considered potential “new” candidate loci to explore in the etiology of sporadic AD. These candidates included EFNA5, CAND1, MAGI2, ARSB, and PRUNE2, genes involved in the regulation of protein degradation, apoptosis, neuronal loss and neurodevelopment. Thus, we identified common genetic variants associated with the increased risk of developing AD in the ADNI cohort, and present publicly available genome-wide data. Supportive evidence based on case-control studies and biological plausibility by gene annotation is provided. Currently no available sample with both imaging and genetic data is available for replication.Using hippocampal atrophy as a quantitative phenotype in a genome-wide scan, we have identified candidate risk genes for sporadic Alzheimer's disease that merit further investigation
Genetic determinants in a critical domain of ns5a correlate with hepatocellular carcinoma in cirrhotic patients infected with hcv genotype 1b
HCV is an important cause of hepatocellular carcinoma (HCC). HCV NS5A domain‐1 interacts with cellular proteins inducing pro‐oncogenic pathways. Thus, we explore genetic variations in NS5A domain‐1 and their association with HCC, by analyzing 188 NS5A sequences from HCV genotype‐1b infected DAA‐naïve cirrhotic patients: 34 with HCC and 154 without HCC. Specific NS5A mutations significantly correlate with HCC: S3T (8.8% vs. 1.3%, p = 0.01), T122M (8.8% vs. 0.0%, p < 0.001), M133I (20.6% vs. 3.9%, p < 0.001), and Q181E (11.8% vs. 0.6%, p < 0.001). By multivariable analysis, the presence of >1 of them independently correlates with HCC (OR (95%CI): 21.8 (5.7–82.3); p < 0.001). Focusing on HCC‐group, the presence of these mutations correlates with higher viremia (median (IQR): 5.7 (5.4–6.2) log IU/mL vs. 5.3 (4.4–5.6) log IU/mL, p = 0.02) and lower ALT (35 (30–71) vs. 83 (48–108) U/L, p = 0.004), suggesting a role in enhancing viral fitness without affecting necroinflammation. Notably, these mutations reside in NS5A regions known to interact with cellular proteins crucial for cell‐cycle regulation (p53, p85‐PIK3, and β‐ catenin), and introduce additional phosphorylation sites, a phenomenon known to ameliorate NS5A interaction with cellular proteins. Overall, these results provide a focus for further investigations on molecular bases of HCV‐mediated oncogenesis. The role of these NS5A domain‐1 mutations in triggering pro‐oncogenic stimuli that can persist also despite achievement of sustained virological response deserves further investigation
- …