27 research outputs found
High correlation between the turnover of nucleotides under mutational pressure and the DNA composition
BACKGROUND: Any DNA sequence is a result of compromise between the selection and mutation pressures exerted on it during evolution. It is difficult to estimate the relative influence of each of these pressures on the rate of accumulation of substitutions. However, it is important to discriminate between the effect of mutations, and the effect of selection, when studying the phylogenic relations between taxa. RESULTS: We have tested in computer simulations, and analytically, the available substitution matrices for many genomes, and we have found that DNA strands in equilibrium under mutational pressure have unique feature: the fraction of each type of nucleotide is linearly dependent on the time needed for substitution of half of nucleotides of a given type, with a correlation coefficient close to 1. Substitution matrices found for sequences under selection pressure do not have this property. A substitution matrix for the leading strand of the Borrelia burgdorferi genome, having reached equilibrium in computer simulation, gives a DNA sequence with nucleotide composition and asymmetry corresponding precisely to the third positions in codons of protein coding genes located on the leading strand. CONCLUSIONS: Parameters of mutational pressure allow us to count DNA composition in equilibrium with this mutational pressure. Comparing any real DNA sequence with the sequence in equilibrium it is possible to estimate the distance between these sequences, which could be used as a measure of the selection pressure. Furthermore, the parameters of the mutational pressure enable direct estimation of the relative mutation rates in any DNA sequence in the studied genome
The relationships between the isoelectric point and: length of proteins, taxonomy and ecology of organisms
<p>Abstract</p> <p>Background</p> <p>The distribution of isoelectric point (pI) of proteins in a proteome is universal for all organisms. It is bimodal dividing the proteome into two sets of acidic and basic proteins. Different species however have different abundance of acidic and basic proteins that may be correlated with taxonomy, subcellular localization, ecological niche of organisms and proteome size.</p> <p>Results</p> <p>We have analysed 1784 proteomes encoded by chromosomes of Archaea, Bacteria, Eukaryota, and also mitochondria, plastids, prokaryotic plasmids, phages and viruses. We have found significant correlation in more than 95% of proteomes between the protein length and pI in proteomes â positive for acidic proteins and negative for the basic ones. Plastids, viruses and plasmids encode more basic proteomes while chromosomes of Archaea, Bacteria, Eukaryota, mitochondria and phages more acidic ones. Mitochondrial proteomes of Viridiplantae, Protista and Fungi are more basic than Metazoa. It results from the presence of basic proteins in the former proteomes and their absence from the latter ones and is related with reduction of metazoan genomes. Significant correlation was found between the pI bias of proteomes encoded by prokaryotic chromosomes and proteomes encoded by plasmids but there is no correlation between eukaryotic nuclear-coded proteomes and proteomes encoded by organelles. Detailed analyses of prokaryotic proteomes showed significant relationships between pI distribution and habitat, relation to the host cell and salinity of the environment, but no significant correlation with oxygen and temperature requirements. The salinity is positively correlated with acidicity of proteomes. Host-associated organisms and especially intracellular species have more basic proteomes than free-living ones. The higher rate of mutations accumulation in the intracellular parasites and endosymbionts is responsible for the basicity of their tiny proteomes that explains the observed positive correlation between the decrease of genome size and the increase of basicity of proteomes. The results indicate that even conserved proteins subjected to strong selectional constraints follow the global trend in the pI distribution.</p> <p>Conclusion</p> <p>The distribution of pI of proteins in proteomes shows clear relationships with length of proteins, subcellular localization, taxonomy and ecology of organisms. The distribution is also strongly affected by mutational pressure especially in intracellular organisms.</p
Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researchers, including bioinformaticians. Machine learning represents the most cost-effective method for novel AMP discovery and consequently many computational tools for AMP prediction have been recently developed. In this article, we investigate the impact of negative data sampling on model performance and benchmarking. We generated 660 predictive models using 12 machine learning architectures, a single positive data set and 11 negative data sampling methods; the architectures and methods were defined on the basis of published AMP prediction software. Our results clearly indicate that similar training and benchmark data set, i.e. produced by the same or a similar negative data sampling method, positively affect model performance. Consequently, all the benchmark analyses that have been performed for AMP prediction models are significantly biased and, moreover, we do not know which model is the most accurate. To provide researchers with reliable information about the performance of AMP predictors, we also created a web server AMPBenchmark for fair model benchmarking. AMP Benchmark is available at http://BioGenies.info/AMPBenchmark
Recommended from our members
Comprehensive molecular characterization of gastric adenocarcinoma
Gastric cancer is a leading cause of cancer deaths, but analysis of its molecular and clinical characteristics has been complicated by histological and aetiological heterogeneity. Here we describe a comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project. We propose a molecular classification dividing gastric cancer into four subtypes: tumours positive for EpsteinâBarr virus, which display recurrent PIK3CA mutations, extreme DNA hypermethylation, and amplification of JAK2, CD274 (also known as PD-L1) and PDCD1LG2 (also knownasPD-L2); microsatellite unstable tumours, which show elevated mutation rates, including mutations of genes encoding targetable oncogenic signalling proteins; genomically stable tumours, which are enriched for the diffuse histological variant and mutations of RHOA or fusions involving RHO-family GTPase-activating proteins; and tumours with chromosomal instability, which show marked aneuploidy and focal amplification of receptor tyrosine kinases. Identification of these subtypes provides a roadmap for patient stratification and trials of targeted therapies
Three dimensions in a tropical forest. The colonial plantation La Caroline in French Guiana
International audienceSuccessful fieldwork in a tropical forest requires both a comprehensive preparatory scenario as well as a framework of everyday methods. Dense vegetation impedes navigating the terrain and interpreting the relations between the anthropogenic and natural landscapes. Additionally, the forest hampers using accurate measurement systems, such as GNSS and Total Station systems or UAV platforms. It hinders the logistics, effectively limiting the amount of equipment that can be used. L'habitation La Caroline Parts of the colonial plantation La Caroline were excavated in 2016. Located 30 km from Cayenne near the river Mahury, it was active since the end of the 18 th century until the 1870's. Cloves and rokou were its main crops. It had a regular, somewhat iterative layout. The remains of some buildings, e.g. the maisons de maĂźtre, a free-standing kitchen and slave or labourer quarters are still discernible. A monumental stairway leading down to a now defunct canal and road system is also preserved. Today, the whole area is covered with a tropical forest and the building remains are hidden in thick undergrowth. Large scale overview Airborne Laser Scanning was conducted during the preparatory stage. This allowed to create accurate land relief and use models. The measurements were made using a RIEGL LMS-Q560. Point clouds of high density were obtained, but the thick vegetation caused the area to be irregularly covered many measurements classified as the terrain. This necessitated conducting a series of experiments aiming to find the optimal classification parameters. Eventually, an area of 17,5 km 2 measured. It encompassed several adjacent plantations and revealed the spatial layout of buildings, elevated fields, roads and other infrastructure. Feature recording Photogrammetry was utilised during the fieldwork and supplemented the traditional recording. This allowed to save time spent on documentation and provided high resolution data. Structure from motion techniques were especially useful in recording highly detailed and complex contexts such as the architectural remains. DEMs, orthophotographs and 3D models were the final result of field image-based modelling. Combined with comparative studies of similar structures, they allowed us to model, reconstruct and visualise the buildings in 2D and 3D. Summary Although the array of measurement methods that could have been reliably used in the dense rainforest vegetation was quite limited, they have nonetheless proven to be highly effective and yielded satisfying and valuable results. ALS was used both for prospection and recording. It provided unique data sets imaging the otherwise inaccessible terrain. Feature photogrammetry allowed to record the features found during excavations quickly and in a high resolution, with minimal time and equipment requirements. The data became a base for models, which allowed us to notice new details during post-excavation analyses of architecture. The spatial data helped to image structures on many different scales: from the landscape, through relations of buildings, down to excavation features. Importantly enough, all data was unified in a common mapping system, easily usable in CAD or GIS environments. Finally, the digital data provide a base for multiple scale 3D visualisations, successively uploaded online as interactive models. As such, they promulgate not only our results and the methods we used, but historical archaeology in French Guiana as well
Distributions of the correlation coefficients between pI value and length of proteins calculated separately for acidic and basic sets of proteomes
<p><b>Copyright information:</b></p><p>Taken from "The relationships between the isoelectric point and: length of proteins, taxonomy and ecology of organisms"</p><p>http://www.biomedcentral.com/1471-2164/8/163</p><p>BMC Genomics 2007;8():163-163.</p><p>Published online 12 Jun 2007</p><p>PMCID:PMC1905920.</p><p></p
Relationship between the pI bias and: (A) logarithm of proteome size and (B) genomic GC content for different ecological groups of prokaryotes
<p><b>Copyright information:</b></p><p>Taken from "The relationships between the isoelectric point and: length of proteins, taxonomy and ecology of organisms"</p><p>http://www.biomedcentral.com/1471-2164/8/163</p><p>BMC Genomics 2007;8():163-163.</p><p>Published online 12 Jun 2007</p><p>PMCID:PMC1905920.</p><p></p