811 research outputs found
Genome annotation of the 1.2MB Region on chromosome 8p22-p23.1 harbouring the gene for Keratolytic Winter Erythema (KWE)
Keratolytic winter erythema (KWE) or Oudtshoorn skin disease is a rare autosomal dominant skin disorder for which the genetic cause remains unknown. The disorder manifests in the form of erythema and hyperkeratosis of the palmar-plantar regions and has been linked to a 1.2Mb region on chromosome 8p22-23.1 between markers D8S1759 and D8S552. A prevalence of 1/7200 has been observed in the South African Afrikaans-speaking white population with a lower unspecified prevalence occurring in the coloured South African population. A number of positional candidate genes within the critical region have been assessed for pathogenic mutations, however to date the causative gene has not been identified.
The objective of the current study was to examine the KWE critical region for highly conserved coding and non-coding regions and copy number variants (CNV) and to determine if these regions may play a role in the molecular etiology of the disease. Highly conserved regions were identified based on sequence conservation across a range of evolutionary diverse organisms. These regions were further analysed for possible protein-coding gene structure, regulatory motifs and RNA secondary structure. In addition, a custom CGH tiling array (384K Roche-Nimblegen) was used to identify CNVs across the extended KWE critical region in both affected and unaffected individuals. The multi-species sequence alignment revealed eight regions that showed a high level of conservation above a 70% threshold. Functional analysis of two of the conserved regions led to the identification of a novel protein-coding gene deubiquitinating enzyme 3 (DUB3) within the critical region which presented as a credible functional candidate for KWE. Two of the conserved regions were identified within an open reading frame c8orf13 which has previously been examined and found to contain no pathogenic mutations that segregate with the KWE phenotype. The remaining four highly conserved regions were found within non-coding sequence and computational analysis revealed putative regulatory motifs in the form of transcription factor binding sites. The copy number variation analysis did not show evidence for the presence of any large or small consistent CNV alleles likely to impact on any of the functional candidate genes in the KWE critical region. No common CNV alleles were observed in all of the KWE affected individuals examined and showed absence in unaffected family members. A significant variation in copy number was however observed in affected individuals within a previously defined copy number variable beta-defensin gene cluster which has been associated with psoriasis. Although the exact copy number of the cluster could not be determined in the present study due to the cross hybridization between genes in the family, the CNV observed in affect individuals for the cluster suggests that it may be involved in the modulation of the clinical severity of KWE.
The present study has led to the identification of a previously uncharacterised novel gene DUB3 within the KWE critical region which furthermore presented as a plausible functional candidate for the KWE phenotype. In addition, it has revealed that the molecular cause of KWE is unlikely to be exclusively due to copy number variation within the genes in the critical region. The current study has provided valuable insight into the KWE linked critical region and revealed a number of potential regions of interest to be examined in further studies exploring the molecular cause of the disease
Raster based coastal marsh classification within the Galveston Bay ecosystem, Texas
A mapping study using remote sensing software called ENVI was conducted utilizing four
software algorithms to investigate whether these techniques could accurately classify habitat types and
vegetation communities along West Bay of the Galveston Bay Ecosystem from color infra-red (CIR)
imagery. The algorithms were used in a small-scale study to investigate which of these techniques could
most accurately distinguish habitat types and vegetation communities from the imagery at a site specific
location. The most accurate algorithm of the four was used in a large-scale classification study in which
entire images were classified utilizing the same data from the small-scale study.
Regions of interest (ROIs) were used within ENVI to specify areas of interest within each image
that was classified. The locations of ROIs were recorded using a GPS prior to classification, then each
was added into ENVI as data points, and each ROI polygon was digitized according to its respective pixel
color. Once all of the ROI polygons were completed, each software algorithm was employed.
After classification, each habitat type and vegetation community was ground-truthed in order to
verify the accuracy of the algorithms. The position points were added as ground truth points within ENVI
and an accuracy matrix was assessed. The technique with the greatest averaged accuracy within the smallscale
study was selected for the large-scale study. The ROIs and ground truth points used in the smallscale
study were used again in the large-scale study.
The small-scale study concluded that the Parallelepiped algorithm produced significantly less
accurate classifications than the other three. Although the Mahalanobis algorithm was not significantly
different from the other two algorithms, it yielded the highest overall average accuracy and was used in the
large-scale study. In both the small-scale and large-scale studies there was no significant difference in the
two different years of aerial imagery and there were no significant differences in accuracy for locations. None of the software algorithms were accurate at classifying habitat types and vegetation communities
using the imagery. The accuracy for the Mahalanobis algorithm was less than 60%. Inaccuracies were
largely due to overlapping spectral signatures among habitat types and vegetation communities
Ten simple rules for developing bioinformatics capacity at an academic institution
Bioinformatics is an applied interdisciplinary field whose primary purpose is to develop and
deploy computational techniques to store, organize, and aid in the analysis and interpretation
of large-scale data obtained from biological systems. While rooted in the analysis of nucleotide
and protein sequences, it now encompasses techniques targeting multiple data acquisition
modalities and seeks to comprehend the functioning of biological systems at many different
levels. Bioinformaticians need to be cognizant of diverse scientific fields: basic and molecular
biology, genetics, mathematics, statistics, and computer science at a minimum, thus requiring
a thoroughly interdisciplinary set of skills to successfully carry out their duties. Due to the
growing importance of bioinformatics in enabling modern biomedical research, programs and
core facilities have been established in most academic institutions in the developed world over
the last 30 years
The elusive gene for keratolytic winter erythema
Keratolytic winter erythema (KWE), also known as Oudtshoorn skin disease, is characterised by a cyclical disruption of normal epidermal keratinisation affecting primarily the palmoplantar skin with peeling of the palms and soles, which is worse in the winter. It is a rare monogenic, autosomal dominant condition of unknown cause. However, due to a founder effect, it occurs at a prevalence of 1/7 200 among South African Afrikaans-speakers. In the mid-1980s, samples were collected from affected families for a linkage study to pinpoint the location of the KWE gene. A genome-wide linkage analysis, using microsatellite markers, identified the KWE critical region on chromosome 8p23.1-p22. Subsequent genetic studies focused on screening candidate genes in this critical region; however, no pathogenic mutations that segregated exclusively with KWE were identified. The cathepsin B (CTSB) and farnesyl-diphosphate farnesyltransferase 1 (FDFT1) genes revealed no potentially pathogenic variants, nor did they show differential gene expression in affected skin. Mutation detection in additional candidate genes also failed to identify the KWE-associated variant, suggesting that the causal variant may be in an uncharacterised functional region. Bioinformatic analysis revealed highly conserved regions within the KWE critical region and a custom tiling array was designed to cover this region and to search for copy number variation. Although the study did not identify a variant that segregates exclusively with KWE, it provided valuable insight into the complex KWE-linked region. Next-generation sequencing approaches are being used to comb the region, but the causal variant for this interesting hyperkeratotic palmoplantar phenotype still remains elusive.
High-depth African genomes inform human migration and health
The African continent is regarded as the cradle of modern humans and African
genomes contain more genetic variation than those from any other continent, yet
only a fraction of the genetic diversity among African individuals has been surveyed1.
Here we performed whole-genome sequencing analyses of 426 individuals—
comprising 50 ethnolinguistic groups, including previously unsampled populations—
to explore the breadth of genomic diversity across Africa. We uncovered more than
3 million previously undescribed variants, most of which were found among
individuals from newly sampled ethnolinguistic groups, as well as 62 previously
unreported loci that are under strong selection, which were predominantly found in
genes that are involved in viral immunity, DNA repair and metabolism. We observed
complex patterns of ancestral admixture and putative-damaging and novel variation,
both within and between populations, alongside evidence that population from
Zambia were a likely intermediate site along the routes of expansion of Bantuspeaking
populations. Pathogenic variants in genes that are currently characterized
as medically relevant were uncommon—but in other genes, variants denoted as ‘likely
pathogenic’ in the ClinVar database were commonly observed. Collectively, these
findings refine our current understanding of continental migration, identify gene flow
and the response to human disease as strong drivers of genome-level population
variation, and underscore the scientific imperative for a broader characterization of
the genomic diversity of African individuals to understand human ancestry and
improve health
Genetic-substructure and complex demographic history of South African Bantu speakers
South Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ∼400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa
Genetic substructure and complex demographic history of South African Bantu speakers
Abstract: outh Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genomewide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa
Population-specific common SNPs reflect demographic histories and highlight regions of genomic plasticity with functional relevance
Abstract
Background
Population differentiation is the result of demographic and evolutionary forces. Whole genome datasets from the 1000 Genomes Project (October 2012) provide an unbiased view of genetic variation across populations from Europe, Asia, Africa and the Americas. Common population-specific SNPs (MAF > 0.05) reflect a deep history and may have important consequences for health and wellbeing. Their interpretation is contextualised by currently available genome data.
Results
The identification of common population-specific (CPS) variants (SNPs and SSV) is influenced by admixture and the sample size under investigation. Nine of the populations in the 1000 Genomes Project (2 African, 2 Asian (including a merged Chinese group) and 5 European) revealed that the African populations (LWK and YRI), followed by the Japanese (JPT) have the highest number of CPS SNPs, in concordance with their histories and given the populations studied. Using two methods, sliding 50-SNP and 5-kb windows, the CPS SNPs showed distinct clustering across large genome segments and little overlap of clusters between populations. iHS enrichment score and the population branch statistic (PBS) analyses suggest that selective sweeps are unlikely to account for the clustering and population specificity. Of interest is the association of clusters close to recombination hotspots. Functional analysis of genes associated with the CPS SNPs revealed over-representation of genes in pathways associated with neuronal development, including axonal guidance signalling and CREB signalling in neurones.
Conclusions
Common population-specific SNPs are non-randomly distributed throughout the genome and are significantly associated with recombination hotspots. Since the variant alleles of most CPS SNPs are the derived allele, they likely arose in the specific population after a split from a common ancestor. Their proximity to genes involved in specific pathways, including neuronal development, suggests evolutionary plasticity of selected genomic regions. Contrary to expectation, selective sweeps did not play a large role in the persistence of population-specific variation. This suggests a stochastic process towards population-specific variation which reflects demographic histories and may have some interesting implications for health and susceptibility to disease
Using a multiple-delivery-mode training approach to develop local capacity and infrastructure for advanced bioinformatics in Africa
With more microbiome studies being conducted by African-based research groups, there is an increasing demand for knowledge and skills in the design and analysis of microbiome studies and data. However, high-quality bioinformatics courses are often impeded by differences in computational environments, complicated software stacks, numerous dependencies, and versions of bioinformatics tools along with a lack of local computational infrastructure and expertise. To address this, H3ABioNet developed a 16S rRNA Microbiome Intermediate Bioinformatics Training course, extending its remote classroom model. The course was developed alongside experienced microbiome researchers, bioinformaticians, and systems administrators, who identified key topics to address. Development of containerised workflows has previously been undertaken by H3ABioNet, and Singularity containers were used here to enable the deployment of a standard replicable software stack across different hosting sites. The pilot ran successfully in 2019 across 23 sites registered in 11 African countries, with more than 200 participants formally enrolled and 106 volunteer staff for onsite support
Designing a course model for distance-based online bioinformatics training in Africa: the H3ABioNet experience
Africa is not unique in its need for basic bioinformatics training for individuals from a diverse
range of academic backgrounds. However, particular logistical challenges in Africa, most
notably access to bioinformatics expertise and internet stability, must be addressed in order
to meet this need on the continent. H3ABioNet (www.h3abionet.org), the Pan African Bioinformatics
Network for H3Africa, has therefore developed an innovative, free-of-charge
"Introduction to Bioinformatics" course, taking these challenges into account as part of its
educational efforts to provide on-site training and develop local expertise inside its network.
A multiple-delivery±mode learning model was selected for this 3-month course in order to
increase access to (mostly) African, expert bioinformatics trainers. The content of the
course was developed to include a range of fundamental bioinformatics topics at the introductory
level. For the first iteration of the course (2016), classrooms with a total of 364
enrolled participants were hosted at 20 institutions across 10 African countries. To ensure
that classroom success did not depend on stable internet, trainers pre-recorded their lectures,
and classrooms downloaded and watched these locally during biweekly contact sessions.
The trainers were available via video conferencing to take questions during contact sessions, as well as via online "question and discussion" forums outside of contact session time. This learning model, developed for a resource-limited setting, could easily be adapted
to other settings.IS
- …