57 research outputs found
MapOptics: A light-weight, cross-platform visualisation tool for optical mapping alignment
Availability and implementation:
MapOptics is implemented in Java 1.8 and released under an MIT licence. MapOptics can be downloaded from https://github.com/FadyMohareb/mapoptics and run on any standard desktop computer equipped with a Java Virtual Machine (JVM).
Supplementary data are available at Bioinformatics online.Bionano optical mapping is a technology that can assist in the final stages of genome assembly by lengthening and ordering scaffolds in a draft assembly by aligning the assembly to a genomic map. However, currently, tools for visualisation are limited to use on a Windows operating system or are developed initially for visualising large-scale structural variation. MapOptics is a lightweight cross-platform tool that enables the user to visualise and interact with the alignment of Bionano optical mapping data and can be used for in depth exploration of hybrid scaffolding alignments. It provides a fast, simple alternative to the large optical mapping analysis programs currently available for this area of research
Tersect: a set theoretical utility for exploring sequence variant data
Comparing genomic features among a large panel of individuals across the same species is considered nowadays a core part of the bioinformatics analyses. This typically involves a series of complex theoretical expressions to compare, intersect, extract symmetric differences between individuals within a large set of genotypes. Several publically available tools are capable of performing such tasks; however, due to the sheer size of variants being queried, such tasks can be computationally expensive with a runtime ranging from few minutes up to several hours depending on the dataset size. This makes existing tools unsuitable for interactive data query or as part of genomic data visualization platforms such as genome browsers. Tersect is a lightweight, high-performance command-line utility which interprets and applies flexible set theoretical expressions to sets of sequence variant data. It can be used both for interactive data exploration and as part of a larger pipeline thanks to its highly optimized storage and indexing algorithms for variant data
Visceral fat mass as a novel risk factor for predicting gestational diabetes in obese pregnant women
Objective
To develop a model to predict gestational diabetes mellitus incorporating classical and a novel risk factor, visceral fat mass.
Methods
Three hundred two obese non-diabetic pregnant women underwent body composition analysis at booking by bioimpedance analysis. Of this cohort, 72 (24%) developed gestational diabetes mellitus. Principal component analysis was initially performed to identify possible clustering of the gestational diabetes mellitus and non-GDM groups. A machine learning algorithm was then applied to develop a GDM predictive model utilising random forest and decision tree modelling.
Results
The predictive model was trained on 227 samples and validated using an independent testing subset of 75 samples where the model achieved a validation prediction accuracy of 77.53%. According to the decision tree developed, visceral fat mass emerged as the most important variable in determining the risk of gestational diabetes mellitus.
Conclusions
We present a model incorporating visceral fat mass, which is a novel risk factor in predicting gestational diabetes mellitus in obese pregnant wome
Novel approaches for food safety management and communication
The current safety and quality controls in the food chain are lacking or inadequately applied and fail to prevent microbial and/or chemical contamination of food products, which leads to reduced confidence among consumers.
On the other hand to meet market demands food business operators (producers, retailers, resellers) and regulators need to develop and apply structured quality and safety assurance systems based on thorough risk analysis and prevention, through monitoring, recording and controlling of critical parameters covering the entire product's life cycle.
However the production, supply, and processing sectors of the food chain are fragmented and this lack of cohesion results in a failure to adopt new and innovative technologies, products and processes.
The potential of using information technologies, for example, data storage, communication, cloud, in tandem with data science, for example, data mining, pattern recognition, uncertainty modelling, artificial intelligence, etc., through the whole food chain including processing within the food industry, retailers and even consumers, will provide stakeholders with novel tools regarding the implementation of a more efficient food safety management system
VarGen: An R package for disease-associated variant discovery and annotation
Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here we present āVarGenā, an easy to use, customisable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. OMIM, FANTOM5, GTEx and the GWAS catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships
Study of microRNAs-21/221 as potential breast cancer biomarkers in Egyptian women
microRNAs (miRNAs) play an important role in cancer prognosis. They are small molecules, approximately 17ā25 nucleotides in length, and their high stability in human serum supports their use as novel diagnostic biomarkers of cancer and other pathological conditions. In this study, we analyzed the expression patterns of miR-21 and miR-221 in the serum from a total of 100 Egyptian female subjects with breast cancer, fibroadenoma, and healthy control subjects. Using microarray-based expression profiling followed by real-time polymerase chain reaction validation, we compared the levels of the two circulating miRNAs in the serum of patients with breast cancer (n = 50), fibroadenoma (n = 25), and healthy controls (n = 25). The miRNA SNORD68 was chosen as the housekeeping endogenous control. We found that the serum levels of miR-21 and miR-221 were significantly overexpressed in breast cancer patients compared to normal controls and fibroadenoma patients. Receiver Operating Characteristic (ROC) curve analysis revealed that miR-21 has greater potential in discriminating between breast cancer patients and the control group, while miR-221 has greater potential in discriminating between breast cancer and fibroadenoma patients. Classification models using k-Nearest Neighbor (kNN), NaĆÆve Bayes (NB), and Random Forests (RF) were developed using expression levels of both miR-21 and miR-221. Best classification performance was achieved by NB Classification models, reaching 91% of correct classification. Furthermore, relative miR-221 expression was associated with histological tumor grades. Therefore, it may be concluded that both miR-21 and miR-221 can be used to differentiate between breast cancer patients and healthy controls, but that the diagnostic accuracy of serum miR-21 is superior to miR-221 for breast cancer prediction. miR-221 has more diagnostic power in discriminating between breast cancer and fibroadenoma patients. The overexpression of miR-221 has been associated with the breast cancer grade. We also demonstrated that the combined expression of miR-21 and miR-221can be successfully applied as breast cancer biomarkers
Identification of meat spoilage gene biomarkers in Pseudomonas putida using gene profiling
While current food science research mainly focuses on microbial changes in food products that lead to foodborne illnesses, meat spoilage remains as an unsolved problem for the meat industry. This can result in important economic losses, food waste and loss of consumer confidence in the meat market. Gram-negative bacteria involved in meat spoilage are aerobes or facultative anaerobes. These represent the group with the greatest meat spoilage potential, where Pseudomonas tend to dominate the microbial consortium under refrigeration and aerobic conditions. Identifying stress response genes under different environmental conditions can help researchers gain an understanding of how Pseudomonas adapts to current packaging and storage conditions. We examined the gene expression profile of Pseudomonas putida KT2440, which plays an important role in the spoilage of meat products. Gene expression profiles were evaluated to select the most differentially expressed genes at different temperatures (30 Ā°C and 10 Ā°C) and decreasing glucose concentrations, in order to identify key genes actively involved with the spoilage process. A total of 739 and 1269 were found to be differentially expressed at 30 Ā°C and 10 Ā°C respectively; of which 430 and 568 genes were overexpressed, and 309 and 701 genes were repressed at 30 Ā°C and 10 Ā°C respectively
Biochemical profile of heritage and modern apple cultivars and application of machine learning methods to predict usage, age, and harvest season
The present study represents the first major attempt to characterise the biochemical profile in different tissues of a large selection of apple cultivars sourced from the UKās National Fruit Collection comprising dessert, ornamental, cider and culinary apples. Furthermore, advanced Machine Learning methods were applied with the objective to identify whether the phenolic and sugar composition of an apple cultivar could be used as a biomarker fingerprint to differentiate between heritage and mainstream commercial cultivars as well as govern the separation among primary usage groups and harvest season. Prediction accuracy > 90% was achieved with Random Forest for all three models. The results highlighted the extraordinary phytochemical potency and unique profile of some heritage, cider and ornamental apple cultivars, especially in comparison to more mainstream apple cultivars. Therefore, these findings could guide future cultivar selection on the basis of health-promoting phytochemical content
Multispectral image analysis approach to detect adulteration of beef and pork in raw meats
The aim of this study was to investigate the potential of multispectral imaging supported by multivariate data analysis for the detection of minced beef fraudulently substituted with pork and vice versa. Multispectral images in 18 different wavelengths of 220 meat samples in total from four independent experiments (55 samples per experiment) were acquired for this work. The appropriate amount of beef and pork-minced meat was mixed in order to achieve nine different proportions of adulteration and two categories of pure pork and beef. After an image processing step, data from the first three experiments were used for partial least squares-discriminant analysis (PLS-DA) and linear discriminant analysis (LDA) so as to discriminate among all adulteration classes, as well as among adulterated, pure beef and pure pork samples. Results showed very good discrimination between pure and adulterated samples, for PLS-DA and LDA, yielding 98.48% overall correct classification. Additionally, 98.48% and 96.97% of the samples were classified within a Ā± 10% category of adulteration for LDA and PLS-DA respectively. Lastly, the models were further validated using the data of the fourth experiment for independent testing, where all pure and adulterated samples were classified correctly in the case of PLS-DA, while LDA was proved to be less accurate
Robust detection of point mutations involved in multidrug-resistant Mycobacterium tuberculosis in the presence of co-occurrent resistance markers
Tuberculosis disease is a major global public health concern and the growing prevalence
of drug-resistant Mycobacterium tuberculosis is making disease control more difficult.
However, the increasing application of whole-genome sequencing as a diagnostic tool is
leading to the profiling of drug resistance to inform clinical practice and treatment
decision making. Computational approaches for identifying established and novel
resistance-conferring mutations in genomic data include genome-wide association study
(GWAS) methodologies, tests for convergent evolution and machine learning techniques.
These methods may be confounded by extensive co-occurrent resistance, where
statistical models for a drug include unrelated mutations known to be causing resistance
to other drugs. Here, we introduce a novel ācannibalisticā elimination algorithm
(āHungry, Hungry SNPosā) that attempts to remove these co-occurrent resistant
variants. Using an M. tuberculosis genomic dataset for the virulent Beijing strain-type
(n=3,574) with phenotypic resistance data across five drugs (isoniazid, rifampicin,
ethambutol, pyrazinamide, and streptomycin), we demonstrate that this new approach
is considerably more robust than traditional methods and detects resistance-associated
variants too rare to be likely picked up by correlation-based techniques like GWA
- ā¦