163,170 research outputs found
Development of Novel Calibrations for FT-NIR Analysis of Protein, Oil, Carbohydrates and Isoflavones in Foods
The development of calibration methodology for novel FT-NIRS analysis of soybean-based foods is presented together with high-precision NIRS spectra and composition measurements in terms of proteins, oil and carbohydrates in soybean-based foods/soy foods.

Development and Validation of Clinical Whole-Exome and Whole-Genome Sequencing for Detection of Germline Variants in Inherited Disease
Context.-With the decrease in the cost of sequencing, the clinical testing paradigm has shifted from single gene to gene panel and now whole-exome and whole-genome sequencing. Clinical laboratories are rapidly implementing next-generation sequencing-based whole-exome and whole-genome sequencing. Because a large number of targets are covered by whole-exome and whole-genome sequencing, it is critical that a laboratory perform appropriate validation studies, develop a quality assurance and quality control program, and participate in proficiency testing. Objective.-To provide recommendations for wholeexome and whole-genome sequencing assay design, validation, and implementation for the detection of germline variants associated in inherited disorders. Data Sources.-An example of trio sequencing, filtration and annotation of variants, and phenotypic consideration to arrive at clinical diagnosis is discussed. Conclusions.-It is critical that clinical laboratories planning to implement whole-exome and whole-genome sequencing design and validate the assay to specifications and ensure adequate performance prior to implementation. Test design specifications, including variant filtering and annotation, phenotypic consideration, guidance on consenting options, and reporting of incidental findings, are provided. These are important steps a laboratory must take to validate and implement whole-exome and whole-genome sequencing in a clinical setting for germline variants in inherited disorders
Feature importance for machine learning redshifts applied to SDSS galaxies
We present an analysis of importance feature selection applied to photometric
redshift estimation using the machine learning architecture Decision Trees with
the ensemble learning routine Adaboost (hereafter RDF). We select a list of 85
easily measured (or derived) photometric quantities (or `features') and
spectroscopic redshifts for almost two million galaxies from the Sloan Digital
Sky Survey Data Release 10. After identifying which features have the most
predictive power, we use standard artificial Neural Networks (aNN) to show that
the addition of these features, in combination with the standard magnitudes and
colours, improves the machine learning redshift estimate by 18% and decreases
the catastrophic outlier rate by 32%. We further compare the redshift estimate
using RDF with those from two different aNNs, and with photometric redshifts
available from the SDSS. We find that the RDF requires orders of magnitude less
computation time than the aNNs to obtain a machine learning redshift while
reducing both the catastrophic outlier rate by up to 43%, and the redshift
error by up to 25%. When compared to the SDSS photometric redshifts, the RDF
machine learning redshifts both decreases the standard deviation of residuals
scaled by 1/(1+z) by 36% from 0.066 to 0.041, and decreases the fraction of
catastrophic outliers by 57% from 2.32% to 0.99%.Comment: 10 pages, 4 figures, updated to match version accepted in MNRA
ConSole: using modularity of contact maps to locate solenoid domains in protein structures.
BackgroundPeriodic proteins, characterized by the presence of multiple repeats of short motifs, form an interesting and seldom-studied group. Due to often extreme divergence in sequence, detection and analysis of such motifs is performed more reliably on the structural level. Yet, few algorithms have been developed for the detection and analysis of structures of periodic proteins.ResultsConSole recognizes modularity in protein contact maps, allowing for precise identification of repeats in solenoid protein structures, an important subgroup of periodic proteins. Tests on benchmarks show that ConSole has higher recognition accuracy as compared to Raphael, the only other publicly available solenoid structure detection tool. As a next step of ConSole analysis, we show how detection of solenoid repeats in structures can be used to improve sequence recognition of these motifs and to detect subtle irregularities of repeat lengths in three solenoid protein families.ConclusionsThe ConSole algorithm provides a fast and accurate tool to recognize solenoid protein structures as a whole and to identify individual solenoid repeat units from a structure. ConSole is available as a web-based, interactive server and is available for download at http://console.sanfordburnham.org
A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer
Recently, several classifiers that combine primary tumor data, like gene
expression data, and secondary data sources, such as protein-protein
interaction networks, have been proposed for predicting outcome in breast
cancer. In these approaches, new composite features are typically constructed
by aggregating the expression levels of several genes. The secondary data
sources are employed to guide this aggregation. Although many studies claim
that these approaches improve classification performance over single gene
classifiers, the gain in performance is difficult to assess. This stems mainly
from the fact that different breast cancer data sets and validation procedures
are employed to assess the performance. Here we address these issues by
employing a large cohort of six breast cancer data sets as benchmark set and by
performing an unbiased evaluation of the classification accuracies of the
different approaches. Contrary to previous claims, we find that composite
feature classifiers do not outperform simple single gene classifiers. We
investigate the effect of (1) the number of selected features; (2) the specific
gene set from which features are selected; (3) the size of the training set and
(4) the heterogeneity of the data set on the performance of composite feature
and single gene classifiers. Strikingly, we find that randomization of
secondary data sources, which destroys all biological information in these
sources, does not result in a deterioration in performance of composite feature
classifiers. Finally, we show that when a proper correction for gene set size
is performed, the stability of single gene sets is similar to the stability of
composite feature sets. Based on these results there is currently no reason to
prefer prognostic classifiers based on composite features over single gene
classifiers for predicting outcome in breast cancer
A system performance throughput model applicable to advanced manned telescience systems
As automated space systems become more complex, autonomous, and opaque to the flight crew, it becomes increasingly difficult to determine whether the total system is performing as it should. Some of the complex and interrelated human performance measurement issues are addressed that are related to total system validation. An evaluative throughput model is presented which can be used to generate a human operator-related benchmark or figure of merit for a given system which involves humans at the input and output ends as well as other automated intelligent agents. The concept of sustained and accurate command/control data information transfer is introduced. The first two input parameters of the model involve nominal and off-nominal predicted events. The first of these calls for a detailed task analysis while the second is for a contingency event assessment. The last two required input parameters involving actual (measured) events, namely human performance and continuous semi-automated system performance. An expression combining these four parameters was found using digital simulations and identical, representative, random data to yield the smallest variance
- …