4,850 research outputs found
Essential guidelines for computational method benchmarking
In computational biology and other sciences, researchers are frequently faced
with a choice between several computational methods for performing data
analyses. Benchmarking studies aim to rigorously compare the performance of
different methods using well-characterized benchmark datasets, to determine the
strengths of each method or to provide recommendations regarding suitable
choices of methods for an analysis. However, benchmarking studies must be
carefully designed and implemented to provide accurate, unbiased, and
informative results. Here, we summarize key practical guidelines and
recommendations for performing high-quality benchmarking analyses, based on our
experiences in computational biology.Comment: Minor update
High-throughput Binding Affinity Calculations at Extreme Scales
Resistance to chemotherapy and molecularly targeted therapies is a major
factor in limiting the effectiveness of cancer treatment. In many cases,
resistance can be linked to genetic changes in target proteins, either
pre-existing or evolutionarily selected during treatment. Key to overcoming
this challenge is an understanding of the molecular determinants of drug
binding. Using multi-stage pipelines of molecular simulations we can gain
insights into the binding free energy and the residence time of a ligand, which
can inform both stratified and personal treatment regimes and drug development.
To support the scalable, adaptive and automated calculation of the binding free
energy on high-performance computing resources, we introduce the High-
throughput Binding Affinity Calculator (HTBAC). HTBAC uses a building block
approach in order to attain both workflow flexibility and performance. We
demonstrate close to perfect weak scaling to hundreds of concurrent multi-stage
binding affinity calculation pipelines. This permits a rapid time-to-solution
that is essentially invariant of the calculation protocol, size of candidate
ligands and number of ensemble simulations. As such, HTBAC advances the state
of the art of binding affinity calculations and protocols
Assessing Reproducibility of Inherited Variants Detected With Short-Read Whole Genome Sequencing
Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS.
Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when \u3e 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×.
Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS
Beyond the genome and into the clinic.
A report of BioMed Central's third annual Beyond the Genome conference, held at Harvard Medical School, Boston, September 27-29, 2012
Quantitative imaging in radiation oncology
Artificially intelligent eyes, built on machine and deep learning technologies, can empower our capability of analysing patients’ images. By revealing information invisible at our eyes, we can build decision aids that help our clinicians to provide more effective treatment, while reducing side effects. The power of these decision aids is to be based on patient tumour biologically unique properties, referred to as biomarkers. To fully translate this technology into the clinic we need to overcome barriers related to the reliability of image-derived biomarkers, trustiness in AI algorithms and privacy-related issues that hamper the validation of the biomarkers. This thesis developed methodologies to solve the presented issues, defining a road map for the responsible usage of quantitative imaging into the clinic as decision support system for better patient care
Recommended from our members
ENIGMA and global neuroscience: A decade of large-scale studies of the brain in health and disease across more than 40 countries.
This review summarizes the last decade of work by the ENIGMA (Enhancing NeuroImaging Genetics through Meta Analysis) Consortium, a global alliance of over 1400 scientists across 43 countries, studying the human brain in health and disease. Building on large-scale genetic studies that discovered the first robustly replicated genetic loci associated with brain metrics, ENIGMA has diversified into over 50 working groups (WGs), pooling worldwide data and expertise to answer fundamental questions in neuroscience, psychiatry, neurology, and genetics. Most ENIGMA WGs focus on specific psychiatric and neurological conditions, other WGs study normal variation due to sex and gender differences, or development and aging; still other WGs develop methodological pipelines and tools to facilitate harmonized analyses of "big data" (i.e., genetic and epigenetic data, multimodal MRI, and electroencephalography data). These international efforts have yielded the largest neuroimaging studies to date in schizophrenia, bipolar disorder, major depressive disorder, post-traumatic stress disorder, substance use disorders, obsessive-compulsive disorder, attention-deficit/hyperactivity disorder, autism spectrum disorders, epilepsy, and 22q11.2 deletion syndrome. More recent ENIGMA WGs have formed to study anxiety disorders, suicidal thoughts and behavior, sleep and insomnia, eating disorders, irritability, brain injury, antisocial personality and conduct disorder, and dissociative identity disorder. Here, we summarize the first decade of ENIGMA's activities and ongoing projects, and describe the successes and challenges encountered along the way. We highlight the advantages of collaborative large-scale coordinated data analyses for testing reproducibility and robustness of findings, offering the opportunity to identify brain systems involved in clinical syndromes across diverse samples and associated genetic, environmental, demographic, cognitive, and psychosocial factors
A computational pipeline for the diagnosis of CVID patients
Common variable immunodeficiency (CVID) is one of the most frequently diagnosed primary antibody deficiencies (PADs), a group of disorders characterized by a decrease in one or more immunoglobulin (sub) classes and/or impaired antibody responses caused by inborn defects in B cells in the absence of other major immune defects. CVID patients suffer from recurrent infections and disease-related, non-infectious, complications such as autoimmune manifestations, lymphoproliferation, and malignancies. A timely diagnosis is essential for optimal follow-up and treatment. However, CVID is by definition a diagnosis of exclusion, thereby covering a heterogeneous patient population and making it difficult to establish a definite diagnosis. To aid the diagnosis of CVID patients, and distinguish them from other PADs, we developed an automated machine learning pipeline which performs automated diagnosis based on flow cytometric immunophenotyping. Using this pipeline, we analyzed the immunophenotypic profile in a pediatric and adult cohort of 28 patients with CVID, 23 patients with idiopathic primary hypogammaglobulinemia, 21 patients with IgG subclass deficiency, six patients with isolated IgA deficiency, one patient with isolated IgM deficiency, and 100 unrelated healthy controls. Flow cytometry analysis is traditionally done by manual identification of the cell populations of interest. Yet, this approach has severe limitations including subjectivity of the manual gating and bias toward known populations. To overcome these limitations, we here propose an automated computational flow cytometry pipeline that successfully distinguishes CVID phenotypes from other PADs and healthy controls. Compared to the traditional, manual analysis, our pipeline is fully automated, performing automated quality control and data pre-processing, automated population identification (gating) and deriving features from these populations to build a machine learning classifier to distinguish CVID from other PADs and healthy controls. This results in a more reproducible flow cytometry analysis, and improves the diagnosis compared to manual analysis: our pipelines achieve on average a balanced accuracy score of 0.93 (+/- 0.07), whereas using the manually extracted populations, an averaged balanced accuracy score of 0.72 (+/- 0.23) is achieved
- …