13 research outputs found
BreCaHAD: A dataset for breast cancer histopathological annotation and diagnosis
Objectives: Histopathological tissue analysis by a pathologist determines the diagnosis and prognosis of most tumors, such as breast cancer. To estimate the aggressiveness of cancer, a pathologist evaluates the microscopic appearance of a biopsied tissue sample based on morphological features which have been correlated with patient outcome. Data description: This paper introduces a dataset of 162 breast cancer histopathology images, namely the breast cancer histopathological annotation and diagnosis dataset (BreCaHAD) which allows researchers to optimize and evaluate the usefulness of their proposed methods. The dataset includes various malignant cases. The task associated with this dataset is to automatically classify histological structures in these hematoxylin and eosin (H&E) stained images into six classes, namely mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule. By providing this dataset to the biomedical imaging community, we hope to encourage researchers in computer vision, machine learning and medical fields to contribute and develop methods/tools for automatic detection and diagnosis of cancerous regions in breast cancer histology images. © 2019 The Author(s)
Representative transcript sets for evaluating a translational initiation sites predictor
<p>Abstract</p> <p>Background</p> <p>Translational initiation site (TIS) prediction is a very important and actively studied topic in bioinformatics. In order to complete a comparative analysis, it is desirable to have several benchmark data sets which can be used to test the effectiveness of different algorithms. An ideal benchmark data set should be reliable, representative and readily available. Preferably, proteins encoded by members of the data set should also be representative of the protein population actually expressed in cellular specimens.</p> <p>Results</p> <p>In this paper, we report a general algorithm for constructing a reliable sequence collection that only includes mRNA sequences whose corresponding protein products present an average profile of the general protein population of a given organism, with respect to three major structural parameters. Four representative transcript collections, each derived from a model organism, have been obtained following the algorithm we propose. Evaluation of these data sets shows that they are reasonable representations of the spectrum of proteins obtained from cellular proteomic studies. Six state-of-the-art predictors have been used to test the usefulness of the construction algorithm that we proposed. Comparative study which reports the predictors' performance on our data set as well as three other existing benchmark collections has demonstrated the actual merits of our data sets as benchmark testing collections.</p> <p>Conclusion</p> <p>The proposed data set construction algorithm has demonstrated its property of being a general and widely applicable scheme. Our comparison with published proteomic studies has shown that the expression of our data set of transcripts generates a polypeptide population that is representative of that obtained from evaluation of biological specimens. Our data set thus represents "real world" transcripts that will allow more accurate evaluation of algorithms dedicated to identification of TISs, as well as other translational regulatory motifs within mRNA sequences. The algorithm proposed by us aims at compiling a redundancy-free data set by removing redundant copies of homologous proteins. The existence of such data sets may be useful for conducting statistical analyses of protein sequence-structure relations. At the current stage, our approach's focus is to obtain an "average" protein data set for any particular organism without posing much selection bias. However, with the three major protein structural parameters deeply integrated into the scheme, it would be a trivial task to extend the current method for obtaining a more selective protein data set, which may facilitate the study of some particular protein structure.</p
BreCaHAD: a dataset for breast cancer histopathological annotation and diagnosis
Abstract Objectives Histopathological tissue analysis by a pathologist determines the diagnosis and prognosis of most tumors, such as breast cancer. To estimate the aggressiveness of cancer, a pathologist evaluates the microscopic appearance of a biopsied tissue sample based on morphological features which have been correlated with patient outcome. Data description This paper introduces a dataset of 162 breast cancer histopathology images, namely the breast cancer histopathological annotation and diagnosis dataset (BreCaHAD) which allows researchers to optimize and evaluate the usefulness of their proposed methods. The dataset includes various malignant cases. The task associated with this dataset is to automatically classify histological structures in these hematoxylin and eosin (H&E) stained images into six classes, namely mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule. By providing this dataset to the biomedical imaging community, we hope to encourage researchers in computer vision, machine learning and medical fields to contribute and develop methods/tools for automatic detection and diagnosis of cancerous regions in breast cancer histology images
Assessing the performance of a Loop Mediated Isothermal Amplification (LAMP) assay for the detection and subtyping of high-risk suptypes of Human Papilloma Virus (HPV) for Oropharyngeal Squamous Cell Carcinoma (OPSCC) without DNA purification
Abstract Background Oropharyngeal Squamous Cell Carcinoma (OPSCC) is increasing in incidence despite a decline in traditional risk factors. Human Papilloma Virus (HPV), specifically subtypes 16, 18, 31 and 35, has been implicated as the high-risk etiologic agent. HPV positive cancers have a significantly better prognosis than HPV negative cancers of comparable stage, and may benefit from different treatment regimens. Currently, HPV related carcinogenesis is established indirectly through Immunohistochemistry (IHC) staining for p16, a tumour suppressor gene, or polymerase chain reaction (PCR) that directly tests for HPV DNA in biopsied tissue. Loop mediated isothermal amplification (LAMP) is more accurate than IHC, more rapid than PCR and is significantly less costly. In previous work we showed that a subtype specific HPV LAMP assay performed similar to PCR on purified DNA. In this study we examined the performance of this LAMP assay without DNA purification. Methods We used LAMP assays using established primers for HPV 16 and 18, and new primers for HPV 31 and 35. LAMP reaction conditions were tested on serial dilutions of plasmid HPV DNA to confirm minimum viral copy number detection thresholds. LAMP was then performed directly on different human cell line samples without DNA purification. Results Our LAMP assays could detect 105, 103, 104, and 105 copies of plasmid DNA for HPV 16, 18, 31, and 35, respectively. All primer sets were subtype specific, with no cross-amplification. Our LAMP assays also reliably amplified subtype specific HPV DNA from samples without requiring DNA isolation and purification. Conclusions The high risk OPSCC HPV subtype specific LAMP primer sets demonstrated, excellent clinically relevant, minimum copy number detection thresholds with an easy readout system. Amplification directly from samples without purification illustrated the robust nature of the assay, and the primers used. This lends further support HPV type specific LAMP assays, and these specific primer sets and assays can be further developed to test for HPV in OPSCC in resource and lab limited settings, or even bedside testing