662,068 research outputs found

    Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison

    Get PDF
    Word-based models have achieved promising results in sequence comparison. However, as the important statistical properties of words in biological sequence, how to use the overlapping structures and background information of the words to improve sequence comparison is still a problem. This paper proposed a new statistical method that integrates the overlapping structures and the background information of the words in biological sequences. To assess the effectiveness of this integration for sequence comparison, two sets of evaluation experiments were taken to test the proposed model. The first one, performed via receiver operating curve analysis, is the application of proposed method in discrimination between functionally related regulatory sequences and unrelated sequences, intron and exon. The second experiment is to evaluate the performance of the proposed method with f-measure for clustering Hepatitis E virus genotypes. It was demonstrated that the proposed method integrating the overlapping structures and the background information of words significantly improves biological sequence comparison and outperforms the existing models

    Pancreas MRI segmentation into head, body, and tail enables regional quantitative analysis of heterogeneous disease

    Get PDF
    Background: Quantitative imaging studies of the pancreas have often targeted the three main anatomical segments, head, body, and tail, using manual region of interest strategies to assess geographic heterogeneity. Existing automated analyses have implemented whole-organ segmentation, providing overall quantification but failing to address spatial heterogeneity. Purpose: To develop and validate an automated method for pancreas segmentation into head, body, and tail subregions in abdominal MRI. Study Type: Retrospective. Subjects: One hundred and fifty nominally healthy subjects from UK Biobank (100 subjects for method development and 50 subjects for validation). A separate 390 UK Biobank triples of subjects including type 2 diabetes mellitus (T2DM) subjects and matched nondiabetics. Field strength/Sequence: A 1.5 T, three-dimensional two-point Dixon sequence (for segmentation and volume assessment) and a two-dimensional axial multiecho gradient-recalled echo sequence. Assessment: Pancreas segments were annotated by four raters on the validation cohort. Intrarater agreement and interrater agreement were reported using Dice overlap (Dice similarity coefficient [DSC]). A segmentation method based on template registration was developed and evaluated against annotations. Results on regional pancreatic fat assessment are also presented, by intersecting the three-dimensional parts segmentation with one available proton density fat fraction (PDFF) image. Statistical Test: Wilcoxon signed rank test and Mann–Whitney U-test for comparisons. DSC and volume differences for evaluation. A P value  Results: Good intrarater (DSC mean, head: 0.982, body: 0.940, tail: 0.961) agreement and interrater (DSC mean, head: 0.968, body: 0.905, tail: 0.943) agreement were observed. No differences (DSC, head: P = 0.4358, body: P = 0.0992, tail: P = 0.1080) were observed between the manual annotations and our method's segmentations (DSC mean, head: 0.965, body: 0.893, tail: 0.934). Pancreatic body PDFF was different between T2DM and nondiabetics matched by body mass index. Data Conclusion: The developed segmentation's performance was no different from manual annotations. Application on type 2 diabetes subjects showed potential for assessing pancreatic disease heterogeneity. Level of Evidence: 4 Technical Efficacy Stage: 3

    Implementasi Penyusunan Instrumen Evaluasi Yang Digunakan Oleh Widyaiswara Dalam Mengukur Keberhasilan Pelatihan Di Balai Besar Pendidikan Dan Pelatihan Kesejahteraan Sosial Lembang

    Full text link
    Problems in this study, namely, "How is the procedure of preparation of training evaluation instruments used by trainers to measure the success of the training at the Center for Education and Social Welfare Training Lembang". This study aimed to obtain data and information on procedures for the preparation of an evaluation instrument training at the Center for Education and Social Welfare Training Lembang, to obtain data and information on the testing procedures of evaluation instruments of training at the Center for Education and Social Welfare Training Lembang, as well as to obtain data and data processing information on the evaluation of training at the Center for Education and Social Welfare Training Lembang. Theoretical foundation discussed in this study is the concept of training, evaluation and understanding the concept of evaluation instruments. The method used in this research is descriptive method with data collecting technique is interview, observation, documentation studies and literature studies. The main subject of this study are trainers and training managers in BBPPKS Lembang. Based on the findings of this study are as follows: 1) The procedure for preparing and developing instrument tests conducted by trainers in BBPPKS Lembangdimulai of set purpose test, establish learning outcomes to be measured, prepare a table of specifications (lattice), specify the content of the test material , Establish test items, setting norms rules, and prepare the scoring key. 2) The testing procedure evaluation instrument in Lembang BBPPKS not done testing the validity and reliability testing item about evaluation instruments. It is based on the reason that the nature of the training in Lembang BBPPKS a dynamic that is not enough time to do such testing, there are factors besides lack of understanding some trainers use statistical techniques in testing tersebut.Tetapi overall evaluation instrument has passed the testing measures such as validity logical, content validity, construct validity, empirical validity, objectivity, the level of difficulty and praktikabilitas. 3) Stages of data processing evaluation results in BBPPKS Lembang Lembang start of the examination results of the evaluation of learning, perekapan evaluation data, scoring, score changes into value and the analysis and interpretation of evaluation data for decision making is then performed in sequence in accordance with the procedures specified by the institution. From the results of data analysis, it can be concluded that 1) the procedure for preparing and developing instrument tests conducted by trainers in BBPPKS Lembang in accordance with the steps the preparation of good learning evaluation instruments. 2) The testing procedure evaluation instrument in Lembang BBPPKS not meet the criteria of the characteristics of a good test because it is not a matter of testing the validity and reliability test item evaluation instruments. 3) Overall data processing at the Institute BBPPKS Lembang has done well, the stages of data processing results of the evaluation starts from the examination results of the evaluation of learning, perekapan evaluation data, scoring, score changes into value and the analysis and interpretation of evaluation data for decision making then performed sequentially in accordance with the procedures specified by the institution

    Rapid feedback on hospital onset SARS-CoV-2 infections combining epidemiological and sequencing data.

    Get PDF
    BACKGROUND: Rapid identification and investigation of healthcare-associated infections (HCAIs) is important for suppression of SARS-CoV-2, but the infection source for hospital onset COVID-19 infections (HOCIs) cannot always be readily identified based only on epidemiological data. Viral sequencing data provides additional information regarding potential transmission clusters, but the low mutation rate of SARS-CoV-2 can make interpretation using standard phylogenetic methods difficult. METHODS: We developed a novel statistical method and sequence reporting tool (SRT) that combines epidemiological and sequence data in order to provide a rapid assessment of the probability of HCAI among HOCI cases (defined as first positive test >48 hr following admission) and to identify infections that could plausibly constitute outbreak events. The method is designed for prospective use, but was validated using retrospective datasets from hospitals in Glasgow and Sheffield collected February-May 2020. RESULTS: We analysed data from 326 HOCIs. Among HOCIs with time from admission ≥8 days, the SRT algorithm identified close sequence matches from the same ward for 160/244 (65.6%) and in the remainder 68/84 (81.0%) had at least one similar sequence elsewhere in the hospital, resulting in high estimated probabilities of within-ward and within-hospital transmission. For HOCIs with time from admission 3-7 days, the SRT probability of healthcare acquisition was >0.5 in 33/82 (40.2%). CONCLUSIONS: The methodology developed can provide rapid feedback on HOCIs that could be useful for infection prevention and control teams, and warrants further prospective evaluation. The integration of epidemiological and sequence data is important given the low mutation rate of SARS-CoV-2 and its variable incubation period. FUNDING: COG-UK HOCI funded by COG-UK consortium, supported by funding from UK Research and Innovation, National Institute of Health Research and Wellcome Sanger Institute.COG-UK HOCI funded by COG-UK consortium, supported by funding from UK Research and Innovation, National Institute of Health Research and Wellcome Sanger Institute

    Why Comparing Single Performance Scores Does Not Allow to Draw Conclusions About Machine Learning Approaches

    Full text link
    Developing state-of-the-art approaches for specific tasks is a major driving force in our research community. Depending on the prestige of the task, publishing it can come along with a lot of visibility. The question arises how reliable are our evaluation methodologies to compare approaches? One common methodology to identify the state-of-the-art is to partition data into a train, a development and a test set. Researchers can train and tune their approach on some part of the dataset and then select the model that worked best on the development set for a final evaluation on unseen test data. Test scores from different approaches are compared, and performance differences are tested for statistical significance. In this publication, we show that there is a high risk that a statistical significance in this type of evaluation is not due to a superior learning approach. Instead, there is a high risk that the difference is due to chance. For example for the CoNLL 2003 NER dataset we observed in up to 26% of the cases type I errors (false positives) with a threshold of p < 0.05, i.e., falsely concluding a statistically significant difference between two identical approaches. We prove that this evaluation setup is unsuitable to compare learning approaches. We formalize alternative evaluation setups based on score distributions

    EFICAz²: enzyme function inference by a combined approach enhanced by machine learning

    Get PDF
    ©2009 Arakaki et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/10/107doi:10.1186/1471-2105-10-107Background: We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment. Results: We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz², exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz² and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz² generates considerably more unique assignments than KEGG. Conclusion: Performance benchmarks and the comparison with KEGG demonstrate that EFICAz² is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz² web service is available at: http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.htm

    Handwriting styles: benchmarks and evaluation metrics

    Full text link
    Evaluating the style of handwriting generation is a challenging problem, since it is not well defined. It is a key component in order to develop in developing systems with more personalized experiences with humans. In this paper, we propose baseline benchmarks, in order to set anchors to estimate the relative quality of different handwriting style methods. This will be done using deep learning techniques, which have shown remarkable results in different machine learning tasks, learning classification, regression, and most relevant to our work, generating temporal sequences. We discuss the challenges associated with evaluating our methods, which is related to evaluation of generative models in general. We then propose evaluation metrics, which we find relevant to this problem, and we discuss how we evaluate the evaluation metrics. In this study, we use IRON-OFF dataset. To the best of our knowledge, there is no work done before in generating handwriting (either in terms of methodology or the performance metrics), our in exploring styles using this dataset.Comment: Submitted to IEEE International Workshop on Deep and Transfer Learning (DTL 2018

    Log-based Evaluation of Label Splits for Process Models

    Get PDF
    Process mining techniques aim to extract insights in processes from event logs. One of the challenges in process mining is identifying interesting and meaningful event labels that contribute to a better understanding of the process. Our application area is mining data from smart homes for elderly, where the ultimate goal is to signal deviations from usual behavior and provide timely recommendations in order to extend the period of independent living. Extracting individual process models showing user behavior is an important instrument in achieving this goal. However, the interpretation of sensor data at an appropriate abstraction level is not straightforward. For example, a motion sensor in a bedroom can be triggered by tossing and turning in bed or by getting up. We try to derive the actual activity depending on the context (time, previous events, etc.). In this paper we introduce the notion of label refinements, which links more abstract event descriptions with their more refined counterparts. We present a statistical evaluation method to determine the usefulness of a label refinement for a given event log from a process perspective. Based on data from smart homes, we show how our statistical evaluation method for label refinements can be used in practice. Our method was able to select two label refinements out of a set of candidate label refinements that both had a positive effect on model precision.Comment: Paper accepted at the 20th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, to appear in Procedia Computer Scienc
    • …
    corecore