212 research outputs found

    A Vietnamese Handwritten Text Recognition Pipeline for Tetanus Medical Records

    Get PDF
    Machine learning techniques are successful for optical character recognition tasks, especially in recognizing handwriting. However, recognizing Vietnamese handwriting is challenging with the presence of extra six distinctive tonal symbols and vowels. Such a challenge is amplified given the handwriting of health workers in an emergency care setting, where staff is under constant pressure to record the well-being of patients. In this study, we aim to digitize the handwriting of Vietnamese health workers. We develop a complete handwritten text recognition pipeline that receives scanned documents, detects, and enhances the handwriting text areas of interest, transcribes the images into computer text, and finally auto-corrects invalid words and terms to achieve high accuracy. From experiments with medical documents written by 30 doctors and nurses from the Tetanus Emergency Care unit at the Hospital for Tropical Diseases, we obtain promising results of 2% and 12% for Character Error Rate and Word Error Rate, respectively

    Label driven Knowledge Distillation for Federated Learning with non-IID Data

    Full text link
    In real-world applications, Federated Learning (FL) meets two challenges: (1) scalability, especially when applied to massive IoT networks; and (2) how to be robust against an environment with heterogeneous data. Realizing the first problem, we aim to design a novel FL framework named Full-stack FL (F2L). More specifically, F2L utilizes a hierarchical network architecture, making extending the FL network accessible without reconstructing the whole network system. Moreover, leveraging the advantages of hierarchical network design, we propose a new label-driven knowledge distillation (LKD) technique at the global server to address the second problem. As opposed to current knowledge distillation techniques, LKD is capable of training a student model, which consists of good knowledge from all teachers' models. Therefore, our proposed algorithm can effectively extract the knowledge of the regions' data distribution (i.e., the regional aggregated models) to reduce the divergence between clients' models when operating under the FL system with non-independent identically distributed data. Extensive experiment results reveal that: (i) our F2L method can significantly improve the overall FL efficiency in all global distillations, and (ii) F2L rapidly achieves convergence as global distillation stages occur instead of increasing on each communication cycle.Comment: 28 pages, 5 figures, 10 table

    The seesaw mechanism at TeV scale in the 3-3-1 model with right-handed neutrinos

    Full text link
    We implement the seesaw mechanism in the 3-3-1 model with right-handed neutrinos. This is accomplished by the introduction of a scalar sextet into the model and the spontaneous violation of the lepton number. We identify the Majoron as a singlet under SUL(2)UY(1)SU_L(2)\otimes U_Y(1) symmetry, which makes it safe under the current bounds imposed by electroweak data. The main result of this work is that the seesaw mechanism works already at TeV scale with the outcome that the right-handed neutrino masses lie in the electroweak scale, in the range from MeV to tens of GeV. This window provides a great opportunity to test their appearance at current detectors, though when we contrast our results with some previous analysis concerning detection sensitivity at LHC, we conclude that further work is needed in order to validate this search.Comment: about 13 pages, no figure

    Elliptic flow from two- and four-particle correlations in Au + Au collisions at sqrt{s_{NN}} = 130 GeV

    Get PDF
    Elliptic flow holds much promise for studying the early-time thermalization attained in ultrarelativistic nuclear collisions. Flow measurements also provide a means of distinguishing between hydrodynamic models and calculations which approach the low density (dilute gas) limit. Among the effects that can complicate the interpretation of elliptic flow measurements are azimuthal correlations that are unrelated to the reaction plane (non-flow correlations). Using data for Au + Au collisions at sqrt{s_{NN}} = 130 GeV from the STAR TPC, it is found that four-particle correlation analyses can reliably separate flow and non-flow correlation signals. The latter account for on average about 15% of the observed second-harmonic azimuthal correlation, with the largest relative contribution for the most peripheral and the most central collisions. The results are also corrected for the effect of flow variations within centrality bins. This effect is negligible for all but the most central bin, where the correction to the elliptic flow is about a factor of two. A simple new method for two-particle flow analysis based on scalar products is described. An analysis based on the distribution of the magnitude of the flow vector is also described.Comment: minor text change

    An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

    Get PDF
    For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types
    corecore