536 research outputs found

    Learning from Noisy Crowd Labels with Logics

    Full text link
    This paper explores the integration of symbolic logic knowledge into deep neural networks for learning from noisy crowd labels. We introduce Logic-guided Learning from Noisy Crowd Labels (Logic-LNCL), an EM-alike iterative logic knowledge distillation framework that learns from both noisy labeled data and logic rules of interest. Unlike traditional EM methods, our framework contains a ``pseudo-E-step'' that distills from the logic rules a new type of learning target, which is then used in the ``pseudo-M-step'' for training the classifier. Extensive evaluations on two real-world datasets for text sentiment classification and named entity recognition demonstrate that the proposed framework improves the state-of-the-art and provides a new solution to learning from noisy crowd labels.Comment: 12 pages, 7 figures, accepted by ICDE-202

    Reliability and accuracy of interview data in non-smoking female lung cancer case-control study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Valid interview data is critical to the final results of the study. The purpose of this study was to investigate the reliability of epidemiological data obtained in non-smoking female lung cancer case-control study in China.</p> <p>Methods</p> <p>Fifty-six pairs of cases and controls, 10% percent of all the collected subjects were re-interviewed by three interviewers who underwent identical standardized training. A limited number of questions included in the original survey were asked again, the responses from the re-interview were compared with the original interview. Kappa was calculated by negative rates of agreement, positive rates of agreement and total rates of agreement to the accordance degree between the two interviews.</p> <p>Results</p> <p>The Kappa values were all more than 0.5 in all the studied indexes. The Kappa values descended from 0.92 in family history of cancer to 0.56 in oral contraception use. Errors in collecting and classifying data did occur, and were especially common for complicated clinical events, such as a drug exposure occurring many years before.</p> <p>Conclusion</p> <p>We identified four sources of this variability, three in collecting the data, and one in coding. As a result of these findings, strategies are proposed for improving the quality of interview data obtained in epidemiological research. Before finding a good solution, the strategy of data collecting and coding should be simple and easy to inspect.</p

    Hyperaccumulators for potentially toxic elements: A scientometric analysis

    Get PDF
    Phytoremediation is an effective and low-cost method for the remediation of soil contaminated by potentially toxic elements (metals and metalloids) with hyperaccumulating plants. This study analyzed hyperaccumulator publications using data from the Web of Science Core Collection (WoSCC) (1992–2020). We explored the research status on this topic by creating a series of scientific maps using VOSviewer, HistCite Pro, and CiteSpace. The results showed that the total number of publications in this field shows an upward trend. Dr. Xiaoe Yang is the most productive researcher on hyperaccumulators and has the broadest international collaboration network. The Chinese Academy of Sciences (China), Zhejiang University (China), and the University of Florida (USA) are the top three most productive institutions in the field. China, the USA, and India are the top three most productive countries. The most widely used journals were the International Journal of Phytoremediation, Environmental Science and Pollution Research, and Chemosphere. Co-occurrence and citation analysis were used to identify the most influential publications in this field. In addition, possible knowledge gaps and perspectives for future studies are also presented
    • …
    corecore