966 research outputs found

    Application of Naïve Bayesian sequential analysis to primary care optometry

    Get PDF
    The objective of this study was to investigate the effects of circularity, comorbidity, prevalence and presentation variation on the accuracy of differential diagnoses made in optometric primary care using a modified form of naïve Bayesian sequential analysis. No such investigation has ever been reported before. Data were collected for 1422 cases seen over one year. Positive test outcomes were recorded for case history (ethnicity, age, symptoms and ocular and medical history) and clinical signs in relation to each diagnosis. For this reason only positive likelihood ratios were used for this modified form of Bayesian analysis that was carried out with Laplacian correction and Chi-square filtration. Accuracy was expressed as the percentage of cases for which the diagnoses made by the clinician appeared at the top of a list generated by Bayesian analysis. Preliminary analyses were carried out on 10 diagnoses and 15 test outcomes. Accuracy of 100% was achieved in the absence of presentation variation but dropped by 6% when variation existed. Circularity artificially elevated accuracy by 0.5%. Surprisingly, removal of Chi-square filtering increased accuracy by 0.4%. Decision tree analysis showed that accuracy was influenced primarily by prevalence followed by presentation variation and comorbidity. Analysis of 35 diagnoses and 105 test outcomes followed. This explored the use of positive likelihood ratios, derived from the case history, to recommend signs to look for. Accuracy of 72% was achieved when all clinical signs were entered. The drop in accuracy, compared to the preliminary analysis, was attributed to the fact that some diagnoses lacked strong diagnostic signs; the accuracy increased by 1% when only recommended signs were entered. Chi-square filtering improved recommended test selection. Decision tree analysis showed that accuracy again influenced primarily by prevalence, followed by comorbidity and presentation variation. Future work will explore the use of likelihood ratios based on positive and negative test findings prior to considering naïve Bayesian analysis as a form of artificial intelligence in optometric practice

    Low-Resource Machine Learning Techniques for the Analysis of Online Social Media Textual Data

    Get PDF
    Low-resource and label-efficient machine learning methods can be described as the family of statistical and machine learning techniques that can achieve high performance without needing a substantial amount of labeled data. These methods include both unsupervised learning techniques, such as LDA, and supervised methods, such as active learning, each providing different benefits. Thus, this dissertation is devoted to the design and analysis of unsupervised and supervised techniques to provide solutions for the following problems: Unsupervised narrative summary extraction for social media content, Social media text classification with Active Learning (AL), Investigating restrictions and benefits of using Curriculum Learning (CL) for social media text classification. For the first problem, we present a framework that can identify the viral topics over time and provide a narrative summary for the identified topics in an unsupervised manner. Our framework can provide such information with varying time resolution. For the second problem, we present a strategy that conducts data sampling based on the local structures in the embedding space of a large pretrained language model. The data selection for annotation is conducted for the data samples that do not belong to a dominant set as these samples are less similar to the rest of the data points, and accordingly, are more challenging for the model. This criterion is a compelling technique that minimizes the need for large annotated datasets. Then for the third problem, we consider similar data difficulty notions to study the impacts of learning from such a curriculum to train models from easy samples first. This is opposite to the idea of active learning. However, instead of learning from a small number of data and disregarding a substantial amount of information, gradual training from easy samples leads to learning a trajectory to a better local minimum. Our study includes curricula based on both heuristics and model-derived

    Psychometrics in Practice at RCEC

    Get PDF
    A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment

    Accounting for structure in education assessment data using hierarchical models

    Get PDF
    As the field of education continues to grow, new methods and approaches to teaching are being developed with the goal of improving students\u27 understanding of concepts. While research exists showing positive effects for particular teaching methods in small case studies, generalizations to larger populations of students, which are needed to adequately inform policy decisions, can be difficult when using traditional inferential procedures for group comparisons that rely on randomization, replication, and control over relevant factors. Data collected to compare teaching methods often consists of student level responses, where students are nested within a class, which is typically the experimental unit. Further, for studies in which the scope of inference exceeds individual schools or instructors, we often have classes nested within other factors such as semesters or instructors. In the first part of this dissertation, we explore the consequences of analyzing such data without accounting for the nesting structure. We then show that a hierarchical modeling approach allows us to appropriately account for structure in this type of data. As an illustration, we demonstrate the use of a model-based approach to comparing two teaching methods by fitting a hierarchical model to data from a second course in statistics at Iowa State University. To fit a hierarchical model to a dataset, the nesting structure must be chosen, a priori. However, with data from an educational setting, there can be instances when the nesting structure is ambiguous. For example, should semesters be nested within instructors or vice versa? In part two of this dissertation, we develop a data-driven diagnostic using moment-based variance estimators to aid in the choice of nesting structure prior to fitting a hierarchical model. We conduct a simulation study to demonstrate the diagnostic\u27s effectiveness and then apply the diagnostic to data from a nationally recognized standardized exam measuring statistical understanding after a first course in statistics. The results from the diagnostic and the subsequent fitted hierarchical model demonstrate the presence of a difference between the upper level grouping variable that represents the effect of interest. More broadly, this example is intended to highlight the use of hierarchical models for analyzing education data in a way that adequately accounts for variation between students that arises from nested data structures

    Privacy in the Genomic Era

    Get PDF
    Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward

    Bayesian Partially Ordered Probit and Logit Models with an Application to Course Redesign

    Get PDF
    Large entry-level courses are commonplace at public 2- and 4-year institutions of higher education (IHEs) across the United States. Low pass rates in these entry-level courses, coupled with tight budgets, have put pressure on IHEs to look for ways to teach more students more effectively at a lower cost. Efforts to improve student outcomes in such courses are often called ``course redesigns.\u27 The difficulty arises in trying to determine the impact of a particular course redesign; true random-controlled trials are expensive and time-consuming, and few IHEs have the resources or patience to implement them. As a result, almost all evaluations of efforts to improve student success at scale rely on observational studies. At the same time, standard multilevel models may be inadequate to extract meaningful information from the complex and messy sets of student data available to evaluators because they throw away information by treating all passing grades equally. We propose a new Bayesian approach that keeps all grading information: a partially ordered multinomial probit model with random effects fit using a Markov Chain Monte Carlo algorithm, and a logit model that can be fit with importance sampling. Simulation studies show that the Bayesian Partially Ordered Probit/Logit Models work well, and the parameter estimation is precise in large samples. We also compared this model with standard models considering Mean Squared Error and the area under the Receiver Operating Characteristic (ROC) curve. We applied these new models to evaluate the impact of a course redesign at a large public university using the students\u27 grade data from the Fall semester of 2012 and the Spring semester of 2013
    corecore