25 research outputs found

    Validation of Results from Knowledge Discovery: Mass Density as a Predictor of Breast Cancer

    Get PDF
    The purpose of our study is to identify and quantify the association between high breast mass density and breast malignancy using inductive logic programming (ILP) and conditional probabilities, and validate this association in an independent dataset. We ran our ILP algorithm on 62,219 mammographic abnormalities. We set the Aleph ILP system to generate 10,000 rules per malignant finding with a recall >5% and precision >25%. Aleph reported the best rule for each malignant finding. A total of 80 unique rules were learned. A radiologist reviewed all rules and identified potentially interesting rules. High breast mass density appeared in 24% of the learned rules. We confirmed each interesting rule by calculating the probability of malignancy given each mammographic descriptor. High mass density was the fifth highest ranked predictor. To validate the association between mass density and malignancy in an independent dataset, we collected data from 180 consecutive breast biopsies performed between 2005 and 2007. We created a logistic model with benign or malignant outcome as the dependent variable while controlling for potentially confounding factors. We calculated odds ratios based on dichomotized variables. In our logistic regression model, the independent predictors high breast mass density (OR 6.6, CI 2.5–17.6), irregular mass shape (OR 10.0, CI 3.4–29.5), spiculated mass margin (OR 20.4, CI 1.9–222.8), and subject age (β = 0.09, p < 0.0001) significantly predicted malignancy. Both ILP and conditional probabilities show that high breast mass density is an important adjunct predictor of malignancy, and this association is confirmed in an independent data set of prospectively collected mammographic findings

    Mapping circumstellar magnetic fields of late-type evolved stars with the Goldreich-Kylafis effect: CARMA observations at λ1.3\lambda 1.3 mm of R Crt and R Leo

    Full text link
    Mapping magnetic fields is the key to resolving what remains an unclear physical picture of circumstellar magnetic fields in late-type evolved stars. Observations of linearly polarized emission from thermal molecular line transitions due to the Goldreich-Kylafis (G-K) effect provides valuable insight into the magnetic field geometry in these sources that is complementary to other key studies. In this paper, we present the detection of spectral-line polarization from both the thermal J=21J=2-1 CO line and the v=1,J=54v=1, J=5-4 SiO maser line toward two thermal-pulsating (TP-) AGB stars, R Crt and R Leo. The observed fractional linear polarization in the CO emission is measured as ml3.1%m_l\sim 3.1\% and ml9.7%m_l\sim9.7\% for R Crt and R Leo respectively. A circumstellar envelope (CSE) model profile and the associated parameters are estimated and used as input to a more detailed modeling of the predicted linear polarization expected from the G-K effect. The observed thermal line polarization level is consistent with the predicted results from the G-K model for R Crt; additional effects need to be considered for R Leo

    Empirical Legal Studies Before 1940: A Bibliographic Essay

    Get PDF
    The modern empirical legal studies movement has well-known antecedents in the law and society and law and economics traditions of the latter half of the 20th century. Less well known is the body of empirical research on legal phenomena from the period prior to World War II. This paper is an extensive bibliographic essay that surveys the English language empirical legal research from approximately 1940 and earlier. The essay is arranged around the themes in the research: criminal justice, civil justice (general studies of civil litigation, auto accident litigation and compensation, divorce, small claims, jurisdiction and procedure, civil juries), debt and bankruptcy, banking, appellate courts, legal needs, legal profession (including legal education), and judicial staffing and selection. Accompanying the essay is an extensive bibliography of research articles, books, and reports

    PANC Study (Pancreatitis: A National Cohort Study): national cohort study examining the first 30 days from presentation of acute pancreatitis in the UK

    Get PDF
    Abstract Background Acute pancreatitis is a common, yet complex, emergency surgical presentation. Multiple guidelines exist and management can vary significantly. The aim of this first UK, multicentre, prospective cohort study was to assess the variation in management of acute pancreatitis to guide resource planning and optimize treatment. Methods All patients aged greater than or equal to 18 years presenting with acute pancreatitis, as per the Atlanta criteria, from March to April 2021 were eligible for inclusion and followed up for 30 days. Anonymized data were uploaded to a secure electronic database in line with local governance approvals. Results A total of 113 hospitals contributed data on 2580 patients, with an equal sex distribution and a mean age of 57 years. The aetiology was gallstones in 50.6 per cent, with idiopathic the next most common (22.4 per cent). In addition to the 7.6 per cent with a diagnosis of chronic pancreatitis, 20.1 per cent of patients had a previous episode of acute pancreatitis. One in 20 patients were classed as having severe pancreatitis, as per the Atlanta criteria. The overall mortality rate was 2.3 per cent at 30 days, but rose to one in three in the severe group. Predictors of death included male sex, increased age, and frailty; previous acute pancreatitis and gallstones as aetiologies were protective. Smoking status and body mass index did not affect death. Conclusion Most patients presenting with acute pancreatitis have a mild, self-limiting disease. Rates of patients with idiopathic pancreatitis are high. Recurrent attacks of pancreatitis are common, but are likely to have reduced risk of death on subsequent admissions. </jats:sec

    Adaptively Finding and Combining First-Order Rules for Large, Skewed Data Sets

    No full text
    Inductive Logic Programming (ILP) is a machine-learning approach that uses first-order logic to create human-readable rules from a database of information and a set of positive and negative examples. When working with highly skewed data sets where the negatives vastly outnumber the positives, common metrics such as predictive accuracy and area under the receiver-operator characteristic curves (AUROC) do not work well because these metrics count negatives and positives equally, causing performance on the negative examples to dominate. This thesis explores creating ensembles of rules to maximize area under the recall-precision curves (AURPC), a much better metric that focuses specifically on the coverage and accuracy of labeling the positive examples. I create an ensemble of rules from a wide range of recall values and combine them to maximize AURPC. My Gleaner algorithm retains a set of rules for each positive seed example where standard ILP methods keep only a single rule. Gleaning rules from those rules that would normally be discarded and combining them into a single ensemble shows improved predictive performance while reducing the number of rules evaluated. I evaluate several modified search methods for finding sets of clauses that work well together. One method applies a probability distribution over the space of rules and stochastically selects rules more likely to improve Gleaner's predictive performance. A second method follows a boosting framework and weights examples in order to maximize AURPC. Tying together the method of combining rules with the search for good candidate rules shows improvement over the standard Gleaner algorithm. I apply these first-order ensemble techniques to several data sets from two very different domains. The first data sets come from the Information-Extraction (IE) domain where the task is to find specific relationships in text. The next data sets come from the computer-assisted medical-diagnosis domain. The task is to identify findings on a mammogram as malignant or benign given descriptors of the findings, patient risk factors, radiologist's score, and information from any previous mammograms. I also include my work with Davis et al.'s SAYU algorithm. I demonstrate methods to improve predictive performance and to increase understanding of malignancy indicators. Inclusion of additional background knowledge that allows for rules to contain ranges of values provides for more complex models that improve predictive performance. I also show that transferred models are able to outperform radiologists at new institutions even when no additional data are available from the new institution. Finally, first-order rules and probability help in improving understanding of malignant indicators. I use these techniques to confirm the importance of high mass density in identifying malignant findings. I also identify surprising pairs of features that perform better than expected at identifying malignant findings than would be expected by looking at the features individually

    Using Bayesian Networks to Direct Stochastic Search

    No full text
    Abstract. Stochastically searching the space of candidate clauses is an appealing way to scale up ILP to large datasets. We address an approach that uses a Bayesian network model to adaptively guide search in this space. We examine guiding search towards areas that previously performed well and towards areas that ILP has not yet thoroughly explored. We show improvement in area under the curve for recall-precision curves using these modifications.

    Gleaner: Creating Ensembles of Firstorder Clauses to Improve Recall-Precision Curves

    No full text
    Abstract. Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. The goal of our research is to find new approaches within ILP particularly suited for large, highly-skewed domains. We propose Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an “at least L of these K clauses ” thresholding method to combine sets of selected clauses. Our research focuses on Multi-Slot Information Extraction (IE), a task that typically involves many more negative examples than positive examples. We formulate this problem into a relational domain, using two large testbeds involving the extraction of important relations from the abstracts of biomedical journal articles. We compare Gleaner to ensembles of standard theories learned by Aleph, finding that Gleaner produces comparable testset results in a fraction of the training time

    Learning to Extract Genic Interactions using Gleaner

    No full text
    We explore here the application of Gleaner, an Inductive Logic Programming approach to learning in highly-skewed domains, to the Learning Language in Logic 2005 biomedical information-extraction challenge task. We create and describe a large number of background knowledge predicates suited for this task. We find that Gleaner outperforms standard Aleph theories with respect to recall and that additional linguistic background knowledge improves recall. 1

    A novel method to measure T-1-relaxation times of macromolecules and quantification of the macromolecular resonances

    No full text
    Purpose: Macromolecular peaks underlying metabolite spectra influence the quantification of metabolites. Therefore, it is important to understand the extent of contribution from macromolecules (MMs) in metabolite quantification. However, to model MMs more accurately in spectral fitting, differences in T1 relaxation times among individual MM peaks must be considered. Characterization of T1 -relaxation times for all individual MM peaks using a single inversion recovery technique is difficult due to eventual contributions from metabolites. On the contrary, a double inversion recovery (DIR) technique provided flexibility to acquire MM spectra spanning a range of longitudinal magnetizations with minimal metabolite influence. Thus, a novel method to determine T1 -relaxation times of individual MM peaks is reported in this work. Methods: Extensive Bloch simulations were performed to determine inversion time combinations for a DIR technique that yielded adequate MM signal with varying longitudinal magnetizations while minimizing metabolite contributions. MM spectra were acquired using DIR-metabolite-cycled semi-LASER sequence. LCModel concentrations were fitted to the DIR signal equation to calculate T1 -relaxation times. Results: T1 -relaxation times of MMs range from 204 to 510 ms and 253 to 564 ms in gray- and white-matter rich voxels respectively at 9.4T. Additionally, concentrations of 13 MM peaks are reported. Conclusion: A novel DIR method is reported in this work to calculate T1 -relaxation times of MMs in the human brain. T1 -relaxation times and relaxation time corrected concentrations of individual MMs are reported in gray- and white-matter rich voxels for the first time at 9.4T
    corecore