603 research outputs found

    Machine learning classification of entrepreneurs in British historical census data

    Get PDF
    This paper presents a binary classification of entrepreneurs in British historical data based on the recent availability of big data from the I-CeM dataset. The main task of the paper is to attribute an employment status to individuals that did not fully report entrepreneur status in earlier censuses (1851-1881). The paper assesses the accuracy of different classifiers and machine learning algorithms, including Deep Learning, for this classification problem. We first adopt a ground-truth dataset from the later censuses to train the computer with a Logistic Regression (which is standard in the literature for this kind of binary classification) to recognize entrepreneurs distinct from non-entrepreneurs (i.e. workers). Our initial accuracy for this base-line method is 0.74. We compare the Logistic Regression with ten optimized machine learning algorithms: Nearest Neighbors, Linear and Radial Support Vector Machine, Gaussian Process, Decision Tree, Random Forest, Neural Network, AdaBoost, Naive Bayes, and Quadratic Discriminant Analysis. The best results are boosting and ensemble methods. AdaBoost achieves an accuracy of 0.95. Deep-Learning, as a standalone category of algorithms, further improves accuracy to 0.96 without using the rich text-data that characterizes the OccString feature, a string of up to 500 characters with the full occupational statement of each individual collected in the earlier censuses. Finally, and now using this OccString feature, we implement both shallow (bag-of-words algorithm) learning and Deep Learning (Recurrent Neural Network with a Long Short-Term Memory layer) algorithms. These methods all achieve accuracies above 0.99 with Deep Learning Recurrent Neural Network as the best model with an accuracy of 0.9978. The results show that standard algorithms for classification can be outperformed by machine learning algorithms. This confirms the value of extending the techniques traditionally used in the literature for this type of classification problem.ESRC Leverhulme Trust Isaac Newton Trus

    An Inner Gaseous Disk around the Herbig Be Star MWC 147

    Full text link
    We present high-spectral-resolution, optical spectra of the Herbig Be star MWC 147, in which we spectrally resolve several emission lines, including the [O I] lines at 6300 and 6363\deg. Their highly symmetric, double-peaked line profiles indicate that the emission originates in a rotating circumstellar disk. We deconvolve the Doppler-broadened [O I] emission lines to obtain a measure of emission as a function of distance from the central star. The resulting radial surface brightness profiles are in agreement with a disk structure consisting of a flat, inner, gaseous disk and a flared, outer, dust disk. The transition between these components at 2 to 3 AU corresponds to the estimated dust sublimation radius. The width of the double-peaked Mg II line at 4481\deg suggests that the inner disk extends to at least 0.10 AU, close to the corotation radius.Comment: accepted for ApJ Letters (Oct. 2010

    A tale of two tails: Do Power Law and Lognormal models fit firm-size distributions in the mid-Victorian era?

    Get PDF
    The paper explores the frequency and size distributions of firm-size in a novel dataset for the mid-Victorian era from a recent extraction of the England and Wales population censuses of 1851, 1861, 1871, and 1881. The paper contrasts the hypothesis of the Power Laws against the Lognormal model for the tails of the distributions using maximum likelihood estimation, log likelihood ratio, clipped sample coefficient of variation UMPU-Wilks test, Kolmogorov-Smirnov statistic, among other state-of-the-art statistical methods. Our results show that the Power Law hypothesis is accepted for the size distribution for the years 1851 and 1861, while 1871 is marginally non-significant, but for 1881 the test is inconclusive. The paper discusses the process that generates these distributions citing recent literature that shows how after adding an i.i.d. noise to the Gibrat's multiplicative model one can recreate a Power Law behaviour. Overall, the paper provides, describes and statistically tests for the very first time a unique historical dataset confirming that the tails of the distributions at least for 1851 and 1861 follow a Pareto model and that the Lognormal model is firmly rejected.This research has been supported by the ESRC under project grant ES/M010953: `Drivers of Entrepreneurship and Small Businesses'. Piloting of the research for 1881 draws from Leverhulme Trust grant RG66385: `The long-term evolution of Small and Medium-Sized Enterprises (SMEs)' . Additional support for the coding the 1871 census data was supported by the Isaac Newton Trust research grant 17.07(d):`Business Employers in 1871'

    Plantar pressure analysis after percutaneous repair of displaced intra-articular calcaneal fractures

    Get PDF
    Background: Clinical results for the treatment of displaced intra-articular calcaneal fractures are mainly expressed using disease-specific outcome scores, physical examination and radiographs. We hypothesized that plantar pressure and foot position analysis is a valuable tool in assessing foot function in patients with a unilateral displaced intra-articular calcaneal fracture treated percutaneously. Materials and Methods: With a followup of at least one year, 21 patients with a unilateral displaced intra-articular calcaneal fracture treated percutaneously participated in the study. The pedobarographic measurements in the injured foot were compared with the contralateral control foot. Correlations between the ratios (injured/control) of plantar pressure and foot position variables and outcome scores, the physical exam items ratios, the fracture classification, and the radiological parameters were calculated. Results: Statistically significant differences between the injured and the control foot were f

    Mark correlations: relating physical properties to spatial distributions

    Get PDF
    Mark correlations provide a systematic approach to look at objects both distributed in space and bearing intrinsic information, for instance on physical properties. The interplay of the objects' properties (marks) with the spatial clustering is of vivid interest for many applications; are, e.g., galaxies with high luminosities more strongly clustered than dim ones? Do neighbored pores in a sandstone have similar sizes? How does the shape of impact craters on a planet depend on the geological surface properties? In this article, we give an introduction into the appropriate mathematical framework to deal with such questions, i.e. the theory of marked point processes. After having clarified the notion of segregation effects, we define universal test quantities applicable to realizations of a marked point processes. We show their power using concrete data sets in analyzing the luminosity-dependence of the galaxy clustering, the alignment of dark matter halos in gravitational NN-body simulations, the morphology- and diameter-dependence of the Martian crater distribution and the size correlations of pores in sandstone. In order to understand our data in more detail, we discuss the Boolean depletion model, the random field model and the Cox random field model. The first model describes depletion effects in the distribution of Martian craters and pores in sandstone, whereas the last one accounts at least qualitatively for the observed luminosity-dependence of the galaxy clustering.Comment: 35 pages, 12 figures. to be published in Lecture Notes of Physics, second Wuppertal conference "Spatial statistics and statistical physics

    Psychoactive substance (drugs and alcohol) use by emergency department patients before injury

    Get PDF
    OBJECTIVE: The aim of this study was to determine the prevalence and risk factors of alcohol, medication and illicit drug use before accidents in Emergency Department (ED)-treated trauma victims with internationally recommended methods to minimize registration bias. PATIENTS AND METHODS: The study design was cross-sectional and was carried out at Erasmus Medical Centre in Rotterdam. Alcohol, psychoactive medication and illicit drug use were assessed in an interview by an independent researcher on the basis of the standardized WHO questionnaire. During 84 shifts, covering 4 weeks 24/7, data on a comprehensive population of ED-treated injury patients were collected prospectively. RESULTS: A total of 475 patients were included (response rate 87%). The prevalence of alcohol intoxication (defined as ≥3 U alcohol) before trauma was 19%. Alcohol-intoxicated trauma patients were significantly more often men [odds ratio (OR) 2.88, 95% confidence interval (CI) 1.54-5.40], of Dutch descent (native) (OR 2.26, 95% CI 1.24-4.13), unemployed or students (OR 1.77, 95% CI 1.03-3.04), and alcohol intoxication decreased with age (OR 0.98, 95% CI 0.96-0.99). Psychoactive medication was used by 7% of ED trauma patients; increasing age (OR 1.05, 95% CI 1.03-1.07) and living alone (OR 2.4, 95% CI 1.04-5.52) were risk factors. Illicit drugs were used by 4% of trauma patients. Overall, 27% of patients were under the influence of at least one psychoactive substance. CONCLUSION: Over a quarter of trauma patients visiting the ED had used alcohol, psychoactive medication and/or illicit drugs before their accident. By far, the majority of intoxications before trauma were because of alcohol (19%). We found higher prevalence rates of alcohol intoxication and lower prevalence rates for illicit drug use than others. Because of our comprehensive approach and high response rates, registration bias was minimized

    Sensitivity and Specificity of Multiple Kato-Katz Thick Smears and a Circulating Cathodic Antigen Test for Schistosoma mansoni Diagnosis Pre- and Post-repeated-Praziquantel Treatment

    Get PDF
    Two Kato-Katz thick smears (Kato-Katzs) from a single stool are currently recommended for diagnosing Schistosoma mansoni infections to map areas for intervention. This ‘gold standard’ has low sensitivity at low infection intensities. The urine point-of-care circulating cathodic antigen test (POC-CCA) is potentially more sensitive but how accurately they detect S. mansoni after repeated praziquantel treatments, their suitability for measuring drug efficacy and their correlation with egg counts remain to be fully understood. We compared the accuracies of one to six Kato-Katzs and one POC-CCA for the diagnosis of S. mansoni in primary-school children who have received zero to ten praziquantel treatments. We determined the impact each diagnostic approach may have on monitoring and evaluation (M&E) and drug-efficacy findings

    Social effects on age-related and sex-specific immune cell profiles in a wild mammal

    Get PDF
    Evidence for age-related changes in innate and adaptive immune responses is increasing in wild populations. Such changes have been linked to fitness, and knowledge of the factors driving immune response variation is important for understanding the evolution of immunity. Age-related changes in immune profiles may be owing to factors such as immune system development, sex-specific behaviour and responses to environmental conditions. Social environments may also contribute to variation in immunological responses, for example, through transmission of pathogens and stress arising from resource and mate competition. Yet, the impact of the social environment on age-related changes in immune cell profiles is currently understudied in the wild. Here, we tested the relationship between leukocyte cell composition (proportion of neutrophils and lymphocytes [innate and adaptive immunity, respectively] that were lymphocytes) and age, sex and group size in a wild population of European badgers (Meles meles). We found that the proportion of lymphocytes in early life was greater in males in smaller groups compared to larger groups, but with a faster age-related decline in smaller groups. By contrast, the proportion of lymphocytes in females was not significantly related to age or group size. Our results provide evidence of sex-specific age-related changes in immune cell profiles in a wild mammal, which are influenced by the social environment
    corecore