209 research outputs found

    Data Mining of Online Genealogy Datasets for Revealing Lifespan Patterns in Human Population

    Full text link
    Online genealogy datasets contain extensive information about millions of people and their past and present family connections. This vast amount of data can assist in identifying various patterns in human population. In this study, we present methods and algorithms which can assist in identifying variations in lifespan distributions of human population in the past centuries, in detecting social and genetic features which correlate with human lifespan, and in constructing predictive models of human lifespan based on various features which can easily be extracted from genealogy datasets. We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 9 million connections, all of which were collected from the WikiTree website. Our findings indicate that significant but small positive correlations exist between the parents' lifespan and their children's lifespan. Additionally, we found slightly higher and significant correlations between the lifespans of spouses. We also discovered a very small positive and significant correlation between longevity and reproductive success in males, and a small and significant negative correlation between longevity and reproductive success in females. Moreover, our machine learning algorithms presented better than random classification results in predicting which people who outlive the age of 50 will also outlive the age of 80. We believe that this study will be the first of many studies which utilize the wealth of data on human populations, existing in online genealogy datasets, to better understand factors which influence human lifespan. Understanding these factors can assist scientists in providing solutions for successful aging

    Reaction to New Security Threat Class

    Full text link
    Each new identified security threat class triggers new research and development efforts by the scientific and professional communities. In this study, we investigate the rate at which the scientific and professional communities react to new identified threat classes as it is reflected in the number of patents, scientific articles and professional publications over a long period of time. The following threat classes were studied: Phishing; SQL Injection; BotNet; Distributed Denial of Service; and Advanced Persistent Threat. Our findings suggest that in most cases it takes a year for the scientific community and more than two years for industry to react to a new threat class with patents. Since new products follow patents, it is reasonable to expect that there will be a window of approximately two to three years in which no effective product is available to cope with the new threat class

    Quantitative Analysis of Genealogy Using Digitised Family Trees

    Full text link
    Driven by the popularity of television shows such as Who Do You Think You Are? many millions of users have uploaded their family tree to web projects such as WikiTree. Analysis of this corpus enables us to investigate genealogy computationally. The study of heritage in the social sciences has led to an increased understanding of ancestry and descent but such efforts are hampered by difficult to access data. Genealogical research is typically a tedious process involving trawling through sources such as birth and death certificates, wills, letters and land deeds. Decades of research have developed and examined hypotheses on population sex ratios, marriage trends, fertility, lifespan, and the frequency of twins and triplets. These can now be tested on vast datasets containing many billions of entries using machine learning tools. Here we survey the use of genealogy data mining using family trees dating back centuries and featuring profiles on nearly 7 million individuals based in over 160 countries. These data are not typically created by trained genealogists and so we verify them with reference to third party censuses. We present results on a range of aspects of population dynamics. Our approach extends the boundaries of genealogy inquiry to precise measurement of underlying human phenomena
    • …
    corecore