99 research outputs found

    Ecometrics in the Age of Big Data: Measuring and Assessing "Broken Windows" Using Large-scale Administrative Records

    Get PDF
    The collection of large-scale administrative records in electronic form by many cities provides a new opportunity for the measurement and longitudinal tracking of neighborhood characteristics, but one that requires novel methodologies that convert such data into research-relevant measures. The authors illustrate these challenges by developing measures of “broken windows” from Boston’s constituent relationship management (CRM) system (aka 311 hotline). A 16-month archive of the CRM database contains more than 300,000 address-based requests for city services, many of which reference physical incivilities (e.g., graffiti removal). The authors carry out three ecometric analyses, each building on the previous one. Analysis 1 examines the content of the measure, identifying 28 items that constitute two independent constructs, private neglect and public denigration. Analysis 2 assesses the validity of the measure by using investigator-initiated neighborhood audits to examine the “civic response rate” across neighborhoods. Indicators of civic response were then extracted from the CRM database so that measurement adjustments could be automated. These adjustments were calibrated against measures of litter from the objective audits. Analysis 3 examines the reliability of the composite measure of physical disorder at different spatiotemporal windows, finding that census tracts can be measured at two-month intervals and census block groups at six-month intervals. The final measures are highly detailed, can be tracked longitudinally, and are virtually costless. This framework thus provides an example of how new forms of large-scale administrative data can yield ecometric measurement for urban science while illustrating the methodological challenges that must be addressed.Sociolog

    Cancer risk and tumour spectrum in 172 patients with a germline SUFU pathogenic variation : a collaborative study of the SIOPE Host Genome Working Group

    Get PDF
    Background Little is known about risks associated with germline SUFU pathogenic variants (PVs) known as a cancer predisposition syndrome. Methods To study tumour risks, we have analysed data of a large cohort of 45 unpublished patients with a germline SUFU PV completed with 127 previously published patients. To reduce the ascertainment bias due to index patient selection, the risk of tumours was evaluated in relatives with SUFU PV (89 patients) using the Nelson-Aalen estimator. Results Overall, 117/172 (68%) SUFU PV carriers developed at least one tumour: medulloblastoma (MB) (86 patients), basal cell carcinoma (BCC) (25 patients), meningioma (20 patients) and gonadal tumours (11 patients). Thirty-three of them (28%) had multiple tumours. Median age at diagnosis of MB, gonadal tumour, first BCC and first meningioma were 1.5, 14, 40 and 44 years, respectively. Follow-up data were available for 160 patients (137 remained alive and 23 died). The cumulative incidence of tumours in relatives was 14.4% (95% CI 6.8 to 21.4), 18.2% (95% CI 9.7 to 25.9) and 44.1% (95% CI 29.7 to 55.5) at the age of 5, 20 and 50 years, respectively. The cumulative risk of an MB, gonadal tumour, BCC and meningioma at age 50 years was: 13.3% (95% CI 6 to 20.1), 4.6% (95% CI 0 to 9.7), 28.5% (95% CI 13.4 to 40.9) and 5.2% (95% CI 0 to 12), respectively. Sixty-four different PVs were reported across the entire SUFU gene and inherited in 73% of cases in which inheritance could be evaluated. Conclusion Germline SUFU PV carriers have a life-long increased risk of tumours with a spectrum dominated by MB before the age of 5, gonadal tumours during adolescence and BCC and meningioma in adulthood, justifying fine-tuned surveillance programmes.Peer reviewe

    [Comment] Redefine statistical significance

    Get PDF
    The lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on “statistically significant” findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (e.g., multiple testing, P-hacking, publication bias, and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: Statistical standards of evidence for claiming discoveries in many fields of science are simply too low. Associating “statistically significant” findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems. For fields where the threshold for defining statistical significance is P<0.05, we propose a change to P<0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called “significant” but do not meet the new threshold should instead be called “suggestive.” While statisticians have known the relative weakness of using P≈0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new (1, 2), a critical mass of researchers now endorse this change. We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (e.g., genomics and high-energy physics research; see Potential Objections below). We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P-values. However, changing the P-value threshold is simple and might quickly achieve broad acceptance

    The FANCM:p.Arg658* truncating variant is associated with risk of triple-negative breast cancer

    Get PDF
    Abstract: Breast cancer is a common disease partially caused by genetic risk factors. Germline pathogenic variants in DNA repair genes BRCA1, BRCA2, PALB2, ATM, and CHEK2 are associated with breast cancer risk. FANCM, which encodes for a DNA translocase, has been proposed as a breast cancer predisposition gene, with greater effects for the ER-negative and triple-negative breast cancer (TNBC) subtypes. We tested the three recurrent protein-truncating variants FANCM:p.Arg658*, p.Gln1701*, and p.Arg1931* for association with breast cancer risk in 67,112 cases, 53,766 controls, and 26,662 carriers of pathogenic variants of BRCA1 or BRCA2. These three variants were also studied functionally by measuring survival and chromosome fragility in FANCM−/− patient-derived immortalized fibroblasts treated with diepoxybutane or olaparib. We observed that FANCM:p.Arg658* was associated with increased risk of ER-negative disease and TNBC (OR = 2.44, P = 0.034 and OR = 3.79; P = 0.009, respectively). In a country-restricted analysis, we confirmed the associations detected for FANCM:p.Arg658* and found that also FANCM:p.Arg1931* was associated with ER-negative breast cancer risk (OR = 1.96; P = 0.006). The functional results indicated that all three variants were deleterious affecting cell survival and chromosome stability with FANCM:p.Arg658* causing more severe phenotypes. In conclusion, we confirmed that the two rare FANCM deleterious variants p.Arg658* and p.Arg1931* are risk factors for ER-negative and TNBC subtypes. Overall our data suggest that the effect of truncating variants on breast cancer risk may depend on their position in the gene. Cell sensitivity to olaparib exposure, identifies a possible therapeutic option to treat FANCM-associated tumors

    The role of administrative data in the big data revolution in social science research

    Get PDF
    The term big data is currently a buzzword in social science, however its precise meaning is ambiguous. In this paper we focus on administrative data which is a distinctive form of big data. Exciting new opportunities for social science research will be afforded by new administrative data resources, but these are currently under appreciated by the research community. The central aim of this paper is to discuss the challenges associated with administrative data. We emphasise that it is critical for researchers to carefully consider how administrative data has been produced. We conclude that administrative datasets have the potential to contribute to the development of high-quality and impactful social science research, and should not be overlooked in the emerging field of big data
    corecore