1,197 research outputs found

    Parameter estimators of random intersection graphs with thinned communities

    Full text link
    This paper studies a statistical network model generated by a large number of randomly sized overlapping communities, where any pair of nodes sharing a community is linked with probability qq via the community. In the special case with q=1q=1 the model reduces to a random intersection graph which is known to generate high levels of transitivity also in the sparse context. The parameter qq adds a degree of freedom and leads to a parsimonious and analytically tractable network model with tunable density, transitivity, and degree fluctuations. We prove that the parameters of this model can be consistently estimated in the large and sparse limiting regime using moment estimators based on partially observed densities of links, 2-stars, and triangles.Comment: 15 page

    A Fast Counting Method for 6-motifs with Low Connectivity

    Full text link
    A kk-motif (or graphlet) is a subgraph on kk nodes in a graph or network. Counting of motifs in complex networks has been a well-studied problem in network analysis of various real-word graphs arising from the study of social networks and bioinformatics. In particular, the triangle counting problem has received much attention due to its significance in understanding the behavior of social networks. Similarly, subgraphs with more than 3 nodes have received much attention recently. While there have been successful methods developed on this problem, most of the existing algorithms are not scalable to large networks with millions of nodes and edges. The main contribution of this paper is a preliminary study that genaralizes the exact counting algorithm provided by Pinar, Seshadhri and Vishal to a collection of 6-motifs. This method uses the counts of motifs with smaller size to obtain the counts of 6-motifs with low connecivity, that is, containing a cut-vertex or a cut-edge. Therefore, it circumvents the combinatorial explosion that naturally arises when counting subgraphs in large networks

    The detection of the imprint of filaments on cosmic microwave background lensing

    Full text link
    Galaxy redshift surveys, such as 2dF, SDSS, 6df, GAMA and VIPERS, have shown that the spatial distribution of matter forms a rich web, known as the cosmic web. The majority of galaxy survey analyses measure the amplitude of galaxy clustering as a function of scale, ignoring information beyond a small number of summary statistics. Since the matter density field becomes highly non-Gaussian as structure evolves under gravity, we expect other statistical descriptions of the field to provide us with additional information. One way to study the non-Gaussianity is to study filaments, which evolve non-linearly from the initial density fluctuations produced in the primordial Universe. In our study, we report the first detection of CMB (Cosmic Microwave Background) lensing by filaments and we apply a null test to confirm our detection. Furthermore, we propose a phenomenological model to interpret the detected signal and we measure how filaments trace the matter distribution on large scales through filament bias, which we measure to be around 1.5. Our study provides a new scope to understand the environmental dependence of galaxy formation. In the future, the joint analysis of lensing and Sunyaev-Zel'dovich observations might reveal the properties of `missing baryons', the vast majority of the gas which resides in the intergalactic medium and has so far evaded most observations

    Construction and Random Generation of Hypergraphs with Prescribed Degree and Dimension Sequences

    Full text link
    We propose algorithms for construction and random generation of hypergraphs without loops and with prescribed degree and dimension sequences. The objective is to provide a starting point for as well as an alternative to Markov chain Monte Carlo approaches. Our algorithms leverage the transposition of properties and algorithms devised for matrices constituted of zeros and ones with prescribed row- and column-sums to hypergraphs. The construction algorithm extends the applicability of Markov chain Monte Carlo approaches when the initial hypergraph is not provided. The random generation algorithm allows the development of a self-normalised importance sampling estimator for hypergraph properties such as the average clustering coefficient.We prove the correctness of the proposed algorithms. We also prove that the random generation algorithm generates any hypergraph following the prescribed degree and dimension sequences with a non-zero probability. We empirically and comparatively evaluate the effectiveness and efficiency of the random generation algorithm. Experiments show that the random generation algorithm provides stable and accurate estimates of average clustering coefficient, and also demonstrates a better effective sample size in comparison with the Markov chain Monte Carlo approaches.Comment: 21 pages, 3 figure

    Opportunities and priorities for breast surgical research

    Get PDF
    The Breast Cancer Campaign Gap analysis (2013) established breast cancer research priorities without specific focus on surgical research nor the role of surgeons. The majority of breast cancer patients encounter a surgeon at diagnosis or during treatment, thus surgical involvement in design and delivery of high-quality research to improve patient care is critical. This review aims to identify opportunities and priorities for breast surgical research to complement the previous gap analysis

    The diagnosis of colorectal cancer in patients with symptoms: finding a needle in a haystack

    Get PDF
    Patients often see primary care physicians with symptoms that might signal colorectal cancer but are also common in adults without cancer. Physicians and patients must then make a difficult decision about whether and how aggressively to evaluate the symptom. Favoring referral is that missed diagnoses lead to unnecessary testing, prolonged uncertainty, and continuing symptoms; also, the physician will suffer chagrin. It is not clear that diagnostic delay leads to progression to a more advanced stage. Against referral is that proper evaluation includes colonoscopy, with attendant inconvenience, discomfort, cost, and risk. The article by Hamilton et al, published this month in BMC Medicine, provides strong estimates of the predictive value of the various symptoms and signs of colorectal cancer and show how much higher predictive values are with increasing age and male sex. Unfortunately, their results also make clear that most colorectal cancers present with symptoms with low predictive values, < 1.2%. Models that include a set of predictive variables, that is, risk factors, age, sex, screening history, and symptoms, have been developed to guide primary prevention and clinical decision-making and are more powerful than individual symptoms and signs alone. Although screening for colorectal cancer is increasing in many countries, cancers will still be found outside screening programs so primary care physicians will remain at the front line in the difficult task of distinguishing everyday symptoms from life-threatening cancer

    Effectiveness and cost of recruitment strategies for a community-based randomised controlled trial among rainwater drinkers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Community-based recruitment is challenging particularly if the sampling frame is not easily defined as in the case of people who drink rainwater. Strategies for contacting participants must be carefully considered to maximise generalisability and minimise bias of the results. This paper assesses the recruitment strategies for a 1-year double-blinded randomised trial on drinking untreated rainwater. The effectiveness of the recruitment strategies and associated costs are described.</p> <p>Methods</p> <p>Community recruitment of households from Adelaide, Australia occurred from February to July 2007 using four methods: electoral roll mail-out, approaches to schools and community groups, newspaper advertising, and other media involvement. Word of mouth communication was also assessed.</p> <p>Results</p> <p>A total of 810 callers were screened, with 53.5% eligible. Of those who were eligible and sent further information, 76.7% were willing to participate in the study and 75.1% were enrolled. The target for recruitment was 300 households, and this was achieved. The mail-out was the most effective method with respect to number of households randomised, while recruitment via schools had the highest yield (57.3%) and was the most cost effective when considering cost per household randomised (AUD$147.20). Yield and cost effectiveness were lowest for media advertising.</p> <p>Conclusion</p> <p>The use of electoral roll mail-out and advertising via schools were effective in reaching households using untreated rainwater for drinking. Employing multiple strategies enabled success in achieving the recruitment target. In countries where electoral roll extracts are available to researchers, this method is likely to have a high yield for recruitment into community-based epidemiological studies.</p

    Prevalence and dynamics of ribosomal DNA micro-heterogeneity are linked to population history in two contrasting yeast species

    Get PDF
    Despite the considerable number and taxonomic breadth of past and current genome sequencing projects, many of which necessarily encompass the ribosomal DNA, detailed information on the prevalence and evolutionary significance of sequence variation in this ubiquitous genomic region are severely lacking. Here, we attempt to address this issue in two closely related yet contrasting yeast species, the baker's yeast Saccharomyces cerevisiae and the wild yeast Saccharomyces paradoxus. By drawing on existing datasets from the Saccharomyces Genome Resequencing Project, we identify a rich seam of ribosomal DNA sequence variation, characterising 1,068 and 970 polymorphisms in 34 S. cerevisiae and 26 S. paradoxus strains respectively. We discover the two species sets exhibit distinct mutational profiles. Furthermore, we show for the first time that unresolved rDNA sequence variation resulting from imperfect concerted evolution of the ribosomal DNA region follows a U-shaped allele frequency distribution in each species, similar to loci that evolve under non-concerted mechanisms but arising through rather different evolutionary processes. Finally, we link differences between the shapes of these allele frequency distributions to the two species' contrasting population histories

    The Paired Availability Design for Historical Controls

    Get PDF
    BACKGROUND: Although a randomized trial represents the most rigorous method of evaluating a medical intervention, some interventions would be extremely difficult to evaluate using this study design. One alternative, an observational cohort study, can give biased results if it is not possible to adjust for all relevant risk factors. METHODS: A recently developed and less well-known alternative is the paired availability design for historical controls. The paired availability design requires at least 10 hospitals or medical centers in which there is a change in the availability of the medical intervention. The statistical analysis involves a weighted average of a simple "before" versus "after" comparison from each hospital or medical center that adjusts for the change in availability. RESULTS: We expanded requirements for the paired availability design to yield valid inference. (1) The hospitals or medical centers serve a stable population. (2) Other aspects of patient management remain constant over time. (3) Criteria for outcome evaluation are constant over time. (4) Patient preferences for the medical intervention are constant over time. (5) For hospitals where the intervention was available in the "before" group, a change in availability in the "after group" does not change the effect of the intervention on outcome. CONCLUSION: The paired availability design has promise for evaluating medical versus surgical interventions, in which it is difficult to recruit patients to a randomized trial

    BioTorrents: A File Sharing Service for Scientific Data

    Get PDF
    The transfer of scientific data has emerged as a significant challenge, as datasets continue to grow in size and demand for open access sharing increases. Current methods for file transfer do not scale well for large files and can cause long transfer times. In this study we present BioTorrents, a website that allows open access sharing of scientific data and uses the popular BitTorrent peer-to-peer file sharing technology. BioTorrents allows files to be transferred rapidly due to the sharing of bandwidth across multiple institutions and provides more reliable file transfers due to the built-in error checking of the file sharing technology. BioTorrents contains multiple features, including keyword searching, category browsing, RSS feeds, torrent comments, and a discussion forum. BioTorrents is available at http://www.biotorrents.net
    corecore