493 research outputs found

    eBay users form stable groups of common interest

    Full text link
    Market segmentation of an online auction site is studied by analyzing the users' bidding behavior. The distribution of user activity is investigated and a network of bidders connected by common interest in individual articles is constructed. The network's cluster structure corresponds to the main user groups according to common interest, exhibiting hierarchy and overlap. Key feature of the analysis is its independence of any similarity measure between the articles offered on eBay, as such a measure would only introduce bias in the analysis. Results are compared to null models based on random networks and clusters are validated and interpreted using the taxonomic classifications of eBay categories. We find clear-cut and coherent interest profiles for the bidders in each cluster. The interest profiles of bidder groups are compared to the classification of articles actually bought by these users during the time span 6-9 months after the initial grouping. The interest profiles discovered remain stable, indicating typical interest profiles in society. Our results show how network theory can be applied successfully to problems of market segmentation and sociological milieu studies with sparse, high dimensional data.Comment: Major revision of the manuscript. Methodological improvements and inclusion of analysis of temporal development of user interests. 19 pages, 12 figures, 5 table

    Applied immuno-epidemiological research: an approach for integrating existing knowledge into the statistical analysis of multiple immune markers.

    Get PDF
    BACKGROUND: Immunologists often measure several correlated immunological markers, such as concentrations of different cytokines produced by different immune cells and/or measured under different conditions, to draw insights from complex immunological mechanisms. Although there have been recent methodological efforts to improve the statistical analysis of immunological data, a framework is still needed for the simultaneous analysis of multiple, often correlated, immune markers. This framework would allow the immunologists' hypotheses about the underlying biological mechanisms to be integrated. RESULTS: We present an analytical approach for statistical analysis of correlated immune markers, such as those commonly collected in modern immuno-epidemiological studies. We demonstrate i) how to deal with interdependencies among multiple measurements of the same immune marker, ii) how to analyse association patterns among different markers, iii) how to aggregate different measures and/or markers to immunological summary scores, iv) how to model the inter-relationships among these scores, and v) how to use these scores in epidemiological association analyses. We illustrate the application of our approach to multiple cytokine measurements from 818 children enrolled in a large immuno-epidemiological study (SCAALA Salvador), which aimed to quantify the major immunological mechanisms underlying atopic diseases or asthma. We demonstrate how to aggregate systematically the information captured in multiple cytokine measurements to immunological summary scores aimed at reflecting the presumed underlying immunological mechanisms (Th1/Th2 balance and immune regulatory network). We show how these aggregated immune scores can be used as predictors in regression models with outcomes of immunological studies (e.g. specific IgE) and compare the results to those obtained by a traditional multivariate regression approach. CONCLUSION: The proposed analytical approach may be especially useful to quantify complex immune responses in immuno-epidemiological studies, where investigators examine the relationship among epidemiological patterns, immune response, and disease outcomes

    Fuzzy min-max neural networks for categorical data: application to missing data imputation

    Get PDF
    The fuzzy min–max neural network classifier is a supervised learning method. This classifier takes the hybrid neural networks and fuzzy systems approach. All input variables in the network are required to correspond to continuously valued variables, and this can be a significant constraint in many real-world situations where there are not only quantitative but also categorical data. The usual way of dealing with this type of variables is to replace the categorical by numerical values and treat them as if they were continuously valued. But this method, implicitly defines a possibly unsuitable metric for the categories. A number of different procedures have been proposed to tackle the problem. In this article, we present a new method. The procedure extends the fuzzy min–max neural network input to categorical variables by introducing new fuzzy sets, a new operation, and a new architecture. This provides for greater flexibility and wider application. The proposed method is then applied to missing data imputation in voting intention polls. The micro data—the set of the respondents’ individual answers to the questions—of this type of poll are especially suited for evaluating the method since they include a large number of numerical and categorical attributes

    Amniotic fluid deficiency and congenital abnormalities both influence fluctuating asymmetry in developing limbs of human deceased fetuses

    Get PDF
    Fluctuating asymmetry (FA), as an indirect measure of developmental instability (DI), has been intensively studied for associations with stress and fitness. Patterns, however, appear heterogeneous and the underlying causes remain largely unknown. One aspect that has received relatively little attention in the literature is the consequence of direct mechanical effects on asymmetries. The crucial prerequisite for FA to reflect DI is that environmental conditions on both sides should be identical. This condition may be violated during early human development if amniotic fluid volume is deficient, as the resulting mechanical pressures may increase asymmetries. Indeed, we showed that limb bones of deceased human fetuses exhibited increased asymmetry, when there was not sufficient amniotic fluid (and, thus, space) in the uterine cavity. As amniotic fluid deficiency is known to cause substantial asymmetries and abnormal limb development, these subtle asymmetries are probably at least in part caused by the mechanical pressures. On the other hand, deficiencies in amniotic fluid volume are known to be associated with other congenital abnormalities that may disturb DI. More specifically, urogenital abnormalities can directly affect/reduce amniotic fluid volume. We disentangled the direct mechanical effects on FA from the indirect effects of urogenital abnormalities, the latter presumably representing DI. We discovered that both factors contributed significantly to the increase in FA. However, the direct mechanical effect of uterine pressure, albeit statistically significant, appeared less important than the effects of urogenital abnormalities, with an effect size only two-third as large. We, thus, conclude that correcting for the relevant direct factors allowed for a representative test of the association between DI and stress, and confirmed that fetuses form a suitable model system to increase our understanding in patterns of FA and symmetry development.Research Fund of the University of Antwerp, mobility grant from the Research Foundation – Flanders (FWO)

    Stability of gene contributions and identification of outliers in multivariate analysis of microarray data

    Get PDF
    BACKGROUND: Multivariate ordination methods are powerful tools for the exploration of complex data structures present in microarray data. These methods have several advantages compared to common gene-by-gene approaches. However, due to their exploratory nature, multivariate ordination methods do not allow direct statistical testing of the stability of genes. RESULTS: In this study, we developed a computationally efficient algorithm for: i) the assessment of the significance of gene contributions and ii) the identification of sample outliers in multivariate analysis of microarray data. The approach is based on the use of resampling methods including bootstrapping and jackknifing. A statistical package of R functions was developed. This package includes tools for both inferring the statistical significance of gene contributions and identifying outliers among samples. CONCLUSION: The methodology was successfully applied to three published data sets with varying levels of signal intensities. Its relevance was compared with alternative methods. Overall, it proved to be particularly effective for the evaluation of the stability of microarray data

    Types of the cerebral arterial circle (circle of Willis) in a Sri Lankan Population

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The variations of the circle of Willis (CW) are clinically important as patients with effective collateral circulations have a lower risk of transient ischemic attack and stroke than those with ineffective collaterals. The aim of the present cadaveric study was to investigate the anatomical variations of the CW and to compare the frequency of prevalence of the different variations with previous autopsy studies as variations in the anatomy of the CW as a whole have not been studied in the Indian subcontinent.</p> <p>Methods</p> <p>The external diameter of all the arteries forming the CW in 225 normal Sri Lankan adult cadaver brains was measured using a calibrated grid to determine the prevalence in the variation in CW. Chisquared tests and a correspondence analysis were performed to compare the relative frequencies of prevalence of anatomical variations in the CW across 6 studies of diverse ethnic populations.</p> <p>Results</p> <p>We report 15 types of variations of CW out of 22 types previously described and one additional type: hypoplastic precommunicating part of the anterior cerebral arteries (A1) and contralateral posterior communicating arteries (PcoA) 5(2%). Statistically significant differences (p < 0.0001) were found between most of the studies except for the Moroccan study. An especially notable difference was observed in the following 4 configurations: 1) hypoplastic precommunicating part of the posterior cerebral arteries (P1), and contralateral A1, 2) hypoplastic PcoA and contralateral P1, 3) hypoplastic PcoA, anterior communicating artery (AcoA) and contralateral P1, 4) bilateral hypoplastic P1s and AcoA in a Caucasian dominant study by Fisher versus the rest of the studies.</p> <p>Conclusion</p> <p>The present study reveals that there are significant variations in the CW among intra and inter ethnic groups (Caucasian, African and Asian: Iran and Sri Lanka dominant populations), and warrants further studies keeping the methods of measurements, data assessment, and the definitions of hypoplasia the same.</p

    Assessment of the health of Americans: the average health-related quality of life and its inequality across individuals and groups

    Get PDF
    BACKGROUND: The assessment of population health has traditionally relied on the population's average health measured by mortality related indicators. Researchers have increasingly recognized the importance of including information on health inequality and health-related quality of life (HRQL) in the assessment of population health. The objective of this study is to assess the health of Americans in the 1990s by describing the average HRQL and its inequality across individuals and groups. METHODS: This study uses the 1990 and 1995 National Health Interview Survey from the United States. The measure of HRQL is the Health and Activity Limitation Index (HALex). The measure of health inequality across individuals is the Gini coefficient. This study provides confidence intervals (CI) for the Gini coefficient by a bootstrap method. To describe health inequality by group, this study decomposes the overall Gini coefficient into the between-group, within-group, and overlap Gini coefficient using race (White, Black, and other) as an example. This study looks at how much contribution the overlap Gini coefficient makes to the overall Gini coefficient, in addition to the absolute mean differences between groups. RESULTS: The average HALex was the same in 1990 (0.87, 95% CI: 0.87, 0.88) and 1995 (0.87, 95% CI: 0.86, 0.87). The Gini coefficient for the HALex distribution across individuals was greater in 1995 (0.097, 95% CI: 0.096, 0.099) than 1990 (0.092, 95% CI: 0.091, 0.094). Differences in the average HALex between all racial groups were the same in 1995 as 1990. The contribution of the overlap to the overall Gini coefficient was greater in 1995 than in 1990 by 2.4%. In both years, inequality between racial groups accounted only for 4–5% of overall inequality. CONCLUSION: The average HRQL of Americans was the same in 1990 and 1995, but inequality in HRQL across individuals was greater in 1995 than 1990. Inequality in HRQL by race was smaller in 1995 than 1990 because race had smaller effect on the way health was distributed in 1995 than 1990. Analysis of the average HRQL and its inequality provides information on the health of a population invisible in the traditional analysis of population health
    corecore