71 research outputs found

    Cross-language high similarity search using a conceptual thesaurus

    Full text link
    This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora.This work was done in the framework of the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems and it has been partially funded by the European Commission as part of the WIQ-EI IRSES project (grant no. 269180) within the FP 7 Marie Curie People Framework, and by the Text-Enterprise 2.0 research project (TIN2009-13391-C04-03). The research work of the second author is supported by the CONACyT 192021/302009 grantGupta, P.; Barrón Cedeño, LA.; Rosso, P. (2012). Cross-language high similarity search using a conceptual thesaurus. En Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics. Springer Verlag (Germany). 7488:67-75. https://doi.org/10.1007/978-3-642-33247-0_8S6775748

    Simple estimators of the intensity of seasonal occurrence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Edwards's method is a widely used approach for fitting a sine curve to a time-series of monthly frequencies. From this fitted curve, estimates of the seasonal intensity of occurrence (i.e., peak-to-low ratio of the fitted curve) can be generated.</p> <p>Methods</p> <p>We discuss various approaches to the estimation of seasonal intensity assuming Edwards's periodic model, including maximum likelihood estimation (MLE), least squares, weighted least squares, and a new closed-form estimator based on a second-order moment statistic and non-transformed data. Through an extensive Monte Carlo simulation study, we compare the finite sample performance characteristics of the estimators discussed in this paper. Finally, all estimators and confidence interval procedures discussed are compared in a re-analysis of data on the seasonality of monocytic leukemia.</p> <p>Results</p> <p>We find that Edwards's estimator is substantially biased, particularly for small numbers of events and very large or small amounts of seasonality. For the common setting of rare events and moderate seasonality, the new estimator proposed in this paper yields less finite sample bias and better mean squared error than either the MLE or weighted least squares. For large studies and strong seasonality, MLE or weighted least squares appears to be the optimal analytic method among those considered.</p> <p>Conclusion</p> <p>Edwards's estimator of the seasonal relative risk can exhibit substantial finite sample bias. The alternative estimators considered in this paper should be preferred.</p

    Quantification of selection bias in studies of risk factors for birth defects among livebirths

    Get PDF
    Background: Risk factors for birth defects are frequently investigated using data limited to liveborn infants. By conditioning on survival, results of such studies may be distorted by selection bias, also described as “livebirth bias.” However, the implications of livebirth bias on risk estimation remain poorly understood. Objectives: We sought to quantify livebirth bias and to investigate the conditions under which it arose. Methods: We used data on 3994 birth defects cases and 11 829 controls enrolled in the National Birth Defects Prevention Study to compare odds ratio (OR) estimates of the relationship between three established risk factors (antiepileptic drug use, smoking, and multifetal pregnancy) and four birth defects (anencephaly, spina bifida, omphalocele, and cleft palate) when restricted to livebirths as compared to among livebirths, stillbirths, and elective terminations. Exposures and birth defects represented varying strengths of association with livebirth; all controls were liveborn. We performed a quantitative bias analysis to evaluate the sensitivity of our results to excluding terminated and stillborn controls. Results: Cases ranged from 33% liveborn (anencephaly) to 99% (cleft palate). Smoking and multifetal pregnancy were associated with livebirth among anencephaly (crude OR [cOR] 0.61 and cOR 3.15, respectively) and omphalocele cases (cOR 2.22 and cOR 5.22, respectively). For analyses of the association between exposures and birth defects, restricting to livebirths produced negligible differences in estimates except for anencephaly and multifetal pregnancy, which was twofold higher among livebirths (adjusted OR [aOR] 4.93) as among all pregnancy outcomes (aOR 2.44). Within tested scenarios, bias analyses suggested that results were not sensitive to the restriction to liveborn controls. Conclusions: Selection bias was generally limited except for high mortality defects in the context of exposures strongly associated with livebirth. Findings indicate that substantial livebirth bias is unlikely to affect studies of risk factors for most birth defects

    A design for a miniature biopotential radio transmitter

    No full text
    corecore