212 research outputs found

    A Latent Source Model for Nonparametric Time Series Classification

    Full text link
    For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren't actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a "weighted majority voting" classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such "trending topics" in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%.Comment: Advances in Neural Information Processing Systems (NIPS 2013

    Platinum (II) Compounds With Antitumor Activity Studied by Molecular Mechanics

    Get PDF
    A series of Pt(ll) complexes with antitumor properties: [1,2-bis(2,6-dichloro-4-hydroxyphenyl)ethylenediamine]PtL2 (meso-1-PtL2) and [erythro-1-(2,6-dichloro-4-hydroxyphenyl)-2-(2-halo-4-hydroxyphenyl)ethylenediamine]PtL2, [2L=2Cl−,2I−,SO42−; halo = F (erythro-8-PtL2),halo = Cl (erythro-9-PtL2)] has been modelled by molecular mechanics (MM). The MM calculations were carried out for different isomers and ligand conformations meso-δ, meso-λ, d,l-δ, d,I-λ. The compounds with the lowest MM energies have the same geometries as those obtained by X-ray analysis. The calculated MMX energy orders: meso-1-PtL2 < erythro-9-PtL2 < erythro-8-PtL2 for L=I−, Cl− and SO42− are reverse to the known antitumor activity order - the lowest energy complex (the most stable one)is the one with the highest estrogen activity (meso-1-PtL2). The type of the leaving group (L) does not alter the energy order, which is in agreement with the biological experiments that show a slight dependence of the estrogen properties on the leaving group type

    Pooled DNA genotyping on Affymetrix SNP genotyping arrays

    Get PDF
    BACKGROUND: Genotyping technology has advanced such that genome-wide association studies of complex diseases based upon dense marker maps are now technically feasible. However, the cost of such projects remains high. Pooled DNA genotyping offers the possibility of applying the same technologies at a fraction of the cost, and there is some evidence that certain ultra-high throughput platforms also perform with an acceptable accuracy. However, thus far, this conclusion is based upon published data concerning only a small number of SNPs. RESULTS: In the current study we prepared DNA pools from the parents and from the offspring of 30 parent-child trios that have been extensively genotyped by the HapMap project. We analysed the two pools with Affymetrix 10 K Xba 142 2.0 Arrays. The availability of the HapMap data allowed us to validate the performance of 6843 SNPs for which we had both complete individual and pooled genotyping data. Pooled analyses averaged over 5–6 microarrays resulted in highly reproducible results. Moreover, the accuracy of estimating differences in allele frequency between pools using this ultra-high throughput system was comparable with previous reports of pooling based upon lower throughput platforms, with an average error for the predicted allelic frequencies differences between the two pools of 1.37% and with 95% of SNPs showing an error of < 3.2%. CONCLUSION: Genotyping thousands of SNPs with DNA pooling using Affymetrix microarrays produces highly accurate results and can be used for genome-wide association studies

    Increasing the Accuracy of the Characterization of a Thin Semiconductor or Dielectric Film on a Substrate from Only One Quasi-Normal Incidence UV/Vis/NIR Reflectance Spectrum of the Sample

    Get PDF
    OEMT is an existing optimizing envelope method for thin-film characterization that uses only one transmittance spectrum, T(λ), of the film deposited on the substrate. OEMT computes the optimized values of the average thickness, (Formula presented.), and the thickness non-uniformity, Δd, employing variables for the external smoothing of T(λ), the slit width correction, and the optimized wavelength intervals for the computation of (Formula presented.) and Δd, and taking into account both the finite size and absorption of the substrate. Our group had achieved record low relative errors, <0.1%, in (Formula presented.) of thin semiconductor films via OEMT, whereas the high accuracy of (Formula presented.) and Δd allow for the accurate computation of the complex refractive index, (Formula presented.) (λ), of the film. In this paper is a proposed envelope method, named OEMR, for the characterization of thin dielectric or semiconductor films using only one quasi-normal incidence UV/Vis/NIR reflectance spectrum, R(λ), of the film on the substrate. The features of OEMR are similar to the described above features of OEMT. OEMR and several popular dispersion models are employed for the characterization of two a-Si films, only from R(λ), with computed (Formula presented.) = 674.3 nm and Δd = 11.5 nm for the thinner film. It is demonstrated that the most accurate characterizations of these films over the measured spectrum are based on OEMR

    Re-evaluation of putative rheumatoid arthritis susceptibility genes in the post-genome wide association study era and hypothesis of a key pathway underlying susceptibility

    Get PDF
    Rheumatoid arthritis (RA) is an archetypal, common, complex autoimmune disease with both genetic and environmental contributions to disease aetiology. Two novel RA susceptibility loci have been reported from recent genome-wide and candidate gene association studies. We, therefore, investigated the evidence for association of the STAT4 and TRAF1/C5 loci with RA using imputed data from the Wellcome Trust Case Control Consortium (WTCCC). No evidence for association of variants mapping to the TRAF1/C5 gene was detected in the 1860 RA cases and 2930 control samples tested in that study. Variants mapping to the STAT4 gene did show evidence for association (rs7574865, P = 0.04). Given the association of the TRAF1/C5 locus in two previous large case–control series from populations of European descent and the evidence for association of the STAT4 locus in the WTCCC study, single nucleotide polymorphisms mapping to these loci were tested for association with RA in an independent UK series comprising DNA from >3000 cases with disease and >3000 controls and a combined analysis including the WTCCC data was undertaken. We confirm association of the STAT4 and the TRAF1/C5 loci with RA bringing to 5 the number of confirmed susceptibility loci. The effect sizes are less than those reported previously but are likely to be a more accurate reflection of the true effect size given the larger size of the cohort investigated in the current study

    Results of the Ontology Alignment Evaluation Initiative 2009

    Get PDF
    euzenat2009cInternational audienceOntology matching consists of finding correspondences between on- tology entities. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. Test cases can use ontologies of different nature (from expressive OWL ontologies to simple directories) and use different modal- ities, e.g., blind evaluation, open evaluation, consensus. OAEI-2009 builds over previous campaigns by having 5 tracks with 11 test cases followed by 16 partici- pants. This paper is an overall presentation of the OAEI 2009 campaign
    corecore