6,155 research outputs found

    Profiling user activities with minimal traffic traces

    Full text link
    Understanding user behavior is essential to personalize and enrich a user's online experience. While there are significant benefits to be accrued from the pursuit of personalized services based on a fine-grained behavioral analysis, care must be taken to address user privacy concerns. In this paper, we consider the use of web traces with truncated URLs - each URL is trimmed to only contain the web domain - for this purpose. While such truncation removes the fine-grained sensitive information, it also strips the data of many features that are crucial to the profiling of user activity. We show how to overcome the severe handicap of lack of crucial features for the purpose of filtering out the URLs representing a user activity from the noisy network traffic trace (including advertisement, spam, analytics, webscripts) with high accuracy. This activity profiling with truncated URLs enables the network operators to provide personalized services while mitigating privacy concerns by storing and sharing only truncated traffic traces. In order to offset the accuracy loss due to truncation, our statistical methodology leverages specialized features extracted from a group of consecutive URLs that represent a micro user action like web click, chat reply, etc., which we call bursts. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. We present an extensive experimental evaluation on a real dataset of mobile web traces, consisting of more than 130 million records, representing the browsing activities of 10,000 users over a period of 30 days. Our results show that the proposed methodology achieves around 90% accuracy in segregating URLs representing user activities from non-representative URLs

    Parallel Perceptrons, Activation Margins and Imbalanced Training Set Pruning

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/11492542_6Proceedings of Second Iberian Conference, IbPRIA 2005, Estoril, Portugal, June 7-9, 2005, Part IIA natural way to deal with training samples in imbalanced class problems is to prune them removing redundant patterns, easy to classify and probably over represented, and label noisy patterns that belonging to one class are labelled as members of another. This allows classifier construction to focus on borderline patterns, likely to be the most informative ones. To appropriately define the above subsets, in this work we will use as base classifiers the so–called parallel perceptrons, a novel approach to committee machine training that allows, among other things, to naturally define margins for hidden unit activations. We shall use these margins to define the above pattern types and to iteratively perform subsample selections in an initial training set that enhance classification accuracy and allow for a balanced classifier performance even when class sizes are greatly different.With partial support of Spain’s CICyT, TIC 01–572, TIN2004–0767

    Master your Metrics with Calibration

    Full text link
    Machine learning models deployed in real-world applications are often evaluated with precision-based metrics such as F1-score or AUC-PR (Area Under the Curve of Precision Recall). Heavily dependent on the class prior, such metrics make it difficult to interpret the variation of a model's performance over different subpopulations/subperiods in a dataset. In this paper, we propose a way to calibrate the metrics so that they can be made invariant to the prior. We conduct a large number of experiments on balanced and imbalanced data to assess the behavior of calibrated metrics and show that they improve interpretability and provide a better control over what is really measured. We describe specific real-world use-cases where calibration is beneficial such as, for instance, model monitoring in production, reporting, or fairness evaluation.Comment: Presented at IDA202

    ROC curves in cost space

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10994-013-5328-9ROC curves and cost curves are two popular ways of visualising classifier performance, finding appropriate thresholds according to the operating condition, and deriving useful aggregated measures such as the area under the ROC curve (AUC) or the area under the optimal cost curve. In this paper we present new findings and connections between ROC space and cost space. In particular, we show that ROC curves can be transferred to cost space by means of a very natural threshold choice method, which sets the decision threshold such that the proportion of positive predictions equals the operating condition. We call these new curves rate-driven curves, and we demonstrate that the expected loss as measured by the area under these curves is linearly related to AUC. We show that the rate-driven curves are the genuine equivalent of ROC curves in cost space, establishing a point-point rather than a point-line correspondence. Furthermore, a decomposition of the rate-driven curves is introduced which separates the loss due to the threshold choice method from the ranking loss (Kendall τ distance). We also derive the corresponding curve to the ROC convex hull in cost space; this curve is different from the lower envelope of the cost lines, as the latter assumes only optimal thresholds are chosen.We would like to thank the anonymous referees for their helpful comments. This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST-European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Engineering and Physical Sciences Research Council in the UK and the Ministerio de Economia y Competitividad in Spain.Hernández Orallo, J.; Flach ., P.; Ferri Ramírez, C. (2013). ROC curves in cost space. Machine Learning. 93(1):71-91. https://doi.org/10.1007/s10994-013-5328-9S7191931Adams, N., & Hand, D. (1999). Comparing classifiers when the misallocation costs are uncertain. Pattern Recognition, 32(7), 1139–1147.Chang, J., & Yap, C. (1986). A polynomial solution for the potato-peeling problem. Discrete & Computational Geometry, 1(1), 155–182.Drummond, C., & Holte, R. (2000). Explicitly representing expected cost: an alternative to ROC representation. In Knowl. discovery & data mining (pp. 198–207).Drummond, C., & Holte, R. (2006). Cost curves: an improved method for visualizing classifier performance. Machine Learning, 65, 95–130.Elkan, C. (2001). The foundations of cost-sensitive learning. In B. Nebel (Ed.), Proc. of the 17th intl. conf. on artificial intelligence (IJCAI-01) (pp. 973–978).Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.Fawcett, T., & Niculescu-Mizil, A. (2007). PAV and the ROC convex hull. Machine Learning, 68(1), 97–106.Flach, P. (2003). The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In Machine learning, proceedings of the twentieth international conference (ICML 2003) (pp. 194–201).Flach, P., Hernández-Orallo, J., & Ferri, C. (2011). A coherent interpretation of AUC as a measure of aggregated classification performance. In Proc. of the 28th intl. conference on machine learning, ICML2011.Frank, A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml .Hand, D. (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine Learning, 77(1), 103–123.Hernández-Orallo, J., Flach, P., & Ferri, C. (2011). Brier curves: a new cost-based visualisation of classifier performance. In Proceedings of the 28th international conference on machine learning, ICML2011.Hernández-Orallo, J., Flach, P., & Ferri, C. (2012). A unified view of performance metrics: translating threshold choice into expected classification loss. Journal of Machine Learning Research, 13, 2813–2869.Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93. doi: 10.2307/2332226 .Swets, J., Dawes, R., & Monahan, J. (2000). Better decisions through science. Scientific American, 283(4), 82–87

    Alternatively spliced variants of the cell adhesion molecule CD44 and tumour progression in colorectal cancer.

    Get PDF
    Increased expression of alternatively spliced variants of the CD44 family of cell adhesion molecules has been associated with tumour metastasis. In the present study, expression of alternatively spliced variants of CD44 and their cellular distribution have been investigated in human colonic tumours and in the corresponding normal mucosa, in addition to benign adenomatous polyps. The expression of CD44 alternatively spliced variants has been correlated with tumour progression according to Dukes' histological stage. CD44 variant expression was determined by immunohistochemisty using monoclonal antibodies directed against specific CD44 variant domains together with RT-PCR analysis of CD44 variant mRNA expression in the same tissue specimens. We demonstrate that as well as being expressed in colonic tumour cells, the full range of CD44 variants, CD44v2-v10, are widely expressed in normal colonic crypt epithelium, predominantly in the crypt base. CD44v6, the epitope which is most commonly associated with tumour progression and metastasis, was not only expressed by many benign colonic tumours, but was expressed as frequently in normal basal crypt epithelium as in malignant colonic tumour cells, and surprisingly, was even absent from some metastatic colorectal tumours. Expression of none of the CD44 variant epitopes was found to be positively correlated with tumour progression or with colorectal tumour metastasis to the liver, results which are inconsistent with a role for CD44 variants as indicators of colonic cancer progression

    Spin and charge excitations in incommensurate spin density waves

    Full text link
    Collective excitations both for spin- and charge-channels are investigated in incommensurate spin density wave (or stripe) states on two-dimensional Hubbard model. By random phase approximation, the dynamical susceptibility \chi(q,\omega) is calculated for full range of (q,\omega) with including all higher harmonics components. An intricate landscape of the spectra in \chi(q,\omega) is obtained. We discuss the anisotropy of the dispersion cones for spin wave excitations, and for the phason excitation related to the motion of the stripe line. Inelastic neutron experiments on Cr and its alloys and stripe states of underdoped cuprates are proposed

    GaN resistive hydrogen gas sensors

    Get PDF
    GaN epilayers grown by organometallic vapor phase epitaxy have been used to fabricate resistivegas sensors with a pair of planar ohmic contacts. Detectible sensitivity to H2 gas for a wide range of gas mixtures in an Ar ambient has been realized; the lowest concentration tested is ∼0.1% H2 (in Ar), well below the lower combustion limit in air. No saturation of the signal is observed up to 100% H2 flow. Real-time response to H2 shows a clear and sharp response with no memory effects during the ramping cycles of H2 concentration. The change in current at a fixed voltage to hydrogen was found to change with sensor geometry. This appears to be consistent with a surface-adsorption-induced change of conductivity; a detailed picture of the gas sensing mechanism requires further systematic studies

    Magnetic phase diagram and transport properties of FeGe_2

    Full text link
    We have used resistivity measurements to study the magnetic phase diagram of the itinerant antiferromagnet FeGe_2 in the temperature range from 0.3->300 K in magnetic fields up to 16 T. In contrast to theoretical predictions, the incommensurate spin density wave phase is found to be stable at least up to 16 T, with an estimated critical field \mu _0H_c of ~ 30 T. We have also studied the low temperature magnetoresistance in the [100], [110], and [001] directions. The transverse magnetoresistance is well described by a power law for magnetic fields above 1 T with no saturation observed at high fields. We discuss our results in terms of the magnetic structure and the calculated electronic bandstructure of FeGe_2. We have also observed, for the first time in this compound, Shubnikov-de Haas oscillations in the transverse magnetoresistance with a frequency of 190 +- 10 T for a magnetic field along [001].Comment: 13 pages, RevTeX, 7 postscript figures, to appear in Journal of Physics: Condensed Matte
    • …
    corecore