31 research outputs found

    Conditional heavy hitters : detecting interesting correlations in data streams

    Get PDF
    The notion of heavy hitters—items that make up a large fraction of the population—has been successfully used in a variety of applications across sensor and RFID monitoring, network data analysis, event mining, and more. Yet this notion often fails to capture the semantics we desire when we observe data in the form of correlated pairs. Here, we are interested in items that are conditionally frequent: when a particular item is frequent within the context of its parent item. In this work, we introduce and formalize the notion of conditional heavy hitters to identify such items, with applications in network monitoring and Markov chain modeling. We explore the relationship between conditional heavy hitters and other related notions in the literature, and show analytically and experimentally the usefulness of our approach. We introduce several algorithm variations that allow us to efficiently find conditional heavy hitters for input data with very different characteristics, and provide analytical results for their performance. Finally, we perform experimental evaluations with several synthetic and real datasets to demonstrate the efficacy of our methods and to study the behavior of the proposed algorithms for different types of data

    Evaluating pre-trained Sentence-BERT with class embeddings in active learning for multi-label text classification

    Get PDF
    The Transformer Language Model is a powerful tool that has been shown to excel at various NLP tasks and has become the de-facto standard solution thanks to its versatility. In this study, we employ pre-trained document embeddings in an Active Learning task to group samples with the same labels in the embedding space on a legal document corpus. We find that the calculated class embeddings are not close to the respective samples and consequently do not partition the embedding space in a meaningful way. In addition, we explore using the class embeddings as an Active Learning strategy with dramatically reduced results compared to all baselines

    Characterizing Home Device Usage From Wireless Traffic Time Series

    Get PDF
    International audienceThe analysis of temporal behavioral patterns of home network users can reveal important information to Internet Service Providers (ISPs) and help them to optimize their networks and offer new services (e.g., remote software upgrades, troubleshooting, energy savings). This study uses time series analysis of continuous traffic data from wireless home networks , to extract traffic patterns recurring within, or across homes, and assess the impact of different device types (fixed or portable) on home traffic. Traditional techniques for time series analysis are not suited in this respect, due to the limited stationary and evolving distribution properties of wireless home traffic data. We propose a novel framework that relies on a correlation-based similarity measure of time series , as well as a notion of strong stationarity to define motifs and dominant devices. Using this framework, we analyze the wireless traffic collected from 196 home gateways over two months. The proposed approach goes beyond existing application-specific analysis techniques, such as analysis of wireless traffic, which mainly rely on data aggregated across hundreds, or thousands of users. Our framework, enables the extraction of recurring patterns from traffic time series of individual homes, leading to a much more fine-grained analysis of the behavior patterns of the users. We also determine the best time aggregation policy w.r.t. to the number and statistical importance of the extracted motifs, as well as the device types dominating these motifs and the overall gateway traffic. Our results show that ISPs can exceed the simple observation of the aggregated gateway traffic and better understand their networks

    Analysis of User Activity in Wireless Local Area Network of Petrozavodsk State University

    Get PDF
    Wireless networks are widely used for Internet access in public places, trading centers and educational institutions. The study of users' activity in these places can help solve problems of marketing, security, location, and monitoring of public transport. The article describes an experiment to measure the activity of users in wireless networks of Petrozavodsk State University because some educational and administrative processes are slowed because of a strict speed limit for all users. The authors developed a method to determine users' activity without the access to the service equipment and personal data, as well as a software complex to implement this method. Then they collected and analyzed data on the hours of the highest and lowliest activity, class schedule effect, and the dynamics of users' returns to the network. The information obtained was used to determine the allowed constraints of the Internet access speed for network users. A number of equations take into account the typical behavior of wireless network users, their activity and the capacity of external network channels. Computed rates for speed limits significantly exceed the established indicators in the network of Petrozavodsk University during hours of average and minimum network load. The results obtained can be used in corporate networks to dynamically calculate the permissible speed limit and to improve the quality of user service

    ПОСТРОЕНИЕ НОМОГРАММЫ, ПРОГНОЗИРУЮЩЕЙ ПАТОЛОГИЧЕСКУЮ СТЕПЕНЬ МЕСТНОЙ РАСПРОСТРАНЕННОСТИ РАКА МОЧЕВОГО ПУЗЫРЯ ПО КЛИНИЧЕСКИМ ДАННЫМ

    Get PDF
    Objective: to develop nomogram based on clinical variables, that predicts pathological local extent of the bladder cancer рТ3-рТ4 (рТ3+).Material and methods: We used data of 511 patients with bladder cancer, that have undergone radical cystectomy between 1999 and 2008 at N.N. Alexandrov National Cancer Centre. For prediction of pT3+ on preoperative data were used mono- and multivariate logistic regression analysis. Coefficients from logistic regression equalization were used to construct nomogram. Nomogram accuracy was evaluated with concordance index (с-index) and by building the calibration plot. Internal validation by bootstrap method with 200 variants of dataset was performed.Results: We developed nomogram, that include: clinical stage сТ, tumor grade, tumor macroscopic appearance, presence of upper tract dilatation, prostatic urethra and/or prostatic lobe(s) involvement, 3 or more bladder walls involvement, ESR and creatinine level. Bootstrapcorrected prognostic accuracy of nomogram was 81,4%, that 12,6% better than clinical stage accuracy.Conclusion: developed nomogram can significantly improve pathologic tumor stage prediction accuracy that may be used to select patients for neoadjuvant chemotherapy.Цель исследования — построение номограммы, прогнозирующей патологическую степень местной распространенности рака мочевого пузыря (РМП) рТ3−рТ4 (рТ3+) по клиническим данным.Материалы и методы. Материалом послужили данные 511 пациентов с диагнозом РМП, которым в период с 1999 по 2008 г. в РНПЦ онкологии и медицинской радиологии была выполнена радикальная цистэктомия. Для прогнозирования вероятности наличия pT3+ по дооперационным данным использованы моновариантные и мультивариантные модели логистической регрессии. Коэффициенты уравнения логистической регрессии использованы для построения номограммы. Точность номограммы оценена индексом конкордации (с-index) и построением калибровочного графика. Выполнена внутренняя валидизация методом бутстрэп с использованием 200 вариантов наборов данных.Результаты. Разработана номограмма, включающая предикторы: клиническая степень местной распространенности сТ, степень дифференцировки, характер роста опухоли, наличие уретерогидронефроза, поражение опухолью простатического отдела уретры и/или 1 или 2 долей предстательной железы, поражение 3 и более стенок мочевого пузыря, скорость оседания эритроцитов, уровень креатинина. Бутстрэп-скорректированная прогностическая точность разработанной номограммы составила 81,4 %, что на 12,6 % выше точности моновариантной модели, учитывающей только клиническую стадию.Выводы. Применение разработанной номограммы позволяет существенно повысить точность предсказания патологической стадии опухоли, что может быть использовано при отборе пациентов для неоадъювантной терапии

    Reviewing peer review: a quantitative analysis of peer review

    Get PDF
    In this paper we focus on the analysis of peer reviews and reviewers behavior in a number of different review processes. More specifically, we report on the development, definition and rationale of a theoretical model for peer review processes to support the identification of appropriate metrics to assess the processes main properties. We then apply the proposed model and analysis framework to data sets from conference evaluation processes and we discuss the results implications and their eventual use toward improving the analyzed peer review processes. A number of unexpected results were found, in particular: (1) the low correlation between peer review outcome and impact in time of the accepted contributions and (2) the presence of an high level of randomness in the analyzed peer review processes
    corecore