19 research outputs found

    Gossip Learning with Linear Models on Fully Distributed Data

    Get PDF
    Machine learning over fully distributed data poses an important problem in peer-to-peer (P2P) applications. In this model we have one data record at each network node, but without the possibility to move raw data due to privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is difficult, because there is no possibility to learn local models, the system model offers almost no guarantees for reliability, yet the communication cost needs to be kept low. Here we propose gossip learning, a generic approach that is based on multiple models taking random walks over the network in parallel, while applying an online learning algorithm to improve themselves, and getting combined via ensemble learning methods. We present an instantiation of this approach for the case of classification with linear models. Our main contribution is an ensemble learning method which---through the continuous combination of the models in the network---implements a virtual weighted voting mechanism over an exponential number of models at practically no extra cost as compared to independent random walks. We prove the convergence of the method theoretically, and perform extensive experiments on benchmark datasets. Our experimental analysis demonstrates the performance and robustness of the proposed approach.Comment: The paper was published in the journal Concurrency and Computation: Practice and Experience http://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%291532-0634 (DOI: http://dx.doi.org/10.1002/cpe.2858). The modifications are based on the suggestions from the reviewer

    Magyar mondatok SVM alapú szintaxiselemzése

    Get PDF
    A nyelvtechnológiai alkalmazások egyik fontos elemzése a szintaxiselemzés. Bemutatásra kerül egy gépi tanuláson alapuló szintaxis elemző, mely az SVM alapú megközelítést alkalmazza. A használt algoritmusok elméleti és implementációs részleteinek bemutatásán túl, átfogó teszteléssel igazoljuk a módszer alkalmazhatóságát. A módszer további érdekessége, hogy a strukturált kimenetű tanulás paradigmáját követi

    Sentinel lymph node biopsy following previous axillary surgery in recurrent breast cancer.

    Get PDF
    Ipsilateral breast recurrence or second primary breast cancer can develop in patients who have undergone breast conserving surgery (BCS) and axillary surgery. The purpose of this study was to examine the feasibility of a reoperative sentinel lymph node biopsy (SLNB) as a repeated axillary staging procedure.From August 2014 through January 2017 patients with locally recurrent breast cancer or with BRCA mutation requiring risk reduction mastectomy as a second surgical procedure, underwent repeat SLNB in three Hungarian Breast Units with a radiocolloid (and blue dye) technique.Hundred and sixty repeat SLNBs were analysed, 80 after previous SLNB and 80 after previous total or partial axillary lymph node dissection (ALND). SLN identification was successful in 106 patients (66%); 77/80 (77.5%) and 44/80 (55%) in the SLNB and ALND groups, respectively. (p < 0.003). Extra-axillary lymph drainage was more frequent in the ALND group (19/44, 43,2% versus 7/62, 11,3%; p < 0.001). Lymphatic drainage to the contralateral axilla was observed in 14 patients (11 in the ALND group, p = 0.025), isolated parasternal drainage was detected in 4 patients (p = 0.31). Only 9/106 patients with successful repeat SLNB (8,8%, all with 1 SLN removed) had SLN metastases CONCLUSIONS: Repeat SLNB is feasible in patients with ipsilateral breast tumor recurrence or new ipsilateral primary tumor after previous BCS and axillary staging. Repeat SLNB should replace routine ALND as the standard axillary restaging procedure in recurrent disease with a clinically negative axilla. Preoperative lymphoscintigraphy is important to explore extra-axillary lymphatic drainage in this restaging setting

    Massively distributed concept drift handling in large networks

    Get PDF
    Massively distributed data mining in large networks such as smart device platforms and peer-to-peer systems is a rapidly developing research area. One important problem here is concept drift, where global data patterns (movement, preferences, activities, etc.) change according to the actual set of participating users, the weather, the time of day, or as a result of events such as accidents or even natural catastrophes. In an important case — when the network is very large but only a few training samples can be obtained at each node locally — no efficient distributed solution is known that could follow concept drift efficiently. This case is characteristic of smart device platforms where each device stores only one local observation or data record related to a learning problem. Here we present two algorithms to handle concept drift. None of the algorithms collects data to a central location, instead models of the data perform random walks in the network, while being improved using an online learning algorithm. The first algorithm achieves adaptivity by maintaining young as well as old models in the network according to a fixed age distribution. The second one measures the performance of models locally, and discards them if they are judged outdated. We demonstrate through a thorough experimental analysis that our algorithms outperform the known competing methods if the number of independent local samples is limited relative to the speed of drift: a typical scenario in our targeted application domains. The two algorithms have different strengths: while the age distribution approach is very simple and efficient, explicit drift detection can be useful in monitoring applications to trigger control action. </jats:p

    Gossip-based learning under drifting concepts in fully distributed networks

    Get PDF
    Abstract—In fully distributed networks data mining is an important tool for monitoring, control, and for offering personalized services to users. The underlying data model can change as a function of time according to periodic (daily, weakly) patterns, sudden changes, or long term transformations of the environment or the system itself. For a large space of the possible models for this dynamism—when the network is very large but only a few training samples can be obtained at all nodes locally—no efficient fully distributed solution is known. Here we present an approach, that is able to follow concept drift in very large scale and fully distributed networks. The algorithm does not collect data to a central location, instead it is based on online learners taking random walks in the network. To achieve adaptivity the diversity of the learners is controlled by managing the lifespans of the models. We demonstrate through a thorough experimental analysis, that in a well specified range of feasible models of concept drift, where there is little data available locally in a large network, our algorithm outperforms known methods from related work. Keywords-adaptive classification; concept drift; gossip learning; P2P I
    corecore