19 research outputs found
Gossip Learning with Linear Models on Fully Distributed Data
Machine learning over fully distributed data poses an important problem in
peer-to-peer (P2P) applications. In this model we have one data record at each
network node, but without the possibility to move raw data due to privacy
considerations. For example, user profiles, ratings, history, or sensor
readings can represent this case. This problem is difficult, because there is
no possibility to learn local models, the system model offers almost no
guarantees for reliability, yet the communication cost needs to be kept low.
Here we propose gossip learning, a generic approach that is based on multiple
models taking random walks over the network in parallel, while applying an
online learning algorithm to improve themselves, and getting combined via
ensemble learning methods. We present an instantiation of this approach for the
case of classification with linear models. Our main contribution is an ensemble
learning method which---through the continuous combination of the models in the
network---implements a virtual weighted voting mechanism over an exponential
number of models at practically no extra cost as compared to independent random
walks. We prove the convergence of the method theoretically, and perform
extensive experiments on benchmark datasets. Our experimental analysis
demonstrates the performance and robustness of the proposed approach.Comment: The paper was published in the journal Concurrency and Computation:
Practice and Experience
http://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%291532-0634 (DOI:
http://dx.doi.org/10.1002/cpe.2858). The modifications are based on the
suggestions from the reviewer
Magyar mondatok SVM alapú szintaxiselemzése
A nyelvtechnológiai alkalmazások egyik fontos elemzése a szintaxiselemzés. Bemutatásra kerül egy gépi tanuláson alapuló szintaxis elemző, mely az SVM alapú megközelítést alkalmazza. A használt algoritmusok elméleti és implementációs részleteinek bemutatásán túl, átfogó teszteléssel igazoljuk a módszer alkalmazhatóságát. A módszer további érdekessége, hogy a strukturált kimenetű tanulás paradigmáját követi
Sentinel lymph node biopsy following previous axillary surgery in recurrent breast cancer.
Ipsilateral breast recurrence or second primary breast cancer can develop in patients who have undergone breast conserving surgery (BCS) and axillary surgery. The purpose of this study was to examine the feasibility of a reoperative sentinel lymph node biopsy (SLNB) as a repeated axillary staging procedure.From August 2014 through January 2017 patients with locally recurrent breast cancer or with BRCA mutation requiring risk reduction mastectomy as a second surgical procedure, underwent repeat SLNB in three Hungarian Breast Units with a radiocolloid (and blue dye) technique.Hundred and sixty repeat SLNBs were analysed, 80 after previous SLNB and 80 after previous total or partial axillary lymph node dissection (ALND). SLN identification was successful in 106 patients (66%); 77/80 (77.5%) and 44/80 (55%) in the SLNB and ALND groups, respectively. (p < 0.003). Extra-axillary lymph drainage was more frequent in the ALND group (19/44, 43,2% versus 7/62, 11,3%; p < 0.001). Lymphatic drainage to the contralateral axilla was observed in 14 patients (11 in the ALND group, p = 0.025), isolated parasternal drainage was detected in 4 patients (p = 0.31). Only 9/106 patients with successful repeat SLNB (8,8%, all with 1 SLN removed) had SLN metastases CONCLUSIONS: Repeat SLNB is feasible in patients with ipsilateral breast tumor recurrence or new ipsilateral primary tumor after previous BCS and axillary staging. Repeat SLNB should replace routine ALND as the standard axillary restaging procedure in recurrent disease with a clinically negative axilla. Preoperative lymphoscintigraphy is important to explore extra-axillary lymphatic drainage in this restaging setting
Massively distributed concept drift handling in large networks
Massively distributed data mining in large networks such as smart device platforms and peer-to-peer systems is a rapidly developing research area. One important problem here is concept drift, where global data patterns (movement, preferences, activities, etc.) change according to the actual set of participating users, the weather, the time of day, or as a result of events such as accidents or even natural catastrophes. In an important case — when the network is very large but only a few training samples can be obtained at each node locally — no efficient distributed solution is known that could follow concept drift efficiently. This case is characteristic of smart device platforms where each device stores only one local observation or data record related to a learning problem. Here we present two algorithms to handle concept drift. None of the algorithms collects data to a central location, instead models of the data perform random walks in the network, while being improved using an online learning algorithm. The first algorithm achieves adaptivity by maintaining young as well as old models in the network according to a fixed age distribution. The second one measures the performance of models locally, and discards them if they are judged outdated. We demonstrate through a thorough experimental analysis that our algorithms outperform the known competing methods if the number of independent local samples is limited relative to the speed of drift: a typical scenario in our targeted application domains. The two algorithms have different strengths: while the age distribution approach is very simple and efficient, explicit drift detection can be useful in monitoring applications to trigger control action. </jats:p
Gossip-based learning under drifting concepts in fully distributed networks
Abstract—In fully distributed networks data mining is an important tool for monitoring, control, and for offering personalized services to users. The underlying data model can change as a function of time according to periodic (daily, weakly) patterns, sudden changes, or long term transformations of the environment or the system itself. For a large space of the possible models for this dynamism—when the network is very large but only a few training samples can be obtained at all nodes locally—no efficient fully distributed solution is known. Here we present an approach, that is able to follow concept drift in very large scale and fully distributed networks. The algorithm does not collect data to a central location, instead it is based on online learners taking random walks in the network. To achieve adaptivity the diversity of the learners is controlled by managing the lifespans of the models. We demonstrate through a thorough experimental analysis, that in a well specified range of feasible models of concept drift, where there is little data available locally in a large network, our algorithm outperforms known methods from related work. Keywords-adaptive classification; concept drift; gossip learning; P2P I