152 research outputs found
NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization
We study the problem of large-scale network embedding, which aims to learn
latent representations for network mining applications. Previous research shows
that 1) popular network embedding benchmarks, such as DeepWalk, are in essence
implicitly factorizing a matrix with a closed form, and 2)the explicit
factorization of such matrix generates more powerful embeddings than existing
methods. However, directly constructing and factorizing this matrix---which is
dense---is prohibitively expensive in terms of both time and space, making it
not scalable for large networks.
In this work, we present the algorithm of large-scale network embedding as
sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral
sparsification to efficiently sparsify the aforementioned dense matrix,
enabling significantly improved efficiency in embedding learning. The
sparsified matrix is spectrally close to the original dense one with a
theoretically bounded approximation error, which helps maintain the
representation power of the learned embeddings. We conduct experiments on
networks of various scales and types. Results show that among both popular
benchmarks and factorization based methods, NetSMF is the only method that
achieves both high efficiency and effectiveness. We show that NetSMF requires
only 24 hours to generate effective embeddings for a large-scale academic
collaboration network with tens of millions of nodes, while it would cost
DeepWalk months and is computationally infeasible for the dense matrix
factorization solution. The source code of NetSMF is publicly available
(https://github.com/xptree/NetSMF).Comment: 11 pages, in Proceedings of the Web Conference 2019 (WWW 19
Potential of Core-Collapse Supernova Neutrino Detection at JUNO
JUNO is an underground neutrino observatory under construction in Jiangmen, China. It uses 20kton liquid scintillator as target, which enables it to detect supernova burst neutrinos of a large statistics for the next galactic core-collapse supernova (CCSN) and also pre-supernova neutrinos from the nearby CCSN progenitors. All flavors of supernova burst neutrinos can be detected by JUNO via several interaction channels, including inverse beta decay, elastic scattering on electron and proton, interactions on C12 nuclei, etc. This retains the possibility for JUNO to reconstruct the energy spectra of supernova burst neutrinos of all flavors. The real time monitoring systems based on FPGA and DAQ are under development in JUNO, which allow prompt alert and trigger-less data acquisition of CCSN events. The alert performances of both monitoring systems have been thoroughly studied using simulations. Moreover, once a CCSN is tagged, the system can give fast characterizations, such as directionality and light curve
Detection of the Diffuse Supernova Neutrino Background with JUNO
As an underground multi-purpose neutrino detector with 20 kton liquid scintillator, Jiangmen Underground Neutrino Observatory (JUNO) is competitive with and complementary to the water-Cherenkov detectors on the search for the diffuse supernova neutrino background (DSNB). Typical supernova models predict 2-4 events per year within the optimal observation window in the JUNO detector. The dominant background is from the neutral-current (NC) interaction of atmospheric neutrinos with 12C nuclei, which surpasses the DSNB by more than one order of magnitude. We evaluated the systematic uncertainty of NC background from the spread of a variety of data-driven models and further developed a method to determine NC background within 15\% with {\it{in}} {\it{situ}} measurements after ten years of running. Besides, the NC-like backgrounds can be effectively suppressed by the intrinsic pulse-shape discrimination (PSD) capabilities of liquid scintillators. In this talk, I will present in detail the improvements on NC background uncertainty evaluation, PSD discriminator development, and finally, the potential of DSNB sensitivity in JUNO
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Core-collapse supernova (CCSN) is one of the most energetic astrophysical
events in the Universe. The early and prompt detection of neutrinos before
(pre-SN) and during the SN burst is a unique opportunity to realize the
multi-messenger observation of the CCSN events. In this work, we describe the
monitoring concept and present the sensitivity of the system to the pre-SN and
SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is
a 20 kton liquid scintillator detector under construction in South China. The
real-time monitoring system is designed with both the prompt monitors on the
electronic board and online monitors at the data acquisition stage, in order to
ensure both the alert speed and alert coverage of progenitor stars. By assuming
a false alert rate of 1 per year, this monitoring system can be sensitive to
the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos
up to about 370 (360) kpc for a progenitor mass of 30 for the case
of normal (inverted) mass ordering. The pointing ability of the CCSN is
evaluated by using the accumulated event anisotropy of the inverse beta decay
interactions from pre-SN or SN neutrinos, which, along with the early alert,
can play important roles for the followup multi-messenger observations of the
next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Working Set Selection Using Second Order Information for Training Support Vector Machines
Working set selection is an important step in decomposition methods for training support
vector machines (SVMs). This paper develops a new technique for working set selection in
SMO-type decomposition methods. It uses second order information to achieve fast con-
vergence. Theoretical properties such as linear convergence are established. Experiments
demonstrate that the proposed method is faster than existing selection methods using first
order information
Evaluation Criteria for Multi-label Classification
多標籤分類近年來在各種應用中越來越普遍,比如在文件分類或多媒體搜尋系統。為滿足不同應用的需求,許多評分標準被提出。目前最常被用來解決多標籤分類的方法為雙類比對。此方法替每個標籤創造一個判斷函數。對於某些應用而言,調整判斷函數的門檻值會增進效能。在本篇論文中,我們針對門檻值的選擇進行深入探討。並透過真實應用產生的資料來展示這類方法的有用之處。Multi-label classification becomes more and more popular in recent years. It is used in, for example, text categorization or multimedia retrieval systems. Many evaluation criteria are proposed for different application needs. A commonly used approach for multi-label classification is the binary method, which constructs a decision function per label. For some applications, adjusting thresholds in decision functions improves the performance. This thesis gives a comprehensive
study on the selection of thresholds. Experiments on several
real-world data sets demonstrate the usefulness of some simple selection strategies.口試委員審定書 i
中文摘要 ii
ABSTRACT iii
LIST OF TABLES vi
CHAPTER
I. Introduction 1
II. Binary Method and Evaluation Measures 4
2.1 The Binary Method 4
2.2 Evaluation Criteria 5
2.2.1 Exact Match Ratio 6
2.2.2 Macro-average and Micro-average F-measure 6
2.2.3 Ranking Based Measures 7
2.3 Issues on Optimizing Different Measures 9
III. Optimize Measures via Supervised Threshold Setting 14
3.1 Supervised Threshold Setting in Binary Method 14
3.1.1 The SVM.1-type Methods 16
3.2 Real-World Data Sets 21
3.2.1 Yahoo! 22
3.2.2 scene 23
3.2.3 yeast 24
3.2.4 OHSUMED 25
3.2.5 RCV1-V2 25
3.3 Experiments 25
3.3.1 Experimental Settings 26
3.3.2 Optimizing Macro-average F-measure 27
3.3.3 Optimizing Micro-average F-measure 30
3.3.4 Optimizing Exact Match Ratio 32
3.3.5 Discussion and Conclusion 35
IV. Conclusions 38
BIBLIOGRAPHY 3
- …