44 research outputs found

    A Pre-trained Data Deduplication Model based on Active Learning

    Full text link
    In the era of big data, the issue of data quality has become increasingly prominent. One of the main challenges is the problem of duplicate data, which can arise from repeated entry or the merging of multiple data sources. These "dirty data" problems can significantly limit the effective application of big data. To address the issue of data deduplication, we propose a pre-trained deduplication model based on active learning, which is the first work that utilizes active learning to address the problem of deduplication at the semantic level. The model is built on a pre-trained Transformer and fine-tuned to solve the deduplication problem as a sequence to classification task, which firstly integrate the transformer with active learning into an end-to-end architecture to select the most valuable data for deduplication model training, and also firstly employ the R-Drop method to perform data augmentation on each round of labeled data, which can reduce the cost of manual labeling and improve the model's performance. Experimental results demonstrate that our proposed model outperforms previous state-of-the-art (SOTA) for deduplicated data identification, achieving up to a 28% improvement in Recall score on benchmark datasets

    Low geriatric nutritional risk index as a poor prognostic biomarker for immune checkpoint inhibitor treatment in solid cancer

    Get PDF
    ObjectiveIn this investigation, we focused on the geriatric nutritional risk index (GNRI), a comprehensive metric that takes into account the patient’s ideal weight, actual weight, and serum albumin levels to measure malnutrition. Our primary objective was to examine the predictive value of GNRI-defined malnutrition in determining the response to immunotherapy among cancer patients.MethodsRelevant articles for this study were systematically searched in PubMed, the Cochrane Library, EMBASE, and Google Scholar up to July 2023. Our analysis evaluated overall survival (OS), progression-free survival (PFS), objective response rate (ORR), and disease control rate (DCR) as clinical outcomes.ResultsThis analysis comprised a total of eleven articles encompassing 1,417 patients. The pooled results revealed that cancer patients with low GNRI levels exhibited shorter OS (HR: 2.64, 95% CI: 2.08–3.36, p < 0.001) and PFS (HR: 1.87, 95% CI: 1.46–2.41, p < 0.001), and lower ORR (OR: 0.46, 95% CI: 0.33–0.65, p < 0.001) and DCR (OR: 0.42, 95% CI: 0.29–0.61, p < 0.001). Sensitivity analyses confirmed that the above results were stable. Egger’s and Begg’s tests revealed that there was no publication bias in the above results.ConclusionOur results imply that the GNRI is a useful predictor of immunotherapy response in cancer patients

    CMOS + stochastic nanomagnets: heterogeneous computers for probabilistic inference and learning

    Full text link
    Extending Moore's law by augmenting complementary-metal-oxide semiconductor (CMOS) transistors with emerging nanotechnologies (X) has become increasingly important. Accelerating Monte Carlo algorithms that rely on random sampling with such CMOS+X technologies could have significant impact on a large number of fields from probabilistic machine learning, optimization to quantum simulation. In this paper, we show the combination of stochastic magnetic tunnel junction (sMTJ)-based probabilistic bits (p-bits) with versatile Field Programmable Gate Arrays (FPGA) to design a CMOS + X (X = sMTJ) prototype. Our approach enables high-quality true randomness that is essential for Monte Carlo based probabilistic sampling and learning. Our heterogeneous computer successfully performs probabilistic inference and asynchronous Boltzmann learning, despite device-to-device variations in sMTJs. A comprehensive comparison using a CMOS predictive process design kit (PDK) reveals that compact sMTJ-based p-bits replace 10,000 transistors while dissipating two orders of magnitude of less energy (2 fJ per random bit), compared to digital CMOS p-bits. Scaled and integrated versions of our CMOS + stochastic nanomagnet approach can significantly advance probabilistic computing and its applications in various domains by providing massively parallel and truly random numbers with extremely high throughput and energy-efficiency

    Sciences for The 2.5-meter Wide Field Survey Telescope (WFST)

    Full text link
    The Wide Field Survey Telescope (WFST) is a dedicated photometric survey facility under construction jointly by the University of Science and Technology of China and Purple Mountain Observatory. It is equipped with a primary mirror of 2.5m in diameter, an active optical system, and a mosaic CCD camera of 0.73 Gpix on the main focus plane to achieve high-quality imaging over a field of view of 6.5 square degrees. The installation of WFST in the Lenghu observing site is planned to happen in the summer of 2023, and the operation is scheduled to commence within three months afterward. WFST will scan the northern sky in four optical bands (u, g, r, and i) at cadences from hourly/daily to semi-weekly in the deep high-cadence survey (DHS) and the wide field survey (WFS) programs, respectively. WFS reaches a depth of 22.27, 23.32, 22.84, and 22.31 in AB magnitudes in a nominal 30-second exposure in the four bands during a photometric night, respectively, enabling us to search tremendous amount of transients in the low-z universe and systematically investigate the variability of Galactic and extragalactic objects. Intranight 90s exposures as deep as 23 and 24 mag in u and g bands via DHS provide a unique opportunity to facilitate explorations of energetic transients in demand for high sensitivity, including the electromagnetic counterparts of gravitational-wave events detected by the second/third-generation GW detectors, supernovae within a few hours of their explosions, tidal disruption events and luminous fast optical transients even beyond a redshift of 1. Meanwhile, the final 6-year co-added images, anticipated to reach g about 25.5 mag in WFS or even deeper by 1.5 mag in DHS, will be of significant value to general Galactic and extragalactic sciences. The highly uniform legacy surveys of WFST will also serve as an indispensable complement to those of LSST which monitors the southern sky.Comment: 46 pages, submitted to SCMP

    Improved Collaborative Representation Classifier Based on l2-Regularized for Human Action Recognition

    No full text
    Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs) and extracting deep convolutional neural network (DCNN) features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM

    A Theoretical Study of Temperature-dependent Photodissociation Cross Sections and Rates for O2

    No full text
    The photodissociation of O _2 is thought to play a vital role in blocking UV radiation in the Earth’s atmosphere and likely has great importance in characterizing exoplanetary atmospheres. This work considers four photodissociation processes of O _2 associated with its four electronic states, whose potential energy curves and transition dipole moments are calculated at the icMRCI+Q/aug-cc-pwCV5Z-DK level of theory. The quantum-mechanical approach is used to compute the state-resolved cross sections for two triplet transitions from the ground X 3Σg{}^{3}{{\rm{\Sigma }}}_{{\rm{g}}}^{-} state to the excited B 3Σu{}^{3}{{\rm{\Sigma }}}_{{\rm{u}}}^{-} and E 3Σu{}^{3}{{\rm{\Sigma }}}_{{\rm{u}}}^{-} states, and for two singlet transitions from the a ^1 Δ _g and b 1Σg+{}^{1}{{\rm{\Sigma }}}_{{\rm{g}}}^{+} states to the 1 ^1 Π _u state, with a consideration of photon wavelengths from 500 Å to the relevant threshold. Assuming the populations of the initial states satisfy a Boltzmann distribution, the temperature-dependent photodissociation cross sections are estimated at gas dynamic temperatures of 0–10,000 K, in which the discrete progressions of the B 3ΣuX{}^{3}{{\rm{\Sigma }}}_{{\rm{u}}}^{-}\leftarrow {\rm{X}} 3Σg{}^{3}{{\rm{\Sigma }}}_{{\rm{g}}}^{-} and E 3ΣuX{}^{3}{{\rm{\Sigma }}}_{{\rm{u}}}^{-}\leftarrow {\rm{X}} 3Σg{}^{3}{{\rm{\Sigma }}}_{{\rm{g}}}^{-} transitions are also considered. The photodissociation rates of O _2 in the interstellar, solar, and blackbody radiation fields are also calculated using the temperature-dependent cross sections. The resulting photodissociation cross sections and rates are important for the atmospheric chemistry of Earth and may be also useful for the atmospheric exploration of exoplanets

    Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification

    No full text

    FAR: Fourier Aerial Video Recognition

    Full text link
    We present an algorithm, Fourier Activity Recognition (FAR), for UAV video activity recognition. Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent (which is typically small) from the background. Our disentanglement technique operates in the frequency domain to characterize the extent of temporal change of spatial pixels, and exploits convolution-multiplication properties of Fourier transform to map this representation to the corresponding object-background entangled features obtained from the network. To encapsulate contextual information and long-range space-time dependencies, we present a novel Fourier Attention algorithm, which emulates the benefits of self-attention by modeling the weighted outer product in the frequency domain. Our Fourier attention formulation uses much fewer computations than self-attention. We have evaluated our approach on multiple UAV datasets including UAV Human RGB, UAV Human Night, Drone Action, and NEC Drone. We demonstrate a relative improvement of 8.02% - 38.69% in top-1 accuracy and up to 3 times faster over prior works.Comment: ECCV 2022 Poster pape
    corecore