Search CORE

44 research outputs found

A Pre-trained Data Deduplication Model based on Active Learning

Author: Du Shengdong
Hu Jie
Li Tianrui
Liu Xinyao
Lv Fengmao
Xue Hongtao
Publication venue
Publication date: 30/07/2023
Field of study

In the era of big data, the issue of data quality has become increasingly prominent. One of the main challenges is the problem of duplicate data, which can arise from repeated entry or the merging of multiple data sources. These "dirty data" problems can significantly limit the effective application of big data. To address the issue of data deduplication, we propose a pre-trained deduplication model based on active learning, which is the first work that utilizes active learning to address the problem of deduplication at the semantic level. The model is built on a pre-trained Transformer and fine-tuned to solve the deduplication problem as a sequence to classification task, which firstly integrate the transformer with active learning into an end-to-end architecture to select the most valuable data for deduplication model training, and also firstly employ the R-Drop method to perform data augmentation on each round of labeled data, which can reduce the cost of manual labeling and improve the model's performance. Experimental results demonstrate that our proposed model outperforms previous state-of-the-art (SOTA) for deduplicated data identification, achieving up to a 28% improvement in Recall score on benchmark datasets

arXiv.org e-Print Archive

Low geriatric nutritional risk index as a poor prognostic biomarker for immune checkpoint inhibitor treatment in solid cancer

Author: Kunpeng Wang
Lilong Zhang
Peng Hu
Tianrui Kuang
Weixing Wang
Wenhong Deng
Publication venue: Frontiers Media S.A.
Publication date: 01/11/2023
Field of study

ObjectiveIn this investigation, we focused on the geriatric nutritional risk index (GNRI), a comprehensive metric that takes into account the patient’s ideal weight, actual weight, and serum albumin levels to measure malnutrition. Our primary objective was to examine the predictive value of GNRI-defined malnutrition in determining the response to immunotherapy among cancer patients.MethodsRelevant articles for this study were systematically searched in PubMed, the Cochrane Library, EMBASE, and Google Scholar up to July 2023. Our analysis evaluated overall survival (OS), progression-free survival (PFS), objective response rate (ORR), and disease control rate (DCR) as clinical outcomes.ResultsThis analysis comprised a total of eleven articles encompassing 1,417 patients. The pooled results revealed that cancer patients with low GNRI levels exhibited shorter OS (HR: 2.64, 95% CI: 2.08–3.36, p < 0.001) and PFS (HR: 1.87, 95% CI: 1.46–2.41, p < 0.001), and lower ORR (OR: 0.46, 95% CI: 0.33–0.65, p < 0.001) and DCR (OR: 0.42, 95% CI: 0.29–0.61, p < 0.001). Sensitivity analyses confirmed that the above results were stable. Egger’s and Begg’s tests revealed that there was no publication bias in the above results.ConclusionOur results imply that the GNRI is a useful predictor of immunotherapy response in cancer patients

Directory of Open Access Journals

CMOS + stochastic nanomagnets: heterogeneous computers for probabilistic inference and learning

Author: Aadit Navid Anjum
Camsari Kerem Y.
Cao Qixuan
Fukami Shunsuke
Hu Tianrui
Kanai Shun
Kobayashi Keito
Niazi Shaila
Ohno Hideo
Selcuk Kemal
Singh Nihal
Publication venue
Publication date: 17/04/2023
Field of study

Extending Moore's law by augmenting complementary-metal-oxide semiconductor (CMOS) transistors with emerging nanotechnologies (X) has become increasingly important. Accelerating Monte Carlo algorithms that rely on random sampling with such CMOS+X technologies could have significant impact on a large number of fields from probabilistic machine learning, optimization to quantum simulation. In this paper, we show the combination of stochastic magnetic tunnel junction (sMTJ)-based probabilistic bits (p-bits) with versatile Field Programmable Gate Arrays (FPGA) to design a CMOS + X (X = sMTJ) prototype. Our approach enables high-quality true randomness that is essential for Monte Carlo based probabilistic sampling and learning. Our heterogeneous computer successfully performs probabilistic inference and asynchronous Boltzmann learning, despite device-to-device variations in sMTJs. A comprehensive comparison using a CMOS predictive process design kit (PDK) reveals that compact sMTJ-based p-bits replace 10,000 transistors while dissipating two orders of magnitude of less energy (2 fJ per random bit), compared to digital CMOS p-bits. Scaled and integrated versions of our CMOS + stochastic nanomagnet approach can significantly advance probabilistic computing and its applications in various domains by providing massively parallel and truly random numbers with extremely high throughput and energy-efficiency

arXiv.org e-Print Archive

Sciences for The 2.5-meter Wide Field Survey Telescope (WFST)

The Wide Field Survey Telescope (WFST) is a dedicated photometric survey facility under construction jointly by the University of Science and Technology of China and Purple Mountain Observatory. It is equipped with a primary mirror of 2.5m in diameter, an active optical system, and a mosaic CCD camera of 0.73 Gpix on the main focus plane to achieve high-quality imaging over a field of view of 6.5 square degrees. The installation of WFST in the Lenghu observing site is planned to happen in the summer of 2023, and the operation is scheduled to commence within three months afterward. WFST will scan the northern sky in four optical bands (u, g, r, and i) at cadences from hourly/daily to semi-weekly in the deep high-cadence survey (DHS) and the wide field survey (WFS) programs, respectively. WFS reaches a depth of 22.27, 23.32, 22.84, and 22.31 in AB magnitudes in a nominal 30-second exposure in the four bands during a photometric night, respectively, enabling us to search tremendous amount of transients in the low-z universe and systematically investigate the variability of Galactic and extragalactic objects. Intranight 90s exposures as deep as 23 and 24 mag in u and g bands via DHS provide a unique opportunity to facilitate explorations of energetic transients in demand for high sensitivity, including the electromagnetic counterparts of gravitational-wave events detected by the second/third-generation GW detectors, supernovae within a few hours of their explosions, tidal disruption events and luminous fast optical transients even beyond a redshift of 1. Meanwhile, the final 6-year co-added images, anticipated to reach g about 25.5 mag in WFS or even deeper by 1.5 mag in DHS, will be of significant value to general Galactic and extragalactic sciences. The highly uniform legacy surveys of WFST will also serve as an indispensable complement to those of LSST which monitors the southern sky.Comment: 46 pages, submitted to SCMP

arXiv.org e-Print Archive

Improved Collaborative Representation Classifier Based on l2-Regularized for Human Action Recognition

Author: Ce Li
Shirui Huo
Tianrui Hu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs) and extracting deep convolutional neural network (DCNN) features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM

Directory of Open Access Journals

A Theoretical Study of Temperature-dependent Photodissociation Cross Sections and Rates for O2

Author: Linhua Liu
Peigen Hu
Tianrui Bai
Zhi Qin
Publication venue: IOP Publishing
Publication date: 01/01/2023
Field of study

The photodissociation of O _2 is thought to play a vital role in blocking UV radiation in the Earth’s atmosphere and likely has great importance in characterizing exoplanetary atmospheres. This work considers four photodissociation processes of O _2 associated with its four electronic states, whose potential energy curves and transition dipole moments are calculated at the icMRCI+Q/aug-cc-pwCV5Z-DK level of theory. The quantum-mechanical approach is used to compute the state-resolved cross sections for two triplet transitions from the ground X

{}^{3}{{\rm{\Sigma }}}_{{\rm{g}}}^{-}

state to the excited B

{}^{3}{{\rm{\Sigma }}}_{{\rm{u}}}^{-}

and E

{}^{3}{{\rm{\Sigma }}}_{{\rm{u}}}^{-}

states, and for two singlet transitions from the a ^1 Δ _g and b

{}^{1}{{\rm{\Sigma }}}_{{\rm{g}}}^{+}

states to the 1 ^1 Π _u state, with a consideration of photon wavelengths from 500 Å to the relevant threshold. Assuming the populations of the initial states satisfy a Boltzmann distribution, the temperature-dependent photodissociation cross sections are estimated at gas dynamic temperatures of 0–10,000 K, in which the discrete progressions of the B

{}^{3}{{\rm{\Sigma }}}_{{\rm{u}}}^{-}\leftarrow {\rm{X}}

{}^{3}{{\rm{\Sigma }}}_{{\rm{g}}}^{-}

and E

{}^{3}{{\rm{\Sigma }}}_{{\rm{u}}}^{-}\leftarrow {\rm{X}}

{}^{3}{{\rm{\Sigma }}}_{{\rm{g}}}^{-}

transitions are also considered. The photodissociation rates of O _2 in the interstellar, solar, and blackbody radiation fields are also calculated using the temperature-dependent cross sections. The resulting photodissociation cross sections and rates are important for the atmospheric chemistry of Earth and may be also useful for the atmospheric exploration of exoplanets

Directory of Open Access Journals

TW-Co-MFC: Two-level weighted collaborative fuzzy clustering based on maximum entropy for multi-view data

Author: Jie Hu
Tianrui Li
Yan Yang
Yi Pan
Publication venue: 'Tsinghua University Press'
Publication date
Field of study

Crossref

Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification

Author: Guo Jun
Hu Tianrui
Ma Zhanyu
Tan Zheng-Hua
Yu Hong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/11/2018
Field of study

Crossref

VBN

FAR: Fourier Aerial Video Recognition

Author: Guan Tianrui
Hu Sean
Kothandaraman Divya
Lin Ming
Manocha Dinesh
Wang Xijun
Publication venue
Publication date: 18/07/2022
Field of study

We present an algorithm, Fourier Activity Recognition (FAR), for UAV video activity recognition. Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent (which is typically small) from the background. Our disentanglement technique operates in the frequency domain to characterize the extent of temporal change of spatial pixels, and exploits convolution-multiplication properties of Fourier transform to map this representation to the corresponding object-background entangled features obtained from the network. To encapsulate contextual information and long-range space-time dependencies, we present a novel Fourier Attention algorithm, which emulates the benefits of self-attention by modeling the weighted outer product in the frequency domain. Our Fourier attention formulation uses much fewer computations than self-attention. We have evaluated our approach on multiple UAV datasets including UAV Human RGB, UAV Human Night, Drone Action, and NEC Drone. We demonstrate a relative improvement of 8.02% - 38.69% in top-1 accuracy and up to 3 times faster over prior works.Comment: ECCV 2022 Poster pape

arXiv.org e-Print Archive

Recommended from our members

Implementation of a Drone-Based Information Gathering System

Author: Dash Anshuman
Hu Tianrui
Isukapalli Yogananda
Pan Yifan
Tran Matthew
Wu Zhiwen
Publication venue: International Foundation for Telemetering
Publication date: 01/10/2023
Field of study

This paper demonstrates the implementation of the Personal Information Gathering System (PIGS) for information gathering. This system is intended for several specially-designed drones, a mobile phone, a base station, and a router. Traditionally in combat situations, humans must risk themselves to gain information and identify potential threats. The PIGS system ensures users gain comprehensive information autonomously, while safe from threats. The operator can use the mobile device to remotely command the drones to obtain information, explore different regions, and perform other information-gathering-related tasks. With 802.11ac Wi-Fi and a lightweight computer vision model, PIGS allows the operator to interface with the drones through high-level commands and receive visual information with optional computer vision analysis. The proposed system offers a safer and more efficient way to gather information in dangerous environments.International Foundation for TelemeteringProceedings from the International Telemetering Conference are made available by the International Foundation for Telemetering and the University of Arizona Libraries. Visit https://telemetry.org/contact-us/ if you have questions about items in this collection

The University of Arizona