24 research outputs found

    Enhancing Topic Extraction in Recommender Systems with Entropy Regularization

    Full text link
    In recent years, many recommender systems have utilized textual data for topic extraction to enhance interpretability. However, our findings reveal a noticeable deficiency in the coherence of keywords within topics, resulting in low explainability of the model. This paper introduces a novel approach called entropy regularization to address the issue, leading to more interpretable topics extracted from recommender systems, while ensuring that the performance of the primary task stays competitively strong. The effectiveness of the strategy is validated through experiments on a variation of the probabilistic matrix factorization model that utilizes textual data to extract item embeddings. The experiment results show a significant improvement in topic coherence, which is quantified by cosine similarity on word embeddings

    CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition

    Full text link
    Audio-visual person recognition (AVPR) has received extensive attention. However, most datasets used for AVPR research so far are collected in constrained environments, and thus cannot reflect the true performance of AVPR systems in real-world scenarios. To meet the request for research on AVPR in unconstrained conditions, this paper presents a multi-genre AVPR dataset collected `in the wild', named CN-Celeb-AV. This dataset contains more than 419k video segments from 1,136 persons from public media. In particular, we put more emphasis on two real-world complexities: (1) data in multiple genres; (2) segments with partial information. A comprehensive study was conducted to compare CN-Celeb-AV with two popular public AVPR benchmark datasets, and the results demonstrated that CN-Celeb-AV is more in line with real-world scenarios and can be regarded as a new benchmark dataset for AVPR research. The dataset also involves a development set that can be used to boost the performance of AVPR systems in real-life situations. The dataset is free for researchers and can be downloaded from http://cnceleb.org/.Comment: INTERSPEECH 202

    RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

    Full text link
    Robots need to explore their surroundings to adapt to and tackle tasks in unknown environments. Prior work has proposed building scene graphs of the environment but typically assumes that the environment is static, omitting regions that require active interactions. This severely limits their ability to handle more complex tasks in household and office environments: before setting up a table, robots must explore drawers and cabinets to locate all utensils and condiments. In this work, we introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG) that captures the structure of the underlying environment. The ACSG accounts for both low-level information, such as geometry and semantics, and high-level information, such as the action-conditioned relationships between different entities in the scene. To this end, we present the Robotic Exploration (RoboEXP) system, which incorporates the Large Multimodal Model (LMM) and an explicit memory design to enhance our system's capabilities. The robot reasons about what and how to explore an object, accumulating new information through the interaction process and incrementally constructing the ACSG. We apply our system across various real-world settings in a zero-shot manner, demonstrating its effectiveness in exploring and modeling environments it has never seen before. Leveraging the constructed ACSG, we illustrate the effectiveness and efficiency of our RoboEXP system in facilitating a wide range of real-world manipulation tasks involving rigid, articulated objects, nested objects like Matryoshka dolls, and deformable objects like cloth.Comment: Project Page: https://jianghanxiao.github.io/roboexp-web

    Blood-coated sensor for high-throughput ptychographic cytometry on a Blu-ray disc

    Full text link
    Blu-ray drive is an engineering masterpiece that integrates disc rotation, pickup head translation, and three lasers in a compact and portable format. Here we integrate a blood-coated image sensor with a modified Blu-ray drive for high-throughput cytometric analysis of various bio-specimens. In this device, samples are mounted on the rotating Blu-ray disc and illuminated by the built-in lasers from the pickup head. The resulting coherent diffraction patterns are then recorded by the blood-coated image sensor. The rich spatial features of the blood-cell monolayer help down-modulate the object information for sensor detection, thus forming a high-resolution computational bio-lens with a theoretically unlimited field of view. With the acquired data, we develop a lensless coherent diffraction imaging modality termed rotational ptychography for image reconstruction. We show that our device can resolve the 435 nm linewidth on the resolution target and has a field of view only limited by the size of the Blu-ray disc. To demonstrate its applications, we perform high-throughput urinalysis by locating disease-related calcium oxalate crystals over the entire microscope slide. We also quantify different types of cells on a blood smear with an acquisition speed of ~10,000 cells per second. For in vitro experiment, we monitor live bacterial cultures over the entire Petri dish with single-cell resolution. Using biological cells as a computational lens could enable new intriguing imaging devices for point-of-care diagnostics. Modifying a Blu-ray drive with the blood-coated sensor further allows the spread of high-throughput optical microscopy from well-equipped laboratories to citizen scientists worldwide

    Lensless polarimetric coded ptychography (pol-CP) for high-resolution, high-throughput birefringence imaging on a chip

    Full text link
    Polarimetric imaging provides valuable insights into the polarization state of light interacting with a sample. It can infer crucial birefringence properties of bio-specimens without using any labels, thereby facilitating the diagnosis of diseases such as cancer and osteoarthritis. In this study, we introduce a novel polarimetric coded ptychography (pol-CP) approach that enables high-resolution, high-throughput birefringence imaging on a chip. Our platform deviates from traditional lens-based polarization systems by employing an integrated polarimetric coded sensor for lensless diffraction data acquisition. Utilizing Jones calculus, we quantitatively determine the birefringence retardance and orientation information of bio-specimens from four recovered intensity images. Our portable pol-CP prototype can resolve the 435-nm linewidth on the resolution target and the imaging field of view for a single acquisition is limited only by the detector size of 41 mm^2. The prototype allows for the acquisition of gigapixel birefringence images with a 180-mm^2 field of view in ~3.5 minutes, achieving an imaging throughput comparable to that of a conventional whole slide scanner. To demonstrate its biomedical applications, we perform high-throughput imaging of malaria-infected blood smears, locating parasites using birefringence contrast. We also generate birefringence maps of label-free thyroid smears to identify thyroid follicles. Notably, the recovered birefringence maps emphasize the same regions as autofluorescence images, indicating the potential for rapid on-site evaluation of label-free biopsies. The reported approach offers a portable, turnkey solution for high-resolution, high-throughput polarimetric analysis without using lenses, with potential applications in disease diagnosis, sample screening, and label-free chemical imaging
    corecore