14 research outputs found

    Exploring Cycle Consistency Learning in Interactive Volume Segmentation

    Full text link
    Automatic medical volume segmentation often lacks clinical accuracy, necessitating further refinement. In this work, we interactively approach medical volume segmentation via two decoupled modules: interaction-to-segmentation and segmentation propagation. Given a medical volume, a user first segments a slice (or several slices) via the interaction module and then propagates the segmentation(s) to the remaining slices. The user may repeat this process multiple times until a sufficiently high volume segmentation quality is achieved. However, due to the lack of human correction during propagation, segmentation errors are prone to accumulate in the intermediate slices and may lead to sub-optimal performance. To alleviate this issue, we propose a simple yet effective cycle consistency loss that regularizes an intermediate segmentation by referencing the accurate segmentation in the starting slice. To this end, we introduce a backward segmentation path that propagates the intermediate segmentation back to the starting slice using the same propagation network. With cycle consistency training, the propagation network is better regularized than in standard forward-only training approaches. Evaluation results on challenging AbdomenCT-1K and OAI-ZIB datasets demonstrate the effectiveness of our method.Comment: Major revision of tech report. Code: https://github.com/uncbiag/iSegFormer/tree/v2.

    Disguise without Disruption: Utility-Preserving Face De-Identification

    Full text link
    With the rise of cameras and smart sensors, humanity generates an exponential amount of data. This valuable information, including underrepresented cases like AI in medical settings, can fuel new deep-learning tools. However, data scientists must prioritize ensuring privacy for individuals in these untapped datasets, especially for images or videos with faces, which are prime targets for identification methods. Proposed solutions to de-identify such images often compromise non-identifying facial attributes relevant to downstream tasks. In this paper, we introduce Disguise, a novel algorithm that seamlessly de-identifies facial images while ensuring the usability of the modified data. Unlike previous approaches, our solution is firmly grounded in the domains of differential privacy and ensemble-learning research. Our method involves extracting and substituting depicted identities with synthetic ones, generated using variational mechanisms to maximize obfuscation and non-invertibility. Additionally, we leverage supervision from a mixture-of-experts to disentangle and preserve other utility attributes. We extensively evaluate our method using multiple datasets, demonstrating a higher de-identification rate and superior consistency compared to prior approaches in various downstream tasks.Comment: Accepted at AAAI 2024. Paper + supplementary materia

    Stereoscopic 3D geometric distortions analyzed from the viewer's point of view.

    No full text
    Stereoscopic 3D (S3D) geometric distortions can be introduced by mismatches among image capture, display, and viewing configurations. In previous work of S3D geometric models, geometric distortions have been analyzed from a third-person perspective based on the binocular depth cue (i.e., binocular disparity). A third-person perspective is different from what the viewer sees since monocular depth cues (e.g., linear perspective, occlusion, and shadows) from different perspectives are different. However, depth perception in a 3D space involves both monocular and binocular depth cues. Geometric distortions that are solely predicted by the binocular depth cue cannot describe what a viewer really perceives. In this paper, we combine geometric models and retinal disparity models to analyze geometric distortions from the viewer's perspective where both monocular and binocular depth cues are considered. Results show that binocular and monocular depth-cue conflicts in a geometrically distorted S3D space. Moreover, user-initiated head translations averting from the optimal viewing position in conventional S3D displays can also introduce geometric distortions, which are inconsistent with our natural 3D viewing condition. The inconsistency of depth cues in a dynamic scene may be a source of visually induced motions sickness

    Factorization Algorithms for Temporal Psychovisual Modulation Display

    No full text

    Correcting geometric distortions in stereoscopic 3D imaging.

    No full text
    Motion in a distorted virtual 3D space may cause visually induced motion sickness. Geometric distortions in stereoscopic 3D can result from mismatches among image capture, display, and viewing parameters. Three pairs of potential mismatches are considered, including 1) camera separation vs. eye separation, 2) camera field of view (FOV) vs. screen FOV, and 3) camera convergence distance (i.e., distance from the cameras to the point where the convergence axes intersect) vs. screen distance from the observer. The effect of the viewer's head positions (i.e., head lateral offset from the screen center) is also considered. The geometric model is expressed as a function of camera convergence distance, the ratios of the three parameter-pairs, and the offset of the head position. We analyze the impacts of these five variables separately and their interactions on geometric distortions. This model facilitates insights into the various distortions and leads to methods whereby the user can minimize geometric distortions caused by some parameter-pair mismatches through adjusting of other parameter pairs. For example, in postproduction, viewers can correct for a mismatch between camera separation and eye separation by adjusting their distance from the real screen and changing the effective camera convergence distance

    Learning Local Neighboring Structure for Robust 3D Shape Representation

    No full text
    Mesh is a powerful data structure for 3D shapes. Representation learning for 3D meshes is important in many computer vision and graphics applications. The recent success of convolutional neural networks (CNNs) for structured data (e.g., images) suggests the value of adapting insight from CNN for 3D shapes. However, 3D shape data are irregular since each node's neighbors are unordered. Various graph neural networks for 3D shapes have been developed with isotropic filters or predefined local coordinate systems to overcome the node inconsistency on graphs. However, isotropic filters or predefined local coordinate systems limit the representation power. In this paper, we propose a local structure-aware anisotropic convolutional operation (LSA-Conv) that learns adaptive weighting matrices for each node according to the local neighboring structure and performs shared anisotropic filters. In fact, the learnable weighting matrix is similar to the attention matrix in random synthesizer -- a new Transformer model for natural language processing (NLP). Comprehensive experiments demonstrate that our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods
    corecore