201 research outputs found
Accelerated and Deep Expectation Maximization for One-Bit MIMO-OFDM Detection
In this paper we study the expectation maximization (EM) technique for
one-bit MIMO-OFDM detection (OMOD). Arising from the recent interest in massive
MIMO with one-bit analog-to-digital converters, OMOD is a massive-scale
problem. EM is an iterative method that can exploit the OFDM structure to
process the problem in a per-iteration efficient fashion. In this study we
analyze the convergence rate of EM for a class of approximate
maximum-likelihood OMOD formulations, or, in a broader sense, a class of
problems involving regression from quantized data. We show how the SNR and
channel conditions can have an impact on the convergence rate. We do so by
making a connection between the EM and the proximal gradient methods in the
context of OMOD. This connection also gives us insight to build new accelerated
and/or inexact EM schemes. The accelerated scheme has faster convergence in
theory, and the inexact scheme provides us with the flexibility to implement EM
more efficiently, with convergence guarantee. Furthermore we develop a deep EM
algorithm, wherein we take the structure of our inexact EM algorithm and apply
deep unfolding to train an efficient structured deep net. Simulation results
show that our accelerated exact/inexact EM algorithms run much faster than
their standard EM counterparts, and that the deep EM algorithm gives promising
detection and runtime performances
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
We study the problem of learning goal-conditioned policies in Minecraft, a
popular, widely accessible yet challenging open-ended environment for
developing human-level multi-task agents. We first identify two main challenges
of learning such policies: 1) the indistinguishability of tasks from the state
distribution, due to the vast scene diversity, and 2) the non-stationary nature
of environment dynamics caused by partial observability. To tackle the first
challenge, we propose Goal-Sensitive Backbone (GSB) for the policy to encourage
the emergence of goal-relevant visual state representations. To tackle the
second challenge, the policy is further fueled by an adaptive horizon
prediction module that helps alleviate the learning uncertainty brought by the
non-stationary dynamics. Experiments on 20 Minecraft tasks show that our method
significantly outperforms the best baseline so far; in many of them, we double
the performance. Our ablation and exploratory studies then explain how our
approach beat the counterparts and also unveil the surprising bonus of
zero-shot generalization to new scenes (biomes). We hope our agent could help
shed some light on learning goal-conditioned, multi-task agents in challenging,
open-ended environments like Minecraft.Comment: This paper is accepted by CVPR202
Correlation-Aware Mutual Learning for Semi-supervised Medical Image Segmentation
Semi-supervised learning has become increasingly popular in medical image
segmentation due to its ability to leverage large amounts of unlabeled data to
extract additional information. However, most existing semi-supervised
segmentation methods only focus on extracting information from unlabeled data,
disregarding the potential of labeled data to further improve the performance
of the model. In this paper, we propose a novel Correlation Aware Mutual
Learning (CAML) framework that leverages labeled data to guide the extraction
of information from unlabeled data. Our approach is based on a mutual learning
strategy that incorporates two modules: the Cross-sample Mutual Attention
Module (CMA) and the Omni-Correlation Consistency Module (OCC). The CMA module
establishes dense cross-sample correlations among a group of samples, enabling
the transfer of label prior knowledge to unlabeled data. The OCC module
constructs omni-correlations between the unlabeled and labeled datasets and
regularizes dual models by constraining the omni-correlation matrix of each
sub-model to be consistent. Experiments on the Atrial Segmentation Challenge
dataset demonstrate that our proposed approach outperforms state-of-the-art
methods, highlighting the effectiveness of our framework in medical image
segmentation tasks. The codes, pre-trained weights, and data are publicly
available.Comment: MICCAI2023 early accepted, camera ready versio
〈論文〉日中比較からみる日本古代朝政の特色
「朝政」という言葉は、一般的に「朝廷の政治」という意味で古代の政治を表す用語である。従来、日本古代の朝政は、漠然として「朝堂政」や「あさまつりごと」と捉えられてきたことが多い。本論文では、日中比較の観点から朝政の語源に注意しながら、日中の史料に見られる朝政の用例を分析し、日本古代の朝政を「みかどのまつりごと」を定義した上で、朝政成立史上の推古朝の画期性を指摘した。Chōsei is generally a term expressing ancient politics in the sense of "politics of the court". Traditionally, it has been vaguely captured as chōdōsei or asamatsurigoto. In this paper, from the viewpoint of Japan-China comparison, I analyze examples of chōsei found in Japanese-Chinese historical records, while paying attention to the origin of chōsei, trying to define chōsei as "political affairs of the court," and point out the innovation of the period of Emperor Suiko in the history of chōsei
A two-stage framework for optical coherence tomography angiography image quality improvement
IntroductionOptical Coherence Tomography Angiography (OCTA) is a new non-invasive imaging modality that gains increasing popularity for the observation of the microvasculatures in the retina and the conjunctiva, assisting clinical diagnosis and treatment planning. However, poor imaging quality, such as stripe artifacts and low contrast, is common in the acquired OCTA and in particular Anterior Segment OCTA (AS-OCTA) due to eye microtremor and poor illumination conditions. These issues lead to incomplete vasculature maps that in turn makes it hard to make accurate interpretation and subsequent diagnosis.MethodsIn this work, we propose a two-stage framework that comprises a de-striping stage and a re-enhancing stage, with aims to remove stripe noise and to enhance blood vessel structure from the background. We introduce a new de-striping objective function in a Stripe Removal Net (SR-Net) to suppress the stripe noise in the original image. The vasculatures in acquired AS-OCTA images usually exhibit poor contrast, so we use a Perceptual Structure Generative Adversarial Network (PS-GAN) to enhance the de-striped AS-OCTA image in the re-enhancing stage, which combined cyclic perceptual loss with structure loss to achieve further image quality improvement.Results and discussionTo evaluate the effectiveness of the proposed method, we apply the proposed framework to two synthetic OCTA datasets and a real AS-OCTA dataset. Our results show that the proposed framework yields a promising enhancement performance, which enables both conventional and deep learning-based vessel segmentation methods to produce improved results after enhancement of both retina and AS-OCTA modalities
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
We study the problem of building a controller that can follow open-ended
instructions in open-world environments. We propose to follow reference videos
as instructions, which offer expressive goal specifications while eliminating
the need for expensive text-gameplay annotations. A new learning framework is
derived to allow learning such instruction-following controllers from gameplay
videos while producing a video instruction encoder that induces a structured
goal space. We implement our agent GROOT in a simple yet effective
encoder-decoder architecture based on causal transformers. We evaluate GROOT
against open-world counterparts and human players on a proposed Minecraft
SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the
human-machine gap as well as exhibiting a 70% winning rate over the best
generalist agent baseline. Qualitative analysis of the induced goal space
further demonstrates some interesting emergent properties, including the goal
composition and complex gameplay behavior synthesis. The project page is
available at https://craftjarvis-groot.github.io
Theoretical prediction of diffusive ionic current through nanopores under salt gradients
In charged nanopores, ionic diffusion current reflects the ionic selectivity
and ionic permeability of nanopores which determines the performance of osmotic
energy conversion, i.e. the output power and efficiency. Here, theoretical
predictions of the diffusive currents through cation-selective nanopores have
been developed based on the investigation of diffusive ionic transport under
salt gradients with simulations. The ionic diffusion current I satisfies a
reciprocal relationship with the pore length I correlates with a/L (a is a
constant) in long nanopores. a is determined by the cross-sectional areas of
diffusion paths for anions and cations inside nanopores which can be described
with a quadratic power of the diameter, and the superposition of a quadratic
power and a first power of the diameter, respectively. By using effective
concentration gradients instead of nominal ones, the deviation caused by the
concentration polarization can be effectively avoided in the prediction of
ionic diffusion current. With developed equations of effective concentration
difference and ionic diffusion current, the diffusion current across nanopores
can be well predicted in cases of nanopores longer than 100 nm and without
overlapping of electric double layers. Our results can provide a convenient way
for the quantitative prediction of ionic diffusion currents under salt
gradients
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
We investigate the challenge of task planning for multi-task embodied agents
in open-world environments. Two main difficulties are identified: 1) executing
plans in an open-world environment (e.g., Minecraft) necessitates accurate and
multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla
planners do not consider how easy the current agent can achieve a given
sub-task when ordering parallel sub-goals within a complicated plan, the
resulting plan could be inefficient or even infeasible. To this end, we propose
"escribe, xplain, lan and
elect" (), an interactive planning approach based
on Large Language Models (LLMs). DEPS facilitates better error correction on
initial LLM-generated by integrating of
the plan execution process and providing self- of
feedback when encountering failures during the extended planning phases.
Furthermore, it includes a goal , which is a trainable
module that ranks parallel candidate sub-goals based on the estimated steps of
completion, consequently refining the initial plan. Our experiments mark the
milestone of the first zero-shot multi-task agent that can robustly accomplish
70+ Minecraft tasks and nearly double the overall performances. Further testing
reveals our method's general effectiveness in popularly adopted non-open-ended
domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and
exploratory studies detail how our design beats the counterparts and provide a
promising update on the grand challenge with our
approach. The code is released at https://github.com/CraftJarvis/MC-Planner.Comment: NeurIPS 202
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
In a wide range of multimodal tasks, contrastive learning has become a
particularly appealing approach since it can successfully learn representations
from abundant unlabeled data with only pairing information (e.g., image-caption
or video-audio pairs). Underpinning these approaches is the assumption of
multi-view redundancy - that shared information between modalities is necessary
and sufficient for downstream tasks. However, in many real-world settings,
task-relevant information is also contained in modality-unique regions:
information that is only present in one modality but still relevant to the
task. How can we learn self-supervised multimodal representations to capture
both shared and unique information relevant to downstream tasks? This paper
proposes FactorCL, a new multimodal representation learning method to go beyond
multi-view redundancy. FactorCL is built from three new contributions: (1)
factorizing task-relevant information into shared and unique representations,
(2) capturing task-relevant information via maximizing MI lower bounds and
removing task-irrelevant information via minimizing MI upper bounds, and (3)
multimodal data augmentations to approximate task relevance without labels. On
large-scale real-world datasets, FactorCL captures both shared and unique
information and achieves state-of-the-art results on six benchmarks.Comment: Code available at: https://github.com/pliang279/FactorC
- …