6,428 research outputs found
Learning evolving T-S fuzzy systems with both local and global accuracy – a local online optimization approach
Most real data streams are non-linear and non-stationary by nature, which makes it a challenging issue to develop effective learning techniques. With the advantages of updating the system structure and parameters on the fly, evolving fuzzy systems (EFSs) are effective paradigms to address this issue. However, existing methods and algorithms of EFSs are usually: (1) developed based on a heuristic rather than an optimal approach and put main focus on tracking the most recent local model, thus leading to an “unlearning effect” and often poor global accuracy; (2) lack of optimality of the consequent parameters when there is a structure update of the fuzzy system. In order to resolve these issues, this paper proposes a local error optimization approach (LEOA) for identifying evolving T-S fuzzy systems. LEOA has its antecedent learning method derived from minimizing a bunch of local error functions and guarantee the optimality of the consequent parameters by a new extended weighted recursive least square (EWRLS) method. Furthermore, mathematical proofs and calculations are provided to verify the optimality and ϵ-completeness property of LEOA. Numerical examples based on several benchmark and real-world data sets are tested, and the results demonstrate that LEOA not only makes preferable local prediction accuracy compared with existing state-of-the-art methods but also reserves the global accuracy of the identified models
Planting a SEED of Vision in Large Language Model
We present SEED, an elaborate image tokenizer that empowers Large Language
Models (LLMs) with the emergent ability to SEE and Draw at the same time.
Research on image tokenizers has previously reached an impasse, as frameworks
employing quantized visual tokens have lost prominence due to subpar
performance and convergence in multimodal comprehension (compared to BLIP-2,
etc.) or generation (compared to Stable Diffusion, etc.). Despite the
limitations, we remain confident in its natural capacity to unify visual and
textual representations, facilitating scalable multimodal training with LLM's
original recipe. In this study, we identify two crucial principles for the
architecture and training of SEED that effectively ease subsequent alignment
with LLMs. (1) Image tokens should be independent of 2D physical patch
positions and instead be produced with a 1D causal dependency, exhibiting
intrinsic interdependence that aligns with the left-to-right autoregressive
prediction mechanism in LLMs. (2) Image tokens should capture high-level
semantics consistent with the degree of semantic abstraction in words, and be
optimized for both discriminativeness and reconstruction during the tokenizer
training phase. As a result, the off-the-shelf LLM is able to perform both
image-to-text and text-to-image generation by incorporating our SEED through
efficient LoRA tuning. Comprehensive multimodal pretraining and instruction
tuning, which may yield improved results, are reserved for future
investigation. This version of SEED was trained in 5.7 days using only 64 V100
GPUs and 5M publicly available image-text pairs. Our preliminary study
emphasizes the great potential of discrete visual tokens in versatile
multimodal LLMs and the importance of proper image tokenizers in broader
research.Comment: Technical Report; Project released at:
https://github.com/AILab-CVC/SEE
A Fast CT Reconstruction Scheme for a General Multi-Core PC
Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors
Direct evidences for inner-shell electron-excitation by laser induced electron recollision
Extreme ultraviolet (XUV) attosecond pulses, generated by a process known as
laser-induced electron recollision, are a key ingredient for attosecond
metrology, providing a tool to precisely initiate and probe sub-femtosecond
dynamics in the microcosms of atoms, molecules and solids[1]. However, with the
current technology, extending attosecond metrology to scrutinize the dynamics
of the inner-shell electrons is a challenge, that is because of the lower
efficiency in generating the required soft x-ray \hbar\omega>300 eV attosecond
bursts and the lower absorption cross-sections in this spectral range. A way
around this problem is to use the recolliding electron to directly initiate the
desired inner-shell process, instead of using the currently low flux x-ray
attosecond sources.Such an excitation process occurs in a sub-femtosecond
timescale, and may provide the necessary "pump" step in a pump-probe
experiment[2]. Here we used a few cycle infrared \lambda_{0}~1800nm source[3]
and observed direct evidences for inner-shell excitations through the
laser-induced electron recollision process. It is the first step toward
time-resolved core-hole studies in the keV energy range with sub-femtosecond
time resolution.Comment: 6 pages, 4 figure
Spin photocurrent, its spectra dependence, and current-induced spin polarization in an InGaAs/InAlAs two-dimensional electron gas
Converse effect of spin photocurrent and current induced spin polarization
are experimentally demonstrated in the same two-dimensional electron gas system
with Rashba spin splitting. Their consistency with the strength of the Rashba
coupling as measured from beating of the Shubnikov-de Haas oscillations reveals
a unified picture for the spin photocurrent, current-induced spin polarization
and spin orbit coupling. In addition, the observed spectral inversion of the
spin photocurrent indicates the system with dominating structure inversion
asymmetry.Comment: 13 pages, 4 figure
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
Pre-training on large-scale video data has become a common recipe for
learning transferable spatiotemporal representations in recent years. Despite
some progress, existing methods are mostly limited to highly curated datasets
(e.g., K400) and exhibit unsatisfactory out-of-the-box representations. We
argue that it is due to the fact that they only capture pixel-level knowledge
rather than spatiotemporal commonsense, which is far away from cognition-level
video understanding. Inspired by the great success of image-text pre-training
(e.g., CLIP), we take the first step to exploit language semantics to boost
transferable spatiotemporal representation learning. We introduce a new pretext
task, Turning to Video for Transcript Sorting (TVTS), which sorts shuffled ASR
scripts by attending to learned video representations. We do not rely on
descriptive captions and learn purely from video, i.e., leveraging the natural
transcribed speech knowledge to provide noisy but useful semantics over time.
Furthermore, rather than the simple concept learning in vision-caption
contrast, we encourage cognition-level temporal commonsense reasoning via
narrative reorganization. The advantages enable our model to contextualize what
is happening like human beings and seamlessly apply to large-scale uncurated
video data in the real world. Note that our method differs from ones designed
for video-text alignment (e.g., Frozen) and multimodal representation learning
(e.g., Merlot). Our method demonstrates strong out-of-the-box spatiotemporal
representations on diverse video benchmarks, e.g., +13.6% gains over VideoMAE
on SSV2 via linear probing
- …