6,428 research outputs found

    Learning evolving T-S fuzzy systems with both local and global accuracy – a local online optimization approach

    Get PDF
    Most real data streams are non-linear and non-stationary by nature, which makes it a challenging issue to develop effective learning techniques. With the advantages of updating the system structure and parameters on the fly, evolving fuzzy systems (EFSs) are effective paradigms to address this issue. However, existing methods and algorithms of EFSs are usually: (1) developed based on a heuristic rather than an optimal approach and put main focus on tracking the most recent local model, thus leading to an “unlearning effect” and often poor global accuracy; (2) lack of optimality of the consequent parameters when there is a structure update of the fuzzy system. In order to resolve these issues, this paper proposes a local error optimization approach (LEOA) for identifying evolving T-S fuzzy systems. LEOA has its antecedent learning method derived from minimizing a bunch of local error functions and guarantee the optimality of the consequent parameters by a new extended weighted recursive least square (EWRLS) method. Furthermore, mathematical proofs and calculations are provided to verify the optimality and ϵ-completeness property of LEOA. Numerical examples based on several benchmark and real-world data sets are tested, and the results demonstrate that LEOA not only makes preferable local prediction accuracy compared with existing state-of-the-art methods but also reserves the global accuracy of the identified models

    Planting a SEED of Vision in Large Language Model

    Full text link
    We present SEED, an elaborate image tokenizer that empowers Large Language Models (LLMs) with the emergent ability to SEE and Draw at the same time. Research on image tokenizers has previously reached an impasse, as frameworks employing quantized visual tokens have lost prominence due to subpar performance and convergence in multimodal comprehension (compared to BLIP-2, etc.) or generation (compared to Stable Diffusion, etc.). Despite the limitations, we remain confident in its natural capacity to unify visual and textual representations, facilitating scalable multimodal training with LLM's original recipe. In this study, we identify two crucial principles for the architecture and training of SEED that effectively ease subsequent alignment with LLMs. (1) Image tokens should be independent of 2D physical patch positions and instead be produced with a 1D causal dependency, exhibiting intrinsic interdependence that aligns with the left-to-right autoregressive prediction mechanism in LLMs. (2) Image tokens should capture high-level semantics consistent with the degree of semantic abstraction in words, and be optimized for both discriminativeness and reconstruction during the tokenizer training phase. As a result, the off-the-shelf LLM is able to perform both image-to-text and text-to-image generation by incorporating our SEED through efficient LoRA tuning. Comprehensive multimodal pretraining and instruction tuning, which may yield improved results, are reserved for future investigation. This version of SEED was trained in 5.7 days using only 64 V100 GPUs and 5M publicly available image-text pairs. Our preliminary study emphasizes the great potential of discrete visual tokens in versatile multimodal LLMs and the importance of proper image tokenizers in broader research.Comment: Technical Report; Project released at: https://github.com/AILab-CVC/SEE

    A Fast CT Reconstruction Scheme for a General Multi-Core PC

    Get PDF
    Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors

    Direct evidences for inner-shell electron-excitation by laser induced electron recollision

    Full text link
    Extreme ultraviolet (XUV) attosecond pulses, generated by a process known as laser-induced electron recollision, are a key ingredient for attosecond metrology, providing a tool to precisely initiate and probe sub-femtosecond dynamics in the microcosms of atoms, molecules and solids[1]. However, with the current technology, extending attosecond metrology to scrutinize the dynamics of the inner-shell electrons is a challenge, that is because of the lower efficiency in generating the required soft x-ray \hbar\omega>300 eV attosecond bursts and the lower absorption cross-sections in this spectral range. A way around this problem is to use the recolliding electron to directly initiate the desired inner-shell process, instead of using the currently low flux x-ray attosecond sources.Such an excitation process occurs in a sub-femtosecond timescale, and may provide the necessary "pump" step in a pump-probe experiment[2]. Here we used a few cycle infrared \lambda_{0}~1800nm source[3] and observed direct evidences for inner-shell excitations through the laser-induced electron recollision process. It is the first step toward time-resolved core-hole studies in the keV energy range with sub-femtosecond time resolution.Comment: 6 pages, 4 figure

    Spin photocurrent, its spectra dependence, and current-induced spin polarization in an InGaAs/InAlAs two-dimensional electron gas

    Full text link
    Converse effect of spin photocurrent and current induced spin polarization are experimentally demonstrated in the same two-dimensional electron gas system with Rashba spin splitting. Their consistency with the strength of the Rashba coupling as measured from beating of the Shubnikov-de Haas oscillations reveals a unified picture for the spin photocurrent, current-induced spin polarization and spin orbit coupling. In addition, the observed spectral inversion of the spin photocurrent indicates the system with dominating structure inversion asymmetry.Comment: 13 pages, 4 figure

    Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

    Full text link
    Pre-training on large-scale video data has become a common recipe for learning transferable spatiotemporal representations in recent years. Despite some progress, existing methods are mostly limited to highly curated datasets (e.g., K400) and exhibit unsatisfactory out-of-the-box representations. We argue that it is due to the fact that they only capture pixel-level knowledge rather than spatiotemporal commonsense, which is far away from cognition-level video understanding. Inspired by the great success of image-text pre-training (e.g., CLIP), we take the first step to exploit language semantics to boost transferable spatiotemporal representation learning. We introduce a new pretext task, Turning to Video for Transcript Sorting (TVTS), which sorts shuffled ASR scripts by attending to learned video representations. We do not rely on descriptive captions and learn purely from video, i.e., leveraging the natural transcribed speech knowledge to provide noisy but useful semantics over time. Furthermore, rather than the simple concept learning in vision-caption contrast, we encourage cognition-level temporal commonsense reasoning via narrative reorganization. The advantages enable our model to contextualize what is happening like human beings and seamlessly apply to large-scale uncurated video data in the real world. Note that our method differs from ones designed for video-text alignment (e.g., Frozen) and multimodal representation learning (e.g., Merlot). Our method demonstrates strong out-of-the-box spatiotemporal representations on diverse video benchmarks, e.g., +13.6% gains over VideoMAE on SSV2 via linear probing
    • …
    corecore