33 research outputs found

    Domain Knowledge integrated for Blast Furnace Classifier Design

    Full text link
    Blast furnace modeling and control is one of the important problems in the industrial field, and the black-box model is an effective mean to describe the complex blast furnace system. In practice, there are often different learning targets, such as safety and energy saving in industrial applications, depending on the application. For this reason, this paper proposes a framework to design a domain knowledge integrated classification model that yields a classifier for industrial application. Our knowledge incorporated learning scheme allows the users to create a classifier that identifies "important samples" (whose misclassifications can lead to severe consequences) more correctly, while keeping the proper precision of classifying the remaining samples. The effectiveness of the proposed method has been verified by two real blast furnace datasets, which guides the operators to utilize their prior experience for controlling the blast furnace systems better.Comment: 9 pages, 4 figure

    Transfer Learning in Information Criteria-based Feature Selection

    Full text link
    This paper investigates the effectiveness of transfer learning based on Mallows' Cp. We propose a procedure that combines transfer learning with Mallows' Cp (TLCp) and prove that it outperforms the conventional Mallows' Cp criterion in terms of accuracy and stability. Our theoretical results indicate that, for any sample size in the target domain, the proposed TLCp estimator performs better than the Cp estimator by the mean squared error (MSE) metric in the case of orthogonal predictors, provided that i) the dissimilarity between the tasks from source domain and target domain is small, and ii) the procedure parameters (complexity penalties) are tuned according to certain explicit rules. Moreover, we show that our transfer learning framework can be extended to other feature selection criteria, such as the Bayesian information criterion. By analyzing the solution of the orthogonalized Cp, we identify an estimator that asymptotically approximates the solution of the Cp criterion in the case of non-orthogonal predictors. Similar results are obtained for the non-orthogonal TLCp. Finally, simulation studies and applications with real data demonstrate the usefulness of the TLCp scheme

    MoEC: Mixture of Expert Clusters

    Full text link
    Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated. However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation. Such problems are especially severe on tasks with limited data, thus hindering the progress for MoE models to improve performance by scaling up. In this work, we propose Mixture of Expert Clusters - a general approach to enable expert layers to learn more diverse and appropriate knowledge by imposing variance-based constraints on the routing stage. We further propose a cluster-level expert dropout strategy specifically designed for the expert cluster structure. Our experiments reveal that MoEC could improve performance on machine translation and natural language understanding tasks, and raise the performance upper bound for scaling up experts under limited data. We also verify that MoEC plays a positive role in mitigating overfitting and sparse data allocation

    Kosmos-2.5: A Multimodal Literate Model

    Full text link
    We present Kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. This unified multimodal literate capability is achieved through a shared Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. This work also paves the way for the future scaling of multimodal large language models

    Quantum Deep Hedging

    Full text link
    Quantum machine learning has the potential for a transformative impact across industry sectors and in particular in finance. In our work we look at the problem of hedging where deep reinforcement learning offers a powerful framework for real markets. We develop quantum reinforcement learning methods based on policy-search and distributional actor-critic algorithms that use quantum neural network architectures with orthogonal and compound layers for the policy and value functions. We prove that the quantum neural networks we use are trainable, and we perform extensive simulations that show that quantum models can reduce the number of trainable parameters while achieving comparable performance and that the distributional approach obtains better performance than other standard approaches, both classical and quantum. We successfully implement the proposed models on a trapped-ion quantum processor, utilizing circuits with up to 1616 qubits, and observe performance that agrees well with noiseless simulation. Our quantum techniques are general and can be applied to other reinforcement learning problems beyond hedging

    Analyzing the Synergy between HCI and TRIZ in Product Innovation through a Systematic Review of the Literature

    No full text
    The boundary between tangible and digital products is getting more fused while rapidly evolving systems for interaction require novel processes that allow for rapidly developed designs, evaluations, and interaction strategies to facilitate efficient and unique user interactions with computer systems. Accordingly, the literature suggests combining creativity enhancement tools or methods with human-computer interaction (HCI) design. The TRIZ base of knowledge appears to be one of the viable options, as shown in the fragmental indications reported in well-acknowledged design textbooks. The goal of this paper is to present a systematic review of the literature to identify and analyze the published approaches and recommendations to support the synergy between HCI and TRIZ from the perspective of product innovation related to HCI, with the aim of providing a first comprehensive classification and discussing about observable differences and gaps. The method followed is the guidelines related to systematic literature review methods. As results, out of 444 initial results, only 17 studies reported the outcomes of the synergy between HCI and TRIZ. The 7 of these studies explored the feasibility of the combination of HCI and TRIZ. The 10 studies attempted to combine and derive approaches in these two fields, and the outcomes defined 3 different integration strategies between HCI and TRIZ. Some conclusions achieved are that the generic solutions to support the synergy between HCI and TRIZ are still rare in the literature. The extraction and combination of different tools caused the randomization of the evaluation criteria, and the performance of the proposals has not been comprehensively evaluated. However, the findings can help inform future developments and provide valuable information about the benefits and drawbacks of different approaches
    corecore