33 research outputs found
Domain Knowledge integrated for Blast Furnace Classifier Design
Blast furnace modeling and control is one of the important problems in the
industrial field, and the black-box model is an effective mean to describe the
complex blast furnace system. In practice, there are often different learning
targets, such as safety and energy saving in industrial applications, depending
on the application. For this reason, this paper proposes a framework to design
a domain knowledge integrated classification model that yields a classifier for
industrial application. Our knowledge incorporated learning scheme allows the
users to create a classifier that identifies "important samples" (whose
misclassifications can lead to severe consequences) more correctly, while
keeping the proper precision of classifying the remaining samples. The
effectiveness of the proposed method has been verified by two real blast
furnace datasets, which guides the operators to utilize their prior experience
for controlling the blast furnace systems better.Comment: 9 pages, 4 figure
Transfer Learning in Information Criteria-based Feature Selection
This paper investigates the effectiveness of transfer learning based on
Mallows' Cp. We propose a procedure that combines transfer learning with
Mallows' Cp (TLCp) and prove that it outperforms the conventional Mallows' Cp
criterion in terms of accuracy and stability. Our theoretical results indicate
that, for any sample size in the target domain, the proposed TLCp estimator
performs better than the Cp estimator by the mean squared error (MSE) metric in
the case of orthogonal predictors, provided that i) the dissimilarity between
the tasks from source domain and target domain is small, and ii) the procedure
parameters (complexity penalties) are tuned according to certain explicit
rules. Moreover, we show that our transfer learning framework can be extended
to other feature selection criteria, such as the Bayesian information
criterion. By analyzing the solution of the orthogonalized Cp, we identify an
estimator that asymptotically approximates the solution of the Cp criterion in
the case of non-orthogonal predictors. Similar results are obtained for the
non-orthogonal TLCp. Finally, simulation studies and applications with real
data demonstrate the usefulness of the TLCp scheme
MoEC: Mixture of Expert Clusters
Sparsely Mixture of Experts (MoE) has received great interest due to its
promising scaling capability with affordable computational overhead. MoE
converts dense layers into sparse experts, and utilizes a gated routing network
to make experts conditionally activated. However, as the number of experts
grows, MoE with outrageous parameters suffers from overfitting and sparse data
allocation. Such problems are especially severe on tasks with limited data,
thus hindering the progress for MoE models to improve performance by scaling
up. In this work, we propose Mixture of Expert Clusters - a general approach to
enable expert layers to learn more diverse and appropriate knowledge by
imposing variance-based constraints on the routing stage. We further propose a
cluster-level expert dropout strategy specifically designed for the expert
cluster structure. Our experiments reveal that MoEC could improve performance
on machine translation and natural language understanding tasks, and raise the
performance upper bound for scaling up experts under limited data. We also
verify that MoEC plays a positive role in mitigating overfitting and sparse
data allocation
Kosmos-2.5: A Multimodal Literate Model
We present Kosmos-2.5, a multimodal literate model for machine reading of
text-intensive images. Pre-trained on large-scale text-intensive images,
Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1)
generating spatially-aware text blocks, where each block of text is assigned
its spatial coordinates within the image, and (2) producing structured text
output that captures styles and structures into the markdown format. This
unified multimodal literate capability is achieved through a shared Transformer
architecture, task-specific prompts, and flexible text representations. We
evaluate Kosmos-2.5 on end-to-end document-level text recognition and
image-to-markdown text generation. Furthermore, the model can be readily
adapted for any text-intensive image understanding task with different prompts
through supervised fine-tuning, making it a general-purpose tool for real-world
applications involving text-rich images. This work also paves the way for the
future scaling of multimodal large language models
Quantum Deep Hedging
Quantum machine learning has the potential for a transformative impact across
industry sectors and in particular in finance. In our work we look at the
problem of hedging where deep reinforcement learning offers a powerful
framework for real markets. We develop quantum reinforcement learning methods
based on policy-search and distributional actor-critic algorithms that use
quantum neural network architectures with orthogonal and compound layers for
the policy and value functions. We prove that the quantum neural networks we
use are trainable, and we perform extensive simulations that show that quantum
models can reduce the number of trainable parameters while achieving comparable
performance and that the distributional approach obtains better performance
than other standard approaches, both classical and quantum. We successfully
implement the proposed models on a trapped-ion quantum processor, utilizing
circuits with up to qubits, and observe performance that agrees well with
noiseless simulation. Our quantum techniques are general and can be applied to
other reinforcement learning problems beyond hedging
Analyzing the Synergy between HCI and TRIZ in Product Innovation through a Systematic Review of the Literature
The boundary between tangible and digital products is getting more fused while rapidly evolving systems for interaction require novel processes that allow for rapidly developed designs, evaluations, and interaction strategies to facilitate efficient and unique user interactions with computer systems. Accordingly, the literature suggests combining creativity enhancement tools or methods with human-computer interaction (HCI) design. The TRIZ base of knowledge appears to be one of the viable options, as shown in the fragmental indications reported in well-acknowledged design textbooks. The goal of this paper is to present a systematic review of the literature to identify and analyze the published approaches and recommendations to support the synergy between HCI and TRIZ from the perspective of product innovation related to HCI, with the aim of providing a first comprehensive classification and discussing about observable differences and gaps. The method followed is the guidelines related to systematic literature review methods. As results, out of 444 initial results, only 17 studies reported the outcomes of the synergy between HCI and TRIZ. The 7 of these studies explored the feasibility of the combination of HCI and TRIZ. The 10 studies attempted to combine and derive approaches in these two fields, and the outcomes defined 3 different integration strategies between HCI and TRIZ. Some conclusions achieved are that the generic solutions to support the synergy between HCI and TRIZ are still rare in the literature. The extraction and combination of different tools caused the randomization of the evaluation criteria, and the performance of the proposals has not been comprehensively evaluated. However, the findings can help inform future developments and provide valuable information about the benefits and drawbacks of different approaches