82 research outputs found
Customs Import Declaration Datasets
Given the huge volume of cross-border flows, effective and efficient control
of trade becomes more crucial in protecting people and society from illicit
trade. However, limited accessibility of the transaction-level trade datasets
hinders the progress of open research, and lots of customs administrations have
not benefited from the recent progress in data-based risk management. In this
paper, we introduce an import declaration dataset to facilitate the
collaboration between domain experts in customs administrations and researchers
from diverse domains, such as data science and machine learning. The dataset
contains 54,000 artificially generated trades with 22 key attributes, and it is
synthesized with conditional tabular GAN while maintaining correlated features.
Synthetic data has several advantages. First, releasing the dataset is free
from restrictions that do not allow disclosing the original import data. The
fabrication step minimizes the possible identity risk which may exist in trade
statistics. Second, the published data follow a similar distribution to the
source data so that it can be used in various downstream tasks. Hence, our
dataset can be used as a benchmark for testing the performance of any
classification algorithm. With the provision of data and its generation
process, we open baseline codes for fraud detection tasks, as we empirically
show that more advanced algorithms can better detect fraud.Comment: Datasets: https://github.com/Seondong/Customs-Declaration-Dataset
Understanding Open-Set Recognition by Jacobian Norm of Representation
In contrast to conventional closed-set recognition, open-set recognition
(OSR) assumes the presence of an unknown class, which is not seen to a model
during training. One predominant approach in OSR is metric learning, where a
model is trained to separate the inter-class representations of known class
data. Numerous works in OSR reported that, even though the models are trained
only with the known class data, the models become aware of the unknown, and
learn to separate the unknown class representations from the known class
representations. This paper analyzes this emergent phenomenon by observing the
Jacobian norm of representation. We theoretically show that minimizing the
intra-class distances within the known set reduces the Jacobian norm of known
class representations while maximizing the inter-class distances within the
known set increases the Jacobian norm of the unknown class. The closed-set
metric learning thus separates the unknown from the known by forcing their
Jacobian norm values to differ. We empirically validate our theoretical
framework with ample pieces of evidence using standard OSR datasets. Moreover,
under our theoretical framework, we explain how the standard deep learning
techniques can be helpful for OSR and use the framework as a guiding principle
to develop an effective OSR model
Loss-resilient photonic entanglement swapping using optical hybrid states
We propose a scheme of loss-resilient entanglement swapping between two distant parties via an imperfect optical channel. In this scheme, two copies of hybrid entangled states are prepared and the continuous-variable parts propagate through lossy media. In order to perform successful entanglement swapping, several different measurement schemes are considered for the continuous-variable parts such as single-photon detection for ideal cases and a homodyne detection for practical cases. We find that the entanglement swapping using hybrid states with small amplitudes offers larger entanglement than the discrete-variable entanglement swapping in the presence of large losses. Remarkably, this hybrid scheme still offers excellent robustness of entanglement to the detection inefficiency. Thus, the proposed scheme could be used for the practical quantum key distribution in hybrid optical states under photon losses
- …