Search CORE

202 research outputs found

An Efficient Source Model Selection Framework in Model Databases

Author: Chen Lu
Du Yuntao
Gao Yunjun
Yang Keyu
Zhao Minjun
Publication venue
Publication date: 24/11/2021
Field of study

With the explosive increase of big data, training a Machine Learning (ML) model becomes a computation-intensive workload, which would take days or even weeks. Thus, reusing an already trained model has received attention, which is called transfer learning. Transfer learning avoids training a new model from scratch by transferring knowledge from a source task to a target task. Existing transfer learning methods mostly focus on how to improve the performance of the target task through a specific source model, and assume that the source model is given. Although many source models are available, it is difficult for data scientists to select the best source model for the target task manually. Hence, how to efficiently select a suitable source model in a model database for model reuse is an interesting but unsolved problem. In this paper, we propose SMS, an effective, efficient, and flexible source model selection framework. SMS is effective even when the source and target datasets have significantly different data labels, and is flexible to support source models with any type of structure, and is efficient to avoid any training process. For each source model, SMS first vectorizes the samples in the target dataset into soft labels by directly applying this model to the target dataset, then uses Gaussian distributions to fit for clusters of soft labels, and finally measures the distinguishing ability of the source model using Gaussian mixture-based metric. Moreover, we present an improved SMS (I-SMS), which decreases the output number of the source model. I-SMS can significantly reduce the selection time while retaining the selection performance of SMS. Extensive experiments on a range of practical model reuse workloads demonstrate the effectiveness and efficiency of SMS

arXiv.org e-Print Archive

UAV 3-D path planning based on MOEA/D with adaptive areal weight adjustment

Author: Liu Huan
Wu Guohua
Wu Keyu
Xiao Yougang
Yang Hao
Publication venue
Publication date: 20/08/2023
Field of study

Unmanned aerial vehicles (UAVs) are desirable platforms for time-efficient and cost-effective task execution. 3-D path planning is a key challenge for task decision-making. This paper proposes an improved multi-objective evolutionary algorithm based on decomposition (MOEA/D) with an adaptive areal weight adjustment (AAWA) strategy to make a tradeoff between the total flight path length and the terrain threat. AAWA is designed to improve the diversity of the solutions. More specifically, AAWA first removes a crowded individual and its weight vector from the current population and then adds a sparse individual from the external elite population to the current population. To enable the newly-added individual to evolve towards the sparser area of the population in the objective space, its weight vector is constructed by the objective function value of its neighbors. The effectiveness of MOEA/D-AAWA is validated in twenty synthetic scenarios with different number of obstacles and four realistic scenarios in comparison with other three classical methods.Comment: 23 pages,11 figure

arXiv.org e-Print Archive

Ambient rendezvous: Energy efficient neighbor discovery via acoustic sensing

Author: LIU Yunhao
NI Lionel M.
WANG Keyu
YANG Zheng
ZHOU Zimu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2015
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Protein degradation of black carp (Mylopharyngodon piceus) muscle during cold storage

Author: Bao Yulong
Ertbjerg Per
Regenstein Joe M.
Wang Keyu
Yang Hongxu
Zhou Peng
Publication venue
Publication date: 05/03/2020
Field of study

This study investigated the effects of cold storage at different temperatures (4, -0.5, -3, and -20 degrees C) on protein degradation and its relationship to structural changes of black carp muscle. At -0.5 and 4 degrees C, major structural changes occurred, including the formation of gaps between myofibers and myofibrils, breakage of myofibrils and myofibers, and degradation of sarcoplasmic reticulum. Gel-based proteomic analysis showed that these structural changes were accompanied by degradation of a series of myofibrillar proteins, including titin, nebulin, troponin, myosin, myomesin, myosin-binding protein, and a-actinin. Loss of extractable gelatinolytic and caseinolytic protease activities was also observed. At -3 and -20 degrees C, formation of ice crystals was the most noticeable change. The major proteins were degraded at different locations in the black carp muscle, and gelatinolytic and caseinolytic proteases appear to contribute to the degradation of those proteins.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Pivot-based Metric Indexing

Author: CHEN Lu
GAO Yunjun
JENSEN Christian S.
YANG Hanyu
YANG Keyu
ZHENG Baihua
Publication venue: 'VLDB Endowment'
Publication date: 01/06/2017
Field of study

The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivot-based indexing techniques for metric data has been proposed, which reduces the number of potentially expensive similarity comparisons by exploiting the triangle inequality for pruning and validation. However, no comprehensive empirical study of those techniques exists. Existing studies each offers only a narrower coverage, and they use different pivot selection strategies that affect performance substantially and thus render cross-study comparisons difficult or impossible. We offer a survey of existing pivot-based indexing techniques, and report a comprehensive empirical comparison of their construction costs, update efficiency, storage sizes, and similarity search performance. As part of the study, we provide modifications for two existing indexing techniques to make them more competitive. The findings and insights obtained from the study reveal different strengths and weaknesses of different indexing techniques, and offer guidance on selecting an appropriate indexing technique for a given setting.</jats:p

Crossref

Institutional Knowledge at Singapore Management University

VBN

Automated Generation of Data Quality Checks by Identifying Key Dimensions and Values

Author: Cruz David Rissato
Cunningham Emmett
Ezete Chioma
He Keyu
Lee Steven
Lees W. Max
Li Mingyang
Liu Yang
Wu Eric
Publication venue: Technical Disclosure Commons
Publication date: 18/11/2021
Field of study

Monitoring the quality and integrity of the data stored in a data warehouse is necessary to ensure correct and reliable operation. Such checks can include detecting anomalous and/or impermissible values for the metrics and dimensions present in the database tables. Determining useful and effective checks and bounds is difficult and tedious, and requires high levels of expertise and familiarity with the data. This disclosure describes techniques for automating the creation of data quality checks based on examining database schema and contents to identify important dimensions and values for data quality checks. The techniques utilize the observation that, in practice, a subset of the values of a database field are likely of operational importance. These are automatically identified based on calculating importance-adjusted data quality coverage by assigning importance to metrics, dimensions, and dimension values. Data quality checks are automatically generated for effective coverage of the key dimensions and values. The generation of checks can involve selecting from a repository of historically effective checks generated by experts and/or applying time series anomaly detection to metrics in entirety or sliced by key dimension values

Technical Disclosure Common

Data Quality Coverage

Author: Cruz David Rissato
Cunningham Emmett
Ezete Chioma
He Keyu
Lee Steven
Lees W. Max
Li Mingyang
Liu Yang
Wu Eric
Publication venue: Technical Disclosure Commons
Publication date: 13/09/2021
Field of study

The quality of data is typically measured on a subset of all available data. It is of interest to know if such a measurement, performed on a subset of data, is representative of the entire corpus of data. This disclosure describes techniques that use historical data and metadata of a given time series to determine the set of useful data quality checks that can exist. The set of useful data quality checks is compared to the actual set of data quality checks to provide a percentage of data quality coverage that a given data set has

Technical Disclosure Common