202 research outputs found
An Efficient Source Model Selection Framework in Model Databases
With the explosive increase of big data, training a Machine Learning (ML)
model becomes a computation-intensive workload, which would take days or even
weeks. Thus, reusing an already trained model has received attention, which is
called transfer learning. Transfer learning avoids training a new model from
scratch by transferring knowledge from a source task to a target task. Existing
transfer learning methods mostly focus on how to improve the performance of the
target task through a specific source model, and assume that the source model
is given. Although many source models are available, it is difficult for data
scientists to select the best source model for the target task manually. Hence,
how to efficiently select a suitable source model in a model database for model
reuse is an interesting but unsolved problem. In this paper, we propose SMS, an
effective, efficient, and flexible source model selection framework. SMS is
effective even when the source and target datasets have significantly different
data labels, and is flexible to support source models with any type of
structure, and is efficient to avoid any training process. For each source
model, SMS first vectorizes the samples in the target dataset into soft labels
by directly applying this model to the target dataset, then uses Gaussian
distributions to fit for clusters of soft labels, and finally measures the
distinguishing ability of the source model using Gaussian mixture-based metric.
Moreover, we present an improved SMS (I-SMS), which decreases the output number
of the source model. I-SMS can significantly reduce the selection time while
retaining the selection performance of SMS. Extensive experiments on a range of
practical model reuse workloads demonstrate the effectiveness and efficiency of
SMS
UAV 3-D path planning based on MOEA/D with adaptive areal weight adjustment
Unmanned aerial vehicles (UAVs) are desirable platforms for time-efficient
and cost-effective task execution. 3-D path planning is a key challenge for
task decision-making. This paper proposes an improved multi-objective
evolutionary algorithm based on decomposition (MOEA/D) with an adaptive areal
weight adjustment (AAWA) strategy to make a tradeoff between the total flight
path length and the terrain threat. AAWA is designed to improve the diversity
of the solutions. More specifically, AAWA first removes a crowded individual
and its weight vector from the current population and then adds a sparse
individual from the external elite population to the current population. To
enable the newly-added individual to evolve towards the sparser area of the
population in the objective space, its weight vector is constructed by the
objective function value of its neighbors. The effectiveness of MOEA/D-AAWA is
validated in twenty synthetic scenarios with different number of obstacles and
four realistic scenarios in comparison with other three classical methods.Comment: 23 pages,11 figure
Protein degradation of black carp (Mylopharyngodon piceus) muscle during cold storage
This study investigated the effects of cold storage at different temperatures (4, -0.5, -3, and -20 degrees C) on protein degradation and its relationship to structural changes of black carp muscle. At -0.5 and 4 degrees C, major structural changes occurred, including the formation of gaps between myofibers and myofibrils, breakage of myofibrils and myofibers, and degradation of sarcoplasmic reticulum. Gel-based proteomic analysis showed that these structural changes were accompanied by degradation of a series of myofibrillar proteins, including titin, nebulin, troponin, myosin, myomesin, myosin-binding protein, and a-actinin. Loss of extractable gelatinolytic and caseinolytic protease activities was also observed. At -3 and -20 degrees C, formation of ice crystals was the most noticeable change. The major proteins were degraded at different locations in the black carp muscle, and gelatinolytic and caseinolytic proteases appear to contribute to the degradation of those proteins.Peer reviewe
Pivot-based Metric Indexing
The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivot-based indexing techniques for metric data has been proposed, which reduces the number of potentially expensive similarity comparisons by exploiting the triangle inequality for pruning and validation. However, no comprehensive empirical study of those techniques exists. Existing studies each offers only a narrower coverage, and they use different pivot selection strategies that affect performance substantially and thus render cross-study comparisons difficult or impossible. We offer a survey of existing pivot-based indexing techniques, and report a comprehensive empirical comparison of their construction costs, update efficiency, storage sizes, and similarity search performance. As part of the study, we provide modifications for two existing indexing techniques to make them more competitive. The findings and insights obtained from the study reveal different strengths and weaknesses of different indexing techniques, and offer guidance on selecting an appropriate indexing technique for a given setting.</jats:p
Automated Generation of Data Quality Checks by Identifying Key Dimensions and Values
Monitoring the quality and integrity of the data stored in a data warehouse is necessary to ensure correct and reliable operation. Such checks can include detecting anomalous and/or impermissible values for the metrics and dimensions present in the database tables. Determining useful and effective checks and bounds is difficult and tedious, and requires high levels of expertise and familiarity with the data. This disclosure describes techniques for automating the creation of data quality checks based on examining database schema and contents to identify important dimensions and values for data quality checks. The techniques utilize the observation that, in practice, a subset of the values of a database field are likely of operational importance. These are automatically identified based on calculating importance-adjusted data quality coverage by assigning importance to metrics, dimensions, and dimension values. Data quality checks are automatically generated for effective coverage of the key dimensions and values. The generation of checks can involve selecting from a repository of historically effective checks generated by experts and/or applying time series anomaly detection to metrics in entirety or sliced by key dimension values
Data Quality Coverage
The quality of data is typically measured on a subset of all available data. It is of interest to know if such a measurement, performed on a subset of data, is representative of the entire corpus of data. This disclosure describes techniques that use historical data and metadata of a given time series to determine the set of useful data quality checks that can exist. The set of useful data quality checks is compared to the actual set of data quality checks to provide a percentage of data quality coverage that a given data set has
- …