4 research outputs found
Data Science With Excel
The stages in data science consist of several stages, one of which is data preparation. At this stage, many things are done so that the dirty data becomes clean data that is ready for modeling. Many applications offer data science convenience in terms of processing data. One of them is excel, this application from Microsoft can perform data processing so that the data is ready for modeling. However, there are limitations in using excel. The maximum number of rows that excel has is only 1,048,576 and the number of columns is 16,384. However, if you process data of no more than 1 million rows, excel can still handle it by using features such as error detection, removing duplicate data, correcting error values, detecting outlier values, handling missing data and validating data. This study shows some of these features along with examples of their use
Attribute Simulation for Item Embedding Enhancement in Multi-interest Recommendation
Although multi-interest recommenders have achieved significant progress in
the matching stage, our research reveals that existing models tend to exhibit
an under-clustered item embedding space, which leads to a low discernibility
between items and hampers item retrieval. This highlights the necessity for
item embedding enhancement. However, item attributes, which serve as effective
and straightforward side information for enhancement, are either unavailable or
incomplete in many public datasets due to the labor-intensive nature of manual
annotation tasks. This dilemma raises two meaningful questions: 1. Can we
bypass manual annotation and directly simulate complete attribute information
from the interaction data? And 2. If feasible, how to simulate attributes with
high accuracy and low complexity in the matching stage?
In this paper, we first establish an inspiring theoretical feasibility that
the item-attribute correlation matrix can be approximated through elementary
transformations on the item co-occurrence matrix. Then based on formula
derivation, we propose a simple yet effective module, SimEmb (Item Embedding
Enhancement via Simulated Attribute), in the multi-interest recommendation of
the matching stage to implement our findings. By simulating attributes with the
co-occurrence matrix, SimEmb discards the item ID-based embedding and employs
the attribute-weighted summation for item embedding enhancement. Comprehensive
experiments on four benchmark datasets demonstrate that our approach notably
enhances the clustering of item embedding and significantly outperforms SOTA
models with an average improvement of 25.59% on [email protected]: This paper has been accepted by the 17th ACM International Conference
on Web Search and Data Mining (WSDM 2024). The camera-ready version will be
available in the conference proceeding
Revealing route bias in air transport data : the case of the Bureau of Transport Statistics (BTS), Origin-Destination Survey (DB1B)
status: publishe
Networking Architecture and Key Technologies for Human Digital Twin in Personalized Healthcare: A Comprehensive Survey
Digital twin (DT), refers to a promising technique to digitally and
accurately represent actual physical entities. One typical advantage of DT is
that it can be used to not only virtually replicate a system's detailed
operations but also analyze the current condition, predict future behaviour,
and refine the control optimization. Although DT has been widely implemented in
various fields, such as smart manufacturing and transportation, its
conventional paradigm is limited to embody non-living entities, e.g., robots
and vehicles. When adopted in human-centric systems, a novel concept, called
human digital twin (HDT) has thus been proposed. Particularly, HDT allows in
silico representation of individual human body with the ability to dynamically
reflect molecular status, physiological status, emotional and psychological
status, as well as lifestyle evolutions. These prompt the expected application
of HDT in personalized healthcare (PH), which can facilitate remote monitoring,
diagnosis, prescription, surgery and rehabilitation. However, despite the large
potential, HDT faces substantial research challenges in different aspects, and
becomes an increasingly popular topic recently. In this survey, with a specific
focus on the networking architecture and key technologies for HDT in PH
applications, we first discuss the differences between HDT and conventional
DTs, followed by the universal framework and essential functions of HDT. We
then analyze its design requirements and challenges in PH applications. After
that, we provide an overview of the networking architecture of HDT, including
data acquisition layer, data communication layer, computation layer, data
management layer and data analysis and decision making layer. Besides reviewing
the key technologies for implementing such networking architecture in detail,
we conclude this survey by presenting future research directions of HDT