4 research outputs found

    Data Science With Excel

    Get PDF
    The stages in data science consist of several stages, one of which is data preparation. At this stage, many things are done so that the dirty data becomes clean data that is ready for modeling. Many applications offer data science convenience in terms of processing data. One of them is excel, this application from Microsoft can perform data processing so that the data is ready for modeling. However, there are limitations in using excel. The maximum number of rows that excel has is only 1,048,576 and the number of columns is 16,384. However, if you process data of no more than 1 million rows, excel can still handle it by using features such as error detection, removing duplicate data, correcting error values, detecting outlier values, handling missing data and validating data. This study shows some of these features along with examples of their use

    Attribute Simulation for Item Embedding Enhancement in Multi-interest Recommendation

    Full text link
    Although multi-interest recommenders have achieved significant progress in the matching stage, our research reveals that existing models tend to exhibit an under-clustered item embedding space, which leads to a low discernibility between items and hampers item retrieval. This highlights the necessity for item embedding enhancement. However, item attributes, which serve as effective and straightforward side information for enhancement, are either unavailable or incomplete in many public datasets due to the labor-intensive nature of manual annotation tasks. This dilemma raises two meaningful questions: 1. Can we bypass manual annotation and directly simulate complete attribute information from the interaction data? And 2. If feasible, how to simulate attributes with high accuracy and low complexity in the matching stage? In this paper, we first establish an inspiring theoretical feasibility that the item-attribute correlation matrix can be approximated through elementary transformations on the item co-occurrence matrix. Then based on formula derivation, we propose a simple yet effective module, SimEmb (Item Embedding Enhancement via Simulated Attribute), in the multi-interest recommendation of the matching stage to implement our findings. By simulating attributes with the co-occurrence matrix, SimEmb discards the item ID-based embedding and employs the attribute-weighted summation for item embedding enhancement. Comprehensive experiments on four benchmark datasets demonstrate that our approach notably enhances the clustering of item embedding and significantly outperforms SOTA models with an average improvement of 25.59% on [email protected]: This paper has been accepted by the 17th ACM International Conference on Web Search and Data Mining (WSDM 2024). The camera-ready version will be available in the conference proceeding

    Networking Architecture and Key Technologies for Human Digital Twin in Personalized Healthcare: A Comprehensive Survey

    Full text link
    Digital twin (DT), refers to a promising technique to digitally and accurately represent actual physical entities. One typical advantage of DT is that it can be used to not only virtually replicate a system's detailed operations but also analyze the current condition, predict future behaviour, and refine the control optimization. Although DT has been widely implemented in various fields, such as smart manufacturing and transportation, its conventional paradigm is limited to embody non-living entities, e.g., robots and vehicles. When adopted in human-centric systems, a novel concept, called human digital twin (HDT) has thus been proposed. Particularly, HDT allows in silico representation of individual human body with the ability to dynamically reflect molecular status, physiological status, emotional and psychological status, as well as lifestyle evolutions. These prompt the expected application of HDT in personalized healthcare (PH), which can facilitate remote monitoring, diagnosis, prescription, surgery and rehabilitation. However, despite the large potential, HDT faces substantial research challenges in different aspects, and becomes an increasingly popular topic recently. In this survey, with a specific focus on the networking architecture and key technologies for HDT in PH applications, we first discuss the differences between HDT and conventional DTs, followed by the universal framework and essential functions of HDT. We then analyze its design requirements and challenges in PH applications. After that, we provide an overview of the networking architecture of HDT, including data acquisition layer, data communication layer, computation layer, data management layer and data analysis and decision making layer. Besides reviewing the key technologies for implementing such networking architecture in detail, we conclude this survey by presenting future research directions of HDT
    corecore