5,483 research outputs found

    PASS-JOIN: A Partition-based Method for Similarity Joins

    Full text link
    As an essential operation in data cleaning, the similarity join has attracted considerable attention from the database community. In this paper, we study string similarity joins with edit-distance constraints, which find similar string pairs from two large sets of strings whose edit distance is within a given threshold. Existing algorithms are efficient either for short strings or for long strings, and there is no algorithm that can efficiently and adaptively support both short strings and long strings. To address this problem, we propose a partition-based method called Pass-Join. Pass-Join partitions a string into a set of segments and creates inverted indices for the segments. Then for each string, Pass-Join selects some of its substrings and uses the selected substrings to find candidate pairs using the inverted indices. We devise efficient techniques to select the substrings and prove that our method can minimize the number of selected substrings. We develop novel pruning techniques to efficiently verify the candidate pairs. Experimental results show that our algorithms are efficient for both short strings and long strings, and outperform state-of-the-art methods on real datasets.Comment: VLDB201

    MyHealthAvatar lifestyle management support for cancer patients

    Get PDF
    MyHealthAvatar (MHA) is built on the latest information and communications technology with the aim of collecting lifestyle and health data to promote citizen's wellbeing. According to the collected data, MHA offers visual analytics of lifestyle data, contributes to individualised disease prediction and prevention, and supports healthy lifestyles and independent living. The iManageCancer project aims to empower patients and strengthen self-management in cancer diseases. Therefore, MHA has contributed to the iManageCancer scenario and provides functionality to the iManageCancer platform in terms of its support of lifestyle management of cancer patients by providing them with services to help their cancer management. This paper presents two different MHA-based Android applications for breast and prostate cancer patients. The components in these apps facilitate health and lifestyle data presentation and analysis, including weight control, activity, mood and sleep data collection, promotion of physical exercise after surgery, questionnaires and helpful information. These components can be used cooperatively to achieve flexible visual analysis of spatiotemporal lifestyle and health data and can also help patients discover information about their disease and its management

    MyEvents: a personal visual analytics approach for mining key events and knowledge discovery in support of personal reminiscence

    Get PDF
    Reminiscence is an important aspect in our life. It preserves precious memories, allows us to form our own identities and encourages us to accept the past. Our work takes advantage of modern sensor technologies to support reminiscence, enabling self-monitoring of personal activities and individual movement in space and time on a daily basis. This paper presents MyEvents, a web-based personal visual analytics platform designed for non-computing experts, that allows for the collection of long-term location and movement data and the generation of event mementos. Our research is focused on two prominent goals in event reminiscence: 1) selection subjectivity and human involvement in the process of self knowledge discovery and memento creation; and 2) the enhancement of event familiarity by presenting target events and their related information for optimal memory recall and reminiscence. A novel multi-significance event ranking model is proposed to determine significant events in the personal history according to user preferences for event category, frequency and regularity. The evaluation results show that MyEvents effectively fulfils the reminiscence goals and tasks.

    Semantic lifting and reasoning on the personalised activity big data repository for healthcare research

    Get PDF
    The fast growing markets of smart health monitoring devices and mobile applications provide opportunities for common citizens to have capability for understanding and managing their own health situations. However, there are many challenges for data engineering and knowledge discovery research to enable efficient extraction of knowledge from data that is collected from heterogonous devices and applications with big volumes and velocity. This paper presents research that initially started with the EC MyHealthAvatar project and is under continual improvement following the project’s completion. The major contribution of the work is a comprehensive big data and semantic knowledge discovery framework which integrates data from varied data resources. The framework applies hybrid database architecture of NoSQL and RDF repositories with introductions for semantic oriented data mining and knowledge lifting algorithms. The activity stream data is collected through Kafka’s big data processing component. The motivation of the research is to enhance the knowledge management, discovery capabilities and efficiency to support further accurate health risk analysis and lifestyle summarisation
    • …
    corecore