14 research outputs found

    A Novel Approach to Data Extraction on Hyperlinked Webpages

    Get PDF
    The World Wide Web has an enormous amount of useful data presented as HTML tables. These tables are often linked to other web pages, providing further detailed information to certain attribute values. Extracting schema of such relational tables is a challenge due to the non-existence of a standard format and a lack of published algorithms. We downloaded 15,000 web pages using our in-house developed web-crawler, from various web sites. Tables from the HTML code were extracted and table rows were labeled with appropriate class labels. Conditional random fields (CRF) were used for the classification of table rows, and a nondeterministic finite automaton (NFA) algorithm was designed to identify simple, complex, hyperlinked, or non-linked tables. A simple schema for non-linked tables was extracted and for the linked-tables, relational schema in the form of primary and foreign-keys (PK and FK) were developed. Child tables were concatenated with the parent table’s attribute value (PK), serving as foreign keys (FKs). Resultantly, these tables could assist with performing better and stronger queries using the join operation. A manual checking of the linked web table results revealed a 99% precision and 68% recall values. Our 15,000-strong downloadable corpus and a novel algorithm will provide the basis for further research in this field.publishedVersio

    Implementation of Tuned Schema Merging Approach

    Get PDF
    Schema merging is a process of integrating multiple data sources into a GCS (Global Conceptual Schema). It is pivotal to various application domains, like data ware housing and multi-databases. Schema merging requires the identification of corresponding elements, which is done through schema matching process. In this process, corresponding elements across multiple data sources are identified after the comparison of these data sources with each other. In this way, for a given set of data sources and the correspondence between them, different possibilities for creating GCS can be achieved. In applications like multi-databases and data warehousing, new data sources keep joining in and GCS relations are usually expanded horizontally or vertically. Schema merging approaches usually expand GCS relations horizontally or vertically as new data sources join in. As a result of such expansions, an unbalanced GCS is created which either produces too much NULL values in response to global queries or a result of too many Joins causes poor query processing. In this paper, a novel approach, TuSMe (Tuned Schema Merging) techniqueis introduced to overcome the above mentioned issue via developing a balanced GCS, which will be able to control both vertical and horizontal expansion of GCS relations. The approach employs a weighting mechanism in which the weights are assigned to individual attributes of GCS. These weights reflect the connectedness of GCS attributes in accordance with the attributes of the principle data sources. Moreover, the overall strength of the GCS could be scrutinized by combining these weights. A prototype implementation of TuSMe shows significant improvement against other contemporary state-of-the-art approaches

    Remote health monitoring systems for elderly people: a survey

    Get PDF
    This paper addresses the growing demand for healthcare systems, particularly among the elderly population. The need for these systems arises from the desire to enable patients and seniors to live independently in their homes without relying heavily on their families or caretakers. To achieve substantial improvements in healthcare, it is essential to ensure the continuous development and availability of information technologies tailored explicitly for patients and elderly individuals. The primary objective of this study is to comprehensively review the latest remote health monitoring systems, with a specific focus on those designed for older adults. To facilitate a comprehensive understanding, we categorize these remote monitoring systems and provide an overview of their general architectures. Additionally, we emphasize the standards utilized in their development and highlight the challenges encountered throughout the developmental processes. Moreover, this paper identifies several potential areas for future research, which promise further advancements in remote health monitoring systems. Addressing these research gaps can drive progress and innovation, ultimately enhancing the quality of healthcare services available to elderly individuals. This, in turn, empowers them to lead more independent and fulfilling lives while enjoying the comforts and familiarity of their own homes. By acknowledging the importance of healthcare systems for the elderly and recognizing the role of information technologies, we can address the evolving needs of this population. Through ongoing research and development, we can continue to enhance remote health monitoring systems, ensuring they remain effective, efficient, and responsive to the unique requirements of elderly individuals

    A novel hybrid deep learning model for human activity recognition based on transitional activities

    Get PDF
    In recent years, a plethora of algorithms have been devised for efficient human activity recognition. Most of these algorithms consider basic human activities and neglect postural transitions because of their subsidiary occurrence and short duration. However, postural transitions assume a significant part in the enforcement of an activity recognition framework and cannot be neglected. This work proposes a hybrid multi-model activity recognition approach that employs basic and transition activities by utilizing multiple deep learning models simultaneously. For final classification, a dynamic decision fusion module is introduced. The experiments are performed on the publicly available datasets. The proposed approach achieved a classification accuracy of 96.11% and 98.38% for the transition and basic activities, respectively. The outcomes show that the proposed method is superior to the state-of-the-art methods in terms of accuracy and precision

    Safe Motherhood Applied Research and Training (SMART) Report 3: Changes in knowledge and behavior of women and families

    Get PDF
    The Safe Motherhood Applied Research and Training (SMART) project was conceived as an operations research project designed to test the effectiveness of two different strategies for improving maternal and neonatal health in Pakistan. To evaluate the results of this test, several types of evaluative research were conducted, including qualitative studies of various types, health systems assessments, evaluations of specific components, and household surveys. The household surveys are the subject of this report, which is Report 3 (Changes in knowledge and behavior of women and families) in a series of six. The surveys are two types: a large-scale, before-after household survey (HHS) intended to measure mortality change as well as some other socioeconomic, demographic, and health variables; and a smaller knowledge, attitude and behavior (KAB) survey conducted at baseline, mid-project, and endline and intended to obtain details on behavior and on exposure to SMART project activities

    Comparison of researchers' impact indices.

    No full text
    Researchers contribute to the frontiers of knowledge by establishing facts and reaching new conclusions through systematic investigations, and by subsequently publishing the outcomes of their research findings in the form of research papers. These research publications are indicative of researchers' scientific impact. Different bibliometric indices have been proposed to measure the impact or productivity of a researcher. These indices include publication count, citation count, number of coauthors, h-index, etc. The h-index, since its inception, has been ranked as the foremost impact indicator by many studies. However, as a consequence of the various short comings identified in h-index, some variants of h-index have been proposed. For instance, one dimension which requires significant attention is determining the ability of exceptional performers in a particular research area. In our study, we have compared effectiveness of h-index and some of its recent variants in identifying the exceptional performers of a field. We have also found correlation of h-index with recently proposed indices. A high correlation indicates same effect of these indices as of h-index and low correlation means these indices make non-redundant contribution while ranking potential researchers of a field of study. So far, effectiveness of these indices has not been explored/validated on real data sets of same field. We have considered these variants/modifications of h-index along with h-index and tested on comprehensive data set for the field of Computer Science. The Award winners' data set is considered as the benchmark for the evaluation of these indices for individual researchers. Results show that there is a low correlation of these indices with h-index, and in identifying exceptional performers of a field these indices perform better than h-index

    Schema Interpretation: An Aid to the Schema Analysis in Federated Database Design

    No full text
    A new method for schema analysis is described in which reasoning is based upon the real--world semantics of schema elements. The method distinguishes between intrinsic and in--context semantics, which respectively provide a basis for shallow and deep semantic comparisons between element. Real--world semantics are represent as element interpretations which map elements into a pre--defined common concept model. The method provides a basis for more accurate integration of component schemas within a federated database system. 1. Introduction The schema integration (SI) process by which a federated schema is derived is critical to the effectiveness of a federated database system (FDBS) [SL90]. Schema analysis is one of the four phases of SI, in which semantic relationships between the elements of component schemas are identified [BLN86]. It requires an understanding of, and the ability to capture and reason with, the semantics of those schema elements [YSDK91], but current data models do n..

    Schema Integration of Web Tables (SIWeT)

    No full text
    Schema integration has been mainly applied in database environment whether schemas to be integrated belonged to a single organization or multiple ones. Schema extraction is a relatively new area where schema is extracted from a web table. The extracted schema is not as much concretely defined as in a typical database environment. The work in this paper brings two areas together where extracted schemas from multiple web tables are integrated to form a global schema. The data are also extracted from web tables and placed into global table. This creates a large repository of data of the same domain extracted dynamically from websites which is then available for different types of ad-hoc queries. This work also imposes challenges on schema integration to be studied in the context of schema extraction and other way round

    A Novel Hybrid Ensemble Clustering Technique for Student Performance Prediction

    No full text
    Educational Data Mining (EDM) is a branch of data mining that focuses on extraction of useful knowledge from data generated through academic activities at school, college or at university level. The extracted knowledge can help to perform the academic activities in a better way, so it is useful for students, parents and institutions themselves. One common activity in EDM is students grade prediction with an aim to identify weak or at-risk students. An early identification of such students helps to take supportive measures that may help students to improve. Among a vast number of approaches available in this field, this study mainly focuses on generating a smarter dataset through reduced feature set without compromising the number of records in it and then producing an approach which combines the strengths of classification and clustering for better prediction results. In this study it has been identified that individual features have distinct effect and that removing misclassified data can affect the overall results. Backward selection is adopted using Pearson correlation as a metric to produce smarter dataset with lesser attributes and better accuracy in prediction. After feature set selection, we have applied EMT (Ensemble Meta-Based Tree Model) classification on it to identify best performing classifiers from five families of classifiers. In hybrid approach, first the ensemble clustering is applied on smart dataset and then EMT classification is applied to reevaluate the un-clustered data, which gives a boost in performance and provides us an accuracy of 93%
    corecore