89 research outputs found

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    Extracting causation knowledge from natural language texts.

    Get PDF
    Chan Ki, Cecia.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 95-99).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Our Contributions --- p.4Chapter 1.2 --- Thesis Organization --- p.5Chapter 2 --- Related Work --- p.6Chapter 2.1 --- Using Knowledge-based Inferences --- p.7Chapter 2.2 --- Using Linguistic Techniques --- p.8Chapter 2.2.1 --- Using Linguistic Clues --- p.8Chapter 2.2.2 --- Using Graphical Patterns --- p.9Chapter 2.2.3 --- Using Lexicon-syntactic Patterns of Causative Verbs --- p.10Chapter 2.2.4 --- Comparisons with Our Approach --- p.10Chapter 2.3 --- Discovery of Extraction Patterns for Extracting Relations --- p.11Chapter 2.3.1 --- Snowball system --- p.12Chapter 2.3.2 --- DIRT system --- p.12Chapter 2.3.3 --- Comparisons with Our Approach --- p.13Chapter 3 --- Semantic Expectation-based Knowledge Extraction --- p.14Chapter 3.1 --- Semantic Expectations --- p.14Chapter 3.2 --- Semantic Template --- p.16Chapter 3.2.1 --- Causation Semantic Template --- p.16Chapter 3.3 --- Sentence Templates --- p.17Chapter 3.4 --- Consequence and Reason Templates --- p.22Chapter 3.5 --- Causation Knowledge Extraction Framework --- p.25Chapter 3.5.1 --- Template Design --- p.25Chapter 3.5.2 --- Sentence Screening --- p.27Chapter 3.5.3 --- Semantic Processing --- p.28Chapter 4 --- Using Thesaurus and Pattern Discovery for SEKE --- p.33Chapter 4.1 --- Using a Thesaurus --- p.34Chapter 4.2 --- Pattern Discovery --- p.37Chapter 4.2.1 --- Use of Semantic Expectation-based Knowledge Extraction --- p.37Chapter 4.2.2 --- Use of Part of Speech Information --- p.39Chapter 4.2.3 --- Pattern Representation --- p.39Chapter 4.2.4 --- Constructing the Patterns --- p.40Chapter 4.2.5 --- Merging the Patterns --- p.43Chapter 4.3 --- Pattern Matching --- p.44Chapter 4.3.1 --- Matching Score --- p.46Chapter 4.3.2 --- Support of Patterns --- p.48Chapter 4.3.3 --- Relevancy of Sentence Templates --- p.48Chapter 4.4 --- Applying the Newly Discovered Patterns --- p.49Chapter 5 --- Applying SEKE on Hong Kong Stock Market Domain --- p.52Chapter 5.1 --- Template Design --- p.53Chapter 5.1.1 --- Semantic Templates --- p.53Chapter 5.1.2 --- Sentence Templates --- p.53Chapter 5.1.3 --- Consequence and Reason Templates: --- p.55Chapter 5.2 --- Pattern Discovery --- p.58Chapter 5.2.1 --- Support of Patterns --- p.58Chapter 5.2.2 --- Relevancy of Sentence Templates --- p.58Chapter 5.3 --- Causation Knowledge Extraction Result --- p.58Chapter 5.3.1 --- Evaluation Approach --- p.61Chapter 5.3.2 --- Parameter Investigations --- p.61Chapter 5.3.3 --- Experimental Results --- p.65Chapter 5.3.4 --- Knowledge Discovered --- p.68Chapter 5.3.5 --- Parameter Effect --- p.75Chapter 6 --- Applying SEKE on Global Warming Domain --- p.80Chapter 6.1 --- Template Design --- p.80Chapter 6.1.1 --- Semantic Templates --- p.81Chapter 6.1.2 --- Sentence Templates --- p.81Chapter 6.1.3 --- Consequence and Reason Templates --- p.83Chapter 6.2 --- Pattern Discovery --- p.85Chapter 6.2.1 --- Support of Patterns --- p.85Chapter 6.2.2 --- Relevancy of Sentence Templates --- p.85Chapter 6.3 --- Global Warming Domain Result --- p.85Chapter 6.3.1 --- Evaluation Approach --- p.85Chapter 6.3.2 --- Experimental Results --- p.88Chapter 6.3.3 --- Knowledge Discovered --- p.89Chapter 7 --- Conclusions and Future Directions --- p.92Chapter 7.1 --- Conclusions --- p.92Chapter 7.2 --- Future Directions --- p.93Bibliography --- p.95Chapter A --- Penn Treebank Part of Speech Tags --- p.10

    A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions

    Full text link
    In recent decades, social network anonymization has become a crucial research field due to its pivotal role in preserving users' privacy. However, the high diversity of approaches introduced in relevant studies poses a challenge to gaining a profound understanding of the field. In response to this, the current study presents an exhaustive and well-structured bibliometric analysis of the social network anonymization field. To begin our research, related studies from the period of 2007-2022 were collected from the Scopus Database then pre-processed. Following this, the VOSviewer was used to visualize the network of authors' keywords. Subsequently, extensive statistical and network analyses were performed to identify the most prominent keywords and trending topics. Additionally, the application of co-word analysis through SciMAT and the Alluvial diagram allowed us to explore the themes of social network anonymization and scrutinize their evolution over time. These analyses culminated in an innovative taxonomy of the existing approaches and anticipation of potential trends in this domain. To the best of our knowledge, this is the first bibliometric analysis in the social network anonymization field, which offers a deeper understanding of the current state and an insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure

    Splitting hybrid Make-To-Order and Make-To-Stock demand profiles

    Get PDF
    In this paper a demand time series is analysed to support Make-To-Stock (MTS) and Make-To-Order (MTO) production decisions. Using a purely MTS production strategy based on the given demand can lead to unnecessarily high inventory levels thus it is necessary to identify likely MTO episodes. This research proposes a novel outlier detection algorithm based on special density measures. We divide the time series' histogram into three clusters. One with frequent-low volume covers MTS items whilst a second accounts for high volumes which is dedicated to MTO items. The third cluster resides between the previous two with its elements being assigned to either the MTO or MTS class. The algorithm can be applied to a variety of time series such as stationary and non-stationary ones. We use empirical data from manufacturing to study the extent of inventory savings. The percentage of MTO items is reflected in the inventory savings which were shown to be an average of 18.1%.Comment: demand analysis; time series; outlier detection; production strategy; Make-To-Order(MTO); Make-To-Stock(MTS); 15 pages, 9 figure

    Conception d'une légende interactive et forable pour le SOLAP

    Get PDF
    Afin de palier au manque d'efficacité des SIG en tant qu'outil d'aide à la décision (granularités multiples, rapidité, convivialité, temporalité), différentes saveurs d'outils SOLAP (Spatial OLAP) ont vu le jour dans les centres de recherche et fournisseurs de logiciels (CRG/Kheops/Syntell, SFU/DBMiner, Proclarity, Cognos, Microsoft, Beyond 20/20, ESRI, MapInfo, etc.). Combinant des fonctions SIG avec l'informatique décisionnelle (entrepôts de données, OLAP, data mining), le SOLAP est décrit comme un "logiciel de navigation rapide et facile dans les bases de données spatiales qui offre plusieurs niveaux de granularité d'information, plusieurs époques, plusieurs thèmes et plusieurs modes de visualisation synchronisés ou non: cartes, tableaux et graphiques statistiques (Bédard 2004). Le SOLAP facilite l'exploration volontaire des données spatiales pour aider l'utilisateur à détecter les corrélations d'informations, les regroupements potentiels, les tendances dissimulées dans un amas de données à référence spatiale, etc. Le tout se fait par simple sélection/click de souris (pas de langage SQL) et des opérations simples comme : le forage, le remontage ou le forage latéral. Il permet à l'utilisateur de se focaliser sur les résultats des opérations au lieu de l'analyse du processus de navigation. Le SOLAP étant amené à prendre de l'essor au niveau des fonctions qu'il propose, il devient important de proposer des améliorations à son interface à l'usager de manière à conserver sa facilité d'utilisation. Le développement d'une légende interactive et forable fut la première solution en ce genre proposée par Bédard (Bédard 1997). Nous avons donc retenu cette piste pour la présente recherche, étudié la sémiologie graphique et son applicabilité à l'analyse multidimensionnelle, analysé ce qui existait dans des domaines connexes, exploré différentes alternatives permettant de résoudre le problème causé par l'enrichissement des fonctions de navigation, construit un prototype, recueilli des commentaires d'utilisateurs SOLAP et proposé une solution. Tout au long de cette recherche, nous avons été confrontés à une absence de littérature portant explicitement sur le sujet (les SOLAP étant trop nouveaux), à des corpus théoriques qu'il fallait adapter (sémiologie, interface homme-machine, visualisation scientifique, cartographie dynamique) et à des besoins en maquettes et prototypes pour illustrer les solutions envisagées. Finalement, cette recherche propose une solution parmi plusieurs; cependant, son principal intérêt est davantage l'ensemble des réflexions et considérations mises de l'avant tout au long du mémoire pour arriver au résultat proposé que la solution proposée en elle-même. Ce sont ces réflexions théoriques et pratiques qui permettront d'améliorer l'interface à l'usager de tout outil SOLAP grâce au nouveau concept de légende interactive et forable
    • …
    corecore