64 research outputs found

    Unsupervised Terminological Ontology Learning based on Hierarchical Topic Modeling

    Full text link
    In this paper, we present hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to traditional topic models, hrLDA relies on noun phrases instead of unigrams, considers syntax and document structures, and enriches topic hierarchies with topic relations. Through a series of experiments, we demonstrate the superiority of hrLDA over existing topic models, especially for building hierarchies. Furthermore, we illustrate the robustness of hrLDA in the settings of noisy data sets, which are likely to occur in many practical scenarios. Our ontology evaluation results show that ontologies extracted from hrLDA are very competitive with the ontologies created by domain experts

    Toward Effective Knowledge Discovery in Social Media Streams

    Get PDF
    The last few decades have seen an unprecedented growth in the amount of new data. New computing and communications resources, such as cloud data platforms and mo- bile devices have enabled individuals to contribute new ideas, share points of view and exchange newsworthy bits with each other at a previously unfathomable rate. While there are many ways a modern person can communicate digitally with others, social media outlets, such as Twitter or Facebook have been occupying much of the focus of inter-person social networking in recent years. The millions of pieces of content published on social media sites have been both a blessing and a curse for those trying to make sense of the discourse. On one hand, the sheer amount of easily available, real time, contextually relevant content has been a cause of much excitement in academia and the industry. On the other hand, however, the amount of new diverse content that is being continuously published on social sites makes it difficult for researchers and industry participants to effectively grasp. Therefore, the goal of this thesis is to discover a set of approaches and techniques that would help enable data miners to quickly develop intuitions regarding the happenings in the social media space. To that aim, I concentrate on effectively visualizing social media streams as hierarchical structures, as such structures have been shown to be useful in human sense makingPh.D., Information Studies -- Drexel University, 201

    Future Intelligent Systems and Networks 2019

    Get PDF
    In this Special Issue, we present current developments and future directions of future intelligent systems and networks. This is the second Special Issue regarding the future of the Internet. This subject remains of interest for firms applying technological possibilities to promote more innovative business models. This Special Issue widens the application of intelligent systems and networks to firms so that they can evolve to more innovative models. The five contributions highlight useful applications, business models, or innovative practices based on intelligent systems and networks. We hope our findings become an inspiration for firms operating in various industries

    Mining diverse consumer preferences for bundling and recommendation

    Get PDF

    Experimental Evaluation of Representation Models for Content Recommendation in Microblogging Services

    Get PDF
    Οι microblogging υπηρεσίες αποτελούν έναν ευρέως διαδεδομένο τρόπο ανταλλαγής πληροφοριών και επικοινωνίας σε πραγματικό χρόνο. Το Twitter είναι η πιο δημοφιλής microblogging υπηρεσία, αφού επί του παρόντος συγκεντρώνει 300 εκατομμύρια ενεργούς χρήστες μηνιαίως και καταγράφει 500 εκατομμύρια tweets ημερησίως. Για να αντιμετωπιστεί ο καταιγισμός πληροφοριών των χρηστών του Twitter, έχουν προταθεί ποικίλες μέθοδοι συστάσεων για την ανακατάταξη των tweets στο χρονολόγιο ενός χρήστη, σύμφωνα με τα ενδιαφέροντά του. Στη παρούσα διπλωματική εργασία εστιάζουμε σε τεχνικές που αρχικά κατασκευάζουν ένα μοντέλο για κάθε χρήστη ξεχωριστά, με στόχο να απεικονίσουν τις προτιμήσεις του και στη συνέχεια κατατάσσουν τα tweets του χρήστη με βάση την ομοιότητά τους με το μοντέλο αυτό. Στη βιβλιογραφία, μέχρι στιγμής, δεν υπάρχει περιεκτική αποτίμηση των στρατηγικών μοντελοποίησης χρηστών. Για να καλύψουμε το κενό αυτό, εξετάζουμε διεξοδικά σε ένα πραγματικό σύνολο δεδομένων του Twitter, σύγχρονες μεθόδους για τη μοντελοποίηση των προτιμήσεων ενός χρήστη, χρησιμοποιώντας αποκλειστικά πληροφορία σε μορφή κειμένου. Ο στόχος μας είναι να προσδιορίσουμε το πιο αποδοτικό μοντέλο χρήστη σε σχέση με τα ακόλουθα κριτήρια: (1) την πηγή της πληροφορίας σχετική με tweets που χρησιμοποιείται για την μοντελοποίηση, (2) το είδος του χρήστη, όπως προσδιορίζεται από τη σχέση μεταξύ της συχνότητας των tweets που ανεβάζει ο ίδιος και της συχνότητας αυτών που λαμβάνει, (3) τα χαρακτηριστικά της λειτουργικότητάς του, όπως προκύπτουν από μια πρωτότυπη ταξινόμηση, (4) την ευρωστία του σε σχέση με τις εσωτερικές του παραμέτρους. Τα αποτελέσματά μας μπορούν να αξιοποιηθούν για την ρύθμιση και ερμηνεία μοντέλων χρηστών βασισμένων σε κείμενο, με στόχο συστάσεις σε microblogging υπηρεσίες και λειτουργούν σαν σημείο εκκίνησης για την ενίσχυση του καλύτερου μοντέλου με επιπλέον συναφή εξωτερική πληροφορία.Micro-blogging services constitute a popular means of real time communication and information sharing. Twitter is the most popular of these services with 300 million monthly active user accounts and 500 million tweets posted in a daily basis at the moment. Consequently, Twitter users suffer from an information deluge and a large number of recommendation methods have been proposed to re-rank the tweets in a user's timeline according to her interests. We focus on techniques that build a textual model for every individual user to capture her tastes and then rank the tweets she receives according to their similarity with that model. In the literature, there is no comprehensive evaluation of these user modeling strategies as yet. To cover this gap, in this thesis we systematically examine on a real Twitter dataset, 9 state-of-the-art methods for modeling a user's preferences using exclusively textual information. Our goal is to identify the best performing user model with respect to several criteria: (i) the source of tweet information available for modeling (ii) the user type, as determined by the relation between the tweeting frequency of a user and the frequency of her received tweets, (iii) the characteristics of its functionality, as derived from a novel taxonomy, and (iv) its robustness with respect to its internal configurations, as deduced by assessing a wide range of plausible values for internal parameters. Our results can be used for fine-tuning and interpreting text user models in a recommendation scenario in microblogging services and could serve as a starting point for further enhancing the most effective user model with additional contextual information

    PHONOTACTIC AND ACOUSTIC LANGUAGE RECOGNITION

    Get PDF
    Práce pojednává o fonotaktickém a akustickém přístupu pro automatické rozpoznávání jazyka. První část práce pojednává o fonotaktickém přístupu založeném na výskytu fonémových sekvenci v řeči. Nejdříve je prezentován popis vývoje fonémového rozpoznávače jako techniky pro přepis řeči do sekvence smysluplných symbolů. Hlavní důraz je kladen na dobré natrénování fonémového rozpoznávače a kombinaci výsledků z několika fonémových rozpoznávačů trénovaných na různých jazycích (Paralelní fonémové rozpoznávání následované jazykovými modely (PPRLM)). Práce také pojednává o nové technice anti-modely v PPRLM a studuje použití fonémových grafů místo nejlepšího přepisu. Na závěr práce jsou porovnány dva přístupy modelování výstupu fonémového rozpoznávače -- standardní n-gramové jazykové modely a binární rozhodovací stromy. Hlavní přínos v akustickém přístupu je diskriminativní modelování cílových modelů jazyků a první experimenty s kombinací diskriminativního trénování a na příznacích, kde byl odstraněn vliv kanálu. Práce dále zkoumá různé druhy technik fúzi akustického a fonotaktického přístupu. Všechny experimenty jsou provedeny na standardních datech z NIST evaluaci konané v letech 2003, 2005 a 2007, takže jsou přímo porovnatelné s výsledky ostatních skupin zabývajících se automatickým rozpoznáváním jazyka. S fúzí uvedených technik jsme posunuli state-of-the-art výsledky a dosáhli vynikajících výsledků ve dvou NIST evaluacích.This thesis deals with phonotactic and acoustic techniques for automatic language recognition (LRE). The first part of the thesis deals with the phonotactic language recognition based on co-occurrences of phone sequences in speech. A thorough study of phone recognition as tokenization technique for LRE is done, with focus on the amounts of training data for phone recognizer and on the combination of phone recognizers trained on several language (Parallel Phone Recognition followed by Language Model - PPRLM). The thesis also deals with novel technique of anti-models in PPRLM and investigates into using phone lattices instead of strings. The work on phonotactic approach is concluded by a comparison of classical n-gram modeling techniques and binary decision trees. The acoustic LRE was addressed too, with the main focus on discriminative techniques for training target language acoustic models and on initial (but successful) experiments with removing channel dependencies. We have also investigated into the fusion of phonotactic and acoustic approaches. All experiments were performed on standard data from NIST 2003, 2005 and 2007 evaluations so that the results are directly comparable to other laboratories in the LRE community. With the above mentioned techniques, the fused systems defined the state-of-the-art in the LRE field and reached excellent results in NIST evaluations.

    TOPIC MODELLING METHODOLOGY: ITS USE IN INFORMATION SYSTEMS AND OTHER MANAGERIAL DISCIPLINES

    Get PDF
    Over the last decade, quantitative text mining approaches to content analysis have gained increasing traction within information systems research, and related fields, such as business administration. Recently, topic models, which are supposed to provide their user with an overview of themes being dis-cussed in documents, have gained popularity. However, while convenient tools for the creation of this model class exist, the evaluation of topic models poses significant challenges to their users. In this research, we investigate how questions of model validity and trustworthiness of presented analyses are addressed across disciplines. We accomplish this by providing a structured review of methodological approaches across the Financial Times 50 journal ranking. We identify 59 methodological research papers, 24 implementations of topic models, as well as 33 research papers using topic models in In-formation Systems (IS) research, and 29 papers using such models in other managerial disciplines. Results indicate a need for model implementations usable by a wider audience, as well as the need for more implementations of model validation techniques, and the need for a discussion about the theoretical foundations of topic modelling based research
    corecore