224 research outputs found

    Towards the TopMost: A Topic Modeling System Toolkit

    Full text link
    Topic models have been proposed for decades with various applications and recently refreshed by the neural variational inference. However, these topic models adopt totally distinct dataset, implementation, and evaluation settings, which hinders their quick utilization and fair comparisons. This greatly hinders the research progress of topic models. To address these issues, in this paper we propose a Topic Modeling System Toolkit (TopMost). Compared to existing toolkits, TopMost stands out by covering a wider range of topic modeling scenarios including complete lifecycles with dataset pre-processing, model training, testing, and evaluations. The highly cohesive and decoupled modular design of TopMost enables quick utilization, fair comparisons, and flexible extensions of different topic models. This can facilitate the research and applications of topic models. Our code, tutorials, and documentation are available at https://github.com/bobxwu/topmost

    A Study on the Use of Ontologies to Represent Collective Knowledge

    Get PDF
    The development of ontologies has become an area of considerable research interest over the past number of years. Domain ontologies are often developed to represent a shared understanding that in turn indicates cooperative effort by a user community. However, the structure and form that an ontology takes is predicated both on the approach of the developer and the cooperation of the user community. A shift has taken place in recent years from the use of highly specialised and expressive ontologies to simpler knowledge models, progressively developed by community contribution. It is within this context that this thesis investigates the use of ontologies as a means to representing collective knowledge. It investigates the impact of the community on the approach to and outcome of knowledge representation and compares the use of simple terminological ontologies with highly structured expressive ontologies in community-based narrative environments

    Automatically Characterizing Product and Process Incentives in Collective Intelligence

    Get PDF
    Social media facilitate interaction and information dissemination among an unprecedented number of participants. Why do users contribute, and why do they contribute to a specific venue? Does the information they receive cover all relevant points of view, or is it biased? The substantial and increasing importance of online communication makes these questions more pressing, but also puts answers within reach of automated methods. I investigate scalable algorithms for understanding two classes of incentives which arise in collective intelligence processes. Product incentives exist when contributors have a stake in the information delivered to other users. I investigate product-relevant user behavior changes, algorithms for characterizing the topics and points of view presented in peer-produced content, and the results of a field experiment with a prediction market framework having associated product incentives. Process incentives exist when users find contributing to be intrinsically rewarding. Algorithms which are aware of process incentives predict the effect of feedback on where users will make contributions, and can learn about the structure of a conversation by observing when users choose to participate in it. Learning from large-scale social interactions allows us to monitor the quality of information and the health of venues, but also provides fresh insights into human behavior

    Social media mining as an opportunistic citizen science model in ecological monitoring: a case study using invasive alien species in forest ecosystems.

    Get PDF
    Dramatische ökologische, ökonomische und soziale VerĂ€nderungen bedrohen die StabilitĂ€t von Ökosystemen weltweit und stellen zusammen mit neuen AnsprĂŒchen an die vielfĂ€ltigen Ökosystemdienstleistungen von WĂ€ldern neue Herausforderungen fĂŒr das forstliche Management und Monitoring dar. Neue Risiken und Gefahren, wie zum Beispiel eingebĂŒrgerte invasive Arten (Neobiota), werfen grundsĂ€tzliche Fragen hinsichtlich etablierter forstlicher Managementstrategien auf, da diese Strategien auf der Annahme stabiler Ökosysteme basieren. AnpassungsfĂ€hige Management- und Monitoringstrategien sind deshalb notwendig, um diese neuen Bedrohungen und VerĂ€nderungen frĂŒhzeitig zu erkennen. Dies erfordert jedoch ein großflĂ€chiges und umfassendes Monitoring, was unter Maßgabe begrenzter Ressourcen nur bedingt möglich ist. Angesichts dieser Herausforderungen haben Forstpraktiker und Wissenschaftler begonnen auch auf die UnterstĂŒtzung von Freiwilligen in Form sogenannter „Citizen Science“-Projekte (BĂŒrgerwissenschaft) zurĂŒckzugreifen, um zusĂ€tzliche Informationen zu sammeln und flexibel auf spezifische Fragestellungen reagieren zu können. Mit der allgemeinen VerfĂŒgbarkeit des Internets und mobiler GerĂ€te ist in Form sogenannter sozialer Medien zudem eine neue digitale Informationsquelle entstanden. Mittels dieser Technologien ĂŒbernehmen Nutzer prinzipiell die Funktion von Umweltsensoren und erzeugen indirekt ein ungeheures Volumen allgemein zugĂ€nglicher Umgebungs- und Umweltinformationen. Die automatische Analyse von sozialen Medien wie Facebook, Twitter, Wikis oder Blogs, leistet inzwischen wichtige BeitrĂ€ge zu Bereichen wie dem Monitoring von Infektionskrankheiten, Katastrophenschutz oder der Erkennung von Erdbeben. Anwendungen mit einem ökologischen Bezug existieren jedoch nur vereinzelt, und eine methodische Bearbeitung dieses Anwendungsbereichs fand bisher nicht statt. Unter Anwendung des Mikroblogging-Dienstes Twitter und des Beispiels eingebĂŒrgerter invasiver Arten in Waldökosystemen, verfolgt die vorliegende Arbeit eine solche methodische Bearbeitung und Bewertung sozialer Medien im Monitoring von WĂ€ldern. Die automatische Analyse sozialer Medien wird dabei als opportunistisches „Citizen Science“-Modell betrachtet und die verfĂŒgbaren Daten, AktivitĂ€ten und Teilnehmer einer vergleichenden Analyse mit existierenden bewusst geplanten „Citizen Science“-Projekten im Umweltmonitoring unterzogen. Die vorliegenden Ergebnisse zeigen, dass Twitter eine wertvolle Informationsquelle ĂŒber invasive Arten darstellt und dass soziale Medien im Allgemeinen traditionelle Umweltinformationen ergĂ€nzen könnten. Twitter ist eine reichhaltige Quelle von primĂ€ren BiodiversitĂ€tsbeobachtungen, einschließlich solcher zu eingebĂŒrgerten invasiven Arten. ZusĂ€tzlich kann gezeigt werden, dass die analysierten Twitterinhalte fĂŒr die untersuchten Arten markante Themen- und Informationsprofile aufweisen, die wichtige BeitrĂ€ge im Management invasiver Arten leisten können. Allgemein zeigt die Studie, dass einerseits das Potential von „Citizen Science“ im forstlichen Monitoring derzeit nicht ausgeschöpft wird, aber andererseits mit denjenigen Nutzern, die BiodiversitĂ€tsbeobachtungen auf Twitter teilen, eine große Zahl von Individuen mit einem Interesse an Umweltbeobachtungen zur VerfĂŒgung steht, die auf der Basis ihres dokumentierten Interesses unter UmstĂ€nden fĂŒr bewusst geplante „Citizen Science“-Projekte mobilisiert werden könnten. Zusammenfassend dokumentiert diese Studie, dass soziale Medien eine wertvolle Quelle fĂŒr Umweltinformationen allgemein sind und eine verstĂ€rkte Untersuchung verdienen, letztlich mit dem Ziel, operative Systeme zur UnterstĂŒtzung von Risikobewertungen in Echtzeit zu entwickeln.Major environmental, social and economic changes threatening the resilience of ecosystems world-wide and new demands on a broad range of forest ecosystem services present new challenges for forest management and monitoring. New risks and threats such as invasive alien species imply fundamental challenges for traditional forest management strategies, which have been based on assumptions of permanent ecosystem stability. Adaptive management and monitoring is called for to detect new threats and changes as early as possible, but this requires large-scale monitoring and monitoring resources remain a limiting factor. Accordingly, forest practitioners and scientists have begun to turn to public support in the form of “citizen science” to react flexibly to specific challenges and gather critical information. The emergence of ubiquitous mobile and internet technologies provides a new digital source of information in the form of so-called social media that essentially turns users of these media into environmental sensors and provides an immense volume of publicly accessible, ambient environmental information. Mining social media content, such as Facebook, Twitter, Wikis or Blogs, has been shown to make critical contributions to epidemic disease monitoring, emergency management or earthquake detection. Applications in the ecological domain remain anecdotal and a methodical exploration for this domain is lacking. Using the example of the micro-blogging service Twitter and invasive alien species in forest ecosystems, this study provides a methodical exploration and assessment of social media for forest monitoring. Social media mining is approached as an opportunistic citizen science model and the data, activities and contributors are analyzed in comparison to deliberate ecological citizen science monitoring. The results show that Twitter is a valuable source of information on invasive alien species and that social media in general could be a supplement to traditional monitoring data. Twitter proves to be a rich source of primary biodiversity observations including those of the selected invasive species. In addition, it is shown that Twitter content provides distinctive thematic profiles that relate closely to key characteristics of the explored invasive alien species and provide valuable insights for invasive species management. Furthermore, the study shows that while there are underutilized opportunities for citizen science in forest monitoring, the contributors of biodiversity observations on Twitter show a more than casual interest in this subject and represent a large pool of potential contributors to deliberate citizen science monitoring efforts. In summary, social online media are a valuable source for ecological monitoring information in general and deserve intensified exploration to arrive at operational systems supporting real-time risk assessments

    When in doubt ask the crowd : leveraging collective intelligence for improving event detection and machine learning

    Get PDF
    [no abstract

    A Data Mining Toolbox for Collaborative Writing Processes

    Get PDF
    Collaborative writing (CW) is an essential skill in academia and industry. Providing support during the process of CW can be useful not only for achieving better quality documents, but also for improving the CW skills of the writers. In order to properly support collaborative writing, it is essential to understand how ideas and concepts are developed during the writing process, which consists of a series of steps of writing activities. These steps can be considered as sequence patterns comprising both time events and the semantics of the changes made during those steps. Two techniques can be combined to examine those patterns: process mining, which focuses on extracting process-related knowledge from event logs recorded by an information system; and semantic analysis, which focuses on extracting knowledge about what the student wrote or edited. This thesis contributes (i) techniques to automatically extract process models of collaborative writing processes and (ii) visualisations to describe aspects of collaborative writing. These two techniques form a data mining toolbox for collaborative writing by using process mining, probabilistic graphical models, and text mining. First, I created a framework, WriteProc, for investigating collaborative writing processes, integrated with the existing cloud computing writing tools in Google Docs. Secondly, I created new heuristic to extract the semantic nature of text edits that occur in the document revisions and automatically identify the corresponding writing activities. Thirdly, based on sequences of writing activities, I propose methods to discover the writing process models and transitional state diagrams using a process mining algorithm, Heuristics Miner, and Hidden Markov Models, respectively. Finally, I designed three types of visualisations and made contributions to their underlying techniques for analysing writing processes. All components of the toolbox are validated against annotated writing activities of real documents and a synthetic dataset. I also illustrate how the automatically discovered process models and visualisations are used in the process analysis with real documents written by groups of graduate students. I discuss how the analyses can be used to gain further insight into how students work and create their collaborative documents

    Blogs as Infrastructure for Scholarly Communication.

    Full text link
    This project systematically analyzes digital humanities blogs as an infrastructure for scholarly communication. This exploratory research maps the discourses of a scholarly community to understand the infrastructural dynamics of blogs and the Open Web. The text contents of 106,804 individual blog posts from a corpus of 396 blogs were analyzed using a mix of computational and qualitative methods. Analysis uses an experimental methodology (trace ethnography) combined with unsupervised machine learning (topic modeling), to perform an interpretive analysis at scale. Methodological findings show topic modeling can be integrated with qualitative and interpretive analysis. Special attention must be paid to data fitness, or the shape and re-shaping practices involved with preparing data for machine learning algorithms. Quantitative analysis of computationally generated topics indicates that while the community writes about diverse subject matter, individual scholars focus their attention on only a couple of topics. Four categories of informal scholarly communication emerged from the qualitative analysis: quasi-academic, para-academic, meta-academic, and extra-academic. The quasi and para-academic categories represent discourse with scholarly value within the digital humanities community, but do not necessarily have an obvious path into formal publication and preservation. A conceptual model, the (in)visible college, is introduced for situating scholarly communication on blogs and the Open Web. An (in)visible college is a kind of scholarly communication that is informal, yet visible at scale. This combination of factors opens up a new space for the study of scholarly communities and communication. While (in)invisible colleges are programmatically observable, care must be taken with any effort to count and measure knowledge work in these spaces. This is the first systematic, data driven analysis of the digital humanities and lays the groundwork for subsequent social studies of digital humanities.PhDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111592/1/mcburton_1.pd

    Exploring, exploiting and evolving diversity of aquatic ecosystem models: A community perspective

    Get PDF
    Here, we present a community perspective on how to explore, exploit and evolve the diversity in aquatic ecosystem models. These models play an important role in understanding the functioning of aquatic ecosystems, filling in observation gaps and developing effective strategies for water quality management. In this spirit, numerous models have been developed since the 1970s. We set off to explore model diversity by making an inventory among 42 aquatic ecosystem modellers, by categorizing the resulting set of models and by analysing them for diversity. We then focus on how to exploit model diversity by comparing and combining different aspects of existing models. Finally, we discuss how model diversity came about in the past and could evolve in the future. Throughout our study, we use analogies from biodiversity research to analyse and interpret model diversity. We recommend to make models publicly available through open-source policies, to standardize documentation and technical implementation of models, and to compare models through ensemble modelling and interdisciplinary approaches. We end with our perspective on how the field of aquatic ecosystem modelling might develop in the next 5–10 years. To strive for clarity and to improve readability for non-modellers, we include a glossary

    Information Systems Scholarship: An Examination of the Past, Present, and Future of the Information Systems Academic Discipline

    Get PDF
    This dissertation investigates the topic of scholarship in the Information Systems (IS) discipline through a series of three papers. The papers, presented in Chapters 2, 3, and 4, each delve into a specific chronological period of IS scholarship which are delineated into the past, present, and future. Chapter 2 elucidates the IS discipline’s ‘past’ by categorizing the entire corpus of extant research in the Association of Information Systems Senior Scholars’ Basket of eight journals. Clusters derived from these mainstream journal publications represent a thematic identity of the IS discipline. After analyzing the corpus altogether, further analysis segments the corpus into shorter, 5-year periods to illuminate the historical evolution of the themes. Lastly, interpretations of the trends and a recommendation to curate an IS Body of Knowledge are discussed. Chapter 3 surveys business school deans and IS academics eliciting their ‘present’ social representations of the IS discipline. It then seeks the two groups’ feedback regarding their level of agreement with concerns attributed to the IS discipline as summarized in Ives and Adams (2012). Group responses are evaluated independently and are juxtaposed for between-group analysis. Then, additional concerns are gathered to ensure the full range of issues are represented. Network topic maps illustrate the findings, and interpretations are discussed. Group differences suggest that IS academics are more critical of the IS discipline than business school deans. In Chapter 4, an alternative research approach is offered for conducting ‘future’ scholarship efforts in the IS discipline. A framework that organizes discourse on the emergent crowdsourced research genre is constructed. Prior to building the framework, a crowdsourcing process model is developed to conceptualize how problems and outcomes interact with the crowdsourcing process. The internal process components include task, governance, people, and technology. Then, the crowdsourcing process model is applied to eight general research process phases beginning with the idea generation phase and concluding with the apply results phase. Implementation of the crowdsourced research framework expounds phase-specific implications as well as other ubiquitous implications of the research process. The findings are discussed, and future directions for the IS crowd are suggested

    What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s).

    No full text
    This article proposes a review of the literature analyzing Wikipedia as a collective system for producing knowledge. JEL Classification: L39, L86, H41, D7
    • 

    corecore