15 research outputs found

    Document Clustering with Bursty Information

    Get PDF
    Nowadays, almost all text corpora, such as blogs, emails and RSS feeds, are a collection of text streams. The traditional vector space model (VSM), or bag-of-words representation, cannot capture the temporal aspect of these text streams. So far, only a few bursty features have been proposed to create text representations with temporal modeling for the text streams. We propose bursty feature representations that perform better than VSM on various text mining tasks, such as document retrieval, topic modeling and text categorization. For text clustering, we propose a novel framework to generate bursty distance measure. We evaluated it on UPGMA, Star and K-Medoids clustering algorithms. The bursty distance measure did not only perform equally well on various text collections, but it was also able to cluster the news articles related to specific events much better than other models

    Mining the Characteristics of Jupyter Notebooks in Data Science Projects

    Full text link
    Nowadays, numerous industries have exceptional demand for skills in data science, such as data analysis, data mining, and machine learning. The computational notebook (e.g., Jupyter Notebook) is a well-known data science tool adopted in practice. Kaggle and GitHub are two platforms where data science communities are used for knowledge-sharing, skill-practicing, and collaboration. While tutorials and guidelines for novice data science are available on both platforms, there is a low number of Jupyter Notebooks that received high numbers of votes from the community. The high-voted notebook is considered well-documented, easy to understand, and applies the best data science and software engineering practices. In this research, we aim to understand the characteristics of high-voted Jupyter Notebooks on Kaggle and the popular Jupyter Notebooks for data science projects on GitHub. We plan to mine and analyse the Jupyter Notebooks on both platforms. We will perform exploratory analytics, data visualization, and feature importances to understand the overall structure of these notebooks and to identify common patterns and best-practice features separating the low-voted and high-voted notebooks. Upon the completion of this research, the discovered insights can be applied as training guidelines for aspiring data scientists and machine learning practitioners looking to improve their performance from novice ranking Jupyter Notebook on Kaggle to a deployable project on GitHub

    Modeling biological systems using Dynetica—a simulator of dynamic networks

    No full text
    We present Dynetica, a user-friendly simulator of dynamic networks for constructing, visualizing, and analyzing kinetic models of biological systems. In addition to generic reaction networks, Dynetica facilitates construction of models of genetic networks, where many reactions are gene expression and interactions among gene products. Further, it integrates the capability of conducting both deterministic and stochastic simulations

    Modeling biological systems using Dynetica—a simulator of dynamic networks

    No full text
    We present Dynetica, a user-friendly simulator of dynamic networks for constructing, visualizing, and analyzing kinetic models of biological systems. In addition to generic reaction networks, Dynetica facilitates construction of models of genetic networks, where many reactions are gene expression and interactions among gene products. Further, it integrates the capability of conducting both deterministic and stochastic simulations

    An Evolution of Computer Science Research

    No full text
    Over the past two decades, Computer Science has continued to grow as a research field. The most popular research topics moved from artificial intelligence, Internet, and parallel system to cognitive science, social network, and cloud computing. There are several articles that examine trends and emerging topics in Computer Science research or the impact a paper on the field. In contrast, in this paper, we take a closer look at the entire field in the past two decades by analyzing the data on Computer Science publications in IEEE Xplore, ACM Digital Library, and proposals for grants awarded by the National Science Foundation (NSF). We identified trends, bursty topics and relations between NSF and other datasets. We found that the burst in a topics often led to an increase in the funding in the corresponding area. Moreover, on average, the grant money has been a factor in maintaining high level of interests in that topic. In the last five years, “cloud computing ” and “social network ” topics have the highest positive trends. Interestingly, two-year increase or decline in the number of publications always is reversed in the following year. We also analyzed the Computer Science researchers and their communities. We found that a typical community has 5-6 members and it continuously changes. After two years, only one or two core people in the initial research group remains. Nearly half of the time the authors publish their work in particular research area for only a year. Only a handful of authors publish their work in the same research area for a long period of time.
    corecore