15 research outputs found
Document Clustering with Bursty Information
Nowadays, almost all text corpora, such as blogs, emails and RSS feeds, are a collection of text streams. The traditional vector space model (VSM), or bag-of-words representation, cannot capture the temporal aspect of these text streams. So far, only a few bursty features have been proposed to create text representations with temporal modeling for the text streams. We propose bursty feature representations that perform better than VSM on various text mining tasks, such as document retrieval, topic modeling and text categorization. For text clustering, we propose a novel framework to generate bursty distance measure. We evaluated it on UPGMA, Star and K-Medoids clustering algorithms. The bursty distance measure did not only perform equally well on various text collections, but it was also able to cluster the news articles related to specific events much better than other models
Mining the Characteristics of Jupyter Notebooks in Data Science Projects
Nowadays, numerous industries have exceptional demand for skills in data
science, such as data analysis, data mining, and machine learning. The
computational notebook (e.g., Jupyter Notebook) is a well-known data science
tool adopted in practice. Kaggle and GitHub are two platforms where data
science communities are used for knowledge-sharing, skill-practicing, and
collaboration. While tutorials and guidelines for novice data science are
available on both platforms, there is a low number of Jupyter Notebooks that
received high numbers of votes from the community. The high-voted notebook is
considered well-documented, easy to understand, and applies the best data
science and software engineering practices. In this research, we aim to
understand the characteristics of high-voted Jupyter Notebooks on Kaggle and
the popular Jupyter Notebooks for data science projects on GitHub. We plan to
mine and analyse the Jupyter Notebooks on both platforms. We will perform
exploratory analytics, data visualization, and feature importances to
understand the overall structure of these notebooks and to identify common
patterns and best-practice features separating the low-voted and high-voted
notebooks. Upon the completion of this research, the discovered insights can be
applied as training guidelines for aspiring data scientists and machine
learning practitioners looking to improve their performance from novice ranking
Jupyter Notebook on Kaggle to a deployable project on GitHub
Modeling biological systems using Dynetica—a simulator of dynamic networks
We present Dynetica, a user-friendly simulator of dynamic networks for constructing, visualizing, and analyzing kinetic models of biological systems. In addition to generic reaction networks, Dynetica facilitates construction of models of genetic networks, where many reactions are gene expression and interactions among gene products. Further, it integrates the capability of conducting both deterministic and stochastic simulations
Modeling biological systems using Dynetica—a simulator of dynamic networks
We present Dynetica, a user-friendly simulator of dynamic networks for constructing, visualizing, and analyzing kinetic models of biological systems. In addition to generic reaction networks, Dynetica facilitates construction of models of genetic networks, where many reactions are gene expression and interactions among gene products. Further, it integrates the capability of conducting both deterministic and stochastic simulations
An Evolution of Computer Science Research
Over the past two decades, Computer Science has continued to grow as a research field. The most popular research topics moved from artificial intelligence, Internet, and parallel system to cognitive science, social network, and cloud computing. There are several articles that examine trends and emerging topics in Computer Science research or the impact a paper on the field. In contrast, in this paper, we take a closer look at the entire field in the past two decades by analyzing the data on Computer Science publications in IEEE Xplore, ACM Digital Library, and proposals for grants awarded by the National Science Foundation (NSF). We identified trends, bursty topics and relations between NSF and other datasets. We found that the burst in a topics often led to an increase in the funding in the corresponding area. Moreover, on average, the grant money has been a factor in maintaining high level of interests in that topic. In the last five years, “cloud computing ” and “social network ” topics have the highest positive trends. Interestingly, two-year increase or decline in the number of publications always is reversed in the following year. We also analyzed the Computer Science researchers and their communities. We found that a typical community has 5-6 members and it continuously changes. After two years, only one or two core people in the initial research group remains. Nearly half of the time the authors publish their work in particular research area for only a year. Only a handful of authors publish their work in the same research area for a long period of time.