1,071 research outputs found
Data mining techniques using decision tree model in materialised projection and selection view
With the availability of very large data storage today, redundant data
structures are no longer a big issue. However, an intelligent way of managing
materialised projection and selection views that can lead to fast access of
data is the central issue dealt with in this paper. A set of implementation
steps for the data warehouse administrators or decision makers to improve
the response time of queries is also defined. The study concludes that both
attributes and tuples, are important factors to be considered to improve the
response time of a query. The adoption of data mining techniques in the
physical design of data warehouses has been shown to be useful in practice
Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis
Notwithstanding recent work which has demonstrated the potential of using
Twitter messages for content-specific data mining and analysis, the depth of
such analysis is inherently limited by the scarcity of data imposed by the 140
character tweet limit. In this paper we describe a novel approach for targeted
knowledge exploration which uses tweet content analysis as a preliminary step.
This step is used to bootstrap more sophisticated data collection from directly
related but much richer content sources. In particular we demonstrate that
valuable information can be collected by following URLs included in tweets. We
automatically extract content from the corresponding web pages and treating
each web page as a document linked to the original tweet show how a temporal
topic model based on a hierarchical Dirichlet process can be used to track the
evolution of a complex topic structure of a Twitter community. Using
autism-related tweets we demonstrate that our method is capable of capturing a
much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining, 201
Learning nuanced cross-disciplinary citation metric normalization using the hierarchical Dirichlet process on big scholarly data
Citation counts have long been used in academia as a way of measuring, inter alia, the importance of journals, quantifying the significance and the impact of a researcher's body of work, and allocating funding for individuals and departments. For example, the h-index proposed by Hirsch is one of the most popular metrics that utilizes citation analysis to determine an individual's research impact. Among many issues, one of the pitfalls of citation metrics is the unfairness which emerges when comparisons are made between researchers in different fields. The algorithm we described in the present paper learns evidence based, nuanced, and probabilistic representations of academic fields, and uses data collected by crawling Google Scholar to perform field of study based normalization of citation based impact metrics such as the h-index.Postprin
- …