Search CORE

1,071 research outputs found

Data mining techniques using decision tree model in materialised projection and selection view

Author: Teh Y. W.
Publication venue: Universitat Politècnica de Catalunya. Secció de Matemàtiques i Informàtica
Publication date: 01/01/2004
Field of study

With the availability of very large data storage today, redundant data structures are no longer a big issue. However, an intelligent way of managing materialised projection and selection views that can lead to fast access of data is the central issue dealt with in this paper. A set of implementation steps for the data warehouse administrators or decision makers to improve the response time of queries is also defined. The study concludes that both attributes and tuples, are important factors to be considered to improve the response time of a query. The adoption of data mining techniques in the physical design of data warehouses has been shown to be useful in practice

UPCommons. Portal del coneixement obert de la UPC

Thompson sampling for species discovery

Author: Battiston M.
Favaro Stefano
Teh Y. W.
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2016
Field of study

Institutional Research Information System University of Turin

On a class of smoothed Good–Turing estimators

Author: Favaro Stefano
Nipoti Bernardo
Teh Y. W.
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2015
Field of study

Institutional Research Information System University of Turin

On a class of sigma-stable Poisson-Kingman models and an effective marginalized sampler

Author: Favaro Stefano
Lomeli M.
Teh Y. W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Institutional Research Information System University of Turin

Bayesian inference on population structure: from parametric to nonparametric modeling

Author: De Iorio M.
Favaro Stefano
Teh Y. W.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

Institutional Research Information System University of Turin

Stick-breaking representations of 1/2-stable Poisson-Kingman models

Author: Favaro Stefano
Lomeli M.
Nipoti Bernardo
Teh Y. W.
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2013
Field of study

Institutional Research Information System University of Turin

On Bayesian nonparametric inference for discovery probabilities

Author: Arbel J.
Favaro Stefano
Nipoti N.
Teh Y. W.
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2016
Field of study

Institutional Research Information System University of Turin

Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

Author: Agarwal A.
Autism
Blei D.
Bollen J.
Chang J.
Danial J. T.
Harrington J. W.
Harshavardhan A.
Higashida N.
Himelboim I.
Hutchings C.
Hviid A.
Ishwaran H.
Jacobson J. W.
Jashinsky J.
Jiang L.
Paul M. J.
Paul M. J.
Robinson B.
Russell M. A.
Scanfeld D.
Teh Y. W.
Teh Y. W.
Trembath D.
Verma S.
Warren Z.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 201

arXiv.org e-Print Archive

Deakin Research Online

Crossref

Learning nuanced cross-disciplinary citation metric normalization using the hierarchical Dirichlet process on big scholarly data

Author: Andrei V.
Blog Google Scholar
Chang J.
Garfield E.
Teh Y. W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Citation counts have long been used in academia as a way of measuring, inter alia, the importance of journals, quantifying the significance and the impact of a researcher's body of work, and allocating funding for individuals and departments. For example, the h-index proposed by Hirsch is one of the most popular metrics that utilizes citation analysis to determine an individual's research impact. Among many issues, one of the pitfalls of citation metrics is the unfairness which emerges when comparisons are made between researchers in different fields. The algorithm we described in the present paper learns evidence based, nuanced, and probabilistic representations of academic fields, and uses data collected by crawling Google Scholar to perform field of study based normalization of citation based impact metrics such as the h-index.Postprin

Crossref

University of St. Andrews - Pure

St Andrews Research Repository