4 research outputs found

    Trend Topic Analysis using Latent Dirichlet Allocation (LDA) (Study Case: Denpasar People’s Complaints Online Website)

    Get PDF
    According to the publication of the Central Bureau of Statistics 2017, the population of Denpasar people has increased to 914,300 people. The Increasing number of the population raises various problems that must be faced by the Denpasar’s Government. The variety of problems is in line with the increase in complaints data posted through Denpasar people’s complaints online website, which made it difficult to know the main topics of the problems. The purpose of this research is to find the main topics of complaints Denpasar residents quickly and efficiently. The method used to achieve the objective of the research is Latent Dirichlet Allocation topic models with Gibbs sampling parameter estimation. The number of topics obtained through the highest log-likelihood value -42,528.84, the value is in the number of topics 19. The trending topic was based on the highest topic probability, topic 4, with a topic probability value 0.055. Based on these results, the trend of a topic is on topic 4 which can be interpreted that many residents of Denpasar complained about damaged roads and requested to fix the roads

    Topic modeling in marketing: recent advances and research opportunities

    Get PDF
    Using a probabilistic approach for exploring latent patterns in high-dimensional co-occurrence data, topic models offer researchers a flexible and open framework for soft-clustering large data sets. In recent years, there has been a growing interest among marketing scholars and practitioners to adopt topic models in various marketing application domains. However, to this date, there is no comprehensive overview of this rapidly evolving field. By analyzing a set of 61 published papers along with conceptual contributions, we systematically review this highly heterogeneous area of research. In doing so, we characterize extant contributions employing topic models in marketing along the dimensions data structures and retrieval of input data, implementation and extensions of basic topic models, and model performance evaluation. Our findings confirm that there is considerable progress done in various marketing sub-areas. However, there is still scope for promising future research, in particular with respect to integrating multiple, dynamic data sources, including time-varying covariates and the combination of exploratory topic models with powerful predictive marketing models

    Flexible Regularized Estimation in High-Dimensional Mixed Membership Models

    Full text link
    Mixed membership models are an extension of finite mixture models, where each observation can partially belong to more than one mixture component. A probabilistic framework for mixed membership models of high-dimensional continuous data is proposed with a focus on scalability and interpretability. The novel probabilistic representation of mixed membership is based on convex combinations of dependent multivariate Gaussian random vectors. In this setting, scalability is ensured through approximations of a tensor covariance structure through multivariate eigen-approximations with adaptive regularization imposed through shrinkage priors. Conditional weak posterior consistency is established on an unconstrained model, allowing for a simple posterior sampling scheme while keeping many of the desired theoretical properties of our model. The model is motivated by two biomedical case studies: a case study on functional brain imaging of children with autism spectrum disorder (ASD) and a case study on gene expression data from breast cancer tissue. These applications highlight how the typical assumption made in cluster analysis, that each observation comes from one homogeneous subgroup, may often be restrictive in several applications, leading to unnatural interpretations of data features.Comment: arXiv admin note: text overlap with arXiv:2206.1208

    Development of statistical methodologies applied to anthropometric data oriented towards the ergonomic design of products

    Get PDF
    Ergonomics is the scientific discipline that studies the interactions between human beings and the elements of a system and presents multiple applications in areas such as clothing and footwear design or both working and household environments. In each of these sectors, knowing the anthropometric dimensions of the current target population is fundamental to ensure that products suit as well as possible most of the users who make up the population. Anthropometry refers to the study of the measurements and dimensions of the human body and it is considered a very important branch of Ergonomics because its considerable influence on the ergonomic design of products. Human body measurements have usually been taken using rules, calipers or measuring tapes. These procedures are simple and cheap to carry out. However, they have one major drawback: the body measurements obtained and consequently, the human shape information, is imprecise and inaccurate. Furthermore, they always require interaction with real subjects, which increases the measure time and data collecting. The development of new three-dimensional (3D) scanning techniques has represented a huge step forward in the way of obtaining anthropometric data. This technology allows 3D images of human shape to be captured and at the same time, generates highly detailed and reproducible anthropometric measurements. The great potential of these new scanning systems for the digitalization of human body has contributed to promoting new anthropometric studies in several countries, such as United Kingdom, Australia, Germany, France or USA, in order to acquire accurate anthropometric data of their current population. In this context, in 2006 the Spanish Ministry of Health commissioned a 3D anthropometric survey of the Spanish female population, following the agreement signed by the Ministry itself with the Spanish associations and companies of manufacturing, distribution, fashion design and knitted sectors. A sample of 10415 Spanish females from 12 to 70 years old, randomly selected from the official Postcode Address File, was measured. The two main objectives of this study, which was conducted by the Biomechanics Institute of Valencia, were the following: on the one hand, to characterize the shape and body dimensions of the current Spanish women population to develop a standard sizing system that could be used by all clothing designers. On the other hand, to promote a healthy image of beauty through the representation of suited mannequins. In order to tackle both objectives, Statistics plays an essential role. Thus, the statistical methodologies presented in this PhD work have been applied to the database obtained from the Spanish anthropometric study. Clothing sizing systems classify the population into homogeneous groups (size groups) based on some key anthropometric dimensions. All members of the same group are similar in body shape and size, so they can wear the same garment. In addition, members of different groups are very different with respect to their body dimensions. An efficient and optimal sizing system aims at accommodating as large a percentage of the population as possible, in the optimum number of size groups that better describes the shape variability of the population. Besides, the garment fit for the accommodated individuals must be as good as possible. A very valuable reference related to sizing systems is the book Sizing in clothing: Developing effective sizing systems for ready-to-wear clothing, by Susan Ashdown. Each clothing size is defined from a person whose body measurements are located toward the central value for each of the dimensions considered in the analysis. The central person, which is considered as the size representative (the size prototype), becomes the basic pattern from which the clothing line in the same size is designed. Clustering is the statistical tool that divides a set of individuals in groups (clusters), in such a way that subjects of the same cluster are more similar to each other than to those in other groups. In addition, clustering defines each group by means of a representative individual. Therefore, it arises in a natural way the idea of using clustering to try to define an efficient sizing system. Specifically, four of the methodologies presented in this PhD thesis aimed at segmenting the population into optimal sizes, use different clustering methods. The first one, called trimowa, has been published in Expert Systems with Applications. It is based on using an especially defined distance to examine differences between women regarding their body measurements. The second and third ones (called biclustAnthropom and TDDclust, respectively) will soon be submitted in the same paper. BiclustAnthropom adapts to the field of Anthropometry a clustering method addressed in the specific case of gene expression data. Moreover, TDDclust uses the concept of statistical depth for grouping according to the most central (deep) observation in each size. As mentioned, current sizing systems are based on using an appropriate set of anthropometric dimensions, so clustering is carried out in the Euclidean space. In the three previous proposals, we have always worked in this way. Instead, in the fourth and last approach, called kmeansProcrustes, a clustering procedure is proposed for grouping taking into account the women shape, which is represented by a set of anatomical markers (landmarks). For this purpose, the statistical shape analysis will be fundamental. This contribution has been submitted for publication. A sizing system is intended to cover the so-called standard population, discarding the individuals with extreme sizes (both large and small). In mathematical language, these individuals can be considered outliers. An outlier is an observation point that is distant from other observations. In our case, a person with extreme anthopometric measurements would be considered as a statistical outlier. Clothing companies usually design garments for the standard sizes so that their market share is optimal. Nevertheless, with their foreign expansion, a lot of brands are spreading their collection and they already have a special sizes section. In last years, Internet shopping has been an alternative for consumers with extreme sizes looking for clothes that follow trends. The custom-made fabrication is other possibility with the advantage of making garments according to the customers' preferences. The four aforementioned methodologies (trimowa, biclustAnthropom, TDDclust and kmeansProcrustes) have been adapted to only accommodate the standard population. Once a particular garment has been designed, the assessing and analysis of fit is performed using one or more fit models. The fit model represents the body dimensions selected by each company to define the proportional relationships needed to achieve the fit the company has determined. The definition of an efficient sizing system relies heavily on the accuracy and representativeness of the fit models regarding the population to which it is addressed. In this PhD work, a statistical approach is proposed to identify representative fit models. It is based on another clustering method originally developed for grouping gene expression data. This method, called hipamAnthropom, has been published in Decision Support Systems. From well-defined fit models and prototypes, representative and accurate mannequins of the population can be made. Unlike clothing design, where representative cases correspond with central individuals, in the design of working and household environments, the variability of human shape is described by extreme individuals, which are those that have the largest or smallest values (or extreme combinations) in the dimensions involved in the study. This is often referred to as the accommodation problem. A very interesting reference in this area is the book entitled Guidelines for Using Anthropometric Data in Product Design, published by The Human Factors and Ergonomics Society. The idea behind this way of proceeding is that if a product fits extreme observations, it will also fit the others (less extreme). To that end, in this PhD thesis we propose two methodological contributions based on the statistical archetypal analysis. An archetype in Statistics is an extreme individual that is obtained as a convex combination of other subjects of the sample. The first of these methodologies has been published in Computers and Industrial Engineering, whereas the second one has been submitted for publication. The outline of this PhD report is as follows: Chapter 1 reviews the state of the art of Ergonomics and Anthropometry and introduces the anthropometric survey of the Spanish female population. Chapter 2 presents the trimowa, biclustAnthropom and hipamAnthropom methodologies. In Chapter 3 the kmeansProcrustes proposal is detailed. The TDDclust methodology is explained in Chapter 4. Chapter 5 presents the two methodologies related to the archetypal analysis. Since all these contributions have been programmed in the statistical software R, Chapter 6 presents the Anthropometry R package, that brings together all the algorithms associated with each approach. In this way, from Chapter 2 to Chapter 6 all the methodologies and results included in this PhD thesis are presented. At last, Chapter 7 provides the most important conclusions
    corecore