163 research outputs found

    Exploraty Multivariate Statistical Methods Applied to Pharmaceutical Industry CRM Data

    Get PDF
    Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de InformaçãoAn analysis of the current CRM systems in the Pharmaceutical Industry, the way the pharmaceutical companies developed them and a comparison between Europe and United States was done in this study. Overall the CRM in the pharmaceutical industry is far-behind, when compared with other business areas, like consumer goods, finance (banking) or insurance companies, being pharmaceutical CRM specifically less developed in Europe when compared to United States. One of the big obstacles for the success of CRM in the pharmaceutical industry is the poor analytics applied to the current CRM programs. Improving Sales and Marketing Effectiveness by apllying, multivariate exploratory statistical methods, specifically Factor Analysis and Clustering into pharmaceutical CRM data from a Portuguese pharmaceutical company was the main goal of this thesis. Their overall usefulness when applied to the business was demonstrated, and specifically in relation to the cluster methods, SOMs outperformed the hierarchical methods by producing a more meaningful business solution

    From insights to innovations : data mining, visualization, and user interfaces

    Get PDF
    This thesis is about data mining (DM) and visualization methods for gaining insight into multidimensional data. Novel, exploratory data analysis tools and adaptive user interfaces are developed by tailoring and combining existing DM and visualization methods in order to advance in different applications. The thesis presents new visual data mining (VDM) methods that are also implemented in software toolboxes and applied to industrial and biomedical signals: First, we propose a method that has been applied to investigating industrial process data. The self-organizing map (SOM) is combined with scatterplots using the traditional color linking or interactive brushing. The original contribution is to apply color linked or brushed scatterplots and the SOM to visually survey local dependencies between a pair of attributes in different parts of the SOM. Clusters can be visualized on a SOM with different colors, and we also present how a color coding can be automatically obtained by using a proximity preserving projection of the SOM model vectors. Second, we present a new method for an (interactive) visualization of cluster structures in a SOM. By using a contraction model, the regular grid of a SOM visualization is smoothly changed toward a presentation that shows better the proximities in the data space. Third, we propose a novel VDM method for investigating the reliability of estimates resulting from a stochastic independent component analysis (ICA) algorithm. The method can be extended also to other problems of similar kind. As a benchmarking task, we rank independent components estimated on a biomedical data set recorded from the brain and gain a reasonable result. We also utilize DM and visualization for mobile-awareness and personalization. We explore how to infer information about the usage context from features that are derived from sensory signals. The signals originate from a mobile phone with on-board sensors for ambient physical conditions. In previous studies, the signals are transformed into descriptive (fuzzy or binary) context features. In this thesis, we present how the features can be transformed into higher-level patterns, contexts, by rather simple statistical methods: we propose and test using minimum-variance cost time series segmentation, ICA, and principal component analysis (PCA) for this purpose. Both time-series segmentation and PCA revealed meaningful contexts from the features in a visual data exploration. We also present a novel type of adaptive soft keyboard where the aim is to obtain an ergonomically better, more comfortable keyboard. The method starts from some conventional keypad layout, but it gradually shifts the keys into new positions according to the user's grasp and typing pattern. Related to the applications, we present two algorithms that can be used in a general context: First, we describe a binary mixing model for independent binary sources. The model resembles the ordinary ICA model, but the summation is replaced by the Boolean operator OR and the multiplication by AND. We propose a new, heuristic method for estimating the binary mixing matrix and analyze its performance experimentally. The method works for signals that are sparse enough. We also discuss differences on the results when using different objective functions in the FastICA estimation algorithm. Second, we propose "global iterative replacement" (GIR), a novel, greedy variant of a merge-split segmentation method. Its performance compares favorably to that of the traditional top-down binary split segmentation algorithm.reviewe

    Product family design based on a design reuse model

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Acoustic data optimisation for seabed mapping with visual and computational data mining

    Get PDF
    Oceans cover 70% of Earth’s surface but little is known about their waters. While the echosounders, often used for exploration of our oceans, have developed at a tremendous rate since the WWII, the methods used to analyse and interpret the data still remain the same. These methods are inefficient, time consuming, and often costly in dealing with the large data that modern echosounders produce. This PhD project will examine the complexity of the de facto seabed mapping technique by exploring and analysing acoustic data with a combination of data mining and visual analytic methods. First we test the redundancy issues in multibeam echosounder (MBES) data by using the component plane visualisation of a Self Organising Map (SOM). A total of 16 visual groups were identified among the 132 statistical data descriptors. The optimised MBES dataset had 35 attributes from 16 visual groups and represented a 73% reduction in data dimensionality. A combined Principal Component Analysis (PCA) + k-means was used to cluster both the datasets. The cluster results were visually compared as well as internally validated using four different internal validation methods. Next we tested two novel approaches in singlebeam echosounder (SBES) data processing and clustering – using visual exploration for outlier detection and direct clustering of time series echo returns. Visual exploration identified further outliers the automatic procedure was not able to find. The SBES data were then clustered directly. The internal validation indices suggested the optimal number of clusters to be three. This is consistent with the assumption that the SBES time series represented the subsurface classes of the seabed. Next the SBES data were joined with the corresponding MBES data based on identification of the closest locations between MBES and SBES. Two algorithms, PCA + k-means and fuzzy c-means were tested and results visualised. From visual comparison, the cluster boundary appeared to have better definitions when compared to the clustered MBES data only. The results seem to indicate that adding SBES did in fact improve the boundary definitions. Next the cluster results from the analysis chapters were validated against ground truth data using a confusion matrix and kappa coefficients. For MBES, the classes derived from optimised data yielded better accuracy compared to that of the original data. For SBES, direct clustering was able to provide a relatively reliable overview of the underlying classes in survey area. The combined MBES + SBES data provided by far the best accuracy for mapping with almost a 10% increase in overall accuracy compared to that of the original MBES data. The results proved to be promising in optimising the acoustic data and improving the quality of seabed mapping. Furthermore, these approaches have the potential of significant time and cost saving in the seabed mapping process. Finally some future directions are recommended for the findings of this research project with the consideration that this could contribute to further development of seabed mapping problems at mapping agencies worldwide

    Parallel Hierarchies: Interactive Visualization of Multidimensional Hierarchical Aggregates

    Get PDF
    Exploring multi-dimensional hierarchical data is a long-standing problem present in a wide range of fields such as bioinformatics, software systems, social sciences and business intelligence. While each hierarchical dimension within these data structures can be explored in isolation, critical information lies in the relationships between dimensions. Existing approaches can either simultaneously visualize multiple non-hierarchical dimensions, or only one or two hierarchical dimensions. Yet, the challenge of visualizing multi-dimensional hierarchical data remains open. To address this problem, we developed a novel data visualization approach -- Parallel Hierarchies -- that we demonstrate on a real-life SAP SE product called SAP Product Lifecycle Costing. The starting point of the research is a thorough customer-driven requirement engineering phase including an iterative design process. To avoid restricting ourselves to a domain-specific solution, we abstract the data and tasks gathered from users, and demonstrate the approach generality by applying Parallel Hierarchies to datasets from bioinformatics and social sciences. Moreover, we report on a qualitative user study conducted in an industrial scenario with 15 experts from 9 different companies. As a result of this co-innovation experience, several SAP customers requested a product feature out of our solution. Moreover, Parallel Hierarchies integration as a standard diagram type into SAP Analytics Cloud platform is in progress. This thesis further introduces different uncertainty representation methods applicable to Parallel Hierarchies and in general to flow diagrams. We also present a visual comparison taxonomy for time-series of hierarchically structured data with one or multiple dimensions. Moreover, we propose several visual solutions for comparing hierarchies employing flow diagrams. Finally, after presenting two application examples of Parallel Hierarchies on industrial datasets, we detail two validation methods to examine the effectiveness of the visualization solution. Particularly, we introduce a novel design validation table to assess the perceptual aspects of eight different visualization solutions including Parallel Hierarchies.:1 Introduction 1.1 Motivation and Problem Statement 1.2 Research Goals 1.3 Outline and Contributions 2 Foundations of Visualization 2.1 Information Visualization 2.1.1 Terms and Definition 2.1.2 What: Data Structures 2.1.3 Why: Visualization Tasks 2.1.4 How: Visualization Techniques 2.1.5 How: Interaction Techniques 2.2 Visual Perception 2.2.1 Visual Variables 2.2.2 Attributes of Preattentive and Attentive Processing 2.2.3 Gestalt Principles 2.3 Flow Diagrams 2.3.1 Classifications of Flow Diagrams 2.3.2 Main Visual Features 2.4 Summary 3 Related Work 3.1 Cross-tabulating Hierarchical Categories 3.1.1 Visualizing Categorical Aggregates of Item Sets 3.1.2 Hierarchical Visualization of Categorical Aggregates 3.1.3 Visualizing Item Sets and Their Hierarchical Properties 3.1.4 Hierarchical Visualization of Categorical Set Aggregates 3.2 Uncertainty Visualization 3.2.1 Uncertainty Taxonomies 3.2.2 Uncertainty in Flow Diagrams 3.3 Time-Series Data Visualization 3.3.1 Time & Data 3.3.2 User Tasks 3.3.3 Visual Representation 3.4 Summary ii Contents 4 Requirement Engineering Phase 4.1 Introduction 4.2 Environment 4.2.1 The Product 4.2.2 The Customers and Development Methodology 4.2.3 Lessons Learned 4.3 Visualization Requirements for Product Costing 4.3.1 Current Visualization Practice 4.3.2 Visualization Tasks 4.3.3 Data Structure and Size 4.3.4 Early Visualization Prototypes 4.3.5 Challenges and Lessons Learned 4.4 Data and Task Abstraction 4.4.1 Data Abstraction 4.4.2 Task Abstraction 4.5 Summary and Outlook 5 Parallel Hierarchies 5.1 Introduction 5.2 The Parallel Hierarchies Technique 5.2.1 The Individual Axis: Showing Hierarchical Categories 5.2.2 Two Interlinked Axes: Showing Pairwise Frequencies 5.2.3 Multiple Linked Axes: Propagating Frequencies 5.2.4 Fine-tuning Parallel Hierarchies through Reordering 5.3 Design Choices 5.4 Applying Parallel Hierarchies 5.4.1 US Census Data 5.4.2 Yeast Gene Ontology Annotations 5.5 Evaluation 5.5.1 Setup of the Evaluation 5.5.2 Procedure of the Evaluation 5.5.3 Results from the Evaluation 5.5.4 Validity of the Evaluation 5.6 Summary and Outlook 6 Visualizing Uncertainty in Flow Diagrams 6.1 Introduction 6.2 Uncertainty in Product Costing 6.2.1 Background 6.2.2 Main Causes of Bad Quality in Costing Data 6.3 Visualization Concepts 6.4 Uncertainty Visualization using Ribbons 6.4.1 Selected Visualization Techniques 6.4.2 Study Design and Procedure 6.4.3 Results 6.4.4 Discussion 6.5 Revised Visualization Approach using Ribbons 6.5.1 Application to Sankey Diagram 6.5.2 Application to Parallel Sets 6.5.3 Application to Parallel Hierarchies 6.6 Uncertainty Visualization using Nodes 6.6.1 Visual Design of Nodes 6.6.2 Expert Evaluation 6.7 Summary and Outlook 7 Visual Comparison Task 7.1 Introduction 7.2 Comparing Two One-dimensional Time Steps 7.2.1 Problem Statement 7.2.2 Visualization Design 7.3 Comparing Two N-dimensional Time Steps 7.4 Comparing Several One-dimensional Time Steps 7.5 Summary and Outlook 8 Parallel Hierarchies in Practice 8.1 Application to Plausibility Check Task 8.1.1 Plausibility Check Process 8.1.2 Visual Exploration of Machine Learning Results 8.2 Integration into SAP Analytics Cloud 8.2.1 SAP Analytics Cloud 8.2.2 Ocean to Table Project 8.3 Summary and Outlook 9 Validation 9.1 Introduction 9.2 Nested Model Validation Approach 9.3 Perceptual Validation of Visualization Techniques 9.3.1 Design Validation Table 9.3.2 Discussion 9.4 Summary and Outlook 10 Conclusion and Outlook 10.1 Summary of Findings 10.2 Discussion 10.3 Outlook A Questionnaires of the Evaluation B Survey of the Quality of Product Costing Data C Questionnaire of Current Practice Bibliograph

    Retail Shelf Analytics Through Image Processing and Deep Learning

    Get PDF
    The present thesis promotes an innovative approach based on modern deep learning and image processing techniques for retail shelf analytics within an actual business context. To achieve this goal, the research focused on recent developments in computer vision while maintaining a business-oriented approach. The project involved the full-stack software development of a product to analyze structured and unstructured data and provide business intelligence services for retail systems

    User involvement in service design: A case study on designing a new service concept for cultural institutions

    Get PDF
    Recently, smartphones have been changing our daily lives by constant applications of new technologies and enabling new services. Cultural institutions are those institutions that usually embrace new approaches to help realize their missions. A multi-disciplinary team formed by students working toward a Master’s degree from Aalto University has been developing a new mobile service, as a start-up project, for cultural institutions by applying a relatively new technology. The challenges of developing this new service are to define the customer values for potential users, and creating a service system that involves different stakeholders. The aim of this thesis is to create a service concept that would allow the project team to describe the elements of the system and its usages to the client, particularly from the point of view of service design and cultural institutions. As a designer the author takes the responsibility to discover and present the future customer needs, and using different user involvement techniques to gather and analyze data from users, as well as integrating them into the service concept development process. This thesis presents an overview of user involvement in theories of human-centered design. The related user involvement techniques are analyzed from different theoretical aspects. Then it describes the application of a number of specific techniques through different stages of service concept development process. In the end a series of tools are presented to demonstrate the outcome service concept, which describe the service system, customer journey process, and implementing guideline for stakeholders

    Localizing the media, locating ourselves: a critical comparative analysis of socio-spatial sorting in locative media platforms (Google AND Flickr 2009-2011)

    Get PDF
    In this thesis I explore media geocoding (i.e., geotagging or georeferencing), the process of inscribing the media with geographic information. A process that enables distinct forms of producing, storing, and distributing information based on location. Historically, geographic information technologies have served a biopolitical function producing knowledge of populations. In their current guise as locative media platforms, these systems build rich databases of places facilitated by user-generated geocoded media. These geoindexes render places, and users of these services, this thesis argues, subject to novel forms of computational modelling and economic capture. Thus, the possibility of tying information, people and objects to location sets the conditions to the emergence of new communicative practices as well as new forms of governmentality (management of populations). This project is an attempt to develop an understanding of the socio-economic forces and media regimes structuring contemporary forms of location-aware communication, by carrying out a comparative analysis of two of the main current location-enabled platforms: Google and Flickr. Drawing from the medium-specific approach to media analysis characteristic of the subfield of Software Studies, together with the methodological apparatus of Cultural Analytics (data mining and visualization methods), the thesis focuses on examining how social space is coded and computed in these systems. In particular, it looks at the databases’ underlying ontologies supporting the platforms' geocoding capabilities and their respective algorithmic logics. In the final analysis the thesis argues that the way social space is translated in the form of POIs (Points of Interest) and business-biased categorizations, as well as the geodemographical ordering underpinning the way it is computed, are pivotal if we were to understand what kind of socio-spatial relations are actualized in these systems, and what modalities of governing urban mobility are enabled

    Scientific Advances in STEM

    Get PDF
    Following a previous topic (Scientific advances in STEM: from professors to students; https://www.mdpi.com/topics/advances_stem), this new topic aims to highlight the importance of establishing collaborations among research groups from different disciplines, combining the scientific knowledge from basic to applied research as well as taking advantage of different research facilities. Fundamental science helps us to understand phenomenological basics, while applied science focuses on products and technology developments, highlighting the need to perform a transference of knowledge to society and the industrial sector
    corecore