13 research outputs found

    You can't always sketch what you want: Understanding Sensemaking in Visual Query Systems

    Full text link
    Visual query systems (VQSs) empower users to interactively search for line charts with desired visual patterns, typically specified using intuitive sketch-based interfaces. Despite decades of past work on VQSs, these efforts have not translated to adoption in practice, possibly because VQSs are largely evaluated in unrealistic lab-based settings. To remedy this gap in adoption, we collaborated with experts from three diverse domains---astronomy, genetics, and material science---via a year-long user-centered design process to develop a VQS that supports their workflow and analytical needs, and evaluate how VQSs can be used in practice. Our study results reveal that ad-hoc sketch-only querying is not as commonly used as prior work suggests, since analysts are often unable to precisely express their patterns of interest. In addition, we characterize three essential sensemaking processes supported by our enhanced VQS. We discover that participants employ all three processes, but in different proportions, depending on the analytical needs in each domain. Our findings suggest that all three sensemaking processes must be integrated in order to make future VQSs useful for a wide range of analytical inquiries.Comment: Accepted for presentation at IEEE VAST 2019, to be held October 20-25 in Vancouver, Canada. Paper will also be published in a special issue of IEEE Transactions on Visualization and Computer Graphics (TVCG) IEEE VIS (InfoVis/VAST/SciVis) 2019 ACM 2012 CCS - Human-centered computing, Visualization, Visualization design and evaluation method

    Decision support visualization approach in textile manufacturing a case study from operational control in textile industry

    Get PDF
    Decision support visualization tools provide insights for solving problems by displaying data in an interactive, graphical format. Such tools can be effective for supporting decision-makers in finding new opportunities and in measuring decision outcomes. In this study, was used a visualization tool capable of handling multivariate time series for studying a problem of operational control in a textile manufacturing plant; the main goal was to identify sources of inefficiency in the daily production data of three machines. A concise rule-based model of the inefficiency measures (i.e. quantitative measures were transformed into categorical variables) was developed and then performed an in-depth visual analysis using a particular technique, the categorical time series plots stacked vertically. With this approach were identified a wide array of production inefficiency patterns, which were difficult to identify using standard quantitative reporting - temporal pattern of best and worst performing machines - and critically, along with most important sources of inefficiency and some interactions between them were revealed. The case study underlying this work was further contextualized within the state of the art, and demonstrates the effectiveness of adequate visual analysis as a decision support tool for operational control in manufacturing.This study was partially conducted at the Psychology Research Centre (UID/PSI/01662/2013), University of Minho, and supported by the Portuguese Foundation for Science and Technology and the Portuguese Ministry of Science, Technology and Higher Education through national funds and co-financed by FEDER through COMPETE2020 under the PT2020 Partnership Agreement (POCI-010145-FEDER-007653). This work was also supported by the following grants: FCT project PTDC/MHC/PCN/1530; FEDER Funds through the "Programa Operacional Factores de Competitividade - COMPETE" program and by National Funds through FCT "Fundacao para a Ciencia e a Tecnologia" under the project: FCOMP-010124-FEDER-PEst-OE/EEI/UI0760/2011, PEst-OE/EEI/UI0760/2014, PEst2015-2020 and UID/CEC/00319/2019

    Evaluating stance-annotated sentences from political blogs regarding the Brexit:a quantitative analysis

    Get PDF
    This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC). Our goal is to identify features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty) in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features, such as punctuation, words and grammatical categories that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine stance similarities in the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. necessity has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters

    Análise colaborativa de grandes conjuntos de séries temporais

    Get PDF
    The recent expansion of metrification on a daily basis has led to the production of massive quantities of data, and in many cases, these collected metrics are only useful for knowledge building when seen as a full sequence of data ordered by time, which constitutes a time series. To find and interpret meaningful behavioral patterns in time series, a multitude of analysis software tools have been developed. Many of the existing solutions use annotations to enable the curation of a knowledge base that is shared between a group of researchers over a network. However, these tools also lack appropriate mechanisms to handle a high number of concurrent requests and to properly store massive data sets and ontologies, as well as suitable representations for annotated data that are visually interpretable by humans and explorable by automated systems. The goal of the work presented in this dissertation is to iterate on existing time series analysis software and build a platform for the collaborative analysis of massive time series data sets, leveraging state-of-the-art technologies for querying, storing and displaying time series and annotations. A theoretical and domain-agnostic model was proposed to enable the implementation of a distributed, extensible, secure and high-performant architecture that handles various annotation proposals in simultaneous and avoids any data loss from overlapping contributions or unsanctioned changes. Analysts can share annotation projects with peers, restricting a set of collaborators to a smaller scope of analysis and to a limited catalog of annotation semantics. Annotations can express meaning not only over a segment of time, but also over a subset of the series that coexist in the same segment. A novel visual encoding for annotations is proposed, where annotations are rendered as arcs traced only over the affected series’ curves in order to reduce visual clutter. Moreover, the implementation of a full-stack prototype with a reactive web interface was described, directly following the proposed architectural and visualization model while applied to the HVAC domain. The performance of the prototype under different architectural approaches was benchmarked, and the interface was tested in its usability. Overall, the work described in this dissertation contributes with a more versatile, intuitive and scalable time series annotation platform that streamlines the knowledge-discovery workflow.A recente expansão de metrificação diária levou à produção de quantidades massivas de dados, e em muitos casos, estas métricas são úteis para a construção de conhecimento apenas quando vistas como uma sequência de dados ordenada por tempo, o que constitui uma série temporal. Para se encontrar padrões comportamentais significativos em séries temporais, uma grande variedade de software de análise foi desenvolvida. Muitas das soluções existentes utilizam anotações para permitir a curadoria de uma base de conhecimento que é compartilhada entre investigadores em rede. No entanto, estas ferramentas carecem de mecanismos apropriados para lidar com um elevado número de pedidos concorrentes e para armazenar conjuntos massivos de dados e ontologias, assim como também representações apropriadas para dados anotados que são visualmente interpretáveis por seres humanos e exploráveis por sistemas automatizados. O objetivo do trabalho apresentado nesta dissertação é iterar sobre o software de análise de séries temporais existente e construir uma plataforma para a análise colaborativa de grandes conjuntos de séries temporais, utilizando tecnologias estado-de-arte para pesquisar, armazenar e exibir séries temporais e anotações. Um modelo teórico e agnóstico quanto ao domínio foi proposto para permitir a implementação de uma arquitetura distribuída, extensível, segura e de alto desempenho que lida com várias propostas de anotação em simultâneo e evita quaisquer perdas de dados provenientes de contribuições sobrepostas ou alterações não-sancionadas. Os analistas podem compartilhar projetos de anotação com colegas, restringindo um conjunto de colaboradores a uma janela de análise mais pequena e a um catálogo limitado de semântica de anotação. As anotações podem exprimir significado não apenas sobre um intervalo de tempo, mas também sobre um subconjunto das séries que coexistem no mesmo intervalo. Uma nova codificação visual para anotações é proposta, onde as anotações são desenhadas como arcos traçados apenas sobre as curvas de séries afetadas de modo a reduzir o ruído visual. Para além disso, a implementação de um protótipo full-stack com uma interface reativa web foi descrita, seguindo diretamente o modelo de arquitetura e visualização proposto enquanto aplicado ao domínio AVAC. O desempenho do protótipo com diferentes decisões arquiteturais foi avaliado, e a interface foi testada quanto à sua usabilidade. Em geral, o trabalho descrito nesta dissertação contribui com uma abordagem mais versátil, intuitiva e escalável para uma plataforma de anotação sobre séries temporais que simplifica o fluxo de trabalho para a descoberta de conhecimento.Mestrado em Engenharia Informátic

    Acoustic data optimisation for seabed mapping with visual and computational data mining

    Get PDF
    Oceans cover 70% of Earth’s surface but little is known about their waters. While the echosounders, often used for exploration of our oceans, have developed at a tremendous rate since the WWII, the methods used to analyse and interpret the data still remain the same. These methods are inefficient, time consuming, and often costly in dealing with the large data that modern echosounders produce. This PhD project will examine the complexity of the de facto seabed mapping technique by exploring and analysing acoustic data with a combination of data mining and visual analytic methods. First we test the redundancy issues in multibeam echosounder (MBES) data by using the component plane visualisation of a Self Organising Map (SOM). A total of 16 visual groups were identified among the 132 statistical data descriptors. The optimised MBES dataset had 35 attributes from 16 visual groups and represented a 73% reduction in data dimensionality. A combined Principal Component Analysis (PCA) + k-means was used to cluster both the datasets. The cluster results were visually compared as well as internally validated using four different internal validation methods. Next we tested two novel approaches in singlebeam echosounder (SBES) data processing and clustering – using visual exploration for outlier detection and direct clustering of time series echo returns. Visual exploration identified further outliers the automatic procedure was not able to find. The SBES data were then clustered directly. The internal validation indices suggested the optimal number of clusters to be three. This is consistent with the assumption that the SBES time series represented the subsurface classes of the seabed. Next the SBES data were joined with the corresponding MBES data based on identification of the closest locations between MBES and SBES. Two algorithms, PCA + k-means and fuzzy c-means were tested and results visualised. From visual comparison, the cluster boundary appeared to have better definitions when compared to the clustered MBES data only. The results seem to indicate that adding SBES did in fact improve the boundary definitions. Next the cluster results from the analysis chapters were validated against ground truth data using a confusion matrix and kappa coefficients. For MBES, the classes derived from optimised data yielded better accuracy compared to that of the original data. For SBES, direct clustering was able to provide a relatively reliable overview of the underlying classes in survey area. The combined MBES + SBES data provided by far the best accuracy for mapping with almost a 10% increase in overall accuracy compared to that of the original MBES data. The results proved to be promising in optimising the acoustic data and improving the quality of seabed mapping. Furthermore, these approaches have the potential of significant time and cost saving in the seabed mapping process. Finally some future directions are recommended for the findings of this research project with the consideration that this could contribute to further development of seabed mapping problems at mapping agencies worldwide

    Analyse et détection des trajectoires d'approches atypiques des aéronefs à l'aide de l'analyse de données fonctionnelles et de l'apprentissage automatique

    Get PDF
    L'amélioration de la sécurité aérienne implique généralement l'identification, la détection et la gestion des événements indésirables qui peuvent conduire à des événements finaux mortels. De précédentes études menées par la DSAC, l'autorité de surveillance française, ont permis d'identifier les approches non-conformes présentant des déviations par rapport aux procédures standards comme des événements indésirables. Cette thèse vise à explorer les techniques de l'analyse de données fonctionnelles et d'apprentissage automatique afin de fournir des algorithmes permettant la détection et l'analyse de trajectoires atypiques en approche à partir de données sol. Quatre axes de recherche sont abordés. Le premier axe vise à développer un algorithme d'analyse post-opérationnel basé sur des techniques d'analyse de données fonctionnelles et d'apprentissage non-supervisé pour la détection de comportements atypiques en approche. Le modèle sera confronté à l'analyse des bureaux de sécurité des vols des compagnies aériennes, et sera appliqué dans le contexte particulier de la période COVID-19 pour illustrer son utilisation potentielle alors que le système global ATM est confronté à une crise. Le deuxième axe de recherche s'intéresse plus particulièrement à la génération et à l'extraction d'informations à partir de données radar à l'aide de nouvelles techniques telles que l'apprentissage automatique. Ces méthodologies permettent d'améliorer la compréhension et l'analyse des trajectoires, par exemple dans le cas de l'estimation des paramètres embarqués à partir des paramètres radar. Le troisième axe, propose de nouvelles techniques de manipulation et de génération de données en utilisant le cadre de l'analyse de données fonctionnelles. Enfin, le quatrième axe se concentre sur l'extension en temps réel de l'algorithme post-opérationnel grâce à l'utilisation de techniques de contrôle optimal, donnant des pistes vers de nouveaux systèmes d'alerte permettant une meilleure conscience de la situation.Improving aviation safety generally involves identifying, detecting and managing undesirable events that can lead to final events with fatalities. Previous studies conducted by the French National Supervisory Authority have led to the identification of non-compliant approaches presenting deviation from standard procedures as undesirable events. This thesis aims to explore functional data analysis and machine learning techniques in order to provide algorithms for the detection and analysis of atypical trajectories in approach from ground side. Four research directions are being investigated. The first axis aims to develop a post-op analysis algorithm based on functional data analysis techniques and unsupervised learning for the detection of atypical behaviours in approach. The model is confronted with the analysis of airline flight safety offices, and is applied in the particular context of the COVID-19 crisis to illustrate its potential use while the global ATM system is facing a standstill. The second axis of research addresses the generation and extraction of information from radar data using new techniques such as Machine Learning. These methodologies allow to \mbox{improve} the understanding and the analysis of trajectories, for example in the case of the estimation of on-board parameters from radar parameters. The third axis proposes novel data manipulation and generation techniques using the functional data analysis framework. Finally, the fourth axis focuses on extending the post-operational algorithm into real time with the use of optimal control techniques, giving directions to new situation awareness alerting systems

    Analiza i predviđanje toka vremenskih serija pomoću “Case-BasedReasoning” tehnologije.

    Get PDF
    This thesis describes one promising approach where a problem of time series analysis and prediction was solved by using Case Based Reasoning (CBR) technology. Foundations and main concepts of this technology are described in detail. Furthermore, a detailed study of different approaches in time series analysis is given. System CuBaGe (Curve Base Generator) - A robust and general architecture for curve representation and indexing time series databases, based on Case based reasoning technology, was developed. Also, a corresponding similarity measure was modelled for a given kind of curve representation. The presented architecture may be employed equally well not only in conventional time series (where all values are known), but also in some non-standard time series (sparse, vague, non-equidistant). Dealing with the non-standard time series is the highest advantage of the presented architecture.U ovoj doktorskoj disertaciji prikazan je interesantan i perspektivan pristup rešavanja problema analize i predviđanja vremenskih serija korišćenjem Case Based Reasoning (CBR) tehnologije. Detaljno su opisane osnove i glavni koncepti ove tehnologije. Takođe, data je komparativna analiza različitih pristupa u analizi vremenskih serija sa posebnim osvrtom na predviđanje. Kao najveći doprinos ove disertacije, prikazan je sistem CuBaGe (Curve Base Generator) u kome je realizovan originalni način reprezentacije vremenskih serija zajedno sa, takođe originalnom, odgovarajućom merom sličnosti. Robusnost i generalnost sistema ilustrovana je realnom primenom u domenu finansijskog predviđanja, gde je pokazano da sistem jednako dobro funkcioniše sa standardnim, ali i sa nekim nestandardnim vremenskim serijama (neodređenim, retkim i neekvidistantnim)

    Visuelle Suchanfragen auf graphbasierten Datenstrukturen

    Get PDF
    Die Menge an verfügbaren Daten nimmt stetig zu. Durch standardisierte Datenformate wird die Verknüpfung verschiedener Datenquellen und dadurch auch die Zusammenführung unterschiedlicher Datenelemente je nach Anwendungszweck ermöglicht. Dies führt wiederum zu noch umfassenderen Datenbeständen, in denen die eigentlich gewünschten Informationen teilweise nur schwer gefunden werden können. Handelt es sich bei den Daten um unstrukturierte oder gleichförmige Informationen, so beschränken sich Suchmöglichkeiten auf die Suche nach Übereinstimmungen von Mustern mit Datenelementen oder Teilen davon - beispielsweise Zeichenketten oder regulären Ausdrücken, die mit Teilen von textuellen Datenelementen übereinstimmen. In zunehmendem Maß stehen jedoch auch strukturierte Daten zur Verfügung. Bei diesen wird entweder von Anfang an zwischen unterschiedlichen Facetten pro Datenelement unterschieden, oder es wurden ursprünglich unstrukturierte Daten entsprechend angereichert. Da die einzelnen Facetten auch Verknüpfungen zu anderen Datenelementen darstellen können, entstehen hierbei Graphstrukturen, welche sich für Ansätze der facettierten Suche eignen. Eine Interoperabilität zwischen Datenquellen wird hier unter anderem über die Konzepte und Techniken des Semantic Web erreicht. Zahlreiche Arbeiten haben sich mit der Darstellung der gesamten Datenmengen als Übersicht oder von festgelegten Ausschnitten der Datenmengen im Detail auseinandergesetzt. Jedoch ist das Auffinden bestimmter Daten nach wie vor ein Problem. Die Schwierigkeit liegt dabei darin, die Suchkriterien präzise auszudrücken. Da sich zwischen den einzelnen Kriterien komplexe Zusammenhänge ergeben können, bietet sich auch hier genau wie bei der Übersicht der Datenmengen eine visuelle Darstellung an. Eine Besonderheit dieses Einsatzszenarios für Visualisierungen besteht darin, dass nicht zwangsläufig Daten vorliegen. Statt dessen muss die Visualisierung auch ohne verfügbare Daten die konzeptuelle Idee einer Suchanfrage ausdrücken. Frühere Arbeiten zu diesem Problem befassen sich mit der visuellen Repräsentation von Suchanfragen und Filterausdrücken in Bezug auf relationale Datenbanken und Objektdatenbanken. Viele neuere Arbeiten gehen vermehrt auch auf den Kontext des Semantic Webs ein. Einige dieser Konzepte sind jedoch nicht auf abstrakte Weise klar definiert. Bei komplexeren Anfragen treten zum Teil auch Skalierungsprobleme auf. Zudem wurde bisher kaum betrachtet, wie sich unterschiedliche Konzepte miteinander in Verbindung bringen lassen, um die Vorteile aus unterschiedlichen Anfragevisualisierungen nutzen zu können. Diese Dissertation adressiert die beschriebenen Probleme und stellt sechs Konzepte für die visuelle Darstellung von Suchanfragen vor. Es wird sowohl auf Visualisierungen für allgemeine Einsatzzwecke - also für die Filterung beliebiger strukturierter Informationen -, als auch für spezielle Domänen oder Arten von Informationen eingegangen. Bestehende Ansätze wurden teilweise auf die Gegebenheiten graphbasierter Datenstrukturen angepasst. Ebenso werden neue Ansätze präsentiert, die gezielt auf diese Art von Datenstrukturen ausgelegt sind. Dazu wird jeweils erörtert, inwiefern sich die Anfragevisualisierungen auch ohne Vorhandensein einer zu filternden Datensammlung einsetzen lassen. Zudem wird erklärt, wie bei Vorhandensein einer solchen eine Vorschau auf die Ergebnisse des Filtervorgangs gewährt werden kann. Abschließend werden Verbindungsmöglichkeiten der unterschiedlichen Visualisierungskonzepte präsentiert. Dieser Verbindungsansatz eignet sich dazu, beliebige Anfragevisualisierungen systematisch miteinander zu kombinieren. Mit dem Verbindungskonzept können Benutzer verschiedene Bestandteile einer Anfrage mittels unterschiedlicher Visualisierungskonzepte ausdrücken, um gleichzeitig von den Stärken unterschiedlicher Anfragevisualisierungen zu profitieren. Auf diese Weise können nun Anfragen visuell definiert und dargestellt werden, die sowohl komplexe Bedingungen als auch komplexe Zusammenhänge zwischen den Bedingungen aufweisen, ohne die visuelle Übersicht über einen dieser Aspekte zu verlieren.The total amount of available data is steadily increasing. Standardized data formats allow for connecting different data sources, which can include merging of different data items depending on the use case. This creates even more comprehensive datasets that render finding a particular piece of information difficult. If the data consist of unstructured of homogenous information, searching can only be done by matching patterns with data items or parts thereof - for instance, character strings or regular expressions that match parts of textual data items. However, the availability of structured data is increasing. This kind of data is either stored as distinct facets of each data item from the outset, or originally unstructured data has been enriched to form a structure. As each facet can indicate a link to another data item, the entire dataset forms a graph that is suitable for faceted search conepts. At this point, some interoperability across data sources can be achieved by employing Semantic Web approaches and techniques. Numerous works have attempted to visualize an overview of the entire dataset, or details of a particular excerpt of the dataset. Finding specific data remains a problem, however, as the precise specification of search criteria is difficult. As these criteria can be connected in complex ways, just like the overview of datasets, this issue lends itself to using visual representations. A special trait of this application of visualization is the possible absence of any data. Instead, the visualization must be capable of conveying the conceptual idea of a search query without displaying any data. Former works related to this problem focused on the visual representation of search queries and filter expressions for relational and object-oriented databases. More recent works increasingly address a Semantic Web context. Various of these concepts, however, lack a clear abstract definition. Also, scalability issues appear in the case of complex queries. Furthermore, little attention was paid to how to connect several concepts in order to combine advantages of different query visualizations. This dissertation considers the described problems and presents six concepts for query visualization. Both generic visualizations - that is, for filtering any kind of structured data - and domain-specific or type-specific visualizations are addressed. In part, existing approaches have been adapted to the particularities of graph-based data structures. Likewise, several new approaches specifically designed for this kind of data are presented. For each of these concepts, the necessity of a dataset is discussed. Moreover, options for providing a preview on query results from such a dataset, if available, are considered. Finally, ways for connecting the query visualization concepts are presented. This connection approach is suitable for systematically linking together arbitrary query visualizations. By means of the connection approach, users can express different parts of a query using different visualization concepts, in order to benefit from the advantages of several query visualizations at a time. Like this, queries that include complex criteria as well as complex relations between criteria can now be defined and displayed visually without losing the visual overview of any of these aspects

    Uloga mera sličnosti u analizi vremenskih serija

    Get PDF
    The subject of this dissertation encompasses a comprehensive overview and analysis of the impact of Sakoe-Chiba global constraint on the most commonly used elastic similarity measures in the field of time-series data mining with a focus on classification accuracy. The choice of similarity measure is one of the most significant aspects of time-series analysis  -  it should correctly reflect the resemblance between the data presented in the form of time series. Similarity measures represent a critical component of many tasks of mining time series, including: classification, clustering, prediction, anomaly detection, and others. The research covered by this dissertation is oriented on several issues: 1.  review of the effects of  global constraints on the performance of computing similarity measures, 2.  a detailed analysis of the influence of constraining the elastic similarity measures on the accuracy of classical classification techniques, 3.  an extensive study of the impact of different weighting schemes on the classification of time series, 4.  development of an open source library that integrates the main techniques and methods required for analysis and mining time series, and which is used for the realization of these experimentsPredmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlјa kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uklјučujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detalјna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata.Predmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlja kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uključujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detaljna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata

    Digital Intelligence – Möglichkeiten und Umsetzung einer informatikgestützten Frühaufklärung: Digital Intelligence – opportunities and implementation of a data-driven foresight

    Get PDF
    Das Ziel der Digital Intelligence bzw. datengetriebenen Strategischen Frühaufklärung ist, die Zukunftsgestaltung auf Basis valider und fundierter digitaler Information mit vergleichsweise geringem Aufwand und enormer Zeit- und Kostenersparnis zu unterstützen. Hilfe bieten innovative Technologien der (halb)automatischen Sprach- und Datenverarbeitung wie z. B. das Information Retrieval, das (Temporal) Data, Text und Web Mining, die Informationsvisualisierung, konzeptuelle Strukturen sowie die Informetrie. Sie ermöglichen, Schlüsselthemen und latente Zusammenhänge aus einer nicht überschaubaren, verteilten und inhomogenen Datenmenge wie z. B. Patenten, wissenschaftlichen Publikationen, Pressedokumenten oder Webinhalten rechzeitig zu erkennen und schnell und zielgerichtet bereitzustellen. Die Digital Intelligence macht somit intuitiv erahnte Muster und Entwicklungen explizit und messbar. Die vorliegende Forschungsarbeit soll zum einen die Möglichkeiten der Informatik zur datengetriebenen Frühaufklärung aufzeigen und zum zweiten diese im pragmatischen Kontext umsetzen. Ihren Ausgangspunkt findet sie in der Einführung in die Disziplin der Strategischen Frühaufklärung und ihren datengetriebenen Zweig – die Digital Intelligence. Diskutiert und klassifiziert werden die theoretischen und insbesondere informatikbezogenen Grundlagen der Frühaufklärung – vor allem die Möglichkeiten der zeitorientierten Datenexploration. Konzipiert und entwickelt werden verschiedene Methoden und Software-Werkzeuge, die die zeitorientierte Exploration insbesondere unstrukturierter Textdaten (Temporal Text Mining) unterstützen. Dabei werden nur Verfahren in Betracht gezogen, die sich im Kontext einer großen Institution und den spezifischen Anforderungen der Strategischen Frühaufklärung pragmatisch nutzen lassen. Hervorzuheben sind eine Plattform zur kollektiven Suche sowie ein innovatives Verfahren zur Identifikation schwacher Signale. Vorgestellt und diskutiert wird eine Dienstleistung der Digital Intelligence, die auf dieser Basis in einem globalen technologieorientierten Konzern erfolgreich umgesetzt wurde und eine systematische Wettbewerbs-, Markt- und Technologie-Analyse auf Basis digitaler Spuren des Menschen ermöglicht.:Kurzzusammenfassung 2 Danksagung 3 Inhaltsverzeichnis 5 Tabellenverzeichnis 9 Abbildungsverzeichnis 10 A – EINLEITUNG 13 1 Hintergrund und Motivation 13 2 Beitrag und Aufbau der Arbeit 16 B – THEORIE 20 B0 – Digital Intelligence 20 3 Herleitung und Definition der Digital Intelligence 21 4 Abgrenzung zur Business Intelligence 23 5 Übersicht über unterschiedliche Textsorten 24 6 Informetrie: Bibliometrie, Szientometrie, Webometrie 29 7 Informationssysteme im Kontext der Digital Intelligence 31 B1 – Betriebswirtschaftliche Grundlagen der Digital Intelligence 36 8 Strategische Frühaufklärung 37 8.1 Facetten und historische Entwicklung 37 8.2 Methoden 41 8.3 Prozess 42 8.4 Bestimmung wiederkehrender Termini 44 8.5 Grundlagen der Innovations- und Diffusionsforschung 49 B2 – Informatik-Grundlagen der Digital Intelligence 57 9 Von Zeit, Daten, Text, Metadaten zu multidimensionalen zeitorientierten (Text)Daten 59 9.1 Zeit – eine Begriffsbestimmung 59 9.1.1 Zeitliche Grundelemente und Operatoren 59 9.1.2 Lineare, zyklische und verzweigte Entwicklungen 62 9.1.3 Zeitliche (Un)Bestimmtheit 62 9.1.4 Zeitliche Granularität 63 9.2 Text 63 9.2.1 Der Text und seine sprachlich-textuellen Ebenen 63 9.2.2 Von Signalen und Daten zu Information und Wissen 65 9.3 Daten 65 9.3.1 Herkunft 65 9.3.2 Datengröße 66 9.3.3 Datentyp und Wertebereich 66 9.3.4 Datenstruktur 67 9.3.5 Dimensionalität 68 9.4 Metadaten 69 9.5 Zusammenfassung und multidimensionale zeitorientierte Daten 70 10 Zeitorientierte Datenexplorationsmethoden 73 10.1 Zeitorientierte Datenbankabfragen und OLAP 76 10.2 Zeitorientiertes Information Retrieval 78 10.3 Data Mining und Temporal Data Mining 79 10.3.1 Repräsentationen zeitorientierter Daten 81 10.3.2 Aufgaben des Temporal Data Mining 86 10.4 Text Mining und Temporal Text Mining 91 10.4.1 Grundlagen des Text Mining 98 10.4.2 Entwickelte, genutzte und lizensierte Anwendungen des Text Mining 107 10.4.3 Formen des Temporal Text Mining 110 10.4.3.1 Entdeckung kausaler und zeitorientierter Regeln 110 10.4.3.2 Identifikation von Abweichungen und Volatilität 111 10.4.3.3 Identifikation und zeitorientierte Organisation von Themen 112 10.4.3.4 Zeitorientierte Analyse auf Basis konzeptueller Strukturen 116 10.4.3.5 Zeitorientierte Analyse von Frequenz, Vernetzung und Hierarchien 117 10.4.3.6 Halbautomatische Identifikation von Trends 121 10.4.3.7 Umgang mit dynamisch aktualisierten Daten 123 10.5 Web Mining und Temporal Web Mining 124 10.5.1 Web Content Mining 125 10.5.2 Web Structure Mining 126 10.5.3 Web Usage Mining 127 10.5.4 Temporal Web Mining 127 10.6 Informationsvisualisierung 128 10.6.1 Visualisierungstechniken 130 10.6.1.1 Visualisierungstechniken nach Datentypen 130 10.6.1.2 Visualisierungstechniken nach Darstellungsart 132 10.6.1.3 Visualisierungstechniken nach Art der Interaktion 137 10.6.1.4 Visualisierungstechniken nach Art der visuellen Aufgabe 139 10.6.1.5 Visualisierungstechniken nach Visualisierungsprozess 139 10.6.2 Zeitorientierte Visualisierungstechniken 140 10.6.2.1 Statische Repräsentationen 141 10.6.2.2 Dynamische Repräsentationen 145 10.6.2.3 Ereignisbasierte Repräsentationen 147 10.7 Zusammenfassung 152 11 Konzeptuelle Strukturen 154 12 Synopsis für die zeitorientierte Datenexploration 163 C – UMSETZUNG EINES DIGITAL-INTELLIGENCESYSTEMS 166 13 Bestimmung textbasierter Indikatoren 167 14 Anforderungen an ein Digital-Intelligence-System 171 15 Beschreibung der Umsetzung eines Digital-Intelligence-Systems 174 15.1 Konzept einer Dienstleistung der Digital Intelligence 175 15.1.1 Portalnutzung 177 15.1.2 Steckbriefe 178 15.1.3 Tiefenanalysen 180 15.1.4 Technologiescanning 185 15.2 Relevante Daten für die Digital Intelligence (Beispiel) 187 15.3 Frühaufklärungs-Plattform 188 15.4 WCTAnalyze und automatische Extraktion themenspezifischer Ereignisse 197 15.5 SemanticTalk 200 15.6 Halbautomatische Identifikation von Trends 204 15.6.1 Zeitreihenkorrelation 205 15.6.2 HD-SOM-Scanning 207 D – ZUSAMMENFASSUNG 217 Anhang A: Prozessbilder entwickelter Anwendungen des (Temporal) Text Mining 223 Anhang B: Synopsis der zeitorientierten Datenexploration 230 Literaturverzeichnis 231 Selbstständigkeitserklärung 285 Wissenschaftlicher Werdegang des Autors 286 Veröffentlichungen 28
    corecore