924 research outputs found
Automatic information retrieval through text-mining
The dissertation presented for obtaining the Master’s Degree in Electrical Engineering and Computer Science, at Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaNowadays, around a huge amount of firms in the European Union catalogued as Small and Medium Enterprises (SMEs), employ almost a great portion of the active workforce in Europe. Nonetheless, SMEs cannot afford implementing neither methods nor tools to systematically adapt innovation as a part of their business process. Innovation is the engine to be competitive in the globalized environment, especially in the current
socio-economic situation. This thesis provides a platform that when integrated with ExtremeFactories(EF) project, aids SMEs to become more competitive by means of monitoring schedule functionality.
In this thesis a text-mining platform that possesses the ability to schedule a gathering
information through keywords is presented. In order to develop the platform, several
choices concerning the implementation have been made, in the sense that one of them
requires particular emphasis is the framework, Apache Lucene Core 2 by supplying an efficient text-mining tool and it is highly used for the purpose of the thesis
Understanding the bi-directional relationship between analytical processes and interactive visualization systems
Interactive visualizations leverage the human visual and reasoning systems to increase the scale of information with which we can effectively work, therefore improving our ability to explore and analyze large amounts of data. Interactive visualizations are often designed with target domains in mind, such as analyzing unstructured textual information, which is a main thrust in this dissertation.
Since each domain has its own existing procedures of analyzing data, a good start to a well-designed interactive visualization system is to understand the domain experts' workflow and analysis processes. This dissertation recasts the importance of understanding domain users' analysis processes and incorporating such understanding into the design of interactive visualization systems.
To meet this aim, I first introduce considerations guiding the gathering of general and domain-specific analysis processes in text analytics. Two interactive visualization systems are designed by following the considerations. The first system is Parallel-Topics, a visual analytics system supporting analysis of large collections of documents by extracting semantically meaningful topics. Based on lessons learned from Parallel-Topics, this dissertation further presents a general visual text analysis framework, I-Si, to present meaningful topical summaries and temporal patterns, with the capability to handle large-scale textual information. Both systems have been evaluated by expert users and deemed successful in addressing domain analysis needs.
The second contribution lies in preserving domain users' analysis process while using interactive visualizations. Our research suggests the preservation could serve multiple purposes. On the one hand, it could further improve the current system. On the other hand, users often need help in recalling and revisiting their complex and sometimes iterative analysis process with an interactive visualization system. This dissertation introduces multiple types of evidences available for capturing a user's analysis process within an interactive visualization and analyzes cost/benefit ratios of the capturing methods. It concludes that tracking interaction sequences is the most un-intrusive and feasible way to capture part of a user's analysis process. To validate this claim, a user study is presented to theoretically analyze the relationship between interactions and problem-solving processes. The results indicate that constraining the way a user interacts with a mathematical puzzle does have an effect on the problemsolving process. As later evidenced in an evaluative study, a fair amount of high-level analysis can be recovered through merely analyzing interaction logs
Semantic multimedia modelling & interpretation for annotation
The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user’s high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems
Iterative Visual Analytics and its Applications in Bioinformatics
Indiana University-Purdue University Indianapolis (IUPUI)You, Qian. Ph.D., Purdue University, December, 2010. Iterative Visual Analytics and its Applications in Bioinformatics. Major Professors: Shiaofen Fang and Luo
Si.
Visual Analytics is a new and developing field that addresses the challenges of
knowledge discoveries from the massive amount of available data. It facilitates
humans‘ reasoning capabilities with interactive visual interfaces for exploratory data analysis tasks, where automatic data mining methods fall short due to the lack of the pre-defined objective functions. Analyzing the large volume of data sets for biological discoveries raises similar challenges. The domain knowledge of biologists and bioinformaticians is critical in the hypothesis-driven discovery tasks. Yet developing visual analytics frameworks for bioinformatic applications is
still in its infancy.
In this dissertation, we propose a general visual analytics framework – Iterative Visual Analytics (IVA) – to address some of the challenges in the current research. The framework consists of three progressive steps to explore data sets with the increased complexity: Terrain Surface Multi-dimensional Data
Visualization, a new multi-dimensional technique that highlights the global
patterns from the profile of a large scale network. It can lead users‘ attention to characteristic regions for discovering otherwise hidden knowledge; Correlative Multi-level Terrain Surface Visualization, a new visual platform that provides the overview and boosts the major signals of the numeric correlations among nodes in interconnected networks of different contexts. It enables users to gain
critical insights and perform data analytical tasks in the context of multiple correlated networks; and the Iterative Visual Refinement Model, an innovative process that treats users‘ perceptions as the objective functions, and guides the users to form the optimal hypothesis by improving the desired visual patterns. It is a formalized model for interactive explorations to converge to optimal solutions.
We also showcase our approach with bio-molecular data sets and demonstrate
its effectiveness in several biomarker discovery applications
Recommended from our members
LDA Ensembles for Interactive Exploration and Categorization of Behaviors
We define behavior as a set of actions performed by some agent during a period of time. We consider the problem of analyzing a large collection of behaviors by multiple agents, more specifically, identifying typical behaviors as well as spotting behavior anomalies. We propose an approach leveraging topic modeling techniques -- LDA (Latent Dirichlet Allocation) Ensembles -- for representing categories of typical behaviors by topics obtained through applying topic modeling to a behavior collection. When such methods are applied to text documents, the goodness of the extracted topics is usually judged based on the semantic relatedness of the terms pertinent to the topics. This criterion, however, may not be applicable to topics extracted from non-textual data, such as action sets, since relationships between actions may not be obvious. We have developed a suite of visual and interactive techniques supporting the construction of an appropriate combination of topics based on other criteria, such as distinctiveness and coverage of the behavior set. Our case studies in the operation behaviors in the security management system and visiting behaviors in an amusement park and the expert evaluation of the first case study demonstrate the effectiveness of our approach
Recommended from our members
NIH-NSF Visualization Research Challenges Report
Engineering and Applied Science
The state of the art in integrating machine learning into visual analytics
Visual analytics systems combine machine learning or other analytic techniques with interactive data visualization to promote sensemaking and analytical reasoning. It is through such techniques that people can make sense of large, complex data. While progress has been made, the tactful combination of machine learning and data visualization is still under-explored. This state-of-the-art report presents a summary of the progress that has been made by highlighting and synthesizing select research advances. Further, it presents opportunities and challenges to enhance the synergy between machine learning and visual analytics for impactful future research directions
Data mining techniques for complex application domains
The emergence of advanced communication techniques has increased availability of large collection of data in electronic form in a number of application domains including healthcare, e- business, and e-learning. Everyday a large amount of records are stored electronically. However, finding useful information from such a large data collection is a challenging issue. Data mining technology aims automatically extracting hidden knowledge from large data repositories exploiting sophisticated algorithms. The hidden knowledge in the electronic data may be potentially utilized to facilitate the procedures, productivity, and reliability of several application domains.
The PhD activity has been focused on novel and effective data mining approaches to tackle the complex data coming from two main application domains: Healthcare data analysis and Textual data analysis.
The research activity, in the context of healthcare data, addressed the application of different data mining techniques to discover valuable knowledge from real exam-log data of patients. In particular, efforts have been devoted to the extraction of medical pathways, which can be exploited to analyze the actual treatments followed by patients. The derived knowledge not only provides useful information to deal with the treatment procedures but may also play an important role in future predictions of potential patient risks associated with medical treatments.
The research effort in textual data analysis is twofold. On the one hand, a novel approach to discovery of succinct summaries of large document collections has been proposed. On the other hand, the suitability of an established descriptive data mining to support domain experts in making decisions has been investigated. Both research activities are focused on adopting widely exploratory data mining techniques to textual data analysis, which require overcoming intrinsic limitations for traditional algorithms for handling textual documents efficiently and effectively
- …