1,091 research outputs found
eXamine: a Cytoscape app for exploring annotated modules in networks
Background. Biological networks have growing importance for the
interpretation of high-throughput "omics" data. Statistical and combinatorial
methods allow to obtain mechanistic insights through the extraction of smaller
subnetwork modules. Further enrichment analyses provide set-based annotations
of these modules.
Results. We present eXamine, a set-oriented visual analysis approach for
annotated modules that displays set membership as contours on top of a
node-link layout. Our approach extends upon Self Organizing Maps to
simultaneously lay out nodes, links, and set contours.
Conclusions. We implemented eXamine as a freely available Cytoscape app.
Using eXamine we study a module that is activated by the virally-encoded
G-protein coupled receptor US28 and formulate a novel hypothesis about its
functioning
Practical data mining in a large utility company
We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are present
Practical data mining in a large utility company
We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are present
Medical imaging analysis with artificial neural networks
Given that neural networks have been widely reported in the research community of medical imaging, we provide a focused literature survey on recent neural network developments in computer-aided diagnosis, medical image segmentation and edge detection towards visual content analysis, and medical image registration for its pre-processing and post-processing, with the aims of increasing awareness of how neural networks can be applied to these areas and to provide a foundation for further research and practical development. Representative techniques and algorithms are explained in detail to provide inspiring examples illustrating: (i) how a known neural network with fixed structure and training procedure could be applied to resolve a medical imaging problem; (ii) how medical images could be analysed, processed, and characterised by neural networks; and (iii) how neural networks could be expanded further to resolve problems relevant to medical imaging. In the concluding section, a highlight of comparisons among many neural network applications is included to provide a global view on computational intelligence with neural networks in medical imaging
Mapping the state of financial stability
The paper uses the Self-Organizing Map for mapping the state of financial stability and visualizing the sources of systemic risks as well as for predicting systemic financial crises. The Self-Organizing Financial Stability Map (SOFSM) enables a two-dimensional representation of a multidimensional financial stability space that allows disentangling the individual sources impacting on systemic risks. The SOFSM can be used to monitor macro-financial vulnerabilities by locating a country in the financial stability cycle: being it either in the pre-crisis, crisis, post-crisis or tranquil state. In addition, the SOFSM performs better than or equally well as a logit model in classifying in-sample data and predicting out-of-sample the global financial crisis that started in 2007. Model robustness is tested by varying the thresholds of the models, the policymaker’s preferences, and the forecasting horizons. JEL Classification: E44, E58, F01, F37, G01macroprudential supervision, prediction, Self-Organizing Map (SOM), Systemic financial crisis, systemic risk, visualization
Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation
We propose in this paper an exploratory analysis algorithm for functional
data. The method partitions a set of functions into clusters and represents
each cluster by a simple prototype (e.g., piecewise constant). The total number
of segments in the prototypes, , is chosen by the user and optimally
distributed among the clusters via two dynamic programming algorithms. The
practical relevance of the method is shown on two real world datasets
Integrated characterisation of mud-rich overburden sediment sequences using limited log and seismic data: Application to seal risk
Muds and mudstones are the most abundant sediments in sedimentary basins and can
control fluid migration and pressure. In petroleum systems, they can also act as source,
reservoir or seal rocks. More recently, the sealing properties of mudstones have been
used for nuclear waste storage and geological CO2 sequestration. Despite the growing
importance of mudstones, their geological modelling is poorly understood and clear
quantitative studies are needed to address 3D lithology and flow properties distribution
within these sediments. The key issues in this respect are the high degree of
heterogeneity in mudstones and the alteration of lithology and flow properties with time
and depth. In addition, there are often very limited field data (log and seismic), with
lower quality within these sediments, which makes the common geostatistical modelling
practices ineffective.
In this study we assess/capture quantitatively the flow-important characteristics of
heterogeneous mud-rich sequences based on limited conventional log and post-stack
seismic data in a deep offshore West African case study. Additionally, we develop a
practical technique of log-seismic integration at the cross-well scale to translate 3D
seismic attributes into lithology probabilities. The final products are probabilistic
multiattribute transforms at different resolutions which allow prediction of lithologies
away from wells while keeping the important sub-seismic stratigraphic and structural
flow features. As a key result, we introduced a seismically-driven risk attribute (so-called
Seal Risk Factor "SRF") which showed robust correspondence to the lithologies
within the seismic volume. High seismic SRFs were often a good approximation for
volumes containing a higher percentage of coarser-grained and distorted sediments, and
vice versa.
We believe that this is the first attempt at quantitative, integrated characterisation of
mud-rich overburden sediment sequences using log and seismic data. Its application on
modern seismic surveys can save days of processing/mapping time and can reduce
exploration risk by basing decisions on seal texture and lithology probabilities
Data exploration process based on the self-organizing map
With the advances in computer technology, the amount of data that is obtained from various sources and stored in electronic media is growing at exponential rates. Data mining is a research area which answers to the challange of analysing this data in order to find useful information contained therein. The Self-Organizing Map (SOM) is one of the methods used in data mining. It quantizes the training data into a representative set of prototype vectors and maps them on a low-dimensional grid. The SOM is a prominent tool in the initial exploratory phase in data mining.
The thesis consists of an introduction and ten publications. In the publications, the validity of SOM-based data exploration methods has been investigated and various enhancements to them have been proposed. In the introduction, these methods are presented as parts of the data mining process, and they are compared with other data exploration methods with similar aims.
The work makes two primary contributions. Firstly, it has been shown that the SOM provides a versatile platform on top of which various data exploration methods can be efficiently constructed. New methods and measures for visualization of data, clustering, cluster characterization, and quantization have been proposed. The SOM algorithm and the proposed methods and measures have been implemented as a set of Matlab routines in the SOM Toolbox software library.
Secondly, a framework for SOM-based data exploration of table-format data - both single tables and hierarchically organized tables - has been constructed. The framework divides exploratory data analysis into several sub-tasks, most notably the analysis of samples and the analysis of variables. The analysis methods are applied autonomously and their results are provided in a report describing the most important properties of the data manifold. In such a framework, the attention of the data miner can be directed more towards the actual data exploration task, rather than on the application of the analysis methods. Because of the highly iterative nature of the data exploration, the automation of routine analysis tasks can reduce the time needed by the data exploration process considerably.reviewe
Visualization Techniques For Malware Behavior Analysis
Malware spread via Internet is a great security threat, so studying their behavior is important to identify and classify them. Using SSDT hooking we can obtain malware behavior by running it in a controlled environment and capturing interactions with the target operating system regarding file, process, registry, network and mutex activities. This generates a chain of events that can be used to compare them with other known malware. In this paper we present a simple approach to convert malware behavior into activity graphs and show some visualization techniques that can be used to analyze malware behavior, individually or grouped. © 2011 SPIE.8019The Society of Photo-Optical Instrumentation Engineers (SPIE)Tufte, E.R., (2001) The Visual Display of Quantitative Information, , Graphic PressKeim, D., Visual data mining. Tutorial (1997) Proc. 23rd International Conference on Very Large Data BasesCleveland, W.S., (1993) Visualizing Data, , Hobart PressGrégio, A.R.A., Aplicação de técnicas de data mining para a análise de logs de tráfego tcp/ip (2007) Applied Computing at INPE - Brazilian Institute for Space Research, , Masters dissertationInselberg, A., The plane with parallel coordinates (1985) The Visual Computer, 1 (2), pp. 69-91Inselberg, A., (2009) Parallel Coordinates - Visual Multidimensional Geometry and its Applications, , SpringerKohonen, T., (1997) Self-Organizing Maps, , SpringerBeddow, J., Shape coding of multidimensional data on a mircocomputer display (1990) Proc. of the First IEEE Conference on Visualization, pp. 238-246Keim, D.A., Kriegel, H.-P., Using visualization to support data mining of large existing databases (1993) Proc. IEEE Visualization '93 WorkshopShneiderman, B., Tree visualization with tree-maps: A 2-D space-filling approach (1991) ACM Transactions on Graphics, 11, pp. 92-99www.shadowserver.orgwww.cert.brwww.cert.br/docs/whitepapers/spambotsCalais, P.H., Pires, D.E.V., Guedes, D.O., Meira Jr., W., Hoepers, C., Steding-Jessen, K., A campaign-based characterization of spamming strategies (2008) Proc. of Fifth Conference on E-mail and Anti-Spa
Methods of visualization and analysis of cardiac depolarization in the three dimensional space
The master thesis presents methods for intellectual analysis and visualization 3D EKG in order to increase the efficiency of ECG analysis by extracting additional data. Visualization is presented as part of the signal analysis tasks considered imaging techniques and their mathematical description. Have been developed algorithms for calculating and visualizing the signal attributes are described using mathematical methods and tools for mining signal. The model of patterns searching for comparison purposes of accuracy of methods was constructed, problems of a clustering and classification of data are solved, the program of visualization of data is also developed. This approach gives the largest accuracy in a task of the intellectual analysis that is confirmed in this work. Considered visualization and analysis techniques are also applicable to the multi-dimensional signals of a different kind
- …