19,813 research outputs found
Visualizing dimensionality reduction of systems biology data
One of the challenges in analyzing high-dimensional expression data is the
detection of important biological signals. A common approach is to apply a
dimension reduction method, such as principal component analysis. Typically,
after application of such a method the data is projected and visualized in the
new coordinate system, using scatter plots or profile plots. These methods
provide good results if the data have certain properties which become visible
in the new coordinate system and which were hard to detect in the original
coordinate system. Often however, the application of only one method does not
suffice to capture all important signals. Therefore several methods addressing
different aspects of the data need to be applied. We have developed a framework
for linear and non-linear dimension reduction methods within our visual
analytics pipeline SpRay. This includes measures that assist the interpretation
of the factorization result. Different visualizations of these measures can be
combined with functional annotations that support the interpretation of the
results. We show an application to high-resolution time series microarray data
in the antibiotic-producing organism Streptomyces coelicolor as well as to
microarray data measuring expression of cells with normal karyotype and cells
with trisomies of human chromosomes 13 and 21
VizRank: Data Visualization Guided by Machine Learning
Data visualization plays a crucial role in identifying interesting patterns in exploratory data analysis. Its use is, however, made difficult by the large number of possible data projections showing different attribute subsets that must be evaluated by the data analyst. In this paper, we introduce a method called VizRank, which is applied on classified data to automatically select the most useful data projections. VizRank can be used with any visualization method that maps attribute values to points in a two-dimensional visualization space. It assesses possible data projections and ranks them by their ability to visually discriminate between classes. The quality of class separation is estimated by computing the predictive accuracy of k-nearest neighbor classifier on the data set consisting of x and y positions of the projected data points and their class information. The paper introduces the method and presents experimental results which show that VizRank's ranking of projections highly agrees with subjective rankings by data analysts. The practical use of VizRank is also demonstrated by an application in the field of functional genomics
Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context
Mathematical formulae represent complex semantic information in a concise
form. Especially in Science, Technology, Engineering, and Mathematics,
mathematical formulae are crucial to communicate information, e.g., in
scientific papers, and to perform computations using computer algebra systems.
Enabling computers to access the information encoded in mathematical formulae
requires machine-readable formats that can represent both the presentation and
content, i.e., the semantics, of formulae. Exchanging such information between
systems additionally requires conversion methods for mathematical
representation formats. We analyze how the semantic enrichment of formulae
improves the format conversion process and show that considering the textual
context of formulae reduces the error rate of such conversions. Our main
contributions are: (1) providing an openly available benchmark dataset for the
mathematical format conversion task consisting of a newly created test
collection, an extensive, manually curated gold standard and task-specific
evaluation metrics; (2) performing a quantitative evaluation of
state-of-the-art tools for mathematical format conversions; (3) presenting a
new approach that considers the textual context of formulae to reduce the error
rate for mathematical format conversions. Our benchmark dataset facilitates
future research on mathematical format conversions as well as research on many
problems in mathematical information retrieval. Because we annotated and linked
all components of formulae, e.g., identifiers, operators and other entities, to
Wikidata entries, the gold standard can, for instance, be used to train methods
for formula concept discovery and recognition. Such methods can then be applied
to improve mathematical information retrieval systems, e.g., for semantic
formula search, recommendation of mathematical content, or detection of
mathematical plagiarism.Comment: 10 pages, 4 figure
- …