395 research outputs found
Performance Evaluation and Optimization of Math-Similarity Search
Similarity search in math is to find mathematical expressions that are
similar to a user's query. We conceptualized the similarity factors between
mathematical expressions, and proposed an approach to math similarity search
(MSS) by defining metrics based on those similarity factors [11]. Our
preliminary implementation indicated the advantage of MSS compared to
non-similarity based search. In order to more effectively and efficiently
search similar math expressions, MSS is further optimized. This paper focuses
on performance evaluation and optimization of MSS. Our results show that the
proposed optimization process significantly improved the performance of MSS
with respect to both relevance ranking and recall.Comment: 15 pages, 8 figure
A Markov chain model for changes in users’ assessment of search results
Previous research shows that users tend to change their assessment of search results over time. This is a first study that investigates the factors and reasons for these changes, and describes a stochastic model of user behaviour that may explain these changes. In particular, we hypothesise that most of the changes are local, i.e. between results with similar or close relevance to the query, and thus belong to the same ”coarse” relevance category. According to the theory of coarse beliefs and categorical thinking, humans tend to divide the range of values under consideration into coarse categories, and are thus able to distinguish only between cross-category values but not within them. To test this hypothesis we conducted five experiments with about 120 subjects divided into 3 groups. Each student in every group was asked to rank and assign relevance scores to the same set of search results over two or three rounds, with a period of three to nine weeks between each round. The subjects of the last three-round experiment were then exposed to the differences in their judgements and were asked to explain them. We make use of a Markov chain model to measure change in users’ judgments between the different rounds. The Markov chain demonstrates that the changes converge, and that a majority of the changes are local to a neighbouring relevance category. We found that most of the subjects were satisfied with their changes, and did not perceive them as mistakes but rather as a legitimate phenomenon, since they believe that time has influenced their relevance assessment. Both our quantitative analysis and user comments support the hypothesis of the existence of coarse relevance categories resulting from categorical thinking in the context of user evaluation of search results
Notions of Connectivity in Overlay Networks
International audience" How well connected is the network? " This is one of the most fundamental questions one would ask when facing the challenge of designing a communication network. Three major notions of connectivity have been considered in the literature, but in the context of traditional (single-layer) networks, they turn out to be equivalent. This paper introduces a model for studying the three notions of connectivity in multi-layer networks. Using this model, it is easy to demonstrate that in multi-layer networks the three notions may differ dramatically. Unfortunately, in contrast to the single-layer case, where the values of the three connectivity notions can be computed efficiently, it has been recently shown in the context of WDM networks (results that can be easily translated to our model) that the values of two of these notions of connectivity are hard to compute or even approximate in multi-layer networks. The current paper shed some positive light into the multi-layer connectivity topic: we show that the value of the third connectivity notion can be computed in polynomial time and develop an approximation for the construction of well connected overlay networks
Reconstruction of Network Evolutionary History from Extant Network Topology and Duplication History
Genome-wide protein-protein interaction (PPI) data are readily available
thanks to recent breakthroughs in biotechnology. However, PPI networks of
extant organisms are only snapshots of the network evolution. How to infer the
whole evolution history becomes a challenging problem in computational biology.
In this paper, we present a likelihood-based approach to inferring network
evolution history from the topology of PPI networks and the duplication
relationship among the paralogs. Simulations show that our approach outperforms
the existing ones in terms of the accuracy of reconstruction. Moreover, the
growth parameters of several real PPI networks estimated by our method are more
consistent with the ones predicted in literature.Comment: 15 pages, 5 figures, submitted to ISBRA 201
The hw-rank: an h-index variant for ranking web pages
We introduce a novel ranking of search results based on a variant of the h-index for directed information networks such as the Web. The h-index was originally
introduced to measure an individual researcher’s scientific output and influence, but here a variant of it is applied to assess the ‘‘importance’’ of web pages. Like PageRank, the‘‘importance’’ of a page is defined by the ‘‘importance’’ of the pages linking to it. However,
unlike the computation of PageRank which involves the whole web graph, computing the h-index for web pages (the hw-rank) is based on a local computation and only the
neighbors of the neighbors of the given node are considered. Preliminary results show a strong correlation between ranking with the hw-rank and PageRank, and moreover its computation is simpler and less complex than computation of the PageRank. Further, larger scale experiments are needed in order to assess the applicability of the method
Publication and patent analysis of European researchers in the field of production technology and manufacturing systems
This paper develops a structured comparison among a sample of European researchers in the field of Production Technology and Manufacturing Systems, on the basis of scientific publications and patents. Researchers are evaluated and compared by a variegated set of indicators concerning (1) the output of individual researchers and (2) that of groups of researchers from the same country. While not claiming to be exhaustive, the results of this preliminary study provide a rough indication of the publishing and patenting activity of researchers in the field of interest, identifying (dis)similarities between different countries. Of particular interest is a proposal for aggregating analysis results by means of maps based on publication and patent indicators. A large amount of empirical data are presented and discusse
Relationship among research collaboration, number of documents and number of citations. A case study in Spanish computer science production in 2000-2009.
This paper analyzes the relationship among research collaboration, number of documents and number of citations of computer science research activity. It analyzes the number of documents and citations and how they vary by number of authors. They are also analyzed (according to author set cardinality) under different circumstances, that is, when documents are written in different types of collaboration, when documents are published in different document types, when documents are published in different computer science subdisciplines, and, finally, when documents are published by journals with different impact factor quartiles. To investigate the above relationships, this paper analyzes the publications listed in the Web of Science and produced by active Spanish university professors between 2000 and 2009, working in the computer science field. Analyzing all documents, we show that the highest percentage of documents are published by three authors, whereas single-authored documents account for the lowest percentage. By number of citations, there is no positive association between the author cardinality and citation impact. Statistical tests show that documents written by two authors receive more citations per document and year than documents published by more authors. In contrast, results do not show statistically significant differences between documents published by two authors and one author. The research findings suggest that international collaboration results on average in publications with higher citation rates than national and institutional collaborations. We also find differences regarding citation rates between journals and conferences, across different computer science subdisciplines and journal quartiles as expected. Finally, our impression is that the collaborative level (number of authors per document) will increase in the coming years, and documents published by three or four authors will be the trend in computer science literature
Algebraic Comparison of Partial Lists in Bioinformatics
The outcome of a functional genomics pipeline is usually a partial list of
genomic features, ranked by their relevance in modelling biological phenotype
in terms of a classification or regression model. Due to resampling protocols
or just within a meta-analysis comparison, instead of one list it is often the
case that sets of alternative feature lists (possibly of different lengths) are
obtained. Here we introduce a method, based on the algebraic theory of
symmetric groups, for studying the variability between lists ("list stability")
in the case of lists of unequal length. We provide algorithms evaluating
stability for lists embedded in the full feature set or just limited to the
features occurring in the partial lists. The method is demonstrated first on
synthetic data in a gene filtering task and then for finding gene profiles on a
recent prostate cancer dataset
Network Archaeology: Uncovering Ancient Networks from Present-day Interactions
Often questions arise about old or extinct networks. What proteins interacted
in a long-extinct ancestor species of yeast? Who were the central players in
the Last.fm social network 3 years ago? Our ability to answer such questions
has been limited by the unavailability of past versions of networks. To
overcome these limitations, we propose several algorithms for reconstructing a
network's history of growth given only the network as it exists today and a
generative model by which the network is believed to have evolved. Our
likelihood-based method finds a probable previous state of the network by
reversing the forward growth model. This approach retains node identities so
that the history of individual nodes can be tracked. We apply these algorithms
to uncover older, non-extant biological and social networks believed to have
grown via several models, including duplication-mutation with complementarity,
forest fire, and preferential attachment. Through experiments on both synthetic
and real-world data, we find that our algorithms can estimate node arrival
times, identify anchor nodes from which new nodes copy links, and can reveal
significant features of networks that have long since disappeared.Comment: 16 pages, 10 figure
The role of secondary char oxidation in the transition from smoldering to flaming
The transition from forward smoldering to flaming in polyurethane foam is observed using in-depth thermocouples and ultrasound probing. The experiments are conducted with small parallelepiped samples vertically placed in an upward wind tunnel. Three of the vertical sample sides are maintained at elevated temperature and the fourth is exposed to an upward oxidizer flow and a radiant heat flux. An ultrasound probing technique is used to measure the line-of-sight average permeability of the sample at the same heights as the thermocouples. The smolder front propagation is tracked by both the thermocouples and ultrasound data, which show an increase in temperature and permeability upon passage of the smolder front. The permeability data also show that the transition to flaming is preceded by rapid fluctuations in permeability in the char region below the smolder front, indicating the formation of pores by secondary char oxidation. The pores provide locations for the onset of gas-phase ignition (i.e. transition to flaming). The results from all the tests indicate that the formation of pores is a necessary but not sufficient condition for the transition to flaming. Two novel measures of the intensity of the secondary char oxidation are introduced: the time derivative of permeability, and the secondary char oxidation velocity. The time derivative of permeability, which provides a measure of the pore formation rate, is found to increase as the oxygen concentration and/or radiant heat flux increase, and to indicate the likelihood of the transition to flaming. The permeability data offers a means to track the propagation of the secondary char oxidation, and to calculate the secondary char oxidation velocity, which is found to be strongly correlated to the transition to flaming. A simplified energy balance model is able to predict the dependence of the secondary char oxidation velocity on oxygen concentration and radiant heat flux
- …
