2,251 research outputs found
Diversification Based Static Index Pruning - Application to Temporal Collections
Nowadays, web archives preserve the history of large portions of the web. As
medias are shifting from printed to digital editions, accessing these huge
information sources is drawing increasingly more attention from national and
international institutions, as well as from the research community. These
collections are intrinsically big, leading to index files that do not fit into
the memory and an increase query response time. Decreasing the index size is a
direct way to decrease this query response time.
Static index pruning methods reduce the size of indexes by removing a part of
the postings. In the context of web archives, it is necessary to remove
postings while preserving the temporal diversity of the archive. None of the
existing pruning approaches take (temporal) diversification into account.
In this paper, we propose a diversification-based static index pruning
method. It differs from the existing pruning approaches by integrating
diversification within the pruning context. We aim at pruning the index while
preserving retrieval effectiveness and diversity by pruning while maximizing a
given IR evaluation metric like DCG. We show how to apply this approach in the
context of web archives. Finally, we show on two collections that search
effectiveness in temporal collections after pruning can be improved using our
approach rather than diversity oblivious approaches
Mining Brain Networks using Multiple Side Views for Neurological Disorder Identification
Mining discriminative subgraph patterns from graph data has attracted great
interest in recent years. It has a wide variety of applications in disease
diagnosis, neuroimaging, etc. Most research on subgraph mining focuses on the
graph representation alone. However, in many real-world applications, the side
information is available along with the graph data. For example, for
neurological disorder identification, in addition to the brain networks derived
from neuroimaging data, hundreds of clinical, immunologic, serologic and
cognitive measures may also be documented for each subject. These measures
compose multiple side views encoding a tremendous amount of supplemental
information for diagnostic purposes, yet are often ignored. In this paper, we
study the problem of discriminative subgraph selection using multiple side
views and propose a novel solution to find an optimal set of subgraph features
for graph classification by exploring a plurality of side views. We derive a
feature evaluation criterion, named gSide, to estimate the usefulness of
subgraph patterns based upon side views. Then we develop a branch-and-bound
algorithm, called gMSV, to efficiently search for optimal subgraph features by
integrating the subgraph mining process and the procedure of discriminative
feature selection. Empirical studies on graph classification tasks for
neurological disorders using brain networks demonstrate that subgraph patterns
selected by the multi-side-view guided subgraph selection approach can
effectively boost graph classification performances and are relevant to disease
diagnosis.Comment: in Proceedings of IEEE International Conference on Data Mining (ICDM)
201
Diversified top-k clique search
© 2015, Springer-Verlag Berlin Heidelberg. Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight (Formula presented.) - (Formula presented.) , based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach
Diversified spatial keyword search on road networks
With the increasing pervasiveness of the geo-positioning technologies, there is an enormous amount of spatio-textual objects available in many applications such as location based services and social networks. Consequently, various types of spatial keyword searches which explore both locations and textual descriptions of the objects have been intensively studied by the research communities and commercial organizations. In many important applications (e.g., location based services), the closeness of two spatial objects is measured by the road network distance. Moreover, the result diversification is becoming a common practice to enhance the quality of the search results. Motived by the above facts, in this paper we study the problem of diversified spatial keyword search on road networks which considers both the relevance and the spatial diversity of the results. An efficient signature-based inverted indexing technique is proposed to facilitate the spatial keyword query processing on road networks. Then we develop an efficient diversified spatial keyword search algorithm by taking advantage of spatial keyword pruning and diversity pruning techniques. Comprehensive experiments on real and synthetic data clearly demonstrate the efficiency of our methods
Top-L Most Influential Community Detection Over Social Networks (Technical Report)
In many real-world applications such as social network analysis and online
marketing/advertising, the community detection is a fundamental task to
identify communities (subgraphs) in social networks with high structural
cohesiveness. While previous works focus on detecting communities alone, they
do not consider the collective influences of users in these communities on
other user nodes in social networks. Inspired by this, in this paper, we
investigate the influence propagation from some seed communities and their
influential effects that result in the influenced communities. We propose a
novel problem, named Top-L most Influential Community DEtection (TopL-ICDE)
over social networks, which aims to retrieve top-L seed communities with the
highest influences, having high structural cohesiveness, and containing
user-specified query keywords. In order to efficiently tackle the TopL-ICDE
problem, we design effective pruning strategies to filter out false alarms of
seed communities and propose an effective index mechanism to facilitate
efficient Top-L community retrieval. We develop an efficient TopL-ICDE
answering algorithm by traversing the index and applying our proposed pruning
strategies. We also formulate and tackle a variant of TopL-ICDE, named
diversified top-L most influential community detection (DTopL-ICDE), which
returns a set of L diversified communities with the highest diversity score
(i.e., collaborative influences by L communities). We prove that DTopL-ICDE is
NP-hard, and propose an efficient greedy algorithm with our designed diversity
score pruning. Through extensive experiments, we verify the efficiency and
effectiveness of our proposed TopL-ICDE and DTopL-ICDE approaches over
real/synthetic social networks under various parameter settings
Protecting your software updates
As described in many blog posts and the scientific literature, exploits for software vulnerabilities are often engineered on the basis of patches, which often involves the manual or automated identification of vulnerable code. The authors evaluate how this identification can be automated with the most frequently referenced diffing tools, demonstrating that for certain types of patches, these tools are indeed effective attacker tools. But they also demonstrate that by using binary code diversification, the effectiveness of the tools can be diminished severely, thus severely closing the attacker's window of opportunity
A Global Optimisation Toolbox for Massively Parallel Engineering Optimisation
A software platform for global optimisation, called PaGMO, has been developed
within the Advanced Concepts Team (ACT) at the European Space Agency, and was
recently released as an open-source project. PaGMO is built to tackle
high-dimensional global optimisation problems, and it has been successfully
used to find solutions to real-life engineering problems among which the
preliminary design of interplanetary spacecraft trajectories - both chemical
(including multiple flybys and deep-space maneuvers) and low-thrust (limited,
at the moment, to single phase trajectories), the inverse design of
nano-structured radiators and the design of non-reactive controllers for
planetary rovers. Featuring an arsenal of global and local optimisation
algorithms (including genetic algorithms, differential evolution, simulated
annealing, particle swarm optimisation, compass search, improved harmony
search, and various interfaces to libraries for local optimisation such as
SNOPT, IPOPT, GSL and NLopt), PaGMO is at its core a C++ library which employs
an object-oriented architecture providing a clean and easily-extensible
optimisation framework. Adoption of multi-threaded programming ensures the
efficient exploitation of modern multi-core architectures and allows for a
straightforward implementation of the island model paradigm, in which multiple
populations of candidate solutions asynchronously exchange information in order
to speed-up and improve the optimisation process. In addition to the C++
interface, PaGMO's capabilities are exposed to the high-level language Python,
so that it is possible to easily use PaGMO in an interactive session and take
advantage of the numerous scientific Python libraries available.Comment: To be presented at 'ICATT 2010: International Conference on
Astrodynamics Tools and Techniques
- …