109,703 research outputs found
Finding the most relevant fragments in networks
We study a point pattern detection problem on networks, motivated by applications in geographical analysis, such as crime hotspot detection. Given a network N (a connected graph with non-negative edge lengths) together with a set of sites, which lie on the edges or vertices of N, we look for a connected subnetwork F of N of small total length that contains many sites. The edges of F can form parts of the edges of N. We consider different variants of this problem where N is either a general graph or restricted to a tree, and the subnetwork F that we are looking for is either a simple path or a tree. We give polynomial-time algorithms, NP-hardness and NP-completeness proofs, approximation algorithms, and also fixed-parameter tractable algorithms
Entity Ranking on Graphs: Studies on Expert Finding
Todays web search engines try to offer services for finding various information in addition to simple web pages, like showing locations or answering simple fact queries. Understanding the association of named entities and documents is one of the key steps towards such semantic search tasks. This paper addresses the ranking of entities and models it in a graph-based relevance propagation framework. In particular we study the problem of expert finding as an example of an entity ranking task. Entity containment graphs are introduced that represent the relationship between text fragments on the one hand and their contained entities on the other hand. The paper shows how these graphs can be used to propagate relevance information from the pre-ranked text fragments to their entities. We use this propagation framework to model existing approaches to expert finding based on the entity's indegree and extend them by recursive relevance propagation based on a probabilistic random walk over the entity containment graphs. Experiments on the TREC expert search task compare the retrieval performance of the different graph and propagation models
Distributed Computing on Core-Periphery Networks: Axiom-based Design
Inspired by social networks and complex systems, we propose a core-periphery
network architecture that supports fast computation for many distributed
algorithms and is robust and efficient in number of links. Rather than
providing a concrete network model, we take an axiom-based design approach. We
provide three intuitive (and independent) algorithmic axioms and prove that any
network that satisfies all axioms enjoys an efficient algorithm for a range of
tasks (e.g., MST, sparse matrix multiplication, etc.). We also show the
minimality of our axiom set: for networks that satisfy any subset of the
axioms, the same efficiency cannot be guaranteed for any deterministic
algorithm
Optimizing MDS Codes for Caching at the Edge
In this paper we investigate the problem of optimal MDS-encoded cache
placement at the wireless edge to minimize the backhaul rate in heterogeneous
networks. We derive the backhaul rate performance of any caching scheme based
on file splitting and MDS encoding and we formulate the optimal caching scheme
as a convex optimization problem. We then thoroughly investigate the
performance of this optimal scheme for an important heterogeneous network
scenario. We compare it to several other caching strategies and we analyze the
influence of the system parameters, such as the popularity and size of the
library files and the capabilities of the small-cell base stations, on the
overall performance of our optimal caching strategy. Our results show that the
careful placement of MDS-encoded content in caches at the wireless edge leads
to a significant decrease of the load of the network backhaul and hence to a
considerable performance enhancement of the network.Comment: to appear in Globecom 201
Evaluation of forensic DNA traces when propositions of interest relate to activities: analysis and discussion of recurrent concerns
When forensic scientists evaluate and report on the probative strength of single DNA traces, they commonly rely on only one number, expressing the rarity of the DNA profile in the population of interest. This is so because the focus is on propositions regarding the source of the recovered trace material, such as “the person of interest is the source of the crime stain.” In particular, when the alternative proposition is “an unknown person is the source of the crime stain,” one is directed to think about the rarity of the profile. However, in the era of DNA profiling technology capable of producing results from small quantities of trace material (i.e., non-visible staining) that is subject to easy and ubiquitous modes of transfer, the issue of source is becoming less central, to the point that it is often not contested. There is now a shift from the question “whose DNA is this?” to the question “how did it get there?” As a consequence, recipients of expert information are now very much in need of assistance with the evaluation of the meaning and probative strength of DNA profiling results when the competing propositions of interest refer to different activities. This need is widely demonstrated in day-to-day forensic practice and is also voiced in specialized literature. Yet many forensic scientists remain reluctant to assess their results given propositions that relate to different activities. Some scientists consider evaluations beyond the issue of source as being overly speculative, because of the lack of relevant data and knowledge regarding phenomena and mechanisms of transfer, persistence and background of DNA. Similarly, encouragements to deal with these activity issues, expressed in a recently released European guideline on evaluative reporting (Willis et al., 2015), which highlights the need for rethinking current practice, are sometimes viewed skeptically or are not considered feasible. In this discussion paper, we select and discuss recurrent skeptical views brought to our attention, as well as some of the alternative solutions that have been suggested. We will argue that the way forward is to address now, rather than later, the challenges associated with the evaluation of DNA results (from small quantities of trace material) in light of different activities to prevent them being misrepresented in court
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Flow Logic
Flow networks have attracted a lot of research in computer science. Indeed,
many questions in numerous application areas can be reduced to questions about
flow networks. Many of these applications would benefit from a framework in
which one can formally reason about properties of flow networks that go beyond
their maximal flow. We introduce Flow Logics: modal logics that treat flow
functions as explicit first-order objects and enable the specification of rich
properties of flow networks. The syntax of our logic BFL* (Branching Flow
Logic) is similar to the syntax of the temporal logic CTL*, except that atomic
assertions may be flow propositions, like or , for
, which refer to the value of the flow in a vertex, and
that first-order quantification can be applied both to paths and to flow
functions. We present an exhaustive study of the theoretical and practical
aspects of BFL*, as well as extensions and fragments of it. Our extensions
include flow quantifications that range over non-integral flow functions or
over maximal flow functions, path quantification that ranges over paths along
which non-zero flow travels, past operators, and first-order quantification of
flow values. We focus on the model-checking problem and show that it is
PSPACE-complete, as it is for CTL*. Handling of flow quantifiers, however,
increases the complexity in terms of the network to , even
for the LFL and BFL fragments, which are the flow-counterparts of LTL and CTL.
We are still able to point to a useful fragment of BFL* for which the
model-checking problem can be solved in polynomial time. Finally, we introduce
and study the query-checking problem for BFL*, where under-specified BFL*
formulas are used for network exploration
Hypermedia-based discovery for source selection using low-cost linked data interfaces
Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness
The Impact of IPv6 on Penetration Testing
In this paper we discuss the impact the use of IPv6 has on remote penetration testing of servers and web applications. Several modifications to the penetration testing process are proposed to accommodate IPv6. Among these modifications are ways of performing fragmentation attacks, host discovery and brute-force protection. We also propose new checks for IPv6-specific vulnerabilities, such as bypassing firewalls using extension headers and reaching internal hosts through available transition mechanisms. The changes to the penetration testing process proposed in this paper can be used by security companies to make their penetration testing process applicable to IPv6 targets
Data fragmentation for parallel transitive closure strategies
Addresses the problem of fragmenting a relation to make the parallel computation of the transitive closure efficient, based on the disconnection set approach. To better understand this design problem, the authors focus on transportation networks. These are characterized by loosely interconnected clusters of nodes with a high internal connectivity rate. Three requirements that have to be fulfilled by a fragmentation are formulated, and three different fragmentation strategies are presented, each emphasizing one of these requirements. Some test results are presented to show the performance of the various fragmentation strategie
- …