Structured overlays: The P-Grid Network
P-Grid is a relatively mature technology, with a functional Java based implementation in place, available at the project website. The current implementation incorporates many of the ideas that have been theoretically predicted and validated with rigorous simulations. The P-Grid implementation is a live project, with continuous addition of new features and support for higher level applications, as well as fine-tuning the implementation for better performance being underway. Check here for a Java applet demonstrating the P-Grid data-structure and some of the basic algorithms.
P-Grid supports:
- Good storage load-balancing despite arbitrary load-distribution over the key-space.
- Range queries can be naturally supported and efficiently processed on P-Grid because of P-Grid abstracts a trie-structure, and supports (rather) arbitrary distribution of keys as observed in real life scenarios.
- A Self-referential directory is realized using to provide peer identity persistence over multiple sessions.
- A gossip primitive based update mechanism for keeping replicated content up-to-date.
- Easy merger of multiple P-Grids, and hence decentralized bootstrapping of the P-Grid network.
- Query-adaptive caching is easy to realize on P-Grid to provide query load-balancing where peers have restricted capacity.
Beyond DHTs: The P-Grid routing network
P-Grid abstracts a trie and resolves queries based on prefix matching. The actual topology has no hierarchy. Queries are resolved by matching prefixes. This also determines the choice of routing table entries. Each peer, for each level of the trie, maintains autonomously routing entries chosen randomly from the complementary sub-trees. In fact, multiple entries are maintained for each level at each peer to provide fault-tolerance (as well as potentially for query-load management). While not shown in the figure below, in practice, for diverse reasons including fault-tolerance and load-balancing, multiple peers are responsible for each leaf node in the P-Grid tree. These are called replicas. The replica peers maintain an independent replica sub-network and uses gossip based communication to keep the replica group up-to-date. We call the redundancy in both the replication of key-space partitions as well as the routing network together as structural replication. The figure below shows how a query is resolved by forwarding it based on prefix matching.
An overview of the P-Grid system can be found in the following:
LNCS book
chapter, Sigmod record P-grid
overview
Range queries in P-Grid
P-Grid partitions the key-space in a granularity adaptive to the load at that part of the key-space. Consequently, its possible to realize a P-Grid overlay network where each peer has similar storage load even for non-uniform load distributions. This network provably provides as efficient search of keys as traditional Distributed Hash Tables (DHTs) do. Note that in contrast to P-Grid, DHTs work efficiently only for uniform load-distributions.Hence we can use a lexicographic order preserving function to generate the keys, and still realize a load-balanced P-Grid network which supports efficient search of exact keys. Moreover, because of the preservation of lexicographic ordering, range queries can be done efficiently and precisely on P-Grid. The trie-structure of P-Grid allows different range query strategies, processed serially or parallely, trading off message overheads and query resolution latency. The figures below show the two variants of range queries on a P-Grid network.

More details on range queries in P-Grid can be found in our P2P 2005 paper.
Other features and properties of P-Grid
- Parallelized construction of load-balanced P-Grid networks.
- Self-referential directory to maintain meta-information about participating peers themselves.
- Gossip based communication primitive for replica sub-networks.
- Merger of multiple P-Grid networks.
Beyond P-Grid
Despite the very many nice properties of P-Grid, it has few shortcomings.
- Potentially high imbalance of degree distribution (connections to be maintained) at peers.
- No natural support for peer heterogeneity. The P-Grid network was designed under the paradigm of balancing various kinds of loads at different peers. It was assumed that virtual peers may be instantiated to account for heterogeneity.
We are now working on a new overlay network called
Oscar, which explicitly addresses such
heterogeneity issues while supporting the features necessary for
data-oriented applications, particularly range queries.