Scaling Peer Name Resolution with a Multi-Level Cache
To keep the sizes of the PNRP caches small, peer nodes use a multi-level cache, in which each level contains a maximum number of entries. Each level in the cache represents a one tenth smaller portion of the PNRP ID number space (2256). The lowest level in the cache contains a locally registered PNRP ID and other PNRP IDs that are numerically close to it. As a level of the cache is filled with a maximum of 20 entries, a new lower level is created. The maximum number of levels in the cache is on the order of log10(Total number of PNRP IDs in the cloud). For example, for a global cloud with 100 million PNRP IDs, there are no more than 8 (=log10(100,000,000)) levels in the cache and a similar number of hops to resolve a PNRP ID during name resolution. This mechanism allows for a distributed hash table for which an arbitrary PNRP ID can be resolved by forwarding PNRP Request messages to the next-closest peer until the peer with the corresponding CPA is found.
Figure 6 shows an example of this multi-level caching scheme for a number space of 1000 entries (0-999) in which each level of the cache represents 10 percent of the number space and can only store 4 entries (each tick mark represents a cache entry).
The result of this multi-level caching scheme is that each peer does not have to store a large amount of cache entries. Even for a large number of PNRP IDs, the local storage and network traffic to resolve an arbitrary PNRP ID is not substantial.
Note: Figure 6 is a simplification of the PNRP number space, which is actually a circular name space (the first number after 999 in this example is 0.)
To ensure that resolution can complete, each time a node adds an entry to the lowest level of its cache, it floods a copy of the entry to all the nodes within the last level of the cache.
The cache entries are refreshed over time. Cache entries that are stale are removed from the cache. The result is that the distributed hash table of PNRP IDs are based on active endpoints, unlike DNS where address records and the DNS protocol provide no guarantee that the node associated with the address is actively on the network.
PNRP Cache Initialization
To initialize the PNRP cache when a peer node starts up, the node can use the following methods:
| • |
Persistent cache entries Previous cache entries that were present when the node was shut down are loaded from hard disk storage. |
| • |
PNRP seed nodes PNRP allows administrators to specify the addresses or DNS names of PNRP seed nodes that contain CPA for current participants in the cloud. |
| • |
Simple Service Discovery Protocol PNRP nodes are required to register themselves using the Universal Plug-and-Play (UPnP) Simple Service Discovery Protocol (SSDP). A node joining a cloud can use an SSDP Msearch message to locate nearby SSDP nodes. |
Graphing
A peer graph, or graph, is a set of nodes that are multiply connected to form a coupled network of nodes for the purposes of propagating data in the form of records or point-to-point data streams. Another way to think of a graph is as a collection of peer graph nodes connected such that any peer graph node may communicate with all other graph nodes via a series of logical neighbor connections. A peer graph node is a peer connected to a peer graph.
A peer graph is built and based on flooding. Flooding is the process of propagating a record to all users connected to a graph. A flooding protocol is used to do the following:
| • |
Propagate the addition of new records to all the nodes of the graph. |
| • |
Propagate the updates of changed records to all nodes of the graph. |
| • |
Propagate the deletion of deleted records to all the nodes of the graph. |
To perform these functions, each flooded record that is identified by a globally unique identifier (GUID), has an increasing version number or sequence number, and is further qualified by an age or a status.
In addition, a synchronization process ensures that peers have the same set of records, which can result in the flooding of more records.
A well-connected graph has the following properties:
| • |
It is connected. There is a path between any two nodes, |
| • |
It has a small diameter. There are a relatively small number of hops between the nodes on the farthest edges of the graph. The benefit to a small diameter is that updates are propagated rapidly to all graph nodes. |
| • |
It is robust. The graph remains connected even if some nodes or some connections disappear. |
A graph is built based the connections of neighbors. A neighbor in a graph is a peer graph node that is one graph hop away (is directly connected via a TCP connection). A graph hop is a logical connection that operates above the Internet layer, and can therefore be one or multiple router hops away.
A node ID is a random number a peer graph node chooses when they connect to a peer graph. The node ID should be unique across the graph. A graph is identified by a graph signature, which is the lowest node ID of all graph nodes connected to a peer graph. The graph signature is used to detect breaks in the graph known as partitions.
Graph Maintenance
The flooding protocol already defines how information is flooded throughout the graph. The graph maintenance protocol defines how the group evolves to maintain robust connectivity and to maintain a small diameter. This is done through the following:
| • |
A signature procedure computes the signature of a group. If the group is partitioned, each of the partitions will have a different signature. This can be used to detect that two or more partitions need to be repaired. Designated nodes in the graph known as contacts keep track of the signature records. Contacts are elected randomly. |
| • |
A reconnection procedure allows nodes to establish appropriate connections. |
| • |
A disconnection procedure allows nodes to leave a graph without creating a hole in the graph. |
| • |
When information is flooded across the graph, a graph node that has multiple connections will receive multiple copies of it. To decide which connections to keep and which to remove, a graph node evaluates flooded information and calculates a bidirectional utility index, a number used to indicate the usefulness of the information sent between given connected peers. The utility index has a low value when the information sent across the connection has been consistently previously received and is of no value. |
On an ongoing basis, based on the current utility index and the information that is received during the flooding, peer nodes make adjustments in their connections to neighboring nodes. Connections are created and removed so that the graph converges to a topology that is optimal for flooding for the current traffic pattern.
