Skip to main content

DHT - Distributed Hash Tables

A distributed hash table(DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table: (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key.Keysare unique identifiers which map to particularvalues, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

DHTs form an infrastructure that can be used to build more complex services, such as anycast, cooperative Web caching, distributed file systems, domain name services, instant messaging, multicast, and also peer-to-peerfile sharing and content distribution systems. Notable distributed networks that use DHTs include BitTorrent's distributed tracker, the Coral Content Distribution Network, the Kad network, the Storm botnet, the Tox instant messenger, Freenet, the YaCy search engine, and the InterPlanetary File System.

image

Properties

DHTs characteristically emphasize the following properties:

  • Autonomy and decentralization: the nodes collectively form the system without any central coordination.
  • Fault tolerance: the system should be reliable (in some sense) even with nodes continuously joining, leaving, and failing.
  • Scalability: the system should function efficiently even with thousands or millions of nodes. A key technique used to achieve these goals is that any one node needs to coordinate with only a few other nodes in the system -- most commonly, O(logn) of thenparticipants -- so that only a limited amount of work needs to be done for each change in membership.

Structure

The structure of a DHT can be decomposed into several main components.The foundation is an abstractkeyspace, such as the set of 160-bit strings. Akeyspace partitioningscheme splits ownership of this keyspace among the participating nodes. Anoverlay networkthen connects the nodes, allowing them to find the owner of any given key in the keyspace.

Keyspace partitioning

Most DHTs use some variant of consistent hashing or rendezvous hashing to map keys to nodes. The two algorithms appear to have been devised independently and simultaneously to solve the distributed hash table problem.

Both consistent hashing and rendezvous hashing have the essential property that removal or addition of one node changes only the set of keys owned by the nodes with adjacent IDs, and leaves all other nodes unaffected. Contrast this with a traditional hash table in which addition or removal of one bucket causes nearly the entire keyspace to be remapped. Since any change in ownership typically corresponds to bandwidth-intensive movement of objects stored in the DHT from one node to another, minimizing such reorganization is required to efficiently support high rates of churn (node arrival and failure).

  • Consistent Hashing
  • Rendezvous Hashing
  • Locality-preserving Hashing

Overlay network

Each node maintains a set of links to other nodes (itsneighborsor routing table). Together, these links form the overlay network. A node picks its neighbors according to a certain structure, called the network's topology.

All DHT topologies share some variant of the most essential property: for any keyk, each node either has a node ID that ownskor has a link to a node whose node ID isclosertok, in terms of the keyspace distance defined above. It is then easy to route a message to the owner of any keykusing the following greedy algorithm(that is not necessarily globally optimal): at each step, forward the message to the neighbor whose ID is closest tok. When there is no such neighbor, then we must have arrived at the closest node, which is the owner ofkas defined above. This style of routing is sometimes called key-based routing.

Beyond basic routing correctness, two important constraints on the topology are to guarantee that the maximum number of hops in any route (route length) is low, so that requests complete quickly; and that the maximum number of neighbors of any node (maximum node degree) is low, so that maintenance overhead is not excessive. Of course, having shorter routes requires higher maximum degree. Some common choices for maximum degree and route length are as follows, wherenis the number of nodes in the DHT, using Big O notation:

Max. degreeMax route lengthUsed inNote
O(1)O(n)Worst lookup lengths, with likely much slower lookups times
O(log n)O(log n)Chord, Kademlia, Pastry, TapestryMost common, but not optimal (degree/route length). Chord is the most basic version, with Kademlia seeming the most popular optimized variant (should have improved average lookup)
O(log n)O(log n / log(log(n))KoordeLikely would be more complex to implement, but lookups might be faster (have a lower worst case bound)
O(sqrt(x))O(1)Worst local storage needs, with lots of communication after any node connects or disconnects

References

https://en.wikipedia.org/wiki/Distributed_hash_table

Kademlia