Keep Alive & Client Take-Over
The problem of half-open TCP connections
MQTT is based on the Transmission Control Protocol (TCP). This protocol ensures that packets are transferred over the internet in a "reliable, ordered, and error-checked" way. Nevertheless, from time to time, the transfer between communicating parties can get out of sync. For example, if one of the parties crashes or has transmission errors. In TCP, this state of incomplete connection is called a half-open connection. The important point to remember is that one side of the communication continues to function and is not notified about the failure of the other side. The side that is still connected keeps trying to send messages and waits for acknowledgements. As Andy Stanford-Clark (the inventor of the MQTT protocol) points out, the problem with half-open connections increases in mobile networks:
"Although TCP/IP in theory notifies you when a socket breaks, in practice, particularly on things like mobile and satellite links, which often "fake" TCP over the air and put headers back on at each end, it's quite possible for a TCP session to "black hole", i.e. it appears to be open still, but in fact is just dumping anything you write to it onto the floor."
MQTT Keep Alive
MQTT includes a keep alive function that provides a workaround for the issue of half-open connections (or at least makes it possible to assess if the connection is still open).
Keep alive ensures that the connection between the broker and client is still open and that the broker and the client are aware of being connected.When the client establishes a connection to the broker, the client communicates a time interval in seconds to the broker. This interval defines the maximum length of time that the broker and client may not communicate with each other.
The MQTT specification says the following:
"The Keep Alive ... is the maximum time interval that is permitted to elapse between the point at which the Client finishes transmitting one Control Packet and the point it starts sending the next. It is the responsibility of the Client to ensure that the interval between Control Packets being sent does not exceed the Keep Alive value. In the absence of sending any other Control Packets, the Client MUST send a PINGREQ Packet."
As long as messages are exchanged frequently and the keep-alive interval is not exceeded, there is no need to send an extra message to establish whether the connection is still open.
If the client does not send a messages during the keep-alive period, it must send a PINGREQ packet to the broker to confirm that it is available and to make sure that the broker is also still available.
The broker must disconnect a client that does not send a message or a PINGREQ packet in one and a half times the keep alive interval.Likewise, the client is expected to close the connection if it does not receive a response from the broker in a reasonable amount of time.
This allows the broker to detect a half-open connection and close the connection to the (already disconnected) client if the keepAlive value is exceeded by more than 150% of the value.