rfc6189.txt   rfc6189bis.txt 
Internet Engineering Task Force (IETF) P. Zimmermann Internet Engineering Task Force (IETF) P. Zimmermann
Request for Comments: 6189 Zfone Project Request for Comments: 6189bis Zfone Project
Category: Informational A. Johnston, Ed. Category: Informational A. Johnston, Ed.
ISSN: 2070-1721 Avaya ISSN: 2070-1721 Avaya
J. Callas J. Callas
Apple, Inc. Apple, Inc.
April 2011 April 2012
ZRTP: Media Path Key Agreement for Unicast Secure RTP ZRTP: Media Path Key Agreement for Unicast Secure RTP
Abstract Abstract
This document defines ZRTP, a protocol for media path Diffie-Hellman This document defines ZRTP, a protocol for media path Diffie-Hellman
exchange to agree on a session key and parameters for establishing exchange to agree on a session key and parameters for establishing
unicast Secure Real-time Transport Protocol (SRTP) sessions for Voice unicast Secure Real-time Transport Protocol (SRTP) sessions for Voice
over IP (VoIP) applications. The ZRTP protocol is media path keying over IP (VoIP) applications. The ZRTP protocol is media path keying
because it is multiplexed on the same port as RTP and does not because it is multiplexed on the same port as RTP and does not
skipping to change at page 1, line 47 skipping to change at page 1, line 47
This document is a product of the Internet Engineering Task Force This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are a candidate for any level of Internet approved by the IESG are a candidate for any level of Internet
Standard; see Section 2 of RFC 5741. Standard; see Section 2 of RFC 5741.
Information about the current status of this document, any errata, Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc6189. http://www.rfc-editor.org/info/rfc6189bis.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 3, line 8 skipping to change at page 3, line 8
4.4.3. Multistream Mode . . . . . . . . . . . . . . . . . . 28 4.4.3. Multistream Mode . . . . . . . . . . . . . . . . . . 28
4.4.3.1. Commitment in Multistream Mode . . . . . . . . . 29 4.4.3.1. Commitment in Multistream Mode . . . . . . . . . 29
4.4.3.2. Shared Secret Calculation for Multistream Mode . 29 4.4.3.2. Shared Secret Calculation for Multistream Mode . 29
4.5. Key Derivations . . . . . . . . . . . . . . . . . . . . . 30 4.5. Key Derivations . . . . . . . . . . . . . . . . . . . . . 30
4.5.1. The ZRTP Key Derivation Function . . . . . . . . . . 31 4.5.1. The ZRTP Key Derivation Function . . . . . . . . . . 31
4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared
Modes . . . . . . . . . . . . . . . . . . . . . . . . 32 Modes . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5.3. Deriving the Rest of the Keys from s0 . . . . . . . . 33 4.5.3. Deriving the Rest of the Keys from s0 . . . . . . . . 33
4.6. Confirmation . . . . . . . . . . . . . . . . . . . . . . 35 4.6. Confirmation . . . . . . . . . . . . . . . . . . . . . . 35
4.6.1. Updating the Cache of Shared Secrets . . . . . . . . 35 4.6.1. Updating the Cache of Shared Secrets . . . . . . . . 35
4.6.1.1. Cache Update Following a Cache Mismatch . . . . . 36 4.6.1.1. Cache Update Following a Cache Mismatch . . . . . 37
4.7. Termination . . . . . . . . . . . . . . . . . . . . . . . 37 4.6.1.2. Cache Update for a PBX Following a Cache
4.7.1. Termination via Error Message . . . . . . . . . . . . 37 Mismatch . . . . . . . . . . . . . . . . . . . . 38
4.7.2. Termination via GoClear Message . . . . . . . . . . . 37 4.7. Termination . . . . . . . . . . . . . . . . . . . . . . . 38
4.7.2.1. Key Destruction for GoClear Message . . . . . . . 39 4.7.1. Termination via Error Message . . . . . . . . . . . . 39
4.7.3. Key Destruction at Termination . . . . . . . . . . . 40 4.7.2. Termination via GoClear Message . . . . . . . . . . . 39
4.8. Random Number Generation . . . . . . . . . . . . . . . . 40 4.7.2.1. Key Destruction for GoClear Message . . . . . . . 40
4.9. ZID and Cache Operation . . . . . . . . . . . . . . . . . 40 4.7.3. Key Destruction at Termination . . . . . . . . . . . 41
4.9.1. Cacheless Implementations . . . . . . . . . . . . . . 42 4.8. Random Number Generation . . . . . . . . . . . . . . . . 41
5. ZRTP Messages . . . . . . . . . . . . . . . . . . . . . . . . 42 4.9. ZID and Cache Operation . . . . . . . . . . . . . . . . . 42
5.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . 44 4.9.1. Cacheless Implementations . . . . . . . . . . . . . . 43
5.1.1. Message Type Block . . . . . . . . . . . . . . . . . 44 5. ZRTP Messages . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1.2. Hash Type Block . . . . . . . . . . . . . . . . . . . 45 5.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . 45
5.1.2.1. Negotiated Hash and MAC Algorithm . . . . . . . . 46 5.1.1. Message Type Block . . . . . . . . . . . . . . . . . 46
5.1.2.2. Implicit Hash and MAC Algorithm . . . . . . . . . 47 5.1.2. Hash Type Block . . . . . . . . . . . . . . . . . . . 47
5.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 47 5.1.2.1. Negotiated Hash and MAC Algorithm . . . . . . . . 48
5.1.4. Auth Tag Type Block . . . . . . . . . . . . . . . . . 48 5.1.2.2. Implicit Hash and MAC Algorithm . . . . . . . . . 49
5.1.5. Key Agreement Type Block . . . . . . . . . . . . . . 49 5.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 49
5.1.6. SAS Type Block . . . . . . . . . . . . . . . . . . . 51 5.1.4. Auth Tag Type Block . . . . . . . . . . . . . . . . . 50
5.1.7. Signature Type Block . . . . . . . . . . . . . . . . 52 5.1.5. Key Agreement Type Block . . . . . . . . . . . . . . 51
5.2. Hello Message . . . . . . . . . . . . . . . . . . . . . . 53 5.1.6. SAS Type Block . . . . . . . . . . . . . . . . . . . 53
5.3. HelloACK Message . . . . . . . . . . . . . . . . . . . . 55 5.1.7. Signature Type Block . . . . . . . . . . . . . . . . 54
5.4. Commit Message . . . . . . . . . . . . . . . . . . . . . 56 5.2. Hello Message . . . . . . . . . . . . . . . . . . . . . . 55
5.5. DHPart1 Message . . . . . . . . . . . . . . . . . . . . . 59 5.3. HelloACK Message . . . . . . . . . . . . . . . . . . . . 57
5.6. DHPart2 Message . . . . . . . . . . . . . . . . . . . . . 61 5.4. Commit Message . . . . . . . . . . . . . . . . . . . . . 58
5.7. Confirm1 and Confirm2 Messages . . . . . . . . . . . . . 63 5.5. DHPart1 Message . . . . . . . . . . . . . . . . . . . . . 61
5.8. Conf2ACK Message . . . . . . . . . . . . . . . . . . . . 65 5.6. DHPart2 Message . . . . . . . . . . . . . . . . . . . . . 63
5.9. Error Message . . . . . . . . . . . . . . . . . . . . . . 66 5.7. Confirm1 and Confirm2 Messages . . . . . . . . . . . . . 65
5.10. ErrorACK Message . . . . . . . . . . . . . . . . . . . . 68 5.8. Conf2ACK Message . . . . . . . . . . . . . . . . . . . . 67
5.11. GoClear Message . . . . . . . . . . . . . . . . . . . . . 68 5.9. Error Message . . . . . . . . . . . . . . . . . . . . . . 68
5.12. ClearACK Message . . . . . . . . . . . . . . . . . . . . 68 5.10. ErrorACK Message . . . . . . . . . . . . . . . . . . . . 70
5.13. SASrelay Message . . . . . . . . . . . . . . . . . . . . 69 5.11. GoClear Message . . . . . . . . . . . . . . . . . . . . . 70
5.14. RelayACK Message . . . . . . . . . . . . . . . . . . . . 71 5.12. ClearACK Message . . . . . . . . . . . . . . . . . . . . 70
5.15. Ping Message . . . . . . . . . . . . . . . . . . . . . . 72 5.13. SASrelay Message . . . . . . . . . . . . . . . . . . . . 71
5.16. PingACK Message . . . . . . . . . . . . . . . . . . . . . 73 5.14. RelayACK Message . . . . . . . . . . . . . . . . . . . . 73
6. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 74 5.15. Ping Message . . . . . . . . . . . . . . . . . . . . . . 74
7. Short Authentication String . . . . . . . . . . . . . . . . . 77 5.15.1. Rationale for Ping messages . . . . . . . . . . . . . 75
7.1. SAS Verified Flag . . . . . . . . . . . . . . . . . . . . 78 5.16. PingACK Message . . . . . . . . . . . . . . . . . . . . . 75
7.2. Signing the SAS . . . . . . . . . . . . . . . . . . . . . 79 6. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 77
7.2.1. OpenPGP Signatures . . . . . . . . . . . . . . . . . 81 7. Short Authentication String . . . . . . . . . . . . . . . . . 80
7.2.2. ECDSA Signatures with X.509v3 Certs . . . . . . . . . 82 7.1. SAS Verified Flag . . . . . . . . . . . . . . . . . . . . 80
7.2.3. Signing the SAS without a PKI . . . . . . . . . . . . 83 7.2. Signing the SAS . . . . . . . . . . . . . . . . . . . . . 82
7.3. Relaying the SAS through a PBX . . . . . . . . . . . . . 84 7.2.1. OpenPGP Signatures . . . . . . . . . . . . . . . . . 84
7.3.1. PBX Enrollment and the PBX Enrollment Flag . . . . . 87 7.2.2. ECDSA Signatures with X.509v3 Certs . . . . . . . . . 85
7.2.3. Signing the SAS without a PKI . . . . . . . . . . . . 86
8. Signaling Interactions . . . . . . . . . . . . . . . . . . . 88 7.3. Relaying the SAS through a PBX . . . . . . . . . . . . . 87
7.3.1. PBX Enrollment and the PBX Enrollment Flag . . . . . 90
7.4. Automated Methods of Authenticating the DH Exchange . . . 92
8. Signaling Interactions . . . . . . . . . . . . . . . . . . . 93
8.1. Binding the Media Stream to the Signaling Layer via 8.1. Binding the Media Stream to the Signaling Layer via
the Hello Hash . . . . . . . . . . . . . . . . . . . . . 90 the Hello Hash . . . . . . . . . . . . . . . . . . . . . 95
8.1.1. Integrity-Protected Signaling Enables 8.1.1. Integrity-Protected Signaling Enables
Integrity-Protected DH Exchange . . . . . . . . . . . 91 Integrity-Protected DH Exchange . . . . . . . . . . . 96
8.2. Deriving the SRTP Secret (srtps) from the Signaling 8.2. Deriving the SRTP Secret (srtps) from the Signaling
Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.3. Codec Selection for Secure Media . . . . . . . . . . . . 94 8.3. Codec Selection for Secure Media . . . . . . . . . . . . 99
9. False ZRTP Packet Rejection . . . . . . . . . . . . . . . . . 94 9. False ZRTP Packet Rejection . . . . . . . . . . . . . . . . . 99
10. Intermediary ZRTP Devices . . . . . . . . . . . . . . . . . . 96 10. Intermediary ZRTP Devices . . . . . . . . . . . . . . . . . . 101
11. The ZRTP Disclosure Flag . . . . . . . . . . . . . . . . . . 98 10.1. On Reducing PBX MiTM Behavior . . . . . . . . . . . . . . 102
11. The ZRTP Disclosure Flag . . . . . . . . . . . . . . . . . . 104
11.1. Guidelines on Proper Implementation of the Disclosure 11.1. Guidelines on Proper Implementation of the Disclosure
Flag . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Flag . . . . . . . . . . . . . . . . . . . . . . . . . . 106
12. Mapping between ZID and AOR (SIP URI) . . . . . . . . . . . . 100 12. Mapping between ZID and AOR (SIP URI) . . . . . . . . . . . . 107
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 101 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 108
14. Media Security Requirements . . . . . . . . . . . . . . . . . 102 14. Media Security Requirements . . . . . . . . . . . . . . . . . 109
15. Security Considerations . . . . . . . . . . . . . . . . . . . 103 15. Security Considerations . . . . . . . . . . . . . . . . . . . 111
15.1. Self-Healing Key Continuity Feature . . . . . . . . . . . 106 15.1. Self-Healing Key Continuity Feature . . . . . . . . . . . 114
16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 108 16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 115
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 108 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 115
17.1. Normative References . . . . . . . . . . . . . . . . . . 108 17.1. Normative References . . . . . . . . . . . . . . . . . . 115
17.2. Informative References . . . . . . . . . . . . . . . . . 111 17.2. Informative References . . . . . . . . . . . . . . . . . 118
1. Introduction 1. Introduction
ZRTP is a key agreement protocol that performs a Diffie-Hellman key ZRTP is a key agreement protocol that performs a Diffie-Hellman key
exchange during call setup in the media path and is transported over exchange during call setup in the media path and is transported over
the same port as the Real-time Transport Protocol (RTP) [RFC3550] the same port as the Real-time Transport Protocol (RTP) [RFC3550]
media stream which has been established using a signaling protocol media stream which has been established using a signaling protocol
such as Session Initiation Protocol (SIP) [RFC3261]. This generates such as Session Initiation Protocol (SIP) [RFC3261]. This generates
a shared secret, which is then used to generate keys and salt for a a shared secret, which is then used to generate keys and salt for a
Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from
skipping to change at page 10, line 15 skipping to change at page 10, line 15
When Multistream mode is indicated in the Commit message, a call flow When Multistream mode is indicated in the Commit message, a call flow
similar to Figure 1 is used, but no DH calculation is performed by similar to Figure 1 is used, but no DH calculation is performed by
either endpoint and the DHPart1 and DHPart2 messages are omitted. either endpoint and the DHPart1 and DHPart2 messages are omitted.
The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since
the cache is not affected during this mode, multiple Multistream ZRTP the cache is not affected during this mode, multiple Multistream ZRTP
exchanges can be performed in parallel between two endpoints. exchanges can be performed in parallel between two endpoints.
When adding additional media streams to an existing call, only When adding additional media streams to an existing call, only
Multistream mode is used. Only one DH operation is performed, just Multistream mode is used. Only one DH operation is performed, just
for the first media stream. for the first media stream. Consequently, all the media streams in
the session share the same SAS (Section 7).
4. Protocol Description 4. Protocol Description
This section begins the normative description of the protocol. This section begins the normative description of the protocol.
ZRTP MUST be multiplexed on the same ports as the RTP media packets. ZRTP MUST be multiplexed on the same ports as the RTP media packets.
To support best effort encryption from the Media Security To support best effort encryption from the Media Security
Requirements [RFC5479], ZRTP uses normal RTP/AVP profile (AVP) media Requirements [RFC5479], ZRTP uses normal RTP/AVP profile (AVP) media
lines in the initial offer/answer exchange. The ZRTP SDP attribute lines in the initial offer/answer exchange. The ZRTP SDP attribute
skipping to change at page 36, line 18 skipping to change at page 36, line 18
Section 4.9. Section 4.9.
(3) The responder MUST receive the initiator's Confirm2 message (3) The responder MUST receive the initiator's Confirm2 message
before updating the responder's cache. before updating the responder's cache.
(4) The initiator MUST receive either the responder's Conf2ACK (4) The initiator MUST receive either the responder's Conf2ACK
message or the responder's SRTP media (with a valid SRTP auth message or the responder's SRTP media (with a valid SRTP auth
tag) before updating the initiator's cache. tag) before updating the initiator's cache.
The cache update may also be affected by a cache mismatch, according The cache update may also be affected by a cache mismatch, according
to Section 4.6.1.1. to Section 4.6.1.1 or Section 4.6.1.2.
For DH mode only, before updating the retained shared secret rs1 in For DH mode only, before updating the retained shared secret rs1 in
the cache, each party first discards their old rs2 and copies their the cache, each party first discards their old rs2 and copies their
old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of
session interruption after one party has updated his own rs1 but session interruption after one party has updated his own rs1 but
before the other party has enough information to update her own rs1. before the other party has enough information to update her own rs1.
If that happens, they may regain cache sync in the next session by If that happens, they may regain cache sync in the next session by
using rs2 (per Section 4.3). This mitigates the well-known Two using rs2 (per Section 4.3). This mitigates the well-known Two
Generals' Problem [Byzantine]. The old rs1 value is not saved in Generals' Problem [Byzantine]. The old rs1 value is not saved in
Preshared mode. Preshared mode.
skipping to change at page 36, line 41 skipping to change at page 36, line 41
from s0 via the ZRTP key derivation function (Section 4.5.1): from s0 via the ZRTP key derivation function (Section 4.5.1):
rs1 = KDF(s0, "retained secret", KDF_Context, 256) rs1 = KDF(s0, "retained secret", KDF_Context, 256)
Note that KDF_Context is unique for each media stream, but only the Note that KDF_Context is unique for each media stream, but only the
first media stream is permitted to update rs1. first media stream is permitted to update rs1.
Each media stream has its own s0. At this point in the protocol for Each media stream has its own s0. At this point in the protocol for
each media stream, the corresponding s0 MUST be erased. each media stream, the corresponding s0 MUST be erased.
If a cache update is appropriate, subject to the above conditions and
not delayed by a cache mismatch, it should be done as follows. Both
ZRTP endpoints SHOULD commit the new rs1 to nonvolatile storage
immediately upon receiving the remote party's Confirm message. The
initiator should write the new rs1 before sending the Confirm2
message, and the responder should write the new rs1 before sending
any SRTP media. This means no SRTP media will be sent by either
party until the new rs1 is saved by both parties. After receiving
evidence that the remote party has committed the new rs1 to
nonvolatile storage, rs2 (the old value of rs1) SHOULD be discarded.
Receiving a few packets of properly formed SRTP media after the
Confirm message would be evidence that the remote party has remained
functioning long enough to commit the new rs1 to nonvolatile storage.
A brief interval (about one second of encrypted media) should be
sufficient for rs1 to be properly saved across a cluster of
distributed load-sharing PBXs that share a common cache. A good
strategy is to hold back from committing rs2 to nonvolatile storage
for this brief interval, and commit it to nonvolatile storage only if
the connection is lost during that interval, or if encrypted media
fails to appear within a reasonable time. Since this would be a rare
event, in most cases rs2 would not be saved. If rs2 is saved
unconditionally, it would have the undesirable effect of lengthening
the window of vulnerability for a MiTM attack if the cache is
captured by an attacker, as described in Section 15.1.
4.6.1.1. Cache Update Following a Cache Mismatch 4.6.1.1. Cache Update Following a Cache Mismatch
If a shared secret cache mismatch (as defined in Section 4.3.2) is If a shared secret cache mismatch (as defined in Section 4.3.2) is
detected in the current session, it indicates a possible MiTM attack. detected in the current session, it indicates a possible MiTM attack.
However, there may be evidence to the contrary, if either one of the However, there may be evidence to the contrary, if either one of the
following conditions are met: following conditions are met:
o Successful use of the mechanism described in Section 8.1.1, but o Successful use of the mechanism described in Section 8.1.1, but
only if fully supported by end-to-end integrity-protected delivery only if fully supported by end-to-end integrity-protected delivery
of the a=zrtp-hash in the signaling via SIP Identity [RFC4474] or of the a=zrtp-hash in the signaling via SIP Identity [RFC4474] or
skipping to change at page 37, line 16 skipping to change at page 37, line 41
o A good signature is received and verified using the digital o A good signature is received and verified using the digital
signature feature on the SAS hash, as described in Section 7.2, if signature feature on the SAS hash, as described in Section 7.2, if
this feature is supported. this feature is supported.
If there is a cache mismatch in the absence of the aforementioned If there is a cache mismatch in the absence of the aforementioned
mitigating evidence, the cache update MUST be delayed in the current mitigating evidence, the cache update MUST be delayed in the current
session until the user verbally compares the SAS with his partner session until the user verbally compares the SAS with his partner
during the call and confirms a successful SAS verify via his user during the call and confirms a successful SAS verify via his user
interface as described in Section 7.1. If the session ends before interface as described in Section 7.1. If the session ends before
that happens, the cache update is not performed, leaving the rs1/rs2 that happens, the cache update is not performed, leaving the rs1/rs2
values unmodified in the cache. Regardless of whether a cache values unmodified in the cache. The local SAS Verified (V) flag is
mismatch occurs, s0 must still be erased. also left unmodified in this case.
This means the caches will continue to be mismatched on subsequent
calls, and the user will thus be alerted of this security condition
on every call until the SAS is verified. Or, if the cache mismatches
are caused by an actual MiTM attack instead of a cache mishap, the
alerts will continue on every call until the caches match again
because the MiTM attacker ceased his attacks. In that case, the
cache entries and related (V) flags are unscathed by the MiTM
attacker when the attacks cease. The MiTM attacker is thus foiled
from even having a denial-of-service effect on the caches.
If the user verbally compares the SAS with his partner during the
call and confirms a successful SAS verify via his user interface, the
local cache is then updated. Note that in this case rs2 (the old
value of rs1) must also be saved, to mitigate the possibility of the
remote user failing to update.
Regardless of whether a cache mismatch occurs, s0 must still be
erased.
If no cache entry exists, as is the case in the initial call, the If no cache entry exists, as is the case in the initial call, the
cache update is handled in the normal fashion. cache update is handled in the normal fashion.
4.6.1.2. Cache Update for a PBX Following a Cache Mismatch
In the event of a cache mismatch, a PBX MUST NOT update the cache if
there is a pbxsecret defined on the PBX, but it does not match the
pbxsecret of the remote endpoint. Otherwise, the PBX MUST update the
cache, notwithstanding Section 4.6.1.1.
Rationale: If a ZRTP endpoint is enrolled with a PBX, it is desirable
that the PBX's cache is not easily disrupted by an attempted MiTM
attack. The enrolled phone should also not update the cache per
Section 4.6.1.1. A PBX has no human to verify the SAS, so the PBX
assumes the cache should be updated unless a pbxsecret mismatch
suggests otherwise. Note that unenrolled phones will lose cache sync
after an attempted MiTM attack, because the PBX will update the cache
during te attack.
However, this loss of cache sync for an unenrolled phone may be
easily remedied by calling an enrolled phone behind the PBX (with the
PBX acting as a MiTM) and re-verifying the SAS with a human. That
would update the cache on both the unenrolled phone and the PBX, re-
establishing cache sync.
The PBX's lack of human assisted SAS verification following a cache
mismatch is one more reason to reduce the PBX's MiTM role whenever
possible, as explained in Section 10.1.
4.7. Termination 4.7. Termination
A ZRTP session is normally terminated at the end of a call, but it A ZRTP session is normally terminated at the end of a call, but it
may be terminated early by either the Error message or the GoClear may be terminated early by either the Error message or the GoClear
message. message.
4.7.1. Termination via Error Message 4.7.1. Termination via Error Message
The Error message (Section 5.9) is used to terminate an in-progress The Error message (Section 5.9) is used to terminate an in-progress
ZRTP exchange due to an error. The Error message contains an integer ZRTP exchange due to an error. The Error message contains an integer
skipping to change at page 42, line 12 skipping to change at page 43, line 36
note it as an unexpected security event when the next key negotiation note it as an unexpected security event when the next key negotiation
occurs between the same two parties. This means there need not be occurs between the same two parties. This means there need not be
perfectly synchronized deletion of expired secrets from the two perfectly synchronized deletion of expired secrets from the two
caches, and makes it easy to avoid a race condition that might caches, and makes it easy to avoid a race condition that might
otherwise be caused by clock skew. otherwise be caused by clock skew.
If the expiration interval is not properly agreed to by both If the expiration interval is not properly agreed to by both
endpoints, it may later result in false alarms of MiTM attacks, due endpoints, it may later result in false alarms of MiTM attacks, due
to apparent cache mismatches (Section 4.3.2). to apparent cache mismatches (Section 4.3.2).
It is essential that each cache entry have some form of human-
readable name associated with it. If cache entries are stored
without human-readable names, a MiTM attack is possible for an
attacker who has previously established cache entries with both
parties, as explained in Section 12. Users would have to do a verbal
SAS compare for every call, greatly diminishing the value of caching.
The relationship between a ZID and a SIP AOR is explained in The relationship between a ZID and a SIP AOR is explained in
Section 12. Section 12.
4.9.1. Cacheless Implementations 4.9.1. Cacheless Implementations
It is possible to implement a simplified but nonetheless useful (and It is possible to implement a simplified but nonetheless useful (and
still compliant) profile of the ZRTP protocol that does not support still compliant) profile of the ZRTP protocol that does not support
any caching of shared secrets. In this case, the users would have to any caching of shared secrets. In this case, the users would have to
rely exclusively on the verbal SAS comparison for every call. That rely exclusively on the verbal SAS comparison for every call. That
is, unless MiTM protection is provided by the mechanisms in Section is, unless MiTM protection is provided by the mechanisms in Section
skipping to change at page 49, line 5 skipping to change at page 51, line 5
short SRTP payloads. short SRTP payloads.
The Skein MAC key is computed by the SRTP key derivation function, The Skein MAC key is computed by the SRTP key derivation function,
which is also referred to as the AES-CM PRF, or pseudorandom which is also referred to as the AES-CM PRF, or pseudorandom
function. This is defined either in [RFC3711] or in [RFC6188], function. This is defined either in [RFC3711] or in [RFC6188],
depending on the selected SRTP AES key length. To compute a Skein depending on the selected SRTP AES key length. To compute a Skein
MAC key, the SRTP PRF output for the authentication key is left MAC key, the SRTP PRF output for the authentication key is left
untruncated at 256 bits, instead of the usual truncated length of 160 untruncated at 256 bits, instead of the usual truncated length of 160
bits (the key length used by HMAC-SHA1). bits (the key length used by HMAC-SHA1).
In [RFC3711], Section 9.5 prohibits the use of 32-bit auth tags for
SRTCP, regardless of the SRTP auth tag length. Accordingly, if Skein
is used for SRTP auth tags, SRTCP MUST use Skein 64-bit auth tags,
regardless of the negotiated SRTP auth tag length.
Auth Tag Type Block | Meaning Auth Tag Type Block | Meaning
---------------------------------------------------------- ----------------------------------------------------------
"HS32" | 32-bit authentication tag based on "HS32" | 32-bit authentication tag based on
| HMAC-SHA1 as defined in RFC 3711. | HMAC-SHA1 as defined in RFC 3711.
---------------------------------------------------------- ----------------------------------------------------------
"HS80" | 80-bit authentication tag based on "HS80" | 80-bit authentication tag based on
| HMAC-SHA1 as defined in RFC 3711. | HMAC-SHA1 as defined in RFC 3711.
---------------------------------------------------------- ----------------------------------------------------------
"SK32" | 32-bit authentication tag based on "SK32" | 32-bit authentication tag based on
| Skein-512-MAC as defined in [Skein], | Skein-512-MAC as defined in [Skein],
skipping to change at page 70, line 16 skipping to change at page 72, line 16
The next 8 bits are used for flags. Undefined flags are set to zero The next 8 bits are used for flags. Undefined flags are set to zero
and ignored. Three flags are currently defined. The Disclosure Flag and ignored. Three flags are currently defined. The Disclosure Flag
(D) is a Boolean bit defined in Section 11. The Allow Clear flag (A) (D) is a Boolean bit defined in Section 11. The Allow Clear flag (A)
is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V) is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V)
is a Boolean bit defined in Section 7.1. These flags are updated is a Boolean bit defined in Section 7.1. These flags are updated
values to the same flags provided earlier in the Confirm message, but values to the same flags provided earlier in the Confirm message, but
they are updated to reflect the new flag information relayed by the they are updated to reflect the new flag information relayed by the
PBX from the other party. PBX from the other party.
The relayed V flag comes from the ZRTP endpoint on the other side of
the PBX. If this relayed V flag is zero, the local ZRTP user agent
should render a conspicuous display of the SAS to prompt the human to
verbally verify it. However, a relayed V flag should not affect the
local V flag, unlike the V flag received in the Confirm message.
The next 32-bit word contains the SAS rendering scheme for the The next 32-bit word contains the SAS rendering scheme for the
relayed sashash, which will be the same rendering scheme used by the relayed sashash, which will be the same rendering scheme used by the
other party on the other side of the trusted MiTM. Section 7.3 other party on the other side of the trusted MiTM. Section 7.3
describes how the PBX determines whether the ZRTP client regards the describes how the PBX determines whether the ZRTP client regards the
PBX as a trusted MiTM. If the PBX determines that the ZRTP client PBX as a trusted MiTM. If the PBX determines that the ZRTP client
trusts the PBX, the next 8 words contain the sashash relayed from the trusts the PBX, the next 8 words contain the sashash relayed from the
other party. The first 32-bit word of the sashash contains the other party. The first 32-bit word of the sashash contains the
sasvalue, which may be rendered to the user using the specified SAS sasvalue, which may be rendered to the user using the specified SAS
rendering scheme. If this SASrelay message is being sent to a ZRTP rendering scheme. If this SASrelay message is being sent to a ZRTP
client that does not trust this MiTM, the sashash will be ignored by client that does not trust this MiTM, the sashash will be ignored by
skipping to change at page 73, line 21 skipping to change at page 75, line 21
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version="1.10" (1 word) | | version="1.10" (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EndpointHash (2 words) | | EndpointHash (2 words) |
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 18: Ping Message Format Figure 18: Ping Message Format
5.15.1. Rationale for Ping messages
Ping messages are useful for implementing ZRTP proxies. A ZRTP proxy
(Section 10) is a "bump-in-the-wire" that sits between a (usually
non-ZRTP-enabled) VoIP client and the Internet. It attempts to
secure the VoIP call by examining the RTP media streams, detecting
the call, and intervening to encrypt the call "on the fly".
This is not always easy to do, as it may have to be done without help
from the signaling layer. The VoIP client may make internal
decisions on how to do NAT traversal, which are not readily apparent
to the proxy. The proxy has to reverse engineer this knowledge by
inspecting all the RTP streams. The RTP stream from Alice to Bob
might not follow the same path, through the same ports, as the RTP
stream from Bob to Alice. One stream may go directly peer to peer,
while the reverse stream may take a detour through a media relay.
The two parties may have both audio and video streams between them,
and may also be simultaneously talking to others in a conference
call, and some of those parties may be behind the same PBX. All of
these RTP streams have to be sorted out and associated with the
correct ZRTP endpoints. Related audio and video streams have to be
matched up between two parties, and not confused with other streams
to nearby parties behind the same PBX. Ping and PingACK messages
make this possible.
5.16. PingACK Message 5.16. PingACK Message
A PingACK message is sent only in response to a Ping. A ZRTP A PingACK message is sent only in response to a Ping. A ZRTP
endpoint MUST respond to a Ping with a PingACK message. The version endpoint MUST respond to a Ping with a PingACK message. The version
of PingACK requested is contained in the Ping message. If that of PingACK requested is contained in the Ping message. If that
version number is supported, a PingACK with a format that matches version number is supported, a PingACK with a format that matches
that version MUST be sent. Otherwise, if the version number of the that version MUST be sent. Otherwise, if the version number of the
Ping is not supported, a PingACK SHOULD be sent in the format of the Ping is not supported, a PingACK SHOULD be sent in the format of the
highest supported version known to the Ping responder. Only version highest supported version known to the Ping responder. Only version
"1.10" is supported in this specification. "1.10" is supported in this specification.
skipping to change at page 78, line 29 skipping to change at page 81, line 14
is available to the client software, it allows for the possibility is available to the client software, it allows for the possibility
that the client software could render to the user that the SAS verify that the client software could render to the user that the SAS verify
procedure was carried out in a previous session. procedure was carried out in a previous session.
Regardless of whether there is a user interface element to allow the Regardless of whether there is a user interface element to allow the
user to set the SAS Verified flag, it is worth caching a shared user to set the SAS Verified flag, it is worth caching a shared
secret, because doing so reduces opportunities for an attacker in the secret, because doing so reduces opportunities for an attacker in the
next call. next call.
If at any time the users carry out the SAS comparison procedure, and If at any time the users carry out the SAS comparison procedure, and
it actually fails to match, then this means there is a very it actually fails to match, then this indicates a very resourceful
resourceful MiTM. If this is the first call, the MiTM was there on MiTM. If the SAS comparison fails on the very first call, that would
the first call, which is impressive enough. If it happens in a later indicate an attacker who had some foresight, agility, and fortuitous
call, it also means the MiTM must also know the cached shared secret, positioning, but he is still caught by the SAS comparison. If the
because you could not have carried out any voice traffic at all MiTM misses the first call and attacks later, this will trigger a
unless the session key was correctly computed and is also known to cache mismatch alarm. If the SAS fails to match without a cache
the attacker. This implies the MiTM must have been present in all mismatch alarm, it means the MiTM knows the cached shared secret.
the previous sessions, since the initial establishment of the first This either implies the MiTM attacker has somehow stolen the cached
shared secret. This is indeed a resourceful attacker. It also means shared secret from one of the two parties, or it implies the MiTM
that if at any time he ceases his participation as a MiTM on one of must have been present in all the previous sessions, since the
your calls, the protocol will detect that the cached shared secret is initial establishment of the first shared secret. This is indeed a
no longer valid -- because it was really two different shared secrets resourceful attacker. It also means that if at any time he ceases
all along, one of them between Alice and the attacker, and the other his participation as a MiTM on one of the calls, the protocol will
between the attacker and Bob. The continuity of the cached shared detect that the cached shared secret is no longer valid -- because it
secrets makes it possible for us to detect the MiTM when he inserts was really two different shared secrets all along, one of them
himself into the ongoing relationship, as well as when he leaves. between Alice and the attacker, and the other between the attacker
Also, if the attacker tries to stay with a long lineage of calls, but and Bob. The continuity of the cached shared secrets makes it
fails to execute a DH MiTM attack for even one missed call, he is possible to detect the MiTM when he inserts himself into the ongoing
permanently excluded. He can no longer resynchronize with the chain relationship, as well as when he leaves. Also, if the attacker tries
of cached shared secrets. to stay with a long lineage of calls, but fails to execute a DH MiTM
attack for even one missed call, he is permanently excluded. He can
no longer resynchronize with the chain of cached shared secrets.
This is discussed further in Section 15.1.
A user interface element (i.e., a checkbox or button) is needed to A user interface element (i.e., a checkbox or button) is needed to
allow the user to tell the software the SAS verify was successful, allow the user to tell the software the SAS verify was successful,
causing the software to set the SAS Verified flag (V), which causing the software to set the SAS Verified flag (V), which
(together with our cached shared secret) obviates the need to perform (together with our cached shared secret) obviates the need to perform
the SAS procedure in the next call. An additional user interface the SAS procedure in the next call. An additional user interface
element can be provided to let the user tell the software he detected element can be provided to let the user tell the software he detected
an actual SAS mismatch, which indicates a MiTM attack. The software an actual SAS mismatch, which indicates a MiTM attack. The software
can then take appropriate action, clearing the SAS Verified flag, and can then take appropriate action, clearing the SAS Verified flag, and
erase the cached shared secret from this session. It is up to the erase the cached shared secret from this session. It is up to the
skipping to change at page 80, line 20 skipping to change at page 83, line 8
is independent of the hash used in the sashash. The sashash is is independent of the hash used in the sashash. The sashash is
determined by the negotiated Hash Type (Section 5.1.2), while the determined by the negotiated Hash Type (Section 5.1.2), while the
hash used by the digital signature is separately defined by the hash used by the digital signature is separately defined by the
digital signature algorithm. For example, the sashash may be based digital signature algorithm. For example, the sashash may be based
on SHA-256, while the digital signature might use SHA-384, if an on SHA-256, while the digital signature might use SHA-384, if an
ECDSA P-384 key is used. ECDSA P-384 key is used.
If the sashash (which is always truncated to 256 bits) is shorter If the sashash (which is always truncated to 256 bits) is shorter
than the signature hash, the security is not weakened because the than the signature hash, the security is not weakened because the
hash commitment precludes the attacker from searching for sashash hash commitment precludes the attacker from searching for sashash
collisions. collisions, as explained in Section 4.4.1.1.
ECDSA algorithms may be used with either OpenPGP-formatted keys, or ECDSA algorithms may be used with either OpenPGP-formatted keys, or
X.509v3 certificates. If the ZRTP key exchange is ECDH, and the SAS X.509v3 certificates. If the ZRTP key exchange is ECDH, and the SAS
is signed, then the signature SHOULD be ECDSA, and SHOULD use the is signed, then the signature SHOULD be ECDSA, and SHOULD use the
same size curve as the ECDH exchange if an ECDSA key of that size is same size curve as the ECDH exchange if an ECDSA key of that size is
available. available.
If a ZRTP endpoint supports incoming signatures (evidenced by setting If a ZRTP endpoint supports incoming signatures (evidenced by setting
the (S) flag in the Hello message), it SHOULD be able to parse the (S) flag in the Hello message), it SHOULD be able to parse
signatures from the other endpoint in OpenPGP format and MUST be able signatures from the other endpoint in OpenPGP format and MUST be able
skipping to change at page 86, line 25 skipping to change at page 89, line 16
a relayed SAS from an untrusted MiTM, because it may be relayed by a a relayed SAS from an untrusted MiTM, because it may be relayed by a
MiTM attacker. See the SASrelay message definition (Figure 16) for MiTM attacker. See the SASrelay message definition (Figure 16) for
further details. further details.
To ensure that both Alice and Bob will use the same SAS rendering To ensure that both Alice and Bob will use the same SAS rendering
scheme after the keys are negotiated, the PBX also sends the SASrelay scheme after the keys are negotiated, the PBX also sends the SASrelay
message to the unenrolled party (which does not regard this PBX as a message to the unenrolled party (which does not regard this PBX as a
trusted MiTM), conveying the SAS rendering scheme, but not the trusted MiTM), conveying the SAS rendering scheme, but not the
sashash, which it sets to zero. The unenrolled party will ignore the sashash, which it sets to zero. The unenrolled party will ignore the
relayed SAS field, but will use the specified SAS rendering scheme. relayed SAS field, but will use the specified SAS rendering scheme.
If both endpoints are enrolled, one of them will still receive an
"empty" SASrelay message. If and only if a PBX relays an SAS to one
endpoint, it MUST also send an "empty" SASrelay to the other
endpoint, containing a null sashash.
It is possible to route a call through two ZRTP-enabled PBXs using It is possible to route a call through two ZRTP-enabled PBXs using
this scheme. Assume Alice is a ZRTP endpoint who trusts her local this scheme. Assume Alice is a ZRTP endpoint who trusts her local
PBX in Atlanta, and Bob is a ZRTP endpoint who trusts his local PBX PBX in Atlanta, and Bob is a ZRTP endpoint who trusts his local PBX
in Biloxi. The call is routed from Alice to the Atlanta PBX to the in Biloxi. The call is routed from Alice to the Atlanta PBX to the
Biloxi PBX to Bob. Atlanta would relay the Atlanta-Biloxi SAS to Biloxi PBX to Bob. Atlanta would relay the Atlanta-Biloxi SAS to
Alice because Alice is enrolled with Atlanta, and Biloxi would relay Alice because Alice is enrolled with Atlanta, and Biloxi would relay
the Atlanta-Biloxi SAS to Bob because Bob is enrolled with Biloxi. the Atlanta-Biloxi SAS to Bob because Bob is enrolled with Biloxi.
The two PBXs are not assumed to be enrolled with each other in this The two PBXs are not assumed to be enrolled with each other in this
example. Both Alice and Bob would view and verbally compare the same example. Both Alice and Bob would view and verbally compare the same
relayed SAS, the Atlanta-Biloxi SAS. No more than two trusted MiTM relayed SAS, the Atlanta-Biloxi SAS. No more than two trusted MiTM
nodes can be traversed with this relaying scheme. This behavior is nodes can be traversed with this relaying scheme. This behavior is
extended to two PBXs that are enrolled with each other, via this extended to two PBXs that are enrolled with each other, via this
rule: In the case of a PBX sharing trusted MiTM keys with both rule: In the case of a PBX sharing trusted MiTM keys with both
endpoints (i.e., both enrolled with this PBX), one of which is endpoints (i.e., both enrolled with this PBX), one of which is
another PBX (evidenced by the M-flag) and one of which is a non-PBX, another PBX (evidenced by the M-flag) and one of which is a non-PBX,
the MiTM PBX must always relay the PBX-to-PBX SAS to the non-PBX the MiTM PBX MUST always relay the PBX-to-PBX SAS to the non-PBX
endpoint. endpoint.
A ZRTP endpoint phone that trusts a PBX to act as a trusted MiTM is A ZRTP endpoint phone that trusts a PBX to act as a trusted MiTM is
effectively delegating its own policy decisions of algorithm effectively delegating its own policy decisions of algorithm
negotiation to the PBX. negotiation to the PBX.
When a PBX is between two ZRTP endpoints and is terminating their When a PBX is between two ZRTP endpoints and is terminating their
media streams at the PBX, the PBX presents its own ZID to the two media streams at the PBX, the PBX presents its own ZID to the two
parties, eclipsing the ZIDs of the two parties from each other. For parties, eclipsing the ZIDs of the two parties from each other. For
example, if several different calls are routed through such a PBX to example, if several different calls are routed through such a PBX to
several different ZRTP-enabled phones behind the PBX, only a single several different ZRTP-enabled phones behind the PBX, only a single
ZID is presented to the calling party in every case -- the ZID of the ZID is presented to the calling party in every case -- the ZID of the
PBX itself. PBX itself.
This SAS relay mechanism imposes a cognitive burden on the user, and
the number of intermediaries does not scale up beyond two PBXs
trusted by their respective local users. The ZRTP ecosystem becomes
more elegant if all PBXs and other media intermediaries avoid the
MiTM role whenever possible, as explained in Section 10.1.
The next section describes the initial enrollment procedure that The next section describes the initial enrollment procedure that
establishes a special shared secret, a trusted MiTM key, between a establishes a special shared secret, a trusted MiTM key, between a
PBX and a phone, so that the phone will learn to recognize the PBX as PBX and a phone, so that the phone will learn to recognize the PBX as
a trusted MiTM. a trusted MiTM.
7.3.1. PBX Enrollment and the PBX Enrollment Flag 7.3.1. PBX Enrollment and the PBX Enrollment Flag
Both the PBX and the endpoint need to know when enrollment is taking Both the PBX and the endpoint need to know when enrollment is taking
place. One way of doing this is to set up an enrollment extension on place. One way of doing this is to set up an enrollment extension on
the PBX that a newly configured endpoint would call and establish a the PBX that a newly configured endpoint would call and establish a
skipping to change at page 88, line 47 skipping to change at page 91, line 48
that this puts the PBX in a position to wiretap the calls. that this puts the PBX in a position to wiretap the calls.
It is recommended that a ZRTP client not proceed with the PBX It is recommended that a ZRTP client not proceed with the PBX
enrollment procedure without evidence that a MiTM attack is not enrollment procedure without evidence that a MiTM attack is not
taking place during the enrollment session. It would be especially taking place during the enrollment session. It would be especially
damaging if a MiTM tricks the client into enrolling with the wrong damaging if a MiTM tricks the client into enrolling with the wrong
PBX. That would enable the malevolent MiTM to wiretap all future PBX. That would enable the malevolent MiTM to wiretap all future
calls without arousing suspicion, because he would appear to be calls without arousing suspicion, because he would appear to be
trusted. trusted.
To this end, the client ZRTP endpoint should not proceed with PBX
enrollment unless at least one of the following conditions apply:
o An automated mechanism is used, from Section 7.4. TLS-protected
signaling may be especially well-suited in this special case, for
reasons explained in Section 8.1.1.
o The SAS is verified with a live human on the PBX side during the
enrollment session.
o It is the judgement of the administrator supervising the
enrollment that the threat model and the circumstances indicate a
low probability of a MiTM being present, perhaps because this is
the first call to the PBX, or because the enrollment is conducted
over a relatively safe network. For example, a mobile smart phone
can be enrolled through a protected WiFi local network near the
PBX, before issuing it to an employee for international travel.
This leap of faith is usually justified in benign environments.
7.4. Automated Methods of Authenticating the DH Exchange
Alternate methods of authenticating the DH exchange may be used when
interacting with an automated remote system, when no human is
available at the remote endpoint to verbally compare the SAS. Usage
scenarios include leaving or retrieving voicemail, interacting with a
conference bridge, or the PBX security enrollment procedure
(Section 7.3.1).
Here are the automated ways to have ZRTP authenticate the DH
exchange:
o Successful use of the mechanism described in Section 8.1.1, but
only if fully supported by end-to-end integrity-protected delivery
of the a=zrtp-hash in the signaling. This might be achieved via
[RFC4474] or better still, Dan Wing's SIP Identity using Media
Path [SIP-IDENTITY]. This allows authentication of the DH
exchange without human assistance. However, in most usage
scenarios that access an automated system, the entire end-to-end
path is comprised of only one hop, so TLS provides sufficient
integrity protection in this special case. This is explained in
detail in Section 8.1.1.
o The SAS was previously verified with the remote system in an
earlier session, evidenced by the SAS verified flag (V)
(Section 7.1) at both ends and a matching cache entry. If
circumstances permit this method, it has the advantage of not
requiring a PKI.
o A good signature is received and verified using the digital
signature feature on the SAS hash, as described in Section 7.2, if
this feature is supported. Note that for PBX enrollment, only the
PBX endpoint needs to supply the signature, because the trust
decision is made on the client side only.
In any PKI-backed scheme, there is the disadvantage of having to
decide what to do if the connection fails to authenticate because of
a certificate problem. Warning messages may not be effective because
users become habituated to security warnings [Sunshine] about PKI
certificates. Implementors should carefully weigh the cognitive
burden on the user before they invoke such a heavyweight mechanism.
ZRTP is intended to be a lightweight protocol with a low activation
energy and minimal cognitive burden.
When calling an automated system for the first time, the threat model
and circumstances should be examined to decide if a PKI is the only
way to protect against a MiTM. A reasonable alternative to a PKI
would be to rely on the leap of faith that a MiTM attack is less
likely in the initial session, an assumption that seems to work well
enough for SSH. After the first session, cached shared secrets
should suffice.
8. Signaling Interactions 8. Signaling Interactions
This section discusses how ZRTP, SIP, and SDP work together. This section discusses how ZRTP, SIP, and SDP work together.
Note that ZRTP may be implemented without coupling with the SIP Note that ZRTP may be implemented without coupling with the SIP
signaling. For example, ZRTP can be implemented as a "bump in the signaling. For example, ZRTP can be implemented as a "bump in the
wire" or as a "bump in the stack" in which RTP sent by the SIP User wire" or as a "bump in the stack" in which RTP sent by the SIP User
Agent (UA) is converted to ZRTP. In these cases, the SIP UA will Agent (UA) is converted to ZRTP. In these cases, the SIP UA will
have no knowledge of ZRTP. As a result, the signaling path discovery have no knowledge of ZRTP. As a result, the signaling path discovery
mechanisms introduced in this section should not be definitive -- mechanisms introduced in this section should not be definitive --
skipping to change at page 89, line 35 skipping to change at page 94, line 12
each other. For example, if only one endpoint supports ZRTP, but each other. For example, if only one endpoint supports ZRTP, but
both support another method to key SRTP, then the other method will both support another method to key SRTP, then the other method will
be used instead. When used in parallel, an SRTP secret carried in an be used instead. When used in parallel, an SRTP secret carried in an
a=keymgt [RFC4567] or a=crypto [RFC4568] attribute can be used as a a=keymgt [RFC4567] or a=crypto [RFC4568] attribute can be used as a
shared secret for the srtps computation defined in Section 8.2. The shared secret for the srtps computation defined in Section 8.2. The
ZRTP attribute is also used to signal to an intermediary ZRTP device ZRTP attribute is also used to signal to an intermediary ZRTP device
not to act as a ZRTP endpoint, as discussed in Section 10. not to act as a ZRTP endpoint, as discussed in Section 10.
The a=zrtp-hash attribute can only be included in the SDP at the The a=zrtp-hash attribute can only be included in the SDP at the
media level since Hello messages sent in different media streams will media level since Hello messages sent in different media streams will
have unique hashes. have unique hashes. A separate a=zrtp-hash attribute should be
included for each media stream. Both ZRTP endpoints should provide
a=zrtp-hash attributes in their SDP.
The ABNF for the ZRTP attribute is as follows: The ABNF for the ZRTP attribute is as follows:
zrtp-attribute = "a=zrtp-hash:" zrtp-version zrtp-hash-value zrtp-attribute = "a=zrtp-hash:" zrtp-version zrtp-hash-value
zrtp-version = token zrtp-version = token
zrtp-hash-value = 1*(HEXDIG) zrtp-hash-value = 1*(HEXDIG)
Here's an example of the ZRTP attribute in an initial SDP offer or Here's an example of the ZRTP attribute in an initial SDP offer or
skipping to change at page 93, line 13 skipping to change at page 97, line 40
integrity becomes more problematic if E.164 numbers [RFC3824] are integrity becomes more problematic if E.164 numbers [RFC3824] are
used in SIP. Thus, real-world implementations of ZRTP endpoints will used in SIP. Thus, real-world implementations of ZRTP endpoints will
continue to depend on SAS authentication for quite some time. Even continue to depend on SAS authentication for quite some time. Even
after there is widespread availability of SIP user agents that offer after there is widespread availability of SIP user agents that offer
integrity protected delivery of SDP attributes, many users will still integrity protected delivery of SDP attributes, many users will still
be faced with the fact that the signaling path may be controlled by be faced with the fact that the signaling path may be controlled by
institutions that do not have the best interests of the end user in institutions that do not have the best interests of the end user in
mind. In those cases, SAS authentication will remain the gold mind. In those cases, SAS authentication will remain the gold
standard for the prudent user. standard for the prudent user.
Even without SIP integrity protection, the Media Security The SIP layer can obtain hop-wise integrity protection simply by
using TLS [RFC5246], but this does not achieve full end-to-end
integrity protection of the a=zrtp-hash attribute in the multi-hop
general case. However, if the entire end-to-end signaling path is
comprised of only one hop, TLS is good enough, provided the
associated PKI complexity can be contained. This usually covers the
use cases where a client is traversing one TLS hop to access the
automated remote services of its own PBX, where no human is available
to verbally compare the SAS. Examples include leaving or retrieving
voicemail, interacting with an IVR or conference bridge, or
performing the PBX security enrollment procedure (Section 7.3.1).
Note that the risk of trusting the SIP server or PBX becomes moot
when the PBX itself is the intended ZRTP endpoint. Thus, TLS-
protected signaling is recommended and preferred for these special
use cases. TLS-protected signaling is usually justified for its own
separate reasons, to mitigate exposure to traffic analysis, which
means the signaling layer already would have borne the additional
cost of TLS.
Even without SIP end-to-end integrity protection, the Media Security
Requirements [RFC5479] R-ACT-ACT requirement can be met by ZRTP's SAS Requirements [RFC5479] R-ACT-ACT requirement can be met by ZRTP's SAS
mechanism. Although ZRTP may benefit from an integrity-protected SIP mechanism. Although ZRTP may benefit from an integrity-protected SIP
layer, it is fortunate that ZRTP's self-contained MiTM defenses do layer, it is fortunate that ZRTP's self-contained MiTM defenses do
not actually require an integrity-protected SIP layer. ZRTP can not actually require an integrity-protected SIP layer. ZRTP can
bypass the delays and problems that SIP integrity faces, such as bypass the delays and problems that SIP integrity faces, such as
E.164 number usage, and the complexity of building and maintaining a E.164 number usage, and the complexity of building and maintaining a
PKI. PKI.
In contrast, DTLS-SRTP [RFC5764] appears to depend heavily on end-to- In contrast, DTLS-SRTP [RFC5764] appears to depend heavily on end-to-
end integrity protection in the SIP layer. Further, DTLS-SRTP must end integrity protection in the SIP layer. Further, DTLS-SRTP must
skipping to change at page 94, line 33 skipping to change at page 99, line 31
bitrate depending on the type of sound being compressed. bitrate depending on the type of sound being compressed.
It also appears that voice activity detection (VAD) leaks information It also appears that voice activity detection (VAD) leaks information
about the content of the conversation, but to a lesser extent than about the content of the conversation, but to a lesser extent than
VBR. This effect can be mitigated by lengthening the VAD hangover VBR. This effect can be mitigated by lengthening the VAD hangover
time by a random amount between 1 and 2 seconds, if this is feasible time by a random amount between 1 and 2 seconds, if this is feasible
in your application. Only short bursts of speech would benefit from in your application. Only short bursts of speech would benefit from
lengthening the VAD hangover time. lengthening the VAD hangover time.
The security problems of VBR and VAD are addressed in detail by the The security problems of VBR and VAD are addressed in detail by the
guidelines in [VBR-AUDIO]. It is RECOMMENDED that ZRTP endpoints guidelines in [RFC6562]. It is RECOMMENDED that ZRTP endpoints
follow these guidelines. follow these guidelines.
9. False ZRTP Packet Rejection 9. False ZRTP Packet Rejection
An attacker who is not in the media path may attempt to inject false An attacker who is not in the media path may attempt to inject false
ZRTP protocol packets, possibly to effect a denial-of-service attack ZRTP protocol packets, possibly to effect a denial-of-service attack
or to inject his own media stream into the call. VoIP, by its or to inject his own media stream into the call. VoIP, by its
nature, invites various forms of denial-of-service attacks and nature, invites various forms of denial-of-service attacks and
requires protocol features to reject such attacks. While bogus SRTP requires protocol features to reject such attacks. While bogus SRTP
packets may be easily rejected via the SRTP auth tag field, that can packets may be easily rejected via the SRTP auth tag field, that can
skipping to change at page 98, line 5 skipping to change at page 102, line 48
(IVR), voicemail system, or speech recognition system. The display (IVR), voicemail system, or speech recognition system. The display
of SAS strings to users should be disabled in these cases. of SAS strings to users should be disabled in these cases.
It is possible that an intermediary device acting as a ZRTP endpoint It is possible that an intermediary device acting as a ZRTP endpoint
might still receive ZRTP Hello and other messages from the inside might still receive ZRTP Hello and other messages from the inside
endpoint. This could occur if there is another inline ZRTP device endpoint. This could occur if there is another inline ZRTP device
that does not include the ZRTP SDP attribute flag. An intermediary that does not include the ZRTP SDP attribute flag. An intermediary
acting as a ZRTP endpoint receiving ZRTP Hello and other messages acting as a ZRTP endpoint receiving ZRTP Hello and other messages
from the inside endpoint MUST NOT pass these ZRTP messages. from the inside endpoint MUST NOT pass these ZRTP messages.
10.1. On Reducing PBX MiTM Behavior
ZRTP is designed to negotiate session keys directly between two
users, and to detect a man-in-the-middle (MiTM) attack. A PBX often
tries to be a MiTM, as part of its natural functionality. This
creates a conflict between the objectives of a ZRTP client and the
objectives of a PBX. This conflict may be resolved by using the
trusted MiTM mechanism (Section 7.3), but this adds complexity and
only works well between users of a single trusted PBX. It can be
stretched further to handle calls between two PBXs trusted by their
respective local users, but breaks down if more intermediaries are
involved. It also imposes a cognitive burden on the user, who may
not be aware of the security properties or trustworthiness of all the
intermediaries.
The client usually prefers to negotiate ZRTP end-to-end with the
other client, without exposing the keys or plaintext to the PBX, and
use the PBX as a trusted MiTM only when necessary. A PBX should
allow this whenever possible, even if the clients trust the PBX.
The PBX may avoid acting as a MiTM either by allowing the media to
completely bypass the PBX, with the two clients routing their media
peer-to-peer, or by acting as a media relay in a manner similar to a
TURN server. The advantages of the latter approach are mainly to
facilitate NAT traversal. If only one of the two parties is a ZRTP
endpoint, and the PBX is capable of serving as a ZRTP endpoint, the
PBX MUST attempt to negotiate a ZRTP session with the client that
supports ZRTP, so that at least one leg of the call is secure. This
is a far better choice than directly connecting the media streams
between a ZRTP client and a non-ZRTP client, and having the ZRTP
negotiation fail completely.
The PBX SHOULD make best efforts to not act as a MiTM if the PBX has
evidence that both VoIP clients support ZRTP. Evidence of ZRTP
support is best indicated by the presence of the optional a=zrtp-hash
attribute (Section 8) in the signaling layer of both the caller and
callee. Evidence of ZRTP support or non-support in the clients may
also be available to the PBX in the form of configuration information
stored in the PBX.
If the client sends the a=zrtp-hash attribute, and the PBX acts as a
MiTM nonetheless, the client SHOULD alert the user to the fact that
the security level is less than expected. The client can readily
detect this condition by receiving an SASrelay message (Figure 16)
from the PBX. The severity of the alert is left to the application,
which would be relying on the trusted MiTM mechanism.
A PBX should not act as a MiTM unless there is a compelling reason to
do so. Transcoding is fundamentally incompatible with end-to-end
secure media. It should be done only when there is no alternative,
when the two ZRTP endpoints do not share a common codec. ZRTP
clients should implement a repertoire of codecs sufficient to
minimize the need for PBX transcoding. Transcoding between two ZRTP
clients forces a PBX to act as a MiTM. If only one media stream
needs transcoding in a multimedia session, all of the media streams
in that session must be handled in MiTM mode.
If there is more than one media stream in a session between two ZRTP
endpoints, a PBX MUST either act as a MiTM for all of them, or for
none of them. This is because all the media streams between two ZRTP
endpoints must share the same SAS (Section 7), due to the use of
Multistream mode (Section 3.1.3). This includes the related RTCP/
SRTCP streams.
A PBX may forgo end-to-end security and choose MiTM mode for policy
reasons. An institution may choose to present a single ZRTP endpoint
to the outside world, through its locally trusted PBX. Or, a client
application may explicitly request a PBX to act as a MiTM for a
particular call, for example via a special dial prefix.
It's especially harmful if a PBX that lacks its own ZRTP stack
performs unnecessary transcoding between two ZRTP endpoints, ruling
out the possibility of any secure connection at all. Not even the
trusted MiTM mechanism is available, because this PBX is incapable of
acting as a back-to-back ZRTP MiTM. Even if the PBX avoids
transcoding, it might terminate the media streams for other reasons,
reasons that are likely to be less important than the clients' need
for a secure call. If this kind of PBX sees the a=zrtp-hash
attribute in the caller's signaling, and the two clients share at
least one common codec, the PBX should at least attempt to do no
harm, and get out of the way of ZRTP. Let the users speak Navajo
with each other if they want.
A common usage scenario for a ZRTP-enabled PBX is for a VoIP client
to call a PBX trusted by the client, in order to bridge to a PSTN
gateway in or near the PBX. In such a case, the PBX SHOULD act as a
ZRTP endpoint so that the VoIP leg of the call is secured. The call
should be regarded as not secure past the ZRTP endpoint closest to
the PSTN gateway. If the PSTN gateway is distant from the PBX, the
PBX should provide a secure connection to the PSTN gateway, perhaps
through a VPN connection. Even then, the call becomes vulnerable
when it enters the PSTN. Nonetheless, this would be appropriate for
a caller who originates his ZRTP session from a hostile environment,
but is less concerned about the wiretap threat near the PSTN gateway.
11. The ZRTP Disclosure Flag 11. The ZRTP Disclosure Flag
There are no back doors defined in the ZRTP protocol specification. There are no back doors defined in the ZRTP protocol specification.
The designers of ZRTP would like to discourage back doors in ZRTP- The designers of ZRTP would like to discourage back doors in ZRTP-
enabled products. However, despite the lack of back doors in the enabled products. However, despite the lack of back doors in the
actual ZRTP protocol, it must be recognized that a ZRTP implementer actual ZRTP protocol, it must be recognized that a ZRTP implementer
might still deliberately create a rogue ZRTP-enabled product that might still deliberately create a rogue ZRTP-enabled product that
implements a back door outside the scope of the ZRTP protocol. For implements a back door outside the scope of the ZRTP protocol. For
example, they could create a product that discloses the SRTP session example, they could create a product that discloses the SRTP session
key generated using ZRTP out-of-band to a third party. They may even key generated using ZRTP out-of-band to a third party. They may even
skipping to change at page 100, line 50 skipping to change at page 107, line 45
several ZIDs, and a single ZID may be associated with several SIP several ZIDs, and a single ZID may be associated with several SIP
URIs on the same client. URIs on the same client.
Not only that, but ZRTP is independent of which signaling protocol is Not only that, but ZRTP is independent of which signaling protocol is
used. It works equally well with SIP, Jingle, H.323, or any used. It works equally well with SIP, Jingle, H.323, or any
proprietary signaling protocol. Thus, a ZRTP ZID has little to do proprietary signaling protocol. Thus, a ZRTP ZID has little to do
with SIP, per se, which means it has little to do with a SIP URI. with SIP, per se, which means it has little to do with a SIP URI.
Even though a ZID is associated with a device, not a human, it is Even though a ZID is associated with a device, not a human, it is
often the case that a ZRTP endpoint is controlled mainly by a often the case that a ZRTP endpoint is controlled mainly by a
particular human. For example, it may be a mobile phone. To get the particular human. For example, it may be a mobile phone. For the
full benefit of the key continuity features, a local cache entry (and key continuity features (Section 15.1) to be effective, a local cache
thus a ZID) should be associated with some sort of name of the remote entry (and thus a ZID) should be associated with some sort of name of
party. That name could be a human name, or it could be made more the remote party. That name could be a human name, or it could be
precise by specifying which ZRTP endpoint he's using. For example made more precise by specifying which ZRTP endpoint he's using. For
"Jon Callas", or "Jon Callas on his iPhone", or "Jon on his iPad", or example "Jon Callas", or "Jon Callas on his iPhone", or "Jon on his
"Alice on her office phone". These name strings can be stored in the iPad", or "Alice on her office phone". These name strings can be
local cache, indexed by ZID, and may have been initially provided by stored in the local cache, indexed by ZID, and may have been
the local user by hand. Or the local cache entry may contain a initially provided by the local user by hand. Or the local cache
pointer to an entry in the local address book. When a secure session entry may contain a pointer to an entry in the local address book.
is established, if a prior session has established a cache entry, and When a secure session is established, if a prior session has
the new session has a matching cache entry indexed by the same ZID, established a cache entry, and the new session has a matching cache
and the SAS has been previously verified, the person's name stored in entry indexed by the same ZID, and the SAS has been previously
that cache entry should be displayed. verified, the person's name stored in that cache entry should be
displayed.
It is absolutely essential to have these human-readable names
associated with cache entries. If the cache is implemented without
them, it opens the door to a simple form of MiTM attack. An attacker
who has previously established a cache entry with both parties (or
simply captures a phone that has) can later act as a MiTM between
those two parties without triggering a cache mismatch, which means
the users will not be alerted to do an SAS compare. This MiTM attack
would be easily detected if the name stored with the cache entry is
displayed for the user, so that the user can readily see that he is
not connected to the remote party he expected.
If the remote ZID originates from a PBX, the displayed name would be If the remote ZID originates from a PBX, the displayed name would be
the name of that PBX, which might be the name of the company who owns the name of that PBX, which might be the name of the company who owns
that PBX. that PBX.
If it is desirable to associate some key material with a particular If it is desirable to associate some key material with a particular
AOR, digital signatures (Section 7.2) may be used, with public key AOR, digital signatures (Section 7.2) may be used, with public key
certificates that associate the signature key with an AOR. If more certificates that associate the signature key with an AOR. If more
than one ZRTP endpoint shares the same AOR, they may all use the same than one ZRTP endpoint shares the same AOR, they may all use the same
signature key and provide the same public key certificate with their signature key and provide the same public key certificate with their
skipping to change at page 112, line 11 skipping to change at page 119, line 26
BCP 119, RFC 4579, August 2006. BCP 119, RFC 4579, August 2006.
[RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
January 2008. January 2008.
[RFC5245] Rosenberg, J., "Interactive Connectivity Establishment [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment
(ICE): A Protocol for Network Address Translator (NAT) (ICE): A Protocol for Network Address Translator (NAT)
Traversal for Offer/Answer Protocols", RFC 5245, Traversal for Offer/Answer Protocols", RFC 5245,
April 2010. April 2010.
[RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
(TLS) Protocol Version 1.2", RFC 5246, August 2008.
[RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer
Security (DTLS) Extension to Establish Keys for the Secure Security (DTLS) Extension to Establish Keys for the Secure
Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.
[RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand
Key Derivation Function (HKDF)", RFC 5869, May 2010. Key Derivation Function (HKDF)", RFC 5869, May 2010.
[RFC6090] McGrew, D., Igoe, K., and M. Salter, "Fundamental Elliptic [RFC6090] McGrew, D., Igoe, K., and M. Salter, "Fundamental Elliptic
Curve Cryptography Algorithms", RFC 6090, February 2011. Curve Cryptography Algorithms", RFC 6090, February 2011.
[RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of
Variable Bit Rate Audio with Secure RTP", RFC 6562,
March 2012.
[SRTP-AES-GCM] [SRTP-AES-GCM]
McGrew, D., "AES-GCM and AES-CCM Authenticated Encryption McGrew, D., "AES-GCM and AES-CCM Authenticated Encryption
in Secure RTP (SRTP)", Work in Progress, January 2011. in Secure RTP (SRTP)", Work in Progress, January 2011.
[ECC-OpenPGP] [ECC-OpenPGP]
Jivsov, A., "ECC in OpenPGP", Work in Progress, Jivsov, A., "ECC in OpenPGP", Work in Progress,
March 2011. March 2011.
[VBR-AUDIO]
Perkins, C. and J. Valin, "Guidelines for the use of
Variable Bit Rate Audio with Secure RTP", Work
in Progress, December 2010.
[SIP-IDENTITY] [SIP-IDENTITY]
Wing, D. and H. Kaplan, "SIP Identity using Media Path", Wing, D. and H. Kaplan, "SIP Identity using Media Path",
Work in Progress, February 2008. Work in Progress, February 2008.
[NIST-SP800-57-Part1] [NIST-SP800-57-Part1]
Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid, Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid,
"Recommendation for Key Management - Part 1: General "Recommendation for Key Management - Part 1: General
(Revised)", NIST Special Publication 800-57 - Part (Revised)", NIST Special Publication 800-57 - Part
1 Revised March 2007. 1 Revised March 2007.
 End of changes. 32 change blocks. 
115 lines changed or deleted 447 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/