SIGN IN SIGN UP

Protocol Buffers - Google's data interchange format

0 0 21 C++

Add conformance tests for overlong varints as tags.

No wire format should ever contain an overlong varint, so the topic here is only how to react to non-standard and potentially corrupted data.

The situation today is that there's 4 main ways that implementations deal when parsing tags:

1) parse up to 10 bytes, cast to uint32
2) parse up to 10 bytes, reject if it is above uint32_max
3) parse up to 5 bytes, cast to uint32
4) parse up to 5 bytes, reject if it is above uint32_max

Of our primary supported implementations, these four strategies are used by Java, Go, C++ and upb correspondingly.

Based on examining the situation, the decision taken is that:

- Coercing down silently ignoring bits in the tag is dangerous to interpretation-confusion / silent misparsing, which means Java approach is dangerous.

- Needing to support parsing up to 10 bytes (even when they may just be all 0x80 and no content) would have real performance implications on the upb and C++ parsers. Since it should really never happen taking any performance hit on all parses based on a hypothetical is considered undesirable.

For that reason, the conformance test is set to match upb's behavior, which is slight mismatch to C++ and Go behavior today (in different ways), and larger mismatch to the Java behavior today.

Because fixing this 'bug' may be disruptive to a customer in theory (though it would probably mean they have some bad data that was accidentally parsing), we may hold back fixing the behavior to a breaking change release; this change to the conformance suite only establishes the decision on preferred behavior.

PiperOrigin-RevId: 841856475
P
Protobuf Team Bot committed
448b53feed0e0d8ed3113cc8c76ef75ff0072813
Parent: 54a48aa
Committed by Copybara-Service <copybara-worker@google.com> on 12/8/2025, 7:55:52 PM