That’s what they’re saying.
Essentially - if someone from the small instance subscribes to a community that has a ton of data (huge post volume, images, whatever), the small instance needs to pull data over from the larger instance. At some point there may be communities that are so large small instances can’t pull them in without tanking.
If I’m reading the protocol right, it’s probably larger instances that will avoid more duplication, since:
I’m not sure I see where you see caching fitting in.
I am surprised I don’t see some kind of lower resolution digest concept in the protocol (which might be what you’re looking for)