[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DNA] draft-ietf-dna-cpl-02.txt



Thomas Narten wrote:

> Hmm. One thing missing here is the high-level description of expected
> behavior when a link goes "down" and then "comes back up". 2461/2462
> (to the degree I recall) doesn't say a lot, but if it does say
> anything, the implication is probably to "discard all information when
> the link goes away" and "start fresh when a link comes up", which I
> agree is not really what we want in practice.

I think RFC 2461/62 says absolutely nothing about this.
All it says about a link being enabled is in the case of the system 
coming up, an interface being enabled, or the host attaching or 
re-attaching to a link. (In those cases a host can send a few RSes.)

RFC 2461 talks about what a *router* can do when an interface is being 
administratively disabled or the router is shut down. But there is NO 
mention of a link going down or being detached from an link.

I don't find any language about links going down or interfaces being 
disabled in RFC 2462.

So I don't think the above implication can be read between the lines of 
those RFCs.

If we want to find out what implementors have implemented, I suspect we 
need to go an survey the implementations. I suspect there might be a 
range of behavior from doing nothing explicitly when a link goes down, 
to blowing away all IPv6 addresses and all TCP connections that use 
those local address when a link goes down. Neither is recommended or 
forbidden by our sets of RFCs. (But blowing away all the TCP connections 
just because the 802.11 AP becomes unreachable for a few seconds might 
be too aggressive.)


Note that in the DNA discussions we talked about using 'link down' in 
the protocol, but we decided that
  - most L2s don't have a well-defined point at which a link is declared 
down (the interface could have been unusable for quite a while before 
the device driver considers it as down).
  - some L2 movement might happen without the link being considered 
down. For instance, a handover from one access point to another. DNA 
needs to be notified of such events, and the current assumption is that 
a 'link up' event would be generated in that case without being preceded 
by a 'link down'.

A result of these considerations is that the DNA specifications do not 
attach any behavior of a 'link down' event.

> Right. Though in the DNA context, "discard" probably needs more
> description. We don't want the info assigned to the interface
> anymore. But do we actually want all upper-layer uses of those
> addresses to now explicitely fail? Or is it OK to allow continued use
> (even if it doesn't really work) so that if we reconnect to the link
> later, upper-layer applications may still recover (using the old
> addresses)? I think there may be pros and cons to both sides on this
> point.

FWIW section 4.6 in draft-ietf-dna-cpl does specify what is required to 
be discarded.
But it is silent about any implications on ULP traffic.
 From the perspective of the document, it might make sense to add the 
same language as we have for invalid addresses in RFC 2462 (which is 
that the host should not send any packets using such addresses as source).

I agree that the ULP implications have be argued either way, and the 
optimal behavior is application dependent.
*If* the host might move back to the old link and the application can't 
easily recover, then the best thing for TCP to do would be to be 
patient, and keep on retransmitting. (And the IP layer might prevent 
those packets from going out with the invalid source address until the 
host moves back to the same link.)

But *if* the application can quickly and easily recover by creating a 
new TCP connection, the best strategy would be to immediately reset the 
TCP connection so that the application can recreate another one.

I wonder if it makes sense to add the above considarations in a 
non-normative appendix somewhere.

>> The issue of "discard old information" above does have the potential to 
>> break things compared to a non-DNA host today.
>> Today a host (that only follows 2461/62) will assume it has the same 
>> information until the information times out (with default timers ranging 
>> from hours to weeks).
> 
> Is this just what implementations do, or is this supported (or even
> called for) in 2461/2462 (or other specs)?

As stated above, I find no text in 2461/2462 arguing either end of the 
scale (or the middle) for how a host should behave when a link goes down.

I think there is a fundamental tradeoff between being quick to react and 
keep the host's IPv6 addresses stable.
CPL as well as the DNA solution protocol do go through some extra work 
to come up with a reasonable tradeoff *at the IP layer* between the two, 
and this is why we see things like on a lossy link CPL might not be able 
to conclude a host has moved until after 12 seconds worst case. (But the 
DNA solution can always do it in one RS/RA roundtrip time.)

By doing this, DNA leaves the tradeoff between quick and stable to the 
ULPs to cope with. And we seem to agree that there are some hard 
tradeoffs for TCP and the applications in that space.

>> If a DNA capable host, after detecting a change in L2 attachment point ( 
>> a link up event) erroneously decides that it has moved at L3, and as a 
>> result discards its autoconfigured IPv6 addresses, this might cause 
>> failures that are visible to applications. (The disappeared IPv6 
>> addresses might cause existing connections to fail, even if an RA 
>> arrives milliseconds later and recreates the disappeared IPv6 address.)
> 
> Yep. I agree that this is probably not what we want.
> 
>> So your comments seem to be based on the assumption that existing IPv6 
>> stacks always discard all IPv6 addresses and other information when they 
>> are connected to a different L2 attachment point. That certainly isn't 
>> the case for the implementation I use. If we can resolve the above 
>> point, then we can see what makes sense with respect to simplifications 
>> to the draft.
> 
> That wasn't my intent.
> 
> But, this discussion is good. As I read through some of the other
> documents I kept coming back to "what  is the abstract model"
> governing the behavior we want?
> 
> I will step back and try to describe that as well, as I think it's
> helpful to understand/clarify at an abstract level what the principles
> are and how we think things should work. From that, it's a lot easier
> to dive into details.

OK.

    Erik