[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DNA] High-level overview of DNA, take 2



Thomas Narten wrote:

> DNA requires that a host keep track of the links to which it has
> recently attached. For each known link, information about that link,
> such as default router addresses, prefix information, MTU, Lifetimes,
> etc. are kept in what are called Link Instances (aka Candidate Link
> Objects in dna-cpl). Link Instances are identified by the set of
> on-link global prefixes that have been advertised on that link, as no
> prefix may be assigned to different links simultaneously. At any given
> time, one LI is considered the Current LI, and the parameters
> associated with the CLI are what are actually configured on the link
> as defined in 2461/2462.

I'm not sure all the information about the link needs to be retained 
(default router address, MTU above). What is strictly necessary is the 
information to tell which link it is, i.e. the prefixes and their 
lifetimes. In particular, the list of default routers for a particular 
link might change a lot more frequently than the set of prefixes 
assigned to the link.

I think "on-link global prefixes" above would be a bad idea; we don't 
want to restrict this to the case when on-link prefixes are advertised, 
because for media where multicast is expensive (IEEE .15.4 and .16) 
folks are discussing ways to avoid multicast NSes; one way to do this is 
to not advertise any on-link prefixes (but advertise prefixes with the A 
bit set) which results in the packets being unicast to the default 
router per standard RFC 2461.
So I think DNA should operate using prefixes that have either the L 
(on-link), or the A (addrconf), or both bits set.

> When a "link up" event is detected, a host will need to determine what
> IP configuration parameters should be used. A simple (but naive)
> approach would be to discard all associated information and initiate
> the normal startup procedure as described in 2461/2462.  However, this
> has the downside of causing upper-layer protocols using the (just
> discarded) addresses to possibly receive "hard errors" and terminate
> connections using those addresses.  In the common case where a link
> goes away momentarily and then comes back up connected to the same
> link (with the same associated config information) this is obviously
> undesirable.

The last sentence seems to say that a link is connected to a link (the 
link goes away and then comes back up connected to the same link).
If we want to be more careful about terminology we would state that the 
'link up' and 'link down' events do not signal anything about the link 
itself; they merely signal something about the attachment point to the 
link. Thus it makes sense to say the attachment point goes down ('link 
down' event) and them comes back up and is connected to the same link.

> DNA takes the approach that upon reciept of a "link up" event, the
> most-recently used configuration information should continue to be
> used, and that DNA should be initiated to (quickly) determine whether
> one is attached to a new link, or one to which one has been previously
> attached.  In the simple case, DNA will quickly determine whether one
> has reattached to the current link or to another known link.
> Specifically, it will do so by sending a single RS and receiving a
> single RA, one containing a prefix that it has seen before, thereby
> identifying the link. In such cases, DNA can declare its work done,
> and any new prefixes in the received RA are added to the CLI.

In the above description it isn't clear to me whether you are discussion
  - how CPL works
  - how dna-protocol works
  - some fundamental statement which spans the two

It is clearly highly desirable to handle this with the receipt of a 
single RA, and that RA should arrive as soon as possible. But the "as 
soon as possible" requires changes to the router as specified in 
dna-protocol.

> In some scenarios, however, things are more complex. Specifically, DNA
> may receive an RA that contains only prefixes it has never seen
> before. When this happens, it may not be sure whether the prefixes
> correspond to a new link (i.e., one not visited before), or a
> previously-visited link, but for which it has not yet seen all the
> advertised prefixes. While it might be simplest to assume that receipt
> of such an RA indicated attachment to a new link, that can lead to
> (incorrectly) invalidating other configuration information for that
> link. To handle this case, DNA takes extra steps. First, it
> distinguishes between "complete" and "incomplete" LIs. Complete LIs
> are those for which there is a high probability of having recieved all
> prefixes being advertised on that link. Incomplete LIs are those for
> which insufficient time has elapsed to conclude with reasonable
> certainty that it is likely that all advertised prefixes have been
> seen.
 >
> When DNA receives an RA advertising (only) unknown prefixes, and the
> CLI is not "complete", DNA waits for additional RAs before deciding
> whether the link is new or not.

In the above paragraphs by "DNA" you appear to mean "CPL", right?

If you want to give an overview of the combination of dna-protocol* and 
CPL, then you need to modify the text. And the "two variants" below 
makes me believe that is what you are trying to accomplish.

> The DNA procedure is tightly coupled with the use of 2461/2462, since
> DNA uses the same messages. 
> 
> There are two variants of DNA. In the first variant, hosts interact
> with routers that have implemented 2461/2462, but have not implemented
> any DNA extensions. 

I actually don't think they are variants.
I'd expect a host to send a single RS that is capable of doing the best 
possible whether the routers support just 2461 or support the DNA 
extensions.
We don't want the host to have to guess which form of routers there are 
on a potentially new link before it sends the first RS!

But due to having two separate documents, the combination isn't written 
down anywhere, so it isn't unreasonable for folks to think there are two 
variants.

> Some key questions for DNA. 
> 
> 1) when reusing "old" config, do we need to do anything special? E.g.,
>    change state of ND entries to STALE? Put addresses in Optimistic
>    state? Take into account how much time has elapsed since we were
>    last attached (or just assume this happens as part of normal ND
>    timers?)

Good questions.
For DAD I think it makes sense to always switch to optimistic state even 
for a short link down or when a link up arrives without a preceding link 
down. This is ok since doing that, and not sending any DAD probe, is not 
expensive.
Once a RA arrives indicating that the host did not move, then the 
optimistic mode can be turned off.

For ND state of STALE the situation is similar to the default router 
list and other cached information. If the host has been disconnected 
from the link for a fraction of a section or a few seconds, it might 
have missed some important packet (such as an unsolicited NA telling it 
that the L2 address of a neighbor has changed, or an unsolicited RA 
telling it that a router is going away.) But it could have missed such a 
packet due to bit errors even if it was connected to the link. And we 
have NUD as a way to recover from those cases, so why not use unmodified 
NUD to recover from the case when the host was disconnected for a while.
Of course, if the host determines that it has moved to a different link, 
then the information in the neighbor cache isn't useful any more.

I don't think we need extra timers for this; the existing timers (for 
default router list entries, NUD probing, etc) are ok, since they are 
(supposed to be ;-) ok when the host remains attached to the link. Thus 
the argument about packet loss caused by bit errors vs. packet loss 
caused by being detached from the link apply here as well.

>    What harm can occur from continuing to use the same information?
>    how does this compare with the harm of delaying the configuration
>    of an interface until one is more certain?
> 
> 2) Upon "link up", does DNA need to delay sending an RS per the
>    2461/2462 rule? Are there ways to relax this that are consistent
>    with the original concerns that led to the existing wording in
>    2461/2462?

What is important is that DNA doesn't cause a flood of packets that 
makes things worse.
I think somebody sent a reference to a paper on a study of 802.11 using 
a bunch of stations moving together (think of a vehicle with many 
devices) and found that the L2 association meant that the 'link up' 
events were already spread over time.

> 3) Upon "link up", what messages can host send that would help it to
>    quickly determine if on same link? Is it just an RS? what about
>    unicast? what about an NS (e.g., perhaps including a Landmark
>    option)?

The WG has talked about NS in the past, in particular an NS to the old 
default router(s).
This is an extra packet (a RS plus an NS instead of just an NS).
And since neither the routers L2 address nor the routers link-local 
address need to be globally unique, the fact that the NS gets a response 
doesn't mean the host is still attached to the same link.

>    Are there other messages that can be sent that could help? Like NS
>    queries unicast to known routers?

The host would have to know a global IPv6 address of the router, and the 
router MUST NOT respond on interface 1 for a NS addressed to it's global 
address assigned to interface 2, for the resulting NA to be a fool-proof 
indication that the host didn't move.

>    Note: on some links, like wireless ethernet, multicast is
>    (relatively) "expensive". Would it be better to just send a few
>    immediate unicast NS (or RS) messages to the routers we know about?
>    and send a multicast RS only if the unicast fails? Or do both in
>    parallel?

Two things:
  - first sending a NS and waiting for an answer (how long?), perhaps 
retransmitting the NS, before sending the RS means that DNA in the case 
when the host did move to a different link will take a lot longer than 
in dna-protocol3. Note that in this case the NS would not elicit a 
response. (But I don't know if we've talked about sending the same RS as 
in dna-protocol3, but unicasting it at L2 to an old router address. That 
would require some tweaks to get an immediate response instead of a 
delayed response.)

  - The problem of expensive multicast/broadcast isn't unique to DNA. 
IEEE 15.4 will see this when I initially power on devices that do RFC 
2461, since each multicasts an RS. A solution to that might be some 
combination of L2 and 2461 optimizations. At the recent 6lowpan interim 
meeting we talked about the possibility of having the 15.4 coordinator 
be an IPv6 router, in which case packets sent to the "all routers IPv6 
multicast address" can be unicast to the coordinator's L2 address. Such 
things can presumably be covered in an IPv6-over-foo document that gets 
some careful review.

> 4) When one gets an RA containing prefixes that one has not seen
>    before, is the algorithm in CPA sufficient or ideal? Are there
>    other probes one could send? E.g., upon indication that a new prefix
>    is available, send additional probes quickly to try and flush out
>    things more quickly? Or, should the host keep additional info in
>    LIs, e.g., keeping track of which router advertised what prefixes,
>    so that it can distingish "prefix from previously unknown router"
>    vs. "new prefix from router we've received RAs from before"?

It the routers don't have the DNA extensions, then such probes would 
serve to purpose since the multicast RAs they trigger are rate limited 
to one every 3 seconds.
And if the routers do implement the DNA extensions, then the host can 
always tell from the first RA whether it has moved or not.

> 5) When DNA determines that one is no longer connected to the same
>    link as before, one swaps configs. Care must be taken to apply
>    information learned from RAs received since last "link up" get
>    applied to correct LI.

Yes, the documents might need to be more explicit about this.
CPL tries to be explicit in its rules for the candidate link objects.

> 6) if DNA determines that was has reconnected to the same link, what
>    else needs to be done? Rerun DAD? Do nothing? Does the answer
>    depend on how long it has been since one was previously connected
>    to that link?

I think the reruning of DAD needs to be a function of how long the host 
has been disconnected and the normal packet loss on the link.
If the host is always connected to the link, DAD might still fail to 
find duplicates due to normal packet loss. Thus we don't need to do 
better than this when it comes to DNA handling being disconnected and 
later reconnected.

In reality though, with wireless links there is a floating boundary 
between being disconnected and "normal" packet loss; the further away 
from the AP the station is, the higher packet loss. And when going even 
further, after some time the host NIC determines that the AP is 
unreachable. But I don't know if there is an upper bound on the time 
between the host having a 90, 99, or 100% packet loss until it declares 
the AP gone and the link down.

So I think it is time to pull a number out of a hat. How about 30 seconds?


> 7) Is there ever a time when one should discard still-valid prefix
>    information on a link? That is, suppose a bogus router shows up and

You mean it sets the "bogus" bit in the RA?
The host has no way, unless we run SEND, to tell apart the good, bad, 
and ugly routers.

>    sends out an RA with prefix information with a long Lifetime. This
>    information may stay associated with a link for weeks, because no
>    router ever advertises the prefix with a Lifetime of 0. Pre-DNA,
>    that info would presumably get discarded more quickly because one
>    doesn't keep old LI information around. But with DNA, we may find
>    this happens and causes problems. 

You seem to say DNA would make this worse, and I don't understand how.
If a host remains attached to the same link forever, then the bad 
information will stay in its prefix list until it times out.
Clearly DNA can't be worse than that; the information is only used when 
attached to the link where the bogus router lives.

> For robustness, it seems like it
>    might be desirable to discard prefix information in the _absence_
>    of RAs containing that prefix, in the case where other RAs continue
>    to be received with other prefix information. I.e., if one knows
>    that one has missed (say) N RAs, time out the info if one is
>    receiving other info from other routers.  (note: the situation is
>    further aggrevated by having routers cache RAs learned from other
>    routers.)

The DNA WG talked about this in the context of flash renumbering and 
immediate reassignment, but is is a bit problematic even in that context 
because we are making prefixes disappear faster than RFC 2461 intended. 
Thus things might become more brittle.

We can't guard against malicious routers unless we run SEND, right?
If we run SEND and still have a problem with bogus prefixes being 
advertised, the operator's setup is broken AFAIK.
(When shim6 is deployed it would probably "route around" such 
brokenness, by discovering that the prefix/address doesn't work 
end-to-end, which would given some higher robustness, but still, this 
doesn't get any worse with DNA AFAIK.)

    Erik