[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[DNA] Model: how to treat "link down" events
In thinking through this, one thing that is not clear to me is what
the model is for what to do when a "link down" event takes place. This
is actually important to understand as it plays into how to handle
"link up" events.
RFC 2461/2462 defines the IPv6 behavior for configuring a link with
IPv6 information such as addresses, MTU, on-link prefixes, etc. When
a link goes down, conceptually one disables the interface and discards
all associated config info. However, this has the negative side-effect
of causing upper-layer protocols using those addresses to possibly get
"hard errors" and terminate connections when sending
traffic. [Actually, is this true, or are things more nuanced, i.e.,
does one get errors when sending but here is no route, or when sending
but using an invalid source address, or???] In the common case where a
link goes away for a few seconds and then comes back up connected to
the same link (with the same associated config information) this is
obviously undesirable.
Q: what is a good model for what to do with the config information
assocated with an interface that is going down? In some cases (e.g.,
administrative shutdown of an interface) returning hard errors to
upper-layer protocols is probably OK. But for wireless hiccups, it may
not be. And if I unplug a wired ethernet from the jack, should that
automatically cause all connections using the address on that
interface to fail? Not clear. There are certainly times when I wish
that didn't happen...
One possible model:
- some types of "link down" events are absolute (like an
administrative shutdown)
- some "link down" down events should be treated suspiciously (e.g.,
wireless) and should not result in (immediate) error to
upper-layer protocols.
- by default, one should not return hard errors immediately (in
anticipation of the link coming back up) but one can do so after
some timeout. So, should we say after N minutes, an interface is
really dead?
- can we provide guidelines for what to do in general?
Proposal:
There are two types of errors that can be returned upon sending a
packet:
hard error: means a failure of a type for which retrying is likely to
result in the same error, and upper layer protocols are OK
giving up.
soft error: a temporary failure occured, but retrying a short time
later may result in a different result. I would put "ICMP
dest unreachables" into this category.
When a link goes down (i.e., via a "link down" event), soft errors
should be returned when a) sending through the interface, or b) if
using addresses associated with that interface. Soft errors should be
returned from some short time. After (say) 2 minutes, hard errors can
be returned.
Does this make sense?
Note: in terms of DNA, when a "link up" event takes place, I would
imagine that we could distinguish between a "link up" event that
happens within (say) 2 minutes of a "link down" or after a much longer
time period, possibly treating the behavior somewhat differently.
Thomas