[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [DNA] Model: how to treat "link down" events
I'd argue that tearing down TCP connections based on LINK DOWN is a bad idea
on any media. I have pulled my Ethernet cable out of the switch port, only
to connect it on another port a few seconds later. Why can't the TCP/IP
implementation be resilient to that? It's really just another type of route
flap.
Administrative disabling of an interface is a different case. There I would
be ok with tearing down connections.
>----- Original Message -----
>From: "Thomas Narten" <narten@us.ibm.com>
>To: <dna@eng.monash.edu.au>
>Sent: Wednesday, February 01, 2006 1:38 AM
>Subject: [DNA] Model: how to treat "link down" events
>
>
> > In thinking through this, one thing that is not clear to me is what
> > the model is for what to do when a "link down" event takes place. This
> > is actually important to understand as it plays into how to handle
> > "link up" events.
> >
> > RFC 2461/2462 defines the IPv6 behavior for configuring a link with
> > IPv6 information such as addresses, MTU, on-link prefixes, etc. When
> > a link goes down, conceptually one disables the interface and discards
> > all associated config info. However, this has the negative side-effect
> > of causing upper-layer protocols using those addresses to possibly get
> > "hard errors" and terminate connections when sending
> > traffic. [Actually, is this true, or are things more nuanced, i.e.,
> > does one get errors when sending but here is no route, or when sending
> > but using an invalid source address, or???] In the common case where a
> > link goes away for a few seconds and then comes back up connected to
> > the same link (with the same associated config information) this is
> > obviously undesirable.
> >
> > Q: what is a good model for what to do with the config information
> > assocated with an interface that is going down? In some cases (e.g.,
> > administrative shutdown of an interface) returning hard errors to
> > upper-layer protocols is probably OK. But for wireless hiccups, it may
> > not be. And if I unplug a wired ethernet from the jack, should that
> > automatically cause all connections using the address on that
> > interface to fail? Not clear. There are certainly times when I wish
> > that didn't happen...
> >
> > One possible model:
> >
> > - some types of "link down" events are absolute (like an
> > administrative shutdown)
> >
> > - some "link down" down events should be treated suspiciously (e.g.,
> > wireless) and should not result in (immediate) error to
> > upper-layer protocols.
> >
> > - by default, one should not return hard errors immediately (in
> > anticipation of the link coming back up) but one can do so after
> > some timeout. So, should we say after N minutes, an interface is
> > really dead?
> >
> > - can we provide guidelines for what to do in general?
> >
> >
> > Proposal:
> >
> > There are two types of errors that can be returned upon sending a
> > packet:
> >
> > hard error: means a failure of a type for which retrying is likely to
> > result in the same error, and upper layer protocols are OK
> > giving up.
> >
> > soft error: a temporary failure occured, but retrying a short time
> > later may result in a different result. I would put "ICMP
> > dest unreachables" into this category.
> >
> >
> > When a link goes down (i.e., via a "link down" event), soft errors
> > should be returned when a) sending through the interface, or b) if
> > using addresses associated with that interface. Soft errors should be
> > returned from some short time. After (say) 2 minutes, hard errors can
> > be returned.
> >
> > Does this make sense?
> >
> > Note: in terms of DNA, when a "link up" event takes place, I would
> > imagine that we could distinguish between a "link up" event that
> > happens within (say) 2 minutes of a "link down" or after a much longer
> > time period, possibly treating the behavior somewhat differently.
> >
> > Thomas
> >
> >