[RndTbl] very strange DNS errors EUREKA! SOLVED!!

Trevor Cordes trevor at tecnopolis.ca
Thu Apr 21 01:46:57 CDT 2016

The problem: IPv6!  Argh!

After that named debug I just posted I poked around named.conf's debug
options a bit more and saw that there were whole classes ("category")
of logs I could enable.  So I turned everything on, level 10 and hit
the bug.  Tada!  Plain as day there's my problem staring me in the face:

21-Apr-2016 01:06:05.188 lame-servers: info: network unreachable resolving 'brandoneagles.ca/A/IN': 2607:0:2:4:216:36:178:2#53
21-Apr-2016 01:06:05.188 query-errors: debug 1: client (brandoneagles.ca): query failed (SERVFAIL) for brandoneagles.ca/IN/A at query.c:7769

lame-servers network unreachable for an /A/ record attempting to use a
distinctly IPv6y-looking address.  One minute later after a quick
google, I see the solution, must run BIND in "-4" mode.  It's not
enough to "disable" v6 by not listening on any v6 address.  You must
switch BIND into a "completely 4, never use 6" mode.

On Fedora or RHEL/CentOS/Oracle, edit /etc/sysconfig/named, add:

Problem completely disappears immediately upon repeated tests on many
different boxes.

(Hilarious, I probably would have seen ICMP messages had my tcpdump or
ip6tables been set to log them!  I guess tcpdump is just v4 by default?)

So the reason the original symptom seemed to get worse is that all
these various NS's must slowly be turning on IPv6 DNS serving,
one by one.  And the reason it doesn't happen on every query is some
sort of round-robin or random server picking must be going on by either
BIND or upstream DNS servers that are randomly telling me to use my
IPv6 interface to look up an A record?  Dunno, that's my guess.

Hmm, gets me wondering still why the "big guys" I added to my earlier
tests, who almost certainly also have v6 enabled, never SERVFAILd?
Perhaps there is a DNS bug on the remote side where the buggy guys are
sending me AAAA records when they aren't supposed to?  Looking at the
other named debug output I can see that, in this case, they send me
back just before the final lookup:

;ns1.westmancom.com.    172800  IN      A
;ns1.westmancom.com.    172800  IN      AAAA    2607:0:2:1:216:36:128:2
;ns2.westmancom.com.    172800  IN      A
;ns2.westmancom.com.    172800  IN      AAAA    2607:0:2:1:216:36:128:3

And almost certainly my resolver is picking one of records at random,
failing 50% of the time when it chooses AAAA.

Why is the v6 failing on my boxes you ask?  Because I have it as "off"
as you can be in Fedora without messing up the whole OS (learned this
the hard way a long time ago):

#ip6tables -L -n -v
Chain INPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source destination
  239 12170 ACCEPT     all      lo  *       ::/0                 ::/0

Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source destination

Chain OUTPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source destination
  239 12170 ACCEPT     all      *   lo      ::/0                 ::/0

On purpose.  Last I checked, me and my customer's Shaw and MTS (some
residential!) connections didn't provide IPv6 ability/routing, but then
again, I haven't checked in forever.  If the ISP's I work with don't
support it, v6 is completely a moot point for me (no, I don't want to
tunnel).  So I always do my best to disable v6.  I don't want to be on
the bleeding edge of networking, I just want stuff to work and be as
secure as possible.  Translating my massive v4 iptables ruleset into
v6 is for a day I hope to long postpone.

I recall a long time back I had the same problem with squid, and had to
figure out how to de-v6 squid to get things to work.  I guess nearly
every daemon will need this at some point.

I wish the world would just pick a day where we switch off IPv4
completely and are all forced to use IPv6.  I really hate the current
hybrid approach.  I would hazard a guess 95% of the home user / small
business world (where I dwell) will be using IPv4 for 95% of their
traffic for another 10 years.  They said 4 was dead 20 years ago and
here we are, still 95/95+.  Heck, forget 10 years, it may be forever!

The day I switch is the day that a) some "real life" important servers
go 6-only, and b) every server in the world supports 6.  On that day I
will change everything to 6 and disable 4.  If the day comes where (a)
occurs before (b) I will *not* be a happy camper as that means we'll
all be forced to hybrid.

Good luck to all the valiant early adopters of 6 trying to fight the
good fight, hopefully you'll make "switchover day" a much easier
battle.  Until then, OPTIONS="-4"!!  (And thanks IPv6 for wasting 3 more
of my evenings!)

More information about the Roundtable mailing list