[RndTbl] a weird DNS issue this morning
theodore at ciscodude.net
Fri Jan 13 12:48:57 CST 2017
Inspired by some of Trevor's recent git bisecting to find issues I lay out
the details of an issue I discovered recently.
I had a super super weird DNS issue this morning
I got notice that there was some new powerdns releases, both recursive and
authoritative. These were also noted on debian security announced, so I
installed all of them on my various servers without thinking too much.
Almost immediately after upgrading the ones in Winnipeg I started getting
bombarded with notifications, mostly about one particular zone check
failing -- ciscodude.net.
I started investigating my various local resolvers trying to nail down the
problem, and I noticed a lot of records missing all over the place, and
missing inconsistently too. An A record for something would be there but
not its AAAA. NS records would be missing or incomplete. I used the handy
`rec_control trace-regex <domain>` on the recursors. This started to show
very weird things, like my closest nameserver returning different zone's NS
records when queried for NS records.
How could this even be possible I thought?
I then started `dig`ing the same queries against the suspect nameserver,
and YES indeed it was returning NS records from a whole different zone!!!
So, of course when powerdns recursor checked with the 2nd, 3rd and 4th
nameservers, they disagreed with what had been returned by the first one.
This caused DNS for Ciscodude.net to fail, and some other scattered fallout
for other domains that used ciscodude.net nameserver records (most of my
domains) while the primary served random records.
Investigating the issue on #powerdns IRC, it was quickly identified that
the configuration of 2 backends on this one server (mysql + bind files) was
likely causing the issue, and then I was quickly pointed to a commit, and
then a second commit  for 4.x which likely introduced the bug. I was
also pointed to the package archives where I was able to install the
previous version for now on that one server to get things back up and
Github was simultaneously having an HTTP issue of some sort and so I
haven't been able to compile the sources minus that particular commit yet
to confirm that the commit is indeed the problem.
Theodore Baschak - AS395089 - Hextet Systems
https://ciscodude.net/ - https://hextet.systems/
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Roundtable