wiki'd

by JoKeru

Google DNS Down - 18 March 2016

If you're using the Google DNSs 8.8.8.8 and 8.8.4.4, you might have an issue and you don't know about it :)

Here is what happened:

  • I have some servers configured to use these DNSs (and I've been using them for ages now)
  • I have some scripts fetching API data using linux cURL
  • these scripts are sending alarms in case of failure - Monitoring is important !
    • and this morning my inbox was full of alarms

I've quickly pinpointed the cause of the errors:

root@node-115:/# curl google.com -v
* getaddrinfo(3) failed for google.com:80
* Couldn't resolve host 'google.com'
* Closing connection #0
curl: (6) Couldn't resolve host 'google.com'

Hmmm, Google couldn't resolve google.

But running a dig command, it worked:

root@node-115:/# dig google.com @8.8.4.4
;; Truncated, retrying in TCP mode.

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> google.com @8.8.4.4  
;; global options: +cmd  
;; Got answer:  
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 717  
;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:  
;google.com. IN A

;; ANSWER SECTION:  
google.com. 299 IN A 173.194.123.72  
google.com. 299 IN A 173.194.123.78  
google.com. 299 IN A 173.194.123.67  
google.com. 299 IN A 173.194.123.69  
google.com. 299 IN A 173.194.123.68  
google.com. 299 IN A 173.194.123.64  
google.com. 299 IN A 173.194.123.66  
google.com. 299 IN A 173.194.123.70  
google.com. 299 IN A 173.194.123.73  
google.com. 299 IN A 173.194.123.65  
google.com. 299 IN A 173.194.123.71

;; Query time: 25 msec  
;; SERVER: 8.8.4.4#53(8.8.4.4)  
;; WHEN: Fri Mar 18 06:46:38 2016  
;; MSG SIZE rcvd: 204  

dig also revealed an issue: Truncated, retrying in TCP mode, but I didn't notice it at that time

Here is the tcpdump capture when running the dig command:

06:46:38.152851 IP x.x.x.x.33479 > 8.8.4.4.53: 59819+ A? google.com. (28)
06:46:38.176171 IP 8.8.4.4.53 > x.x.x.x.33479: 59819-| 0/0/0 (28)
06:46:38.176431 IP x.x.x.x.49300 > 8.8.4.4.53: Flags [S], seq 664740351, win 14600, options [mss 1460,sackOK,TS val 4082166890 ecr 0,nop,wscale 9], length 0
06:46:38.199615 IP 8.8.4.4.53 > x.x.x.x.49300: Flags [S.], seq 4111433272, ack 664740352, win 42540, options [mss 1430,sackOK,TS val 1427658912 ecr 4082166890,nop,wscale 7], length 0
06:46:38.199646 IP x.x.x.x.49300 > 8.8.4.4.53: Flags [.], ack 1, win 29, options [nop,nop,TS val 4082166895 ecr 1427658912], length 0
06:46:38.199753 IP x.x.x.x.49300 > 8.8.4.4.53: Flags [P.], seq 1:31, ack 1, win 29, options [nop,nop,TS val 4082166895 ecr 1427658912], length 30717+ A? google.com. (28)
06:46:38.222951 IP 8.8.4.4.53 > x.x.x.x.49300: Flags [.], ack 31, win 333, options [nop,nop,TS val 1427658935 ecr 4082166895], length 0
06:46:38.225278 IP 8.8.4.4.53 > x.x.x.x.49300: Flags [P.], seq 1:207, ack 31, win 333, options [nop,nop,TS val 1427658938 ecr 4082166895], length 206717 11/0/0 A 173.194.123.72, A 173.194.123.78, A 173.194.123.67, A 173.194.123.69, A 173.194.123.68, A 173.194.123.64, A 173.194.123.66, A 173.194.123.70, A 173.194.123.73, A 173.194.123.65, A 173.194.123.71 (204)
06:46:38.225302 IP x.x.x.x.49300 > 8.8.4.4.53: Flags [.], ack 207, win 31, options [nop,nop,TS val 4082166902 ecr 1427658938], length 0
06:46:38.225723 IP x.x.x.x.49300 > 8.8.4.4.53: Flags [F.], seq 31, ack 207, win 31, options [nop,nop,TS val 4082166902 ecr 1427658938], length 0
06:46:38.248930 IP 8.8.4.4.53 > x.x.x.x.49300: Flags [F.], seq 207, ack 32, win 333, options [nop,nop,TS val 1427658961 ecr 4082166902], length 0
06:46:38.248959 IP x.x.x.x.49300 > 8.8.4.4.53: Flags [.], ack 208, win 31, options [nop,nop,TS val 4082166908 ecr 1427658961], length 0

The first 2 lines are for the UDP request, and we can see the reply we get is 0/0/0 and then it's switching to TCP.

And here is the tcpdump capture when running the curl command:

06:47:16.561123 IP x.x.x.x.56375 > 8.8.4.4.53: 52634+ A? google.com. (28)
06:47:16.561137 IP x.x.x.x.56375 > 8.8.4.4.53: 42403+ AAAA? google.com. (28)
06:47:16.584480 IP 8.8.4.4.53 > x.x.x.x.56375: 52634-| 0/0/0 (28)
06:47:16.584498 IP 8.8.4.4.53 > x.x.x.x.56375: 42403-| 0/0/0 (28)
06:47:16.584590 IP x.x.x.x.48556 > 8.8.4.4.53: 52634+ A? google.com. (28)
06:47:16.584606 IP x.x.x.x.48556 > 8.8.4.4.53: 42403+ AAAA? google.com. (28)
06:47:16.607671 IP 8.8.4.4.53 > x.x.x.x.48556: 52634-| 0/0/0 (28)
06:47:16.607731 IP 8.8.4.4.53 > x.x.x.x.48556: 42403-| 0/0/0 (28)
06:47:16.607781 IP x.x.x.x.55720 > 8.8.4.4.53: 54224+ A? google.com. (28)
06:47:16.607793 IP x.x.x.x.55720 > 8.8.4.4.53: 3857+ AAAA? google.com. (28)
06:47:16.630826 IP 8.8.4.4.53 > x.x.x.x.55720: 3857-| 0/0/0 (28)
06:47:16.630913 IP x.x.x.x.58704 > 8.8.4.4.53: 54224+ A? google.com. (28)
06:47:16.630928 IP x.x.x.x.58704 > 8.8.4.4.53: 3857+ AAAA? google.com. (28)
06:47:16.630945 IP 8.8.4.4.53 > x.x.x.x.55720: 54224-| 0/0/0 (28)
06:47:16.654146 IP 8.8.4.4.53 > x.x.x.x.58704: 3857-| 0/0/0 (28)
06:47:16.654164 IP 8.8.4.4.53 > x.x.x.x.58704: 54224-| 0/0/0 (28)

Conclusion:

  1. cURL uses UDP-only when resolving the URLs
  2. Google DNSs - or at least the ones in my geo area (see traceroute below) are not serving requests over UDP
  3. switching to a different resolver, OpenDNS for example (208.67.222.222 and 208.67.220.220), fixed my problem until Google fixes theirs
root@node-115:/# traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 10.8.13.9 (10.8.13.9) 0.764 ms 1.326 ms 1.376 ms
2 10.8.34.253 (10.8.34.253) 0.625 ms 0.687 ms 0.687 ms
3 10.8.40.233 (10.8.40.233) 0.144 ms 10.8.40.229 (10.8.40.229) 0.187 ms 10.8.40.233 (10.8.40.233) 1.917 ms
4 10.8.25.197 (10.8.25.197) 0.212 ms 0.221 ms 0.257 ms
5 207.86.156.45 (207.86.156.45) 15.464 ms 207.86.157.49 (207.86.157.49) 0.286 ms 207.86.156.45 (207.86.156.45) 15.450 ms
6 216.156.1.126.ptr.us.xo.net (216.156.1.126) 13.294 ms 13.264 ms 216.156.0.253.ptr.us.xo.net (216.156.0.253) 15.731 ms
7 ae0d0.cir1.chicago2-il.us.xo.net (207.88.13.250) 13.134 ms 13.102 ms 12.952 ms
8 216.1.94.142 (216.1.94.142) 13.133 ms 13.267 ms ae0d0.cir1.chicago2-il.us.xo.net (207.88.13.250) 13.014 ms
9 216.1.94.142 (216.1.94.142) 13.259 ms 209.85.143.154 (209.85.143.154) 13.163 ms 209.85.143.186 (209.85.143.186) 13.102 ms
10 209.85.241.47 (209.85.241.47) 13.899 ms 209.85.254.128 (209.85.254.128) 13.873 ms 209.85.254.120 (209.85.254.120) 13.692 ms
11 72.14.233.68 (72.14.233.68) 13.449 ms 209.85.247.10 (209.85.247.10) 23.052 ms 22.934 ms
12 72.14.234.81 (72.14.234.81) 24.983 ms 216.239.47.45 (216.239.47.45) 23.377 ms 209.85.247.4 (209.85.247.4) 23.355 ms
13 216.239.49.25 (216.239.49.25) 23.135 ms 72.14.234.81 (72.14.234.81) 24.944 ms 216.239.43.219 (216.239.43.219) 23.540 ms
14 google-public-dns-a.google.com (8.8.8.8) 23.089 ms 23.088 ms *

there are other people complaining also @https://twitter.com/search?q=google%20dns

Comments