Resolution on intermittent issues [resolved]

The source of the intermittent and transient DNS resolution issues that affected some of our clients Saturday and Monday has been determined and resolved as of late Tuesday evening (EST). We have been monitoring the matter closely since then (as is necessary when a problem is both transient AND intermittent) and appreciate your patience while we were making sure all was well.

Please contact helpme@easydns.com if you have any questions or concerns.

Again, we apologize for any inconvenience, and appreciate your patience.

 

Update on Intermittent DNS issues on legacy platform. [resolved]

We’ve made further adjustments to the system. While symptoms disappeared quite quickly, we continue to monitor the situation.

If, perchance, this recurs and you happen to be near a shell, please run this command:

dig +trace [domain]

and then email us those results. If you also include a “traceroute” it may be helpful as well.

The data suggests that these issues are confined to the legacy platform and we will be posting at length about this in the days and weeks to come as this has underscored the need to sunset the legacy system.

Domains on the new platform appear unaffected by these issues, so migrating your domains over to the new system is never a bad idea.

We are very sorry for these issues and are directing all efforts to preventing a recurrence.

More as it comes in.

Update: Intermittent issues appearing [resolved]

We’ve had some intermittent reports in the past short while about problems similar to the ones on Saturday.

All hands are on deck working on it, and we’ll update as soon as we have more info.

Our sincere apologies for any inconvenience, and thanks for the understanding.

Update 22:00 EST:

We’ve made further adjustments to the system. While symptoms disappeared quite quickly, we continue to monitor the situation.

If, perchance, this recurs and you happen to be near a shell, please run this command:

dig +trace [domain]

and then email us those results. If you also include a “traceroute” it may be helpful as well.

The data suggests that these issues are confined to the legacy platform and we will be posting at length about this in the days and weeks to come as this has underscored the need to sunset the legacy system.

Domains on the new platform appear unaffected by these issues, so migrating your domains over to the new system is never a bad idea.

We are very sorry for these issues and are directing all efforts to preventing a recurrence.

More as it comes in.

DNS Anomaly on Saturday – transient lookup issues [resolved]

On Saturday, July 9, between 5:30pm EST and 6:30pm EST (21:30 – 22:30 UTC) we had a spike in users reporting response issues on  some domains. Service degraded for the better part of an hour for some domains or (more likely) some parts of the world. As much as we hate to use the “regional outages” line, it’s looking like it.

Most of the effected domains were either: on the legacy system using the legacy nameserver delegation, or on the new system but had not yet switched their nameserver delegation to add the additional anycast constellation on the new platform.

I say most, because we have one report from a user who is fully on the new system who experienced issues.

We have had no reports from our Enterprise DNS users (if you are one and you had issues, please do let us know).

We initially believed this to be a general network issue (because we saw what looked like a corresponding spike in “* is down” reports on twitter, etc for unrelated domains around the same time). But as more data comes in, we are going to suck it up and say it: something weird happened, we think it was here.

Our response:

1) We are still gathering data and we have identified ways to enhance our own internal monitoring capabilities so that we can cross reference what we see with what external monitoring applications are seeing.

2) We recently made a change in the way we adjust our BGP announcements for individual members of the anycast constellations to optimize response times. Since this has never happened before, and we enacted this change recently, we are suspicious that this may be at the core of the problem and we have rolled that back.

3) We are laggard in our announcement here, and for that I personally apologize. When the event occurred many of the systems group conferred about the incident on Saturday night and we suspected a wider network issue, but we still made the decision to rollback the new BGP programs just to be on the safe side. It was only after we saw more reports roll in today from users that we had to rethink our stance around this and accept that this was most likely our problem and not a network flap or outage.

We’re sincerely sorry to all that were affected and for the delay in this posting.

And now for something less confusing…

One of the more common criticisms of the new interface we’ve been hearing back from our members is that the Domain Overview module was “too busy” with too much stuff  making it hard to find what you’re looking for. (more…)