Unable to log in to Clickfunnels platform
Incident Report for ClickFunnels
Postmortem

On April 15th, from approximately 2:00 am to 7:30 am Eastern time, there was a problem signing in to the Clickfunnels platform for many users. This was caused by a configuration issue with a system used to provide securely logged in sessions for Clickfunnels users. We know it's important for you to be able to access Clickfunnels at any time of the day or night, so we want to provide more details about what happened and what we're doing to prevent it from happening again.

On April 11th, we upgraded the system that helps provide secure logins for Clickfunnels users. The upgrade involved moving functionality from one data cluster to another. While the overall move took place without any issues or impact, one configuration value from the old cluster was not correctly set on the new cluster. Unfortunately, the impact of this misconfiguration was delayed by several days due to the way that specific configuration value works within the cluster. Basically, it helps control the rate at which we expire older data, and this caused the new cluster to slowly fill up. This wouldn't normally be a problem since we have designed the system to handle this, but your login sessions are handled differently due to the sensitive nature of them. Once the new cluster was "full", the Clickfunnels platform was unable to store the data we need to ensure your login is secure, and this prevented you from using the application. We'll always prioritize the security of your accounts, and this was an unexpected side effect of the system we use to do that.

In addition to the misconfiguration that caused the problem, it also took us too long to respond to this issue. Because the issue occurred in the late-night / early-morning hours, and it was related to a very small amount of our overall traffic, our normal monitoring and alerting did not detect the issue. Our customer support team was aware of the issue early on based on your reports, but they did not have an efficient way to notify the engineering team. Once the engineering team was actively working on the issue, it was resolved in approximately 40 minutes.

So, here's what are we doing to prevent this from happening again, and to ensure issues don't take as long to be resolved if automated alerting doesn't notify our team.

First, we've updated our internal documentation to ensure this configuration value is set properly on any clusters it applies to. As an additional check, we've also worked with the company that provides these clusters to us to ensure their documentation also reflects the importance of setting this configuration value correctly for our needs.

Second, we're actively developing new monitoring tools that will regularly perform many of the actions you take when interacting with the Clickfunnels platform. These monitors will be logging in, looking at statistics, etc... and will generate alerts when there are problems. This will be an ongoing effort to make sure we don't miss critical functionality we know you rely on.

Finally, we've implemented a new process to enable our customer support team to reach out directly to our on-call engineers. We do our best to monitor and alert on every aspect of the Clickfunnels platform automatically, but sometimes that's not enough or we miss the mark. This new process helps to minimize the time involved in resolving a problem that isn't automatically detected.

We take your ability to use Clickfunnels incredibly seriously, and we know this incident wasn't a great experience for you. We're actively working to prevent it from happening again, along with improving our ability to respond to future issues of any kind in the timely manner you have come to expect from us. Thanks for your continued use of Clickfunnels, and your understanding as we strive to make your experience the best it can be.

Posted 4 months ago. Apr 16, 2019 - 17:59 MDT

Resolved
Please see the Postmortem for details of this incident.
Posted 4 months ago. Apr 15, 2019 - 05:30 MDT