The self-service unpause is brilliant. The worst thing about hitting these sorts of limits is that time window when you think you've fixed the problem but you can't check because you're throttled - so there's nothing you can do but wait. Giving literally any affordance so that a human can make progress with a fix removes this huge source of frustration.
Happy to be running Caddy on a growing number of servers instead of renewing certs through certbot. Caddy has really good defaults and does the right thing with TLS certs without much hassle. Less moving parts too.
My server got renewal halted. I rolled my own wrapper for certbot. Idk it's just a blog, I'm not that attached. It hit some rock a few months ago, I just retried and manually installed it, and it seems to have perked back up and continued receiving certs. Probably would have been more frustrating if it were a huge fleet but, it wasn't even worth my time to check logs and figure out what precisely happened (cert distributed with a modified that didn't match the ASN.1 expiry? transient issuance failure? issues the same cert? ...who knows.)
Thanks for all the work that goes into this crucial service!
3% and "3,200 people manually unpaused issuance" does seem much higher than expected to me and no cause for celebration, especially at this scale.
Are there no better patterns to be exploited to identify 'zombies'? Running experiments with blocking and then unblocking to validate should work here.
I guess this falls into the bucket of: sure we can do that, given sufficient time and resources
Does the Unpause button have a CAPTCHA, because it's only a matter of time when software will try to auto-unpause if there's a failure... and the cycle repeats. Hence CAPTCHA on the button should at least discourage software devs from automating the process of unpausing.
The self-service unpause is brilliant. The worst thing about hitting these sorts of limits is that time window when you think you've fixed the problem but you can't check because you're throttled - so there's nothing you can do but wait. Giving literally any affordance so that a human can make progress with a fix removes this huge source of frustration.
I really appreciate the thoughtful and non-punitive approach, and intend to add your self-service-unpause approach to my own arsenal of tricks.
Happy to be running Caddy on a growing number of servers instead of renewing certs through certbot. Caddy has really good defaults and does the right thing with TLS certs without much hassle. Less moving parts too.
My server got renewal halted. I rolled my own wrapper for certbot. Idk it's just a blog, I'm not that attached. It hit some rock a few months ago, I just retried and manually installed it, and it seems to have perked back up and continued receiving certs. Probably would have been more frustrating if it were a huge fleet but, it wasn't even worth my time to check logs and figure out what precisely happened (cert distributed with a modified that didn't match the ASN.1 expiry? transient issuance failure? issues the same cert? ...who knows.)
As they have the account email, they could also notify of the issue by email when there are too many issues renewing for too long.
I highly appreciate their saintlike patience to my buggy cronjobs and snippy requests.
Thanks for all the work that goes into this crucial service!
3% and "3,200 people manually unpaused issuance" does seem much higher than expected to me and no cause for celebration, especially at this scale.
Are there no better patterns to be exploited to identify 'zombies'? Running experiments with blocking and then unblocking to validate should work here.
I guess this falls into the bucket of: sure we can do that, given sufficient time and resources
I'm kinda surprised they bothered, it's only caught 100,000 out of the 600,000,000 domains they handle?
Does the Unpause button have a CAPTCHA, because it's only a matter of time when software will try to auto-unpause if there's a failure... and the cycle repeats. Hence CAPTCHA on the button should at least discourage software devs from automating the process of unpausing.
[dead]
I’m curious if they could send emails to accounts indicating that they plan to shut off their access?