Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Overnight, our networking dept patched some systems, which unexpectedly caused a connecting system unable to work. That system was our alerting layer, which didn't/couldn't send out the alerts (phone calls, Teams messages, emails, etc) that alerting wasn't working.
This morning when networking came in, they saw the issue (our backup alerting system was sending emails all night long).
Instead of "Oh no, maybe we should have a process in place to verify patching X systems doesn't degrade Y systems", the various teams are dog-piling on alerting (my responsibility). VPs are now getting involved. They are saying things like "There should have been a monitoring system to monitor the alerting!!!". Which there is, the email back up alerting. Must be a dozens of messages in the team chat all pointing the finger that 'alerting should have worked', even though *those server clusters were all down*. My boss tried to chime in with common sense saying "If our infrastructure team can't guarantee 100% uptime on the clusters, then this will happen again. The issue happened once in the 5+ years we've been using this framework. We can spend time and money creating yet another monitoring system, which could fail too, or accept the reality that sometimes things break. We fix it and do what is reasonable so the issue doesn't happen again. In my opinion, paying for another solution isn't feasible in this situation."
Team chat is silent right now, but my spidey sense is tingling.
rant
mmq