Guidelines for #FT-Tech-Incidents and the New “Incident Response” Slackbot#

Overview#

The #FT-Tech-Incidents Slack channel is used for managing service-affecting incidents and for notifying people about changes and releases likely to generate alerts or impact service. It is monitored by Operations Support 24/7.

Posting into this channel usually acts as a trigger for us to start putting a layer of management around the incident such as posting a message up on Status Page, pulling in people to help, and creating an incident using the new “Incident Response” Slackbot.

FT-Tech-Incidents should be periodically updated with key events during the lifecycle of the incident.

FT-Tech-Incidents: Channel Purpose#

FT-Tech-Incidents should be the place where anyone can look to find out information about incidents affecting live production services.

If there’s something broken, that affects people and it’s of importance, there’s an expectation that it features in this channel.

Use of Separate Team and Product Channels#

Channels dedicated to particular products, services, or teams are great but they create a silo’ed approach when it comes to managing and tracking incidents.  Therefore we suggest that they are not used for managing incidents.

Instead, please post details in FT-Tech-Incidents and consider using our new “Incident Response” Slackbot to fully manage the response – further guidance here.

The “Incident Response” Slackbot will post details of the incident into FT-Tech-Incidents and if requested, create a separate comms channel for you. Interested parties and people can be added as needed, the same as with any other channel.

A simple “/incident” command in any channel will present you with a simple form to complete, capturing key details about the incident.

Rules of Engagement #

·         General: If reporting an incident please make things clear and concise.  The new “Incident Response” Slack bot will guide you on the sorts of information that we will find useful.  Highlight the affected service, the business impact, and the activities that are being undertaken to resolve the issue.

 ·         For changes or releases: Use the change or release number and a clear description of the work and likely impact (e.g. CR0025000 – Carrying out failover of  UPP Cluster – Ignore any related alerts saying the EU is unavailable).

Expected Response#

Operations Support will respond to issues posted in this channel so the information must be clear.

They will use the Information provided to make decisions on what to do next so please avoid open-ended posts and ambiguity.

Contacting us outside of FT-tech-Incidents:-

Telephone: 4500

Email: Operations.Support@ft.com

Slack Team: @ops-support

Slack (Non-Urgent): #Ops-Team-Chat