On September 19, 2019 at 11:33am PST, POS Portal platforms experienced an outage causing services to be disrupted. Service was restored at 12:06pm PST. The outage was caused by a long running database script. This caused database blocking resulting in the application becoming unresponsive.
When the outage started, the Corporate Infrastructure team was immediately alerted of the issue from the application monitoring platform and began taking corrective action.
The defined procedures for communicating outages and severity were not followed during the outage.
After reviewing the incident, the following steps will be taken:
1. POS Portal management will conduct a training session and update documentation regarding communication procedures during and after an incident, including posting updates to POS Portal’s System Status page.
2. POS Portal to review the transaction which caused the long running database script and assess risk mitigation or avoidance strategies. The transaction in question has been analyzed as a rare order scenario, and POS Portal will rectify to ensure the order scenario is handled in an optimal method.
Outage Window: 11:33 AM PST till 12:06 PM PST; Total Duration of 33 Minutes
Services Impacted: Clients utilizing Portal Access, Merchant Control Center, and POS Portal API’s, and ecommerce sites.
Root Cause: Long Running database script