Adam Judd
Managing Director

88% of online consumers are less likely to return to a site after a bad experience

How many times has some aspect of your website stopped working, and you have no idea how long it has been broken? Ensuring your customers can access your website and services without any technical difficulties is essential for maintaining business continuity.

Measuring website uptime is the first step to achieve this. An external service will regularly ping the website (typically the home page) and send an alert if the website is down.

Monitor Uptime

This type of monitoring will alert you to catastrophic failures (datacenter outages, server failures, or site-wide errors). On being alerted, the root cause of the fault can be addressed.

There are many of these services available, some of which have a free plan. Uptime Robot is such a service. If you don’t currently have any monitoring in place, this is an easy first step; you can have monitoring setup in minutes.

However, issues with websites can be less immediately apparent than a catastrophic failure. If your website has business-critical workflows, that require multi-step user actions, the problem may be occurring within the website or connected business process. Then a simple uptime monitoring service will not alert you to this type of problem.

Modern transactional websites typically have multiple 3rd party or microservices to provide the best experience. These layers of services have the effect of creating more potential points of failure:

  • The external service may have an outage
  • The license may have expired
  • Payment not taken due to a credit card expiring
  • Network connectivity issue

Website Interdependent

I am taking an example of an e-commerce store, to demonstrate the number of interconnected services. But this applies to any modern non-trivial any frontend application. Your website will likely depend on at least some of the following:

  • Web Server Hosting
  • Database
  • Domain Name Hosting / DNS
  • SSL certificate
  • Load balancer
  • CDN
  • Login and Registration / User Management
  • SSO / Multi-factor authentication
  • Backend API
  • Middleware
  • Payment Gateway / 3D Secure
  • DAM - Digital Asset Management
  • Address Lookup Service
  • Product AI recommendation
  • ERP Integration
  • CRM Integration
  • Enterprise Search
  • Mail server
  • Analytics and Reporting

 

Interconnected Ecommerce Store

 

Without regularly manually testing these processes, you will be reliant on your customers alerting you to any problems on your website. If the failure point is for a backend process, this may not be noticed for a while.

If your digital presence extends beyond a single website, for example, you may have multiple websites, mobile apps or IoT all connected to business-critical processes. Then regularly manually testing these will not be practical.

The issue may have been in place for a long time before you become aware of it, potentially resulting in lost revenue and reputation from customers unable to make purchases.

If a backend process has silently failed, then the effort to manually resolve the data may be considerable.

Additionally, the report of an issue by customer services may not have all the details that a developer or system administrator requires to diagnose.

The issue may only occur under a particular set of circumstances or may be sporadic. So further investigation may be required before there is enough information to replicate before diagnoses and resolution can occur.

How to Actively Monitor

Firstly, you need to define all of the business-critical user journeys that you want to monitor—prioritising the configuration of the most critical.

Then assemble all the information required to complete each journey (test user login, application detail, test payment information).

Next, using Selenium, a frontend testing framework, a developer will define scripts that will automatically complete an end-to-end test of the journey that a user would complete.

We use a service called Site 24x7 by Zoho, to schedule the running of these scripts. These scripts will be run on an actual browser, enabling.

  • Multi-device testing
  • Multi-browser tests (Chrome, Safari, IE)
  • Multi-lingual regional testing

These scripts are run as often as required, as long as they don’t harm website performance.

The error alerts should contain as much information about the issue as possible (failure point, error messages, screenshots), for a developer to resolve. The alert could even automatically raise a support ticket on your agencies support desk.

Other things to consider are excluding these tests from reporting (filter from Googe Analytics), and ensure that the test account used to make the test purchases are excluded from real business processes.

The scripts don’t just have to check for errors; alerts could be sent for slow running processes.

Support contract with Active Monitoring

 The most resilient approach is to have active monitoring setup and additionally to have a support contract with your digital agency to resolve issues as soon as they are detected. There can be a SLAs for the issues depending on the severity

Resulting in problems being fixed before you (and your customers) are aware of them.

Ensuring that your digital service remains completely resilient.