How many times has some aspect of your website stopped working, and you have no idea how long it has been broken? Ensuring your customers can access your website and services without any technical difficulties is essential for maintaining business continuity.
Measuring website uptime is the first step to achieve this. An external service will regularly ping the website (typically the home page) and send an alert if the website is down.
This type of monitoring will alert you to catastrophic failures (datacenter outages, server failures, or site-wide errors). On being alerted, the root cause of the fault can be addressed.
There are many of these services available, some of which have a free plan. Uptime Robot is such a service. If you don’t currently have any monitoring in place, this is an easy first step; you can have monitoring setup in minutes.
However, issues with websites can be less immediately apparent than a catastrophic failure. If your website has business-critical workflows, that require multi-step user actions, the problem may be occurring within the website or connected business process. Then a simple uptime monitoring service will not alert you to this type of problem.
Modern transactional websites typically have multiple 3rd party or microservices to provide the best experience. These layers of services have the effect of creating more potential points of failure:
I am taking an example of an e-commerce store, to demonstrate the number of interconnected services. But this applies to any modern non-trivial any frontend application. Your website will likely depend on at least some of the following:
Without regularly manually testing these processes, you will be reliant on your customers alerting you to any problems on your website. If the failure point is for a backend process, this may not be noticed for a while.
If your digital presence extends beyond a single website, for example, you may have multiple websites, mobile apps or IoT all connected to business-critical processes. Then regularly manually testing these will not be practical.
The issue may have been in place for a long time before you become aware of it, potentially resulting in lost revenue and reputation from customers unable to make purchases.
If a backend process has silently failed, then the effort to manually resolve the data may be considerable.
Additionally, the report of an issue by customer services may not have all the details that a developer or system administrator requires to diagnose.
The issue may only occur under a particular set of circumstances or may be sporadic. So further investigation may be required before there is enough information to replicate before diagnoses and resolution can occur.
Firstly, you need to define all of the business-critical user journeys that you want to monitor—prioritising the configuration of the most critical.
Then assemble all the information required to complete each journey (test user login, application detail, test payment information).
Next, using Selenium, a frontend testing framework, a developer will define scripts that will automatically complete an end-to-end test of the journey that a user would complete.
We use a service called Site 24x7 by Zoho, to schedule the running of these scripts. These scripts will be run on an actual browser, enabling.
These scripts are run as often as required, as long as they don’t harm website performance.
The error alerts should contain as much information about the issue as possible (failure point, error messages, screenshots), for a developer to resolve. The alert could even automatically raise a support ticket on your agencies support desk.
Other things to consider are excluding these tests from reporting (filter from Googe Analytics), and ensure that the test account used to make the test purchases are excluded from real business processes.
The scripts don’t just have to check for errors; alerts could be sent for slow running processes.
The most resilient approach is to have active monitoring setup and additionally to have a support contract with your digital agency to resolve issues as soon as they are detected. There can be a SLAs for the issues depending on the severity
Resulting in problems being fixed before you (and your customers) are aware of them.
Ensuring that your digital service remains completely resilient.