What is the thing you least want to see when making a complicated transaction on the internet or intranet? My guess is that it’s a pop-up window saying “Please wait” and an hour glass turning. Is that because the systems are slow, you’ve lost your network connection or are your credit card details in the process of being transferred to the Russian mafia. Whatever the irritation to an end user might be, the serious fact is that these glitches in performance are mostly invisible to those who operate networks and systems and are therefore unlikely to be picked up and dealt with. Poor customer experience can however cause lost business and if not dealt with can cause further harm by eroding customer trust and loss of future business.
The weakness of inside outwards monitoring is that it can only monitor what it sees, which means that failed network connections, high retransmission rates at the edge and poor client design and performance are invisible to it. Yet these are the areas most likely to cause user frustration. The solution is to adopt an outside in approach to monitoring, and Tivoli’s (BigFix) endpoint agents provide a means of doing so.
At Pulse in February 2012 we demonstrated an approach to wifi and other mobile management in which TEM agents reported the quality of their network connections and these reports were correlated by access point. A video of the Pulse demo can be seen at http://ibmtvdemo.edgesuite.net/software/tivoli/demos/TEM-Netcool_demo/index.html
This is a relatively straightforward use of the agents as they can query these network statistics using the internal system calls provided by operating systems such as Windows and Linux (and Android as well though not yet). But it is possible to extend the concept if client tools are designed to log actions made against their servers along with the responses. The key thing though is to provide a robust method of exception management, it’s the failure conditions we are interested in, not the successes. We should not expect endpoint agents to apply complex thresholds however, identifying exceptions is better done at a central point, either in near real time or through running advanced analytics on 24 or 48 hours worth of records.
What we want is to know when one of our users is looking at that “please wait” pop-up, even more so when a dozen or more are, and to have the means to identify the cause of our users’ woes. When we have that we can start to talk of monitoring the customer experience.