Calculating Reactive Hours per Endpoint per Month (RHEM): A Process Overview
Are your Techs working efficiently? Discovering the answer to this question takes a Reactive Hours per Endpoint per Month (RHEM) calculation, and while half of the calculation speaks to engineering excellence more than the heart of the Technician, indeed, both are needed to be working efficiently. In this week's article, we will take you through the entire Reactive Hours per Endpoint per Month process.
RHEM is the #1 metric used to compare managed service operations and how efficiently the MSP delivers services that match up with the cost side of MRR.
The biggest challenge is finding the data in the PSA tools. Based on our experience over the last four years working with MSPs from around the world, the vast majority of MSPs are not using their PSA tools in a way that produces meaningful RHEM data.
Enter your details below for a sample of our most requested reports including a sample of an Advanced RHEM Report.
Creating / Finding the RHEM data takes a full understanding of what RHEM means:
1) Reactive Hours (time spent working on remediating incidents)
2) Endpoint (the device with which the incident is associated)
3) Month (period of time for this metric matching MRR)
Reactive vs Proactive Hours: How are you segmenting them?
By the very definition of reactive vs. proactive, we're talking about Break/Fix or Incidents. Most MSPs don't segment their work even at the most basic level of separating Reactive Work from Proactive Work. So, the first step in helping the Techs to work more efficiently is to segment Client requests into at least 11 different workflows (not to be confused with Workflow Rules) using the priority field and associated SOP/SLAs, bringing all non-project requests under SLA management, holding the Techs accountable for Real-Time Time Entry (RTTE), and providing them with proactive, dynamic dashboards that organize their day so they can focus on meeting customers' expectations.
If these are in place, the MSP is ready to drive efficiency improvements using the RHEM calculation. And of course, Reactive Work is separated, so determining the Reactive Hours is as easy as filtering the data to the incident priorities of Critical through Standard (we'd never tell a Client they are a "Low" priority).
However, filtering on the Incident Priorities for all Clients is not the answer. RHEM is an MSP's MRR efficiency calculation for two reasons:
1) We need a full understanding of the customers' networks to which we are responding.
2) We need to know the number of endpoints for which we are responsible.
So, additional filtering to limit the reactive work to those with a contract on the ticket is needed (you have all Managed Service Clients set up with a default Service Desk Contract, right?).
Endpoints: How do you find the data?
Maybe you have had a clear understanding of what is and what is not an endpoint, but this is the area where Advanced Global has struggled. Not only with the definition of an endpoint, but how to find the data.
Finding the data either takes Configuration Items / Installed Assets populated in the PSA tool, or data pulled from the RMM tool and hardcoded into the report. Finding the number of endpoints ensures the RHEM calculation is a current snapshot of the customer's environment. There's no historical audit information on the number of endpoints that were under management at some point in the past; that we are aware of.
Here is Webroot's definition of an endpoint:
"An endpoint is any device that is physically an endpoint on a network. Laptops, desktops, mobile phones, tablets, servers, and virtual environments can all be considered endpoints. When one considers a traditional home antivirus, the endpoint is the desktop, laptop, or smartphone that antivirus is installed on. "
Building off Webroot's definition, we'll define an endpoint as any hardware item that's listed in Autotask configurations, including firewalls, routers and switches. We ignore monitors, keyboards, and mice as these are user interfaces of the endpoint. However, we do include peripherals like printers, scanners, cameras and PoS devices. And for the most part, we do not consider BYOD because they are not listed in either the RMM or Configuration Item list – even though, according to Webroot, they are an endpoint and most likely supported by the MSP.
Now that we have a clear understanding of what the data is, we can move forward.
Month: Are we talking about tickets created last month, tickets completed, or just time entries?
You'd think this would be the easy one: last month - what could be hard about that? Well, are we talking about tickets created last month, tickets completed, or just time entries? Tickets created doesn't work as we are talking about Reactive Hours, and a ticket created on the last day of the month might not have remediation time logged in it yet. Tickets completed will have work done in other months, but since the ticket was still open at the end of the month, the work would be considered this month, not last month. So that leaves time entries, but time entries also have an issue: will resource availability put a pinch on the hours available to react, and if resource availability is unlimited, would the number have been higher or lower? Lower!? How can it be lower? Because if resource availability is unlimited, there's less noise and disruption, and the Techs can work more efficiently and therefore less Reactive Hours per Endpoint per Month, which is the goal of this exercise, after all!
We've chosen to use the Total Hours Worked for Tickets Completed Last Month with a contract on the ticket and more than zero Total Hours Worked for the Reactive Hours portion of the calculation.
Summary:
While it sounds like we are home-free, we have the Reactive Hours for last month, and we have the current number of endpoints. Hours divided by endpoints = RHEM. But it's not that easy. While the simple division will give you a RHEM # that can be compared to MSP peers (0.25 is good, 0.20 is best-in-class), it doesn't give you enough information to improve the number. You can reduce the Techs' average Total Hours Worked or reduce the number of tickets per endpoint in the first place to improve the RHEM #. We've structured our advanced RHEM live report to provide the average Total Hours Worked per ticket and number of tickets per endpoint so it is easier to see where improvements can and should be made.
To reduce the average Total Hours Worked, the following methods can be implemented:
1) Provide training on the Sub-Issues taking the longest to remediate
2) Improve the Process(es)
a. Request segmentation
b. Critical request SOP
c. Proactive / Dynamic Dashboards
d. SLA automation
3) Improve triage process, so the data quality is better
4) Hold the Techs accountable for Real-Time Time Entry so the documentation, Client communications, and ticket status is up to date – driving Service Delivery efficiencies across the Team
To reduce the number of tickets per endpoint:
1) Leverage RMM scripting to invoke self-healing
2) Implement Root Cause Analysis (RCA) to reduce the number of times an incident reoccurs
3) Improve preventative maintenance to stabilize the Clients' networks
4) Expand end-user training
5) Document/vet remediation work strategies and push Clients to a Client portal where self-help reduces the demand on the MSP