3.7 How do we help people identify performance measures for their program or service?

The Short Answer

There are many different ways to do this (see  3.1). Here’s one approach that goes directly to performance measures themselves:

All performance measures (that have ever existed for any program in the history of the universe) fall into one of four categories, derived from the intersection of quantity and quality vs. effort and effect




What did we do?
How much service did we deliver?

How well did we do it?
How well did we deliver service?


Is anyone better off (#)?
How much change for the better did we produce

Is anyone better off (%)?color=”#0000FF”>
What quality of change for the better did we produce?

In each quadrant the questions are answered with  # or % data statements: 

  • What did we do? (e.g.  # clients served, # activities performed).

  • How well did we do it? (e.g. % timely actions, % complete actions, client staff ratio, staff turnover rate, unit cost).

  • Is anyone better off? (# and % of clients who show improvement in skills/ knowledge, attitude, behavior or circumstance).

See 3.12 How do we select the most important ” headline: performance measures?

and 3.14 What do we do with performance measures once we have them?

Full Answer

(1) The first step in any performance measurement work is to identify what organizational entity or function we are talking about.  This can be thought of as a “fence drawing” exercise. We will draw a fence around the thing whose performance is to be measured. This could be an agency, a program, a subprogram or a component unit or activity of the program.  Or it could be a function of the organization which crosses organizational lines. The idea is simple: take a picture of the organization in whatever form it makes sense to you. Draw a line around all of it or a piece of it. And consider the performance of what’s inside the fence. 

(2) Service systems and systems reform and integration: Fences can also be drawn around a set of related programs or agencies that make up a service system (e.g. the out of home care system including child welfare, juvenile justice, mental health and education), and performance measures developed for the system as a whole. This kind of process should be among the first things done in any systems reform effort. (Note: service integration and systems reform are means to the end of better results, not ends in themselves.) For example, in discussions of service integration (as a possible component of reform), we could consider the following performance measures to test whether we were making progress from the client’s perspective.


  • Average number of workers and case plans per family in the system

  • Average number of offices that clients must visit each month.

  • Average number of bus changes required for clients to get to current offices.

This kind of information could be gathered on a sample basis. Baselines could be created and the performance accountability process described in this guide could be used to drive the numbers down. Performance measures can have the effect, as in this case, of giving an operational definition to an otherwise vague notion like “service integration.”

(3) TECHNIQUE: Here is a five step process that’s  the best way to help people identify performance measures, select the most important ones and identify a data development agenda.  

Step 1.  HOW MUCH WE DO (Upper Left): Draw the four quadrants on a big piece of flip chart paper. Start in the upper left quadrant. First put down the measure “# of customers served.” in the upper left quadrant. Ask if there are better more specific ways to count customers or important subcategories of customers, and list them. (e.g. # of families served, # of children with disabilities served etc.).  Next ask what activities are performed. Convert each activity into a measure (e.g. “we train people” becomes # of people trained.) When you’re finished, ask if there are any major activities that are not listed. 

Step 2. HOW WELL DO WE DO IT? HOW WELL DO WE PERFORM THESE ACTIVITIES? (Upper Right): Ask people to review the standard measures for this quadrant that apply to most if not all programs, services or activities (e.g. unit cost, staff turnover, etc.) These are shown on the “Separating the Wheat From Chaff” worksheet (LINK HERE) in the upper right quadrant under “standard measures.” Write each answer in the upper right quadrant. Next take each activity listed in the upper left and ask if there are measures that tell whether that particular activity was performed well. If you get blank looks, ask if timeliness matters, if accuracy matters. Convert each answer into a measure and be specific (e.g. the timeliness of case reviews becomes “percent of case reviews completed on time” or “percent of case reviews completed within 30 days after opening.” 

Step 3. IS ANYONE BETTER OFF? (Lower Left and Lower Right): Ask “In what ways could clients be better off as a result of getting this service? How we would know if they were better off in measurable terms?” Create pairs of measures (# and %) for each answer (e.g. # and % of clients who get jobs above the minimum wage). The # answers go in the lower left; the % answers go in the lower right. 

There are two ways to state these kind of measures: point in time and improvement over time (e.g. % of children with good attendance this report card period vs. % of children whose attendance improved since the last report card period). 

This is the most interesting and challenging part of this process. Dig deep into the different ways this can show up in the lives of the people served. Explore each of  the four categories of “better-offness”: skills/knowledge, attitude, behavior and circumstance. If people get stuck, try the reverse question: “If your service was terrible, how would it show up in the lives of your clients?”

Look first for data that is already collected. Then be creative about things that could/should be counted and the ways in which data could be generated. It is not always necessary to do 100% reporting. Sampling can be used, either regular and continuous sampling or one time studies based on sampling. Pre and post testing can be used to show improvement in skills, knowledge or attitude. Surveys can be used which ask clients to self report improvement or benefits.

NOTE: Every performance measure has two incarnations: a lay definition and a technical definition. The lay definition is one that anyone could understand (e.g. Percentage of clients who got jobs) and a technical definition which, for percentages, exactly specifies the numerator and denominator (e.g. the number of clients who got jobs this month, divided by the total number of clients enrolled in the program at any time during the month).

Now you have filled in the four quadrants with as many entries as you can.  Next we select the most important measures and a data development agenda. Here’s a SHORT CUT way to do that:

Step 4. HEADLINE MEASURES: Identify the measures in the upper right and lower right quadrants for which there is (good) data. This means decent data is available today (or could be produced with little effort). Circle each one of these measures with a colored marker. Ask “If you had to talk about your program with just one of these circled measures, which one would it be?” Put a star by the answer. Then ask “If you could have a second measure… and a third?” You should identify no more than 4 or 5 measures. And those should be a mix of upper right and lower right measures. These choices represent a working list of headline measures for the program.

Step 5. DATA DEVELOPMENT AGENDA: Ask “If you could buy one of the measures for which you don’t have data, which one would it be?” Mark that with a different colored marker. “If you could have a second measure… and a third?” List 4 or 5 measures. These is the beginning of  your data development agenda in priority order.

(4) The longer and more thorough method for selecting performance measures involves rating each measure High Medium or Low on three criteria: Communication, Proxy and Data Power. 

Communication Power: Does the performance measure communicate to a broad range of audiences? It is possible to think of this in terms of the public square test. If you had to stand in a public square and explain the performance of this program to your neighbors, what two or three measures would you use? 

Proxy Power: Does the performance measure say something of central importance about the program (agency or service system)? Can this measure stand as a proxy for the most important things the program does? 

Data Power: Do we have quality data on a timely basis? We need data which is reliable and  consistent. And we need timely data so we can see progress – or the lack thereof –  on a regular and frequent basis. 

(5) Both methods will lead to the same list. The SHORT CUT works because the “forced choice” process leads people intuitively to think about communication and proxy power. When they do this for measures where they have data, the selected measures are the Headline Measures. When they do this for measures where they do not have data, the selected measures are the Data Development Agenda.

This process will lead to a three part list of performance measures:

Headline Performance Measures

Those 3 to 5 measures you would use to present or explain your program’s performance to policy makers or to the public.

Secondary Measures

All other measures for which you now have data. These measures will be used to help manage the program. And they will often figure in the story behind the curve for headline measures.

Data Development Agenda

Measures you would like to have. These should be listed in priority order. Since data is expensive both in dollars and worker time, you must make a judgment about how far down this list you can afford to go.

The headline measures are the starting point for using data to improve program performance. 

See 3.14 What do we do with performance measures once we have them? How can we use performance measures to improve performance? and succeeding questions.

(6) Several things to keep in mind here: It is best if the program or service, for which performance measures are developed,  has some organizational identity. Performance accountability is about holding managers accountable for the performance of what it is they manage. If the thing to be measured has no organizational identity, then there is no person or persons who can be held accountable for its performance. 

This does not mean that the thing to be measured must be a box on the organization chart or a physical unit in a single geographic location. In matrix management, for example, it can be a function that cuts across organization lines for which some person or persons has been given lead responsibility (for example budgeting or staff development, where some staff may be decentralized but the function is still managed or “lead” by someone.) It can be a program which operates in many different locations. The notion of fence drawing is flexible enough to work with any organizational structure old or new. 

(7) Second thing to keep in mind: When you are trying to teach these ideas to new people start with small units which have a clear identity. Then move on to larger units and functions without physical organizational identity. 

(8) Third: performance measurement starts with the idea of customers or clients. CUSTOMERS are people who can be made better or worse off by the services of the program. 

Performance measurement is an easier discussion for organizational entities who can clearly identify their customers. So, for example, direct service programs like child support enforcement or mentoring will have a head start on programs or activities where this discussion is unclear. 

Performance measurement of customer well-being is harder for administrative functions such as budget, personnel, general services etc. It will be necessary to spend some quality time helping these people understand/discover who their customers are. Hint: for administrative functions the customers are often the managers of the agency itself. And customer satisfaction turns out to be the most important lower right quadrant measure. (See 3.10)

(9) One of the best ways to teach this method is to conduct a “fishbowl” at the front of the room. Get four or five people to volunteer who know a particular program well. Position them in chairs in a small semi-circle at the front of the room, facing forward (i.e. back to everyone else). Conduct a short session (15 to 20 minutes) using the technique above. Periodically pause to ask if the larger audience has any questions. If time permits, break the larger group into groups of 6 and have them pick a program. One member of the group then leads the group through the 5 steps of the technique above. Depending on time, two or three rounds of this could be done. Debrief the large group. “What worked and didn’t work about this experience? What did you learn? How many think thay could lead a small group of coworkers through this thinking process?”

(10) Technical note: Some people correctly point out that client results actually have two components which parallel the difference between results and indicators at the population level, i.e. a plain language statement of client well-being (clients are self sufficient) and a measurement that describes this condition of well-being (# and % of clients who get jobs and keep them 6 months or more). In practice, these two ideas are addressed in a single step in the thinking process which asks “In what ways could clients be better off as a result of getting this service? How we would know if they were better off in measurable terms?” (step 3 above). Experience suggests that when these two questions are separated as they are (and must be) at the population level (e.g. first fully answer in plain language, then take each plain language statement and identify measures that can serve as proxy)  then the process loses its common sense feel and becomes unnecessarily complicated and time consuming. One interesting and usable variation of this approach, used by the Department of Developmental Services in California, listed all client results in plain language, and then developed a set of measures for the group of client results as a whole (i.e. not condition by condition).

(11) Obscure note #232: Some people wonder why the progression from least important to most important runs from the upper left to the lower right. There are 23 other possibilities (six variations for each placement of most important). And some other systems place the most important category in the “first read” upper left quadrant (6 ways to do this). Here’s why. In this country we read from left to right and from top to bottom. So the natural progression of reading a 4 quadrant chart is upper left, upper right, lower left, lower right. This would obviously be different for Hebrew or Chinese ideograms which proceed in different direction. In Results-Based Accountability, we get the “How much did we do?” question and set of measures out of the way first. “Yes, you work hard. Yes, you do a lot of things. Yes, you see a lot of clients. Yes, it takes a lot of time. You’re great. We love you. Can we move on now.”  We let people get the credit trap out of their system. Yes they get credit for all their hard work in the upper left quadrant. With this out of the way it is much easier to have the rest of the discussion. It is also essential to understand who your customers are and what you do, in order to answer the next two questions. “How well did we do it?” is next. Having established what people do and for whom, we can now go on to examine how well they perform the functions of their job(s). We set aside effects for customers for a moment and focus on how well the service “plays” are executed. We’ll deal with whether we scored a goal or won the game in a minute. We also think of course that there is a relationship between how well we deliver service and whether our customers are better off. It helps to understand these “drivers” of better-offness before getting to the third and fourth quadrants. Finally we come to “Is anyone better off?” Here we look at numbers and percentage pairs of measures. The raw numbers are less important than the percentages (except in the case of small numbers), and so we put them in the “next read” quadrant, lower left. So we read from upper left to lower right because this is the natural progression in thinking about what programs do and how to measure performance, and because this then matches the natural sequence of reading in most countries. But there is nothing magic or absolute about this. A number of people over the years have said they find it easier start with the “Is anyone better off?” lower right quadrant and work backwards to the other questions. Nothing wrong with that if it works for you. I do ask, however, that, when presenting the model, you keep the order of the quadrants as they are, for the simple reason that thousands of people have seen them this way, and switching them now could cause unnecessary confusion. Thanks.

Marc3.7 How do we help people identify performance measures for their program or service?