Australian Institute of Criminology

Skip to content

Parameters of performance measurement

Those with some knowledge of this topic may already have a question—if specialist policing units are not primarily responsible for crime control in a particular area, how can the popular measurements of recorded crime rates be used to evaluate them? This is indeed the key question, but before engaging with it directly, it is useful to set it in context by briefly examining issues of performance measurement in both the broader context of the public service and more specifically in relation to police forces. It is particularly important to understand the shortcomings and problems that surround the application of performance measurement to the police, as without such understanding, it is impossible to develop meaningful and useful performance frameworks.

Public sector performance measurement

Public sector performance measurement became an increasingly important issue in the Western world in the 1980s and 1990s (Carter, Klein & Day 1992; Fleming & Lafferty 2000; Schick 1996). Advocates of a new style of public service—sometimes termed New Public Management or NPM—hoped to propel what were seen as slow-moving, inefficient and overly bureaucratic organisations closer to a private-sector, corporate model, which would hopefully deliver better services for less money (van Sluis, Cachet & Ringeling 2008). Key to this push was accountability, which in turn required the development of performance measurement frameworks.

Performance measurement can be complex, so it is useful to clarify some core concepts. A standard performance classification scheme considers four elements—inputs, activities, outputs and outcomes. Inputs are the resources available to the organisation; the activities are the processes carried out by those resources. The outputs are the specific goods and services delivered, and the outcomes are the effect on the environment of those goods and services (Collier 2006). The total number of inputs used can be termed economy, the ratio between inputs and outputs efficiency, and the impact of outputs on outcomes can be termed effectiveness (Flynn 1986: 393). Before the 1980s, performance measurement in the public sector—although it was unlikely to be called that—largely focused on inputs, especially staying within allocated budgets. Over time, there was an increasing focus on efficiency indicators (Carter, Klein & Day 1992; Smith 1990). Today, it is generally felt that a focus on outputs and outcomes is of more benefit, especially when evaluating the quality of police work (Collier 2006). For the police in particular, the outcome of ‘harm reduction’ has become increasingly central (Mackenzie & Hamilton-Smith 2011).

Three broad benefits of performance measurement have been identified, namely improving value for money (efficiency), improving managerial competence and increasing accountability (Collier 2006). However, performance measurement can also have costs and not merely in terms of the resources required to undertake measurement. Smith (1995a) has identified eight negative effects ensuing from the publication of such data, including:

  • tunnel vision—an emphasis on quantified elements of performance at the expense of other aspects;
  • suboptimisation—the pursuit of narrow objectives at the expense of greater success;
  • myopia—the pursuit of short-term success at the expense of long-term success;
  • measure fixation—an emphasis on measures rather than underlying objectives;
  • misrepresentation—the deliberate manipulation of data;
  • gaming—the deliberate manipulation of performance to gain strategic advantage;
  • ossification—an overly rigid system of performance measurement; and
  • misinterpretation—misunderstanding performance data.

Flynn (1986: 389) has stated:

At its worst, performance measurement has led to a concentration both on what is easily measured and what is susceptible to narrowly defined efficiency changes.

If targets are poorly defined and lack detail, problems reminiscent of those encountered in command economies can emerge (Smith 1990). Where there are too many indicators, however, there may be criticisms of unreliability, inflexibility and time wasting (Carter, Klein & Day 1992). Overly prescriptive indicators that specify not only what is to occur, but also how, can become divorced from the underlying objectives of the organisation. Thus, there is a need to carefully balance detail and prescription with freedom and flexibility.

While it is easy to assume that differing levels of performance by a given organisation (or sub-groups of that organisation) is primarily due to managerial competence, there are many reasons beyond the skills of a particular manager why those organisations (or sub-groups) might have differing levels of performance (Smith 1990). The organisations might have slightly different objectives, different needs, different costs, or might even measure performance differently. Separating out the impact of environmental factors (upon which any organisation has limited influence) on performance is a difficult task, but essential if the real value of management is to be identified.

The issue of ‘window-dressing’ is also important. Some writers suggest that performance measurement may sometimes be as much about the appearance of legitimacy as it is about instrumental improvement of performance (Collier 2008; Roy & Séguin 2000). This has been seen where organisations simply ‘dress up’ existing statistics as performance indicators (Carter, Klein & Day 1992), rather than going to the trouble of developing specific frameworks. At its most extreme, this may result in a performance version of creative accountancy, where two sets of performance measures are maintained—one for public consumption designed to ensure legitimacy is maintained and focusing on positive results, the other for internal use only (Flynn 1986).

The difficulty of measuring public sector performance can be illustrated by comparing the situation with that in the private sector. In the private sector, earning and profitability provide a convenient and simple ‘bottom line’ performance indicator. Despite that, private sector organisations have still developed large, complex sets of non-profit indicators (Bitichi et al. 2006; Carter, Klein & Day 1992), realising that such indicators present a far more accurate and nuanced picture of organisational performance. In the public sector, by comparison, there is no clear ‘bottom line’—many important objectives are difficult to quantify and there is seldom an equivalent of earnings and profitability (Smith 1995a). Thus, if a good private sector performance framework—where there is a ‘bottom line’—has to be complex, detailed and holistic—a good public sector performance framework has to be even more so.

Often, public sector activities are difficult to distinguish from one another, are produced in conjunction with other organisations and unfold over a lengthy period (Smith 1995a). Public sector performance is thus a particularly elusive concept (Smith 1995a; Wisniewski & Olafsson 2004). This is especially the case with the police, where their goals are often complex objectives that cannot be achieved solely by police action (such as reductions in crime rates; Cockroft & Beattie 2009) and that are heavily dependent on the work of other agencies (Mackenzie & Hamilton-Smith 2011).

Police performance measurement

Around the world, but especially in the United Kingdom, there has been an increasing focus on police performance since approximately 1990 (Collier 2006; van Sluis, Cachet & Ringeling 2008). In that time, there has been a range of reports aiming to ensure that its approximately 43 police forces report on performance in similar fashion (Audit Commission for Local Authorities and the National Health Service in England and Wales 1998a, 1998b; Collier 2006, 1998; Home Office 2005, 2004, 2002; Public Services Productivity Panel 2000), with the goal being a centre-driven improvement in effectiveness (Home Office 2008; Home Office Police Standards Unit 2004; Loveday 2006, 2005). In 2007, performance measures were rationalised at the national level to focus almost entirely on public trust and confidence, and more recently there has been a further move towards devolving responsibility for performance measurement from the national to the force level (Barton & Barton 2011). The cascading of performance indicators from the national level to the police constable on the street has had mixed results (Butterfield, Edwards & Woodall 2004).

Statistical systems, such as COMPSTAT (first in New York and then further afield; Braga & Moore 2003b; Rosenbaum 2007; Schneider, Chapman & Schapiro 2009) have become increasingly common, focusing on the occurrence of specific crimes in limited areas over a particular timeframe (Stone & Travis 2011). In the Netherlands, a set of performance indicators for policing activities was introduced in the early 1990s (van Sluis, Cachet & Ringeling 2008). This was a major change for a police culture that had traditionally not been held particularly accountable for its actions or results (Hoogenboezem & Hoogenboezem 2005). In Australia, performance management was introduced into several state police services from the 1980s onwards (Fleming & Lafferty 2000), with Operational Performance Reviews (OPR) and similarly named reports deliberately emulating the COMPSTAT approach (Mazerolle, Rombouts & McBroom 2006).

As a result of this statistical focus, the rate of recorded and resolved crime has become the primary performance indicator for police around the world (Collier 2006; Dadds & Scheide 2000; Metropolitan Police Authority & Metropolitan Police Service 2009; New Zealand Police 2011; Western Australia Police 2011). In many police organisations, aggregated crime data are presented in league table formats showing the (perceived) comparative performance of different jurisdictions.

There has developed a debate about the applicability of simple, easy-to-use, numerical, ‘New Public Management’-type performance schemes to the policing environment (Mackenzie & Hamilton-Smith 2011). Before discussing the shortcomings of such schemes, however, it must be remembered that this is a problem of police’s own making—it was police forces around the world who embraced such simple, easy-to-use measures, preferring them to devoting the necessary resources to develop more rigorous, analytical and evidentially based frameworks. This is in direct contrast to military services around the world (Blumstein 1999), where there has been a substantial level of investment in Centres for Lessons Learned and operational analysis (United States Army 2011). Had police forces emulated their military counterparts and focused on studies of historical performance, they might well have developed the sort of doctrine and conceptual frameworks that have led to quantum leaps in effectiveness for some military services and that could also serve as the basis for better performance frameworks (Alach 2010a). As such, while some police forces do now appear to be increasingly focused on the quality of their performance measurement (notably in the United Kingdom where the constant evolution of measures has occurred), the lack of real investment in the field over the past two decades indicates that the situation is fragmented at best (Roy & Séguin 2000). There has been very limited investment in the sort of evaluation and research activities that are necessary to gain a better understanding of police performance (Weisburd & Neyroud 2011) and little evidence of police forces truly implementing those learnings that have been gained from the few research and evaluation activities that have occurred (Bradley 2005; Chavez, Pendleton & Bueerman 2005; Lum 2009).

It is questionable whether standard police performance measurement schemes, with their overwhelming focus on crime rates (often at the expense of other aspects of policing activity) are relevant and accurate (Collier 2006). They can be particularly inaccurate when measuring the performance of police forces that are switching, or have switched, from a traditional, professional model to a community policing model (Braga & Moore 2003b). Further, while outcomes are usually regarded as central to performance management, commentators on police performance often emphasise that how police act is as important as what they achieve; police performance is as much normative as it is technical (Audit Commission for Local Authorities and the National Health Service in England and Wales 1996, 1993; Collier 2006; van Sluis, Cachet & Ringeling 2008). The legitimacy of policing is vital (Braga & Moore 2003a) and league-table type approaches cannot easily incorporate this aspect of performance; many actions such as mass random stop-and-searches that might improve performance vis a vis crime rates might actually harm police legitimacy. Related to this is the degree of alignment between what police perceive as good performance and what the public think; forces may focus on recovered and resolved crime rates, but the public may not perceive this as good performance (Kelling 1999), as they may continue to feel unsafe. Indeed, the public may well believe that any improvement in such statistics is merely the result of manipulation of recording practices by the police.

A narrow approach in performance measurement can lead to some of the negative effects cited by Smith earlier, particularly tunnel vision (Collier 2006). It can also lead to an over-emphasis on short-term targets at the expense of longer term objectives (Smith 1995a), despite the desirability of the latter (Collier 2006). The prioritisation decisions of police commanders will be influenced by the performance targets they are operating under, often leading them to devote the most resources to the most measured tasks, rather than those that may have more beneficial (albeit largely unmeasured) results (Davies 2000; Dupon 2003; Fleming & Lafferty 2000; Hoogenboezem & Hoogenboezem 2005; Loveday 1999; Vickers & Kouzmin 2001). A prescriptive approach to performance management can thus reduce the discretion of street-level police officers to best determine how to deal with a particular situation. This can, in turn, conflict with ‘old style’ police culture in which police discretion and flexibility is central (Butterfield, Edwards & Woodall 2004; Hoogenboezem & Hoogenboezem 2005).

The end result of performance measurement schemes can be ‘perverse’ behaviour, where performance targets or indicators become de-linked from the goals they are meant to achieve and instead become self-sustaining in their own right (Loveday 2005). Sometimes, the results are the opposite of those intended (Flynn 1986). This is more likely where the performance indicators focus on particular outputs, rather than outcomes or processes, which may ‘offer perverse incentives to carry out those activities where it is easiest to notch up a big score’ (Carter, Klein & Day 1992: 167). However, some seemingly perverse behaviour may not actually be such, as police outputs are often valuable in themselves (Braga & Moore 2003a), a point to which this paper will return later. Perverse behaviour can also be enabled by overly prescriptive performance indicators that do not allow for discretion or flexibility (Loveday 2005), as well as by performance indicators that are overly simplistic or mono-faceted and that fail to account for all relevant elements (Vollaard 2006).

Another key issue with police performance measurement is the differentiation between ‘hard’, ex-ante performance measurement, where targets are specified in advance and performance against those targets is measured strictly and softer, post-ante performance measurement, where performance over a period is evaluated in a more holistic fashion, incorporating more than just core performance indicators (Hoogenboezem & Hoogenboezem 2005). Some question the value of ex-ante performance targets (Vollaard 2006). Lawton (2005: 235) has stated that ‘inspection, regulatory, and performance regimes that focus on prescriptive target-setting and the technical application of pre-determined metrics ignore the importance of judgement’.

Perhaps because policing lacks the scientific foundation to set truly meaningful and feasible hard targets, most police performance measurement schemes around the world include an inspectorate function, where performance results are discussed and analysed. In New Zealand, this is executed by the Performance Group (Police National Headquarters) and in England, Wales and Northern Ireland by Her Majesty’s Inspectorate of Constabulary. A problem with soft, post-ante performance measurement can develop when managers claim they are not responsible for poor performance, instead citing a range of causal environmental factors (Carter, Klein & Day 1992). While this may be partially accurate, it is unheard of for a manager to blame the environment for their good performance.

Another problem arising from prescriptive, ex-ante performance schemes is the potential for conflict in the applicability of performance targets at different levels of an organisation. For example, what may appear to be a priority (and thus a key performance target) at a national level may be largely irrelevant in a particular geographic locale (Loveday 2006), thus bringing into question whether performance in that field is a valid indicator in that place. At times, a performance indicator may be irrelevant not because it is a poor indicator, but rather because inadequate effort and analysis has been undertaken to convert that indicator into a meaningful measure at different hierarchical levels (and locations) of the organisation. Targets may also be deliberately unambitious—perhaps because it is realised that simple indicators are inadequate for real accountability and therefore it is best to set them at a level likely to be achieved—thus leading to suboptimal performance (Mackenzie & Hamilton-Smith 2011).

Related to the above dichotomy is the idea that performance measurement can include both accountability (whether ‘hard’ or ‘soft’) and learning elements. Accountability is past-focused and identifies whether what has occurred is good, bad, or in-between; learning instead focuses on how future performance can be improved by drawing on lessons from the past (Braga & Moore 2003b). Learning is not something that police forces have traditionally done well, except in the more limited field of technological advancement (Bradley 2005; Lum 2009; Weisburd & Neyroud 2011).

In an effort to overcome several of the problems cited above, Braga and Moore (2003a) have posited a comprehensive approach to police performance measurement. They state that ‘controlling crime is the single most important core function of the police, (but) there are many other dimensions of performance that are valued’ (Braga & Moore 2003a: 10). As such, they feel that any performance scheme needs to incorporate seven dimensions:

  • reducing crime and criminal victimisation;
  • calling offenders to account;
  • reducing fear and enhancing personal security;
  • ensuring civility in public spaces (ordered liberty);
  • using force and authority fairly, efficiently and effectively;
  • using financial resources fairly, efficiently and effectively; and
  • quality services/customer satisfaction (Braga & Moore 2003a).

Braga and Moore (2003a) believe that it is important to measure performance in all of these dimensions. Ignoring one (or several) dimensions is at best a failure to fully appreciate the complexity of police work and at worst a contributor to the types of perverse behaviour noted earlier. This approach is therefore a particularly detailed, balanced scorecard. It takes into account the multiple influences acting upon police forces and the way in which performance in one area can involve trade-offs in another. In the last few years, there has been a limited degree of increased attention to such multidimensional approaches to police performance and the inclusion of additional factors beyond crime rates (Carmona & Gronlund 2003; Cockcroft & Beattie 2009; Hughes, McLaughlin & Muncie 2001).

Others have suggested that one solution to the performance measurement problems noted above might be to first identify what works—also known as best practice—and then measure the degree of adherence to that best practice (Lum 2009; Mackenzie & Hamilton-Smith 2011). This might be seen as a quality compliance approach. While this would be exceptionally useful when first setting quality standards, as will be noted later, it is potentially troublesome as an overall solution due to its self-referential nature. It is also difficult to achieve, as Carter, Klein and Day (1992: 155) have noted, even in highly technical industries such as water management, where there is often clear scientific evidence (often lacking from the policing environment), ‘standard-setting is the result of a political process that has to weigh up what is both desirable and what is feasible’.

A standards-based approach might quickly ossify (Smith 1995a) or instead become detached from the outcomes it seeks to achieve if environmental factors change. If best practice is too prescriptive, then flexibility and innovation may also be harmed. A similar approach of measuring milestones against a particular plan is also valuable in part, but again cannot solve all problems due to its self-referential nature and the likelihood of measure fixation developing.

Performance in specialist policing

When one moves from the general to the specialist—to the field of technical and niche units, as noted earlier—the difficulties in measuring police performance become even greater. First, outcome measures—even simple measures such as crime rates—are usually irrelevant for technical units and difficult to assign to niche units. Even more than with generalist policing, specialist policing groups will have large co-dependencies with other agencies or parts of the organisation (Mackenzie & Hamilton-Smith 2011); what is the ‘outcome’ of a fingerprint identification? Or, indeed, what is the outcome achieved by any technical unit? And what of counter-terrorism—if the measure is ‘terrorist attacks’ and the result is zero, how do we identify whether such was due to police actions rather than simple inactivity by terrorist groups? While the absence of activity can be reliably assumed to be at least partly related to police activity when there is a large enough sample size (such as crime rates; Vollaard 2006), when the sample size is a few incidents a year at most, the validity of assigning responsibility for any decrease (or increase) to police activities is more questionable.

The situation is often made even more difficult for niche units due to overlapping responsibilities. For example, AMCOS has primary responsibility for Level 2 and Level 3 organised crime in Auckland (as defined in the British National Intelligence Model; see National Centre for Policing Excellence 2005), but there are also District Organised Crime Units and a national Organised and Financial Crime Agency of New Zealand operating in the same space, albeit theoretically against different targets. If there were a reliable outcome indicator for the level of organised crime in Auckland, how could the respective effects of the different groups be calculated? The situation is the same in the United Kingdom and Australia, and especially so in the United States, where a range of metropolitan, state and federal agencies may all target the same range of organised criminal groups. Any simple outcome measurement of efforts against organised crime could therefore, at best, demonstrate the range of players involved and the overall effect of those actions; it could not, however, clearly identify the respective influence of those players. Carter, Klein and Day (1992: 32) have stated in relation to performance measurement that:

...the greater the complexity, the greater also is the scope for interdependence. The greater the interdependence, the more difficult it is to assign the ownership of performance to individual actors or agencies within the organisation.

In simple terms, technical units contribute to outcomes—but it is difficult to identify by how much. Niche units contribute to outcomes to a greater extent, but those results can often be lost amidst a much larger picture. Given these difficulties, it is the perspective of the authors that a meaningful specialist policing performance measurement framework should focus primarily (but not solely) on outputs and activities, with these two elements often blending into each other. This has the advantage of validity, as police have much more control over outputs than they do over outcomes (Dadds & Scheide 2000). Outputs and activities also have an inherent value in themselves (Braga & Moore 2003a, 2003b) and while there are problems with an output/activity-focused scheme, these can be mitigated to a certain respect, as later sections will show.

At the same time, where outcome measures can be validly assigned to specialist policing activities, they should be incorporated as part of a balanced approach (Braga & Moore 2003a). However, this should only be done when the baseline of outputs and activities has been established; it is vital to develop the simpler elements of the framework before embarking on the more complex elements. This focus on outputs and activities is unavoidable given the current level of knowledge about the effect of police activities. It is therefore anticipated that in the future, given research initiatives like the Centre for AMCOS Lessons Learned, the framework can transition to one more focused on outcomes, but to do so now would be to put the conceptual cart before the horse.

There are further definitional conundrums to consider. For example, whether something is seen as an activity, output, or outcome will depend very much on who is doing the perceiving. As Blumstein and others have noted, police activities can be seen both as ends in themselves as well as contributors to other processes (Blumstein 1999; Braga & Moore 2003a). To use an earlier example, a fingerprint section will view the process of analysing a fingerprint as an activity (occurring within the section), the number of processes completed as an output (a service provided to something external to the section) and the successful identification and provision of that identification to an investigative unit leading to an arrest, as an outcome (altering the environment external to the section). From the perspective of the police as a whole, however, the arrest is an output at best and perhaps might be seen as an activity. For them, the outcome will be any changes in the crime rate related to the crime type for which that person was arrested. For niche units, outcomes will likely be measured in terms of prosecutions.

The second great challenge in an output/activity-focused framework is ensuring that the measures chosen are meaningful. If they are not meaningful, then the negative elements of performance measurement—particularly measure fixation and gaming—will swiftly emerge. Usually, an output-based approach attaches quantity and quality elements to each output category. Timeliness, sometimes seen as separate, is better seen as a facet of both quantity and quality aspects, given that any quantity is measured over time; the simple quantity of any output gives one indication of timeliness over a particular period. More specific elements, such as response within a particular period, can be incorporated into quality standards.

It can be relatively easy to specify meaningful outputs, such as ‘the number of terminated drug operations per set standards’. Quality standards (perhaps focusing on best practice as noted earlier; Mackenzie & Hamilton-Smith 2011 then rest in a separate document, where they can be as detailed as required without making the performance framework unwieldy in itself. One potential approach to quality is to use a standards-based approach, similar to that used by the New Zealand Qualifications Authority; each particular level, that is ‘Excellence’ or ‘A’, has a specific list of defining characteristics so the evaluator measures actual performance against those lists in identifying the standard achieved. This might then lead to comments such as seven A-Grade terminations, four B-Grade terminations and three C-Grade terminations.

A more sophisticated aspect of meaningfulness is identifying whether or not particular outputs are responsible for the achievement of outcomes (Flynn 1986; Jackson 1993; Smith 1995b). There are at least three aspects to this:

  • non-responsibility;
  • differential success; and
  • a lack of causality.

In short, non-responsibility involves a sub-component of a larger organisation delivering outputs that are necessary, but not sufficient, for the achievement of organisational outcomes, but which are intermediated through another sub-component before that outcome performance is achieved. The analogy of the ‘widget factory’ is useful. The factory produces the widgets (activity or output), but it is the sales staff that sell the widgets, gaining revenue (one outcome) and it is the overall structure of the company that determines profit (another outcome). One could easily have a situation where two companies produce widgets to the same standard and for the same cost, yet where one is profitable and the other is not due to differences in the quality of sales staff. Therefore, while widgets are partly causal to outcomes, they are not solely responsible. Holding the widget factory accountable for the overall performance of the company would be illogical (Flynn 1986). Similarly, in the delivery of particular policing services, such as secondhand dealer checks, a lack of follow-up by other units can lead to a failure to achieve the desired outcome—in this case a decline in burglary rates.

Related to non-responsibility is the concept of differential success. This also involves the delivery of quality outputs coupled with a potential failure at the overall outcome level. The difference is that at least some outcome success is achieved, but only at a lower level; thus, there is differential performance. The Vietnam War is a prime example of this, where tactical victories (tactical outputs resulting in tactical outcomes) were not translated into strategic success (strategic outcomes) due to the absence of a coherent overall plan. This can be analogised to the police environment; terminated operations (tactical outputs) may affect drug availability in a particular locale (outcome success), but unless other groups also deliver quality services, then the overall outcome goal will not be achieved. One way of partly overcoming the problems of non-responsibility and differential success is to integrate activities, outputs and outcomes into a single plan, so that all of the contributors to overall performance are properly linked (Bratton 1999; Collier 2006; Mackenzie & Hamilton-Smith 2011; Smith 1995a). This approach is sometimes known as program logic (Duignan 2012).

The third issue, non-causality, is perhaps the most important of all and rests on limited knowledge of the link between outputs and outcomes. If a particular output X has no causal link with outcome Y, no matter how well we perform X, we will never achieve Y. It is these outputs that must be avoided at all costs—while the problems of non-responsibility and differential success can be overcome through better processes, non-causality can never be overcome. While there has been a substantial amount of research done on the causal link between police activities and the environment, there is no clear picture and no equivalent of the military’s principles of war; we are still largely in the dark (Bradley 2005; Lum 2009; Stone & Travis 2011; Weisburd & Neyroud 2011). Police officers may assume that a particular activity, such as foot patrols, may lead to a particular outcome, but may lack the evidence to show that this is so. It may well be that such outputs are unproductive and merely take resources away from other, more beneficial activities.

In some situations, outputs that are causal to outcome success can be delivered poorly. In the case of the widget factory where there is a high fault rate, responsibility rests with the output provider. For the police, that would be the specialist policing agency. This might occur when a fingerprint unit has poor laboratory standards, leading to very few fingerprints being identified. However, in many other situations, outcome failure is not the responsibility of the output provider due to the issues of non-responsibility, differential success and non-causality noted above. Care must be taken to explore all of these issues before assigning blame for poor performance. Holding a manager accountable for an outcome when they do not control all of the elements contributing to that performance is illogical (Flynn 1986).

Overall, performance measurement for specialist policing is more difficult than for the police as a whole. Primarily, this is due to the fact that specialist policing provides specific services or outputs and is not primarily responsible for the achievement of policing outcomes. Any measurement framework must understand this and therefore focus on the output/activity level, while still remaining cognisant of the links between those outputs/activities and the outcomes they wish to achieve. It is also important to understand whether a seeming lack of correlation between the delivery of outputs and the achievement of outcome goals is actually due to the fact that those outputs are not causal or rather due to a shortcoming in overall strategy or structures.