Shifting Incentives: Moving Reimbursement from Volume to Value



INTRODUCTION





Ms Grace Chen knows to avoid the perfume section in the department store. At 53-years-old, she has lived with asthma her entire life. Scented perfumes and other “triggers” can suddenly cause her airways to spasm, sending her into a fit of wheezing. Today, she is not entirely sure what set off her symptoms, but she could feel her chest tightening up as it became more and more difficult to catch her breath, a sensation that she has experienced many times before. She reached into her purse to take out her inhaler and took a few puffs. She still felt like she was trying to breathe through a snorkel to get the air down to her lungs. Realizing that she may need help, she asked her son to drive her to a nearby urgent care clinic.



At the urgent care clinic, Ms Chen is evaluated by a physician, given a breathing treatment, and undergoes an electrocardiogram (EKG). Following the breathing treatment, she continues to have significant wheezing and shortness of breath, so the urgent care clinic physician coordinates for an ambulance to take her to an emergency room across town. Ms Chen has had to visit the ER for her asthma before, but it has been a number of years since the last episode that was this bad.



In the ER, she is promptly placed in a room and evaluated by an emergency medicine physician. Ms Chen undergoes further breathing treatments. A chest x-ray is taken, blood is drawn for labs, and another EKG done. Her labs are ok, her chest x-ray is clear, and her EKG remains normal. The physician then decides to obtain a chest CT (computed tomography) scan “just to be sure nothing was missed.” The CT scan does not reveal any significant abnormalities. Following more breathing treatments and an intravenous administration of solumedrol (a steroid), she improves. She ultimately is discharged home with self-care instructions, including directions for using her home inhalers and a prescription for oral steroids.



In Ms Chen’s mind, this entire shortness-of-breath experience was a single event, caused by a single disease—an asthma exacerbation likely triggered by an environmental allergen. However, the urgent clinic will bill separate fees for the physician evaluation, the breathing treatment, and the EKG. The hospital will send another bill that charges separate fees for the physician evaluation, the chest x-ray, chest CT, the intravenous steroids, the lab work, the EKG, the breathing treatments, and the radiologist’s interpretation of the imaging studies. Ms Chen will be left to navigate the complex system of healthcare costs herself.



Medicine is a noble profession, built on the altruistic motivations of caretakers. But, it is hard to ignore the fact that perverse incentives that require clinicians to “do more” to get paid will predictably result in more medical care. As in Ms Chen’s case, treatments and procedures are paid for a la carte, whether or not they are actually necessary or help the patient. Ms Chen very likely did not warrant a chest CT scan for her asthma exacerbation, but the hospital will be paid for it anyway. Even if the physicians’ motivation to order the study had absolutely nothing to do with making more money for himself or his medical center, there sure were no incentives for him not to order that test, or more broadly, to consider the value of the care he was delivering. The Institute of Medicine (IOM) recognized this problem in their seminal “Crossing the Quality Chasm” report: “Even among health professionals motivated to provide the best care possible, the structure of the payment incentives may not facilitate the actions needed to systemically improve the quality of care, and may even prevent such actions.”1



“We already have pay for performance,” UCSF healthcare leader Dr Robert M. Wachter has quipped. “We pay more for the performance of procedures, hospitalizations, and office visits, and so that’s precisely what medicine produces.”2 In this chapter we discuss how different payment systems can help incentivize a necessary shift from a healthcare system that is reimbursed for volume (seeing more patients and doing more tests) to a system that is reimbursed for value (making patients better off) (Table 15-1).




Table 15-1  

Payment models for primary care services

 






FEE-FOR-SERVICE: PAYING FOR WHAT YOU GET, OR GETTING WHAT YOU PAY FOR





Fee-for-service (FFS) describes a payment structure in which each healthcare service is billed and paid for separately. The physician or hospital is paid for each office visit, hospitalization, intravenous medication, x-ray, EKG, or other services delivered. Under traditional FFS, individual physicians and healthcare systems are financially rewarded for providing more care, even when this care does not demonstrate actual benefit. This “you eat what you treat” system may create a perverse incentive to provide patients with unnecessary care.



The FFS system may have some virtues: As Harvard Medical School economist Michael Chernew put it, “[FFS] rewards hard work and productivity and incentivizes physicians not to stint on care. It avoids placing physicians at financial risk if they care for sick patients and facilitates financing systems that allow patients unconstrained choice of provider (eg, physicians, allied health professionals, and hospitals).”3 Of course, the flipside of incentivizing physicians “not to stint on care” is that it can encourage the overuse of care, particularly without checks-and-balances to counteract this impulse. Erring on the side of overdoing it has become so ingrained in clinical practice that judiciously ordering tests and referrals is often seen as more cognitively taxing for clinicians. Moreover, there is a common perception that doing so may increase the risk of malpractice repercussions (see Chapter 10). Since FFS pays each clinician and/or health system separately, it also indirectly enables the type of fragmented and duplicative care that Ms Chen experiences. FFS alone provides no financial motivation to coordinate care, or to avoid unnecessary referrals.



In addition to potentially incentivizing too much care, the actual amounts that are paid out under most American FFS systems can be highly arbitrary. Prior to 1989, reimbursements paid to clinicians for the same service varied tremendously by specialty and geographic region based on what was considered “usual, customary, and reasonable” for the local market.4 A first step toward more value-based payments was standardizing the process for determining these amounts. The relative value unit (RVU) has now become the standard of FFS productivity and is used by many health systems to determine physician compensation.5 As we discussed in Chapter 9, FFS payments and the RVU weighting system disadvantages more cerebral parts of care delivery such as prevention and diagnosis compared to more procedure-based practices. As a result, FFS payment systems are often blamed as the engine driving the “hamster on a treadmill” phenomenon (running faster just to stand still) that is lamented by many primary care providers.6






PAY FOR PERFORMANCE: A CARROT USED AS A STICK?





In a classic psychology experiment in the 1970s, researchers recruited a group of preschoolers that liked to draw.7 Some of the children were told that if they drew pictures for the study, they would be given a certificate with a gold seal and ribbon. The other children were not given any such expectation. Each child was invited to a separate room to draw for 6 minutes. Afterwards, the children that were told they would be given a reward were presented their certificate as promised. Over the next few days, the preschoolers were surreptitiously watched from behind one-way mirrors to see how much they would continue drawing on their own accord. The group of children that had expected and received a reward spent about half as much time spontaneously drawing as the other kids. Perhaps even more surprising, judges that did not know who drew which pictures independently rated the art drawn by children expecting a reward as less aesthetically pleasing. Once children expected a reward for drawing, it seems that they felt less interested in doing it “for free” and they put less effort into it.



As demonstrated by this example, the interplay of intrinsic motivation (motivation that arises inside the individual) and extrinsic motivation (motivation that arises from outside of the individual) is complicated. If a student is driven to get good grades because he feels fulfilled when he turns in good work, then he seems to be responding to intrinsic motivation, even if the grades themselves represent a reinforcing extrinsic motivator. On the other hand, if he achieves good grades primarily because his parents demand that he does well and that he is accepted to a particular university, then this would be in response to extrinsic motivation.



Promising premise and early results



The idea that health services should no longer be simply paid based on quantity, but rather should focus on quality, is logical and attractive. Taken on the surface level, “pay for performance” (P4P) is a no-brainer. P4P is built on the simple concept of providing clinicians or health systems more money for hitting specified targets, such as achieving a certain percentage of diabetic patients in a practice that have good blood sugar control. Sounds good, right? Well, as with most things, the closer one looks, the more complicated P4P actually becomes. In other words, the devil may be in the details.8



Studies on P4P thus far have shown inconsistent results.9 In one notably positive study, P4P hospitals in Medicare’s Premier Hospital Quality Incentive Demonstration (HQID) had greater improvement over the first 2 years in multiple metrics of quality, including measures for heart failure, acute myocardial infarction, and pneumonia, compared to their colleagues.10 But, over time these gains diminished and may have completely disappeared.9,11 And, frustratingly, risk-adjusted mortality (an important bottom-line measure of performance) between hospitals participating in the program and those that did not remained similar over the course of the first 6 years.12 In 2011, the Cochrane group published two systematic reviews that failed to show convincing evidence that financial incentives can improve patient outcomes.13,14 A separate systematic review published in Annals of Internal Medicine around the same time concluded, “the effect of P4P targeting individual practitioners on quality of care and outcomes remains largely uncertain.”15 Even if it is possible that P4P may have the power to change healthcare professional behaviors,13 many of the most visible programs implemented thus far have not translated into what we really care about: better health outcomes for our patients.



Perhaps if we look across the pond we will find more encouraging results. After all, the United Kingdom introduced the Quality and Outcomes Framework—a P4P program on a grand scale—more than a decade ago, back in 2004. This program tied about a quarter of family practitioners’ income to measures of their performance.16 With incentives that large, as you may imagine, “much changed overnight” in medical clinics in the United Kingdom.16 Family practitioners increased nursing staffing, creating programs for chronic disease that were much more proactive, such as nurse-run, protocol-driven clinics for some diseases like diabetes. Many hired more administrative staff to provide rapid access to performance data. They also started using full electronic medical records, since these were required for payments to be made. The pace of improvement in the quality of care for asthma and diabetes picked up considerably.17 It seemed that the program had kick-started the engine of improvement and may have created an inflection point in the quality of care. Then by 2007, the rate of improvement had slowed and started to level off. Even more concerning, aspects of care that were not associated with an incentive actually declined in quality for patients.17



In one of the most encouraging study results for proponents of P4P, the introduction of a P4P program in 24 English hospitals seemed to result in improvements in risk-adjusted 30-day mortality for pneumonia, heart attacks, and heart failure over the first 18 months.18 This was the ultimate feather in the cap of P4P, as it seemed that if implemented correctly incentives might just actually save lives after all. However, yet again, these benefits evaporated over time. During the following 2 years, the hospitals not participating in the program caught up to P4P hospitals, resulting in no measurable mortality difference any more between these groups.19



Well, if incentives or “bonuses” may not reliably work, how about invoking penalties for poor outcomes? Would the natural instinct of loss aversion be more motivating to clinicians or health systems? Maybe … maybe not.



Back in 2008, Medicare introduced a policy that reduced payments for hospital-acquired infections.20 Some states responded to this no-pay-for-poor-performance measure aggressively and slashed hospital-acquired infection rates.21 But a 2012 study published in the New England Journal of Medicine found that overall the policy had no measurable effect on the rates of central catheter-associated bloodstream infections and catheter-associated urinary tract infections across US hospitals.22



Taken together, it appears that P4P programs are helpful in catalyzing, but not sustaining, meaningful change. It is clear that at the very least P4P is not a “magic bullet” and if it is to improve care, payment reforms must be paired with sustainable reforms in care delivery.16 Moreover, even if we believe there should be some incentives for quality, determining appropriate metrics and fair schema is not straightforward.



Concerns and challenges



Is it a motivation problem, in the first place?


“The quality improvement literature has pinpointed many causes of quality breeches in medical care: fatigue; poorly designed workflow and care systems; undue commercial influence; knowledge gaps; memory lapses; reliance on inappropriate heuristics; poor interpersonal skills and insufficient teamwork, to name just a few,” wrote health policy expert, Dr Steffie Woolhandler, and behavioral economist, Dan Ariely.23 “But ‘not trying’ is rarely cited. Yet P4P implicitly blames lack of motivation for poor quality care.”



The current healthcare system is built on the backs of hardworking, well-intentioned health professionals. It does seem hard to argue that dangling some more dollars on a string would somehow get them to try even harder. But much as was seen following the introduction of the UK program, it is possible that the potential power of P4P is not in trying to drive personal motivation, but rather in fomenting delivery system changes and innovations (such as the implementation of electronic health records and the creation of nurse-led chronic disease clinics). This may be even truer at the hospital or health system level.



Could P4P undermine intrinsic motivation and result in negative effects?


Tangible rewards, particularly monetary ones, seem to weaken, or “crowd-out,” intrinsic motivation.24 There are many examples from the emerging behavioral economics literature. Remember how a shiny certificate had the power to possibly squash preschoolers’ inherent joy of drawing? Another frequently cited example of motivational crowd-out evaluated monetary payments for blood donors. When money was offered, blood donations actually decreased, leading many to argue that by introducing financial rewards, the stronger altruistic motivations of the donors were being replaced by the weaker monetary one.24,25 The intuition here is that many people donate blood because it is socially recognized as “the right thing to do.” Fewer people decide to donate blood because it is a desirable way to make money.



Some physicians fear that measuring and rewarding physicians may similarly sap motivation.26,27 Still, not all rewards are bad all the time, particularly when there are multiple competing motivations. “For instance, when the Italian government gave blood donors paid time off work, donations increased. The law removed an obstacle to altruism.”24



So, it seems the salient question then is whether or not the volume-based payment system is currently “an obstacle to altruism,” and does P4P help remove this barrier?



What is the correct amount of money in P4P bonuses and penalties?


The United Kingdom has provided up to a quarter of potential income in their initial schema, but most P4P programs in the United States have used much smaller incentives. One of the leading concerns has been that large financial incentives would induce efforts to “game the system.” Even without overt misplay, substantial incentives could cause harm by creating undue focus on the limited areas of clinical practice that are being measured and incentivized. Due to these concerns, a change in the income structure that would reduce the percentage tied to P4P in the United Kingdom “has been widely welcomed.”16 On the other side of the spectrum, incentives that are too small may not even garner physicians’ attention. In a national survey performed in 2008, one-in-six physicians did not even know whether P4P was incorporated in their compensation or not.28 Of course, it is hard for financial incentives to promote change if physicians are not even aware of their existence nor how it is that they get paid. The challenge is finding the amount of incentive that is large enough to invoke change, but not so lucrative as to create a singular focus.



The other issue is whether to pay for meeting a set standard goal or for relative improvement. In one early natural experiment, physician groups whose performance was initially lowest improved the most following the introduction of P4P, whereas physician groups that were already performing above the bonus threshold at baseline improved the least—yet, those that were above the bonus threshold at baseline captured three-quarters of the bonus payments.29 Therefore, the way the incentive is structured is important since it could potentially lead to the greatest improvers not being rewarded, while also possibly allowing good performers to rest on their laurels rather than create further improvements. Moreover, if the lowest performers who have the greatest potential for clinical benefits from improvements are penalized for not hitting a specific target despite making incremental changes, then it could further denigrate their ability to improve.



Do we even know how to accurately measure quality or value?


Not everything important can be measured and not everything measured is important. Large parts of clinical practice just cannot be accurately computed or benchmarked, leaving quality metrics to be made up of only those things that can be rather easily quantified. Even more concerning, there is some evidence that quality may deteriorate for non-incentivized measures like continuity of care.30 Thus, rewarding a narrow set of indicators could in fact decrease overall global quality. This may be “the medical equivalent of teaching to the test.”31 As we begin to focus on value, the problem becomes even more acute. Remember from Chapters 4 and 10 that there are numerous complications with defining and measuring value.



In addition, there are also serious concerns about the validity of the current data and the problem of accurate attribution to specific practices. Based on a study analyzing nearly 1.8 million Medicare claims in 2000 through 2002, patients saw a median of two primary care physicians and five specialists working in four different practices.32 Only about a third of the visits each year were with a patient’s assigned physician, and a third of patients changed their assigned physician from one year to another. The unfortunate reality is “the Centers for Medicare and Medicaid Services (CMS), despite heroic efforts, cannot accurately measure any physician’s overall value, now or in the foreseeable future.”33



What are the potential “side effects” of P4P?


Even if P4P is relatively new in the world of medicine, it is not a new concept. Perhaps the first real experiment of P4P was in the mid-1800s in England. British schools and teachers were paid on the basis of the results of student examinations, with the goal of improving educational outcomes. What happened, though, is curricula narrowed to only focus on the tests, and teachers quickly figured out that the way to get students to perform well was via rote memorization of test specifics. Soon enough, “testing bureaucracy had burgeoned, cheating and cramming flourished, and public opposition had grown dramatically.”34 In the United States, attempts at performance-based pay for schools in the 1960s similarly resulted in cheating scandals and failures.34 These days, merit pay remains a very controversial topic in education.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jun 14, 2016 | Posted by in GENERAL & FAMILY MEDICINE | Comments Off on Shifting Incentives: Moving Reimbursement from Volume to Value

Full access? Get Clinical Tree

Get Clinical Tree app for offline access