Any individual or team that is building something, providing services, or performing medical treatments needs to know how they are doing. But how do they know if they are doing things correctly, efficiently, and consistently? Are they improving as time goes by? Can they detect when something goes wrong?
So, let’s say they are producing something, we will call them Snazzlequacks. In all probability, Snazzlequacks vary somewhat when they roll off the production line. What does a typical Snazzlequack look like on any given day?
![This how we believe a Snazzlequack looks like](https://static.wixstatic.com/media/9ae0b1_f0427e55762d4f2a93118d144beb6a48~mv2.jpg/v1/fill/w_512,h_512,al_c,q_80,enc_auto/9ae0b1_f0427e55762d4f2a93118d144beb6a48~mv2.jpg)
Are some slightly better or slightly worse? What variability can you accept in your Snazzlequacks before your clients go somewhere else? If a particular Snazzlequack is a disaster, you will have no doubt, but with smaller flaws, how do you distinguish between an acceptable flaw and an unacceptable one?
For this, we use performance indicators (PIs), which are a measure of success. You of course need to explicitly define what ‘success’ is, and then determine how to measure it. For starters, the measure should ideally be specific (clearly defined), and quantifiable (measured with a number).
Anything you wish to conclude from numbers will only be solid if the number measured is close to the true value of what you are trying to measure. If you throw a dart aiming at the center of a board (true value) and hit the bullseye, your throw is accurate. The farther away your dart is from the center, the less accurate your throw. You want your numbers to be as accurate as possible.
However, measurements are never exact and variability is a fact of life, so we make repeated measurements and use them to estimate the true value of what we are measuring. If you throw 3 darts, and the three of them hit the board on the same spot, your throws are very precise, and the more disperse your hits, the less precise your throwing… You want your numbers to cluster as tightly as possible around the same number.
Both accuracy (how close your measurement is to the real value) and precision (how close to each other your measurements are) are important!
Of course, in any complex procedure, like our beloved assisted reproduction, there could be many PIs, and organizations in a given activity vary in what they measure, but normally there is a consensus of a smaller set of PIs that everyone in the field agrees on, and that are used for comparing efficiencies between organizations. Unoriginally enough, we call them key performance indicators (KPIs).
In the field of assisted reproduction, KPIs come mainly in two sets: Clinical Practice KPIs and Laboratory Performance KPIs.
For example, ESHRE has established the following 6 KPIs for Clinical Practice:
Cycle cancellation rate (before oocyte pick-up)
Rate of cycles with moderate/severe ovarian hyperstimulation syndrome (OHSS)
Proportion of mature oocytes (MII) at ICSI
Complication rate after oocyte pick-up
Clinical pregnancy rate
Multiple pregnancy rate
Lets focus on Proportion of Mature Oocytes (MII) at ICSI. Here we measure %MII, defined as: # of MII oocytes at ICSI x 100 / # of cumulus-oocyte complexes retrieved. Straightforward enough!
Based on data obtained from clinics across Europe and consensus opinion of industry leaders, ESHRE has defined two parameters: Competency Value (CV, the minimum value a clinic must score to be considered ‘competent’) and Benchmark Value (the values obtained by the best performing clinics, to which all clinics should aspire to). For %MII, the CV is 74 and the BV is a range of 75-90.
A clinic will measure this KPI for every cycle it initiates and typically determine %MII on a regular basis, let say monthly.
Imagine a new clinic starts their %MII measurements. In the first and second months they process roughly 10,000 cumulus-oocyte complexes and they get %MII of 74% and 78% respectively. The Lab Director is relieved: while there is room for improvement, the CV is met. However, he wonders… are these values different due to random variation, or was something done better during the second month? His team is divided: some think a 4% difference is a lot, some think is not…
Even if a clinic measures their KPIs accurately and precisely, too often the subsequent analysis is subjective. In this case (in fact in every case), the necessary question is ‘what is the appropriate statistic to use in order to evaluate if measured parameters differ or not’?
In this case, given that %MII is essentially a proportion (number of MII / number of COCs) and the relatively large number of cumulus-oocyte complexes processed, you can use a two proportion Z-test, which compares the proportions of month 1 vs month 2.
Formally, you follow these steps:
Calculate Proportions
Formulate Hypotheses: Null Hypothesis (the proportions are equal) vs. Alternative Hypothesis (the proportions are different).
Compute the Z-Test Statistic
Determine the p-value
5. Interpret Results:
If the p-value < 0.05, reject the null hypothesis: you do not accept that there is no difference between the values (ie, you accept the alternative hypothesis (that there is indeed a difference between the values).
If the p-value ≥ 0.05, do not reject the null hypothesis (ie, you accept that the values are not statistically different).
For our case, percentages of 74% and 78%, the statistical test gives: a Z-statistic: = −2.09, a P-value= 0.03 (< 0.05), leading us to conclude that the difference between 74% and 78% is statistically significant.
Now (and only now) we know…
Comments