• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 



• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 

Statistical Significance of Differences Between Measures of Outcome


Continued article from the The Behavioral Measurement Letter, Vol. 3, No. 1   Fall 1995

On Differences

Florence S. Downs

The use of measurement is such a common practice that we give little thought to assigning a value to some attribute of an object. However, that value remains a fact until it is contrasted with values drawn from other sources. It is evaluation of contrasts that leads to concern about the adequacy of measures, since all researchers want instruments that are sufficiently accurate and robust to detect differences. However; determining the adequacy of those qualities is only one judgment that needs to be made. To be effective, measurement must be a seamless process that extends from instrument evaluation through interpretation of the results. Therefore, it is equally important to consider what the differences that are found between measures really indicate.

In the wake of cost containment, many new ideas have been developed about how the delivery of health care services can be streamlined and made more efficient. The pragmatics of evaluating these services underscore the essential place of measurement in maintaining the quality and viability of patient care. It is no longer sufficient to translate effectiveness into vague measures of staff or patient satisfaction. Hard evidence is needed to support the position that innovations in patient care make a real difference in quality of care, without increasing costs. This means that measures of outcomes must be carefully chosen and appraised for reliability and validity before use. It is also important that after they have been used, the differences that the measures show be evaluated for both clinical and theoretical significance.

The ordinary method for determining what differences in measurement are of consequence is to apply statistical tests to the data. When the tests show statistical significance, investigators are inclined to rejoice. When they fail to show significance, they feel disheartened. In either case, the clinical magnitude of the difference is frequently ignored. Numbers that indicate statistical significance are allowed to stand as the measure of effect. Clinical researchers who fail to examine their data as well as the significance level do so at their peril. Clearly, they have forgotten that measures have a meaning in the real world that should not be ignored.

A common source of discrepancies between statistical and clinical significance results from using finely calibrated instruments to measure differences that must be relatively large to have meaning. Body temperature and blood pressure measures are common and readily available clinical instruments that can serve as examples. Small differences in these values between samples are often statistically significant, even though they are of no practical importance. Therefore, investigators need to make a decision about how much difference between groups will be considered clinically worthwhile before beginning a study. Should the intervention be considered successful if there is one degree Fahrenheit difference between the groups? Would a smaller value be acceptable? If so, how much smaller?

In the case of blood pressure, would 5 mm Hg be enough to indicate a change in practice? To what extent does consideration need to be given to whether the difference is shown in systolic or diastolic pressure? These determinations are primarily a function of the investigator’s clinical understanding of the meaning of the measures and the boundaries of normal fluctuation. In such cases, clinical expertise cannot be separated from the interpretation of the results.

Once the nature of a reasonable difference has been decided upon, the sample size and the power of the test of significance need to reflect that decision. It is common knowledge that sample size can inflate or deflate statistical significance. Why so many investigators choose to ignore this truism is difficult to understand. But it remains one of the most frequent reasons for missing differences that exist or finding some that are without meaning.

In many ways, the clinical significance of changes in physiological measures is easier to determine than that of many measures of psychological change. Frequently, there are no published norms to guide the interpretation of variations. How much change is indicative of real clinical differences is rarely discussed. Researchers are often overawed by small differences in variables such as anxiety, depression, or attitudes that occur over the time of an intervention. This is especially true if the values are in the desired direction or appear to have a pattern.

Commonly, samples show low values on measures at the beginning of the study. This matter often goes unnoticed. I recall one study in which depression was compared between mothers of premature infants and mothers of full term infants over a three-month period. Although, the scale used was a popular measure of depression, there were no published norms. Values for both groups were clustered at the low end of the scale. However, there was a minimal increase in depression over time among mothers of prematures and a corresponding decrease in depression among mothers of full term infants. This created a statistically significant difference between the groups. Despite the fact that neither group could have been considered depressed, remedial intervention was recommended for the mothers of prematures. Clearly, this is a distortion of the data and represents a naïve approach to measurement. However, it is not an uncommon occurrence and is cited to underscore the critical need to interpret data with care.

If clinical measures are to be used as the means for judging the need for or the outcomes of interventions, the meaning of differences cannot be ignored. It is not merely a matter of financial considerations, although cost is increasingly used as a measure of effectiveness. When changes are made in clinical practice, we want them to be those that are most likely to be of benefit to the patients for whom we care. Therefore, it is important to demand evidence that is unambiguous and unbiased by slavish worship of the statistical significance level.

Florence S. Downs, EdD, FAAN, has taught research design and methods to doctoral students in nursing for over 25 years. She has, in addition, edited Nursing Research since 1979. A former associate dean of graduate studies at the University of Pennsylvania School of Nursing, Dr. Downs is currently a Visiting Scholar and Interim Dean for Graduate Studies at the School of Nursing, University of Maryland at Baltimore. In recognition of her scholarly contributions to nursing, Dr. Downs has received the 1995 Martha E. Rogers Award from the National League for Nursing.


Read additional articles from this newsletter:

Measuring Reminiscence in Research on Type A Behavior

Ways to Measure Demographic Variables



Subscribe to our Newsletter Today

Stay up to date! Newsletters sent out quarterly.

Copyright © 2023 BMDS |  All Rights Reserved

Design: LDS