Medical Outcome Short Form (36) Health Survey

Questionnaire Availability

Questionnaire is available here in html format.

A pdf version of the sf36 is also available here, for ease of administration (it fits on 2 pages!).

Devised by: John E. Ware Jr. PhD – QualityMetric, Inc. Address: 640 George Washington Hwy, Suite 201, RI. Phone 401 334 8800, x242, Fax: 401 334 8801, email: jware@qmetric.com. Internet site www.sf36.com.

Research supported by:

The Henry J. Kaiser Family Foundation, Menlo Park, CA (Grant no. 85-6515) granted to the Health Institute, New England Medical Center.

Co-copyright Holders:

Medical Outcome Trust (MOT), Health Assessment Laboratories (HAL) and QualityMetric Incorporated.

Licensing:

Fee for use of the SF- 36™ is paid by companies and organisations who will profit from the use of the instrument. This allows the MOT, HAL and QualityMetric Inc “to make surveys available royalty free to individuals and organisations for academic research”.

The SF-36™ is distributed by MOS Trust Inc and “strict adherence to item wording and scoring recommendations is required in order to use the SF-36 trademark” www.sf-36.com.

Type of Instrument

The SF- 36™ is a short form measure of generic health status in the general population. The SF-36™ is designed for self-administration. Alternatively, a trained interviewer can use a standardized script for face to face and telephone interview. The SF™-36 takes 5 –10 minutes for respondent to complete. Can be administered to anyone over the age of 14. www.qualitymetric.com/innohome/insf36.shtml

From the 36 items, eight health profiles are derived from summarised scores. All dimensions are independent of each other. A comprehensive manual and interpretation guide is available from the author (Ware, 1993).

Designed to be used in

Clinical Practice – screening individual patients
Research – differentiating health benefits produced by different treatments
Health Policy Evaluations – comparing the burden of different diseases
Monitoring specific and general populations

www.qualitymetric.com/innohome/insf36.shtml.

Translations

The SF-36 has been translated and adapted in 29 countries. It has been translated into languages including: English, Spanish, French, Swedish, Korean, German, Dutch, Portuguese, Chinese, Czech, Finnish, Danish, Hungarian, Hebrew, Italian, Japanese, Norwegian, Polish, Romanian, Slovak, Russian, Afrikaans. Furthermore, the SF-36 has been replicated across 24 different patient groups from various socio-economic situations and diagnosis (Ware, J, 1996). From an Australian and particularly Melbourne perspective this instrument has great utility in a multicultural city providing predominantly Anglo-Saxon services.

Reliability

Split half/ Cronbach’s alpha

One study found that the SF-36 yielded a high rate of response (83%) and rate of completion (95%) (Brazier, Harper, Jones, O’Cathain, Thomas, et al, 1992). A study consisting of 19,785 respondents over 18 years of age yielded an 80.6% full completion of SF-36 and 14.6% partial completion (The Australian Bureau of Statistics (ABS), 1995). Using Cronbach’s alpha, an alpha of 0.5 is acceptable, Nunnally recommends 0.7 and above. Well-used tests should give 0.8 and above (Jenkinson et al, 1993, p.4). Cronbach's alpha on all scales of the SF-36 exceed alpha of 0.8, except for social functioning (α = 0.76) (Jenkinson et al, 1993). Similar findings have been reported by Brazier, Harper and Jones et al (1992). In the case of the Social Functioning dimension the results are considered acceptable due to the small number of items (2 items using a 5 point scale; Jenkinson et al, 1993). The Physical Functioning dimension has consistently exceeded 0.90 (Ware, 1993).

The 8 Health Profiles including number of items, Cronbach’s alpha and item internal consistency.

Scale	Number of items	Definition of scale	Internal consistency reliability (Cronbach’s alpha)	Range of item internal consistency
Physical Functioning – (PF)	10 items	Limitations in physical activity because of health problems	α= 0.93	0.64 – 0.83
Social Functioning (SF)	2 items	Limitations in social activities because of physical or emotional problems	α = 0.90	0.39-0.56
Role limitations – physical (RP)	4 items	Limitations in usual role activities because of physical health problem	α = 0.82	0.86-0.89
Bodily pain (BP)	2 items	Presence of pain and limitations due to pain	α = 0.95	0.26-0.56
General medical health (GH)	5 items	Self evaluation of personal health	α = 0.82	0.65-0.83
Mental health (MH)	5 items	Psychological distress and well-being.	α = 0.80	0.62 –0.77
Role limitations – emotional (RE)	3 items	Limitations in usual role activities because of emotional problems.	α = 0.83	0.83 –0.77
Vitality (VT)	4 items	Energy and fatigue	α =0.82	0.77 –0.80
General Health perceptions	Single item
Scott, Tobias , Sarfati & Haslett (1999).

Additionally two composite summary scores measure physical health and mental health (Ware, J. E. 1992). Reliability estimates for the composite physical and mental summary scores usually exceed 0.90.

Test/re-test reliability

Brazier, Harper and Jones (1992) used the Bland and Altmann technique.

‘The differences [in scores] are plotted, an overall mean and variance of differences calculated, and 95% confidence intervals constructed around the mean by assuming a normal distribution. The test and retest scores are assumed to be from the same distribution when the differences have a mean of zero and 95% of the differences lie within the 95% confidence limits’ (Brazier, Harper & Jones, 1992, p161).

The researchers found for all dimensions 91-98% of cases lay within the 95% confidence interval. The maximum mean difference in dimension scores was 0.80.

The SF-36 is available in standard version when post-test occurs at 4 weeks. The acute version is available when time limitations require a one-week recall. (McDowell & Newell, 1996)

Alternate Form Reliability

The SF-36 is acceptable to patients (Brazier et al, In Jenkinson et al, 1993), and has practical advantages over such instruments as Sickness Impact Profile, in that it is shorter, and the Nottingham Profile, which has been found to be insensitive to lower levels of dysfunction and disability (Jenkinson et al, 1993).

The SF-12 and SF-8 health survey are also available from the author. The SF-36 and SF12 now have version 2 options. The SF-36 provides greater precision than the SF-12 by providing more detailed information in physical and mental scales, and more robust assessment of the 8 scales. However, it takes longer to complete when compared to the SF 8 (1-2 minutes), and the SF-12 (2 – 3 minutes) and printing space becomes incrementally larger.

www.sf-36.com/faq/generalinfo.shtml#1105.

Missing Data

Some variability in data completeness has been found across population sub-groups in a New Zealand study. This was particular to the elderly and Pacific people. (Scott, Tobias, Sarfati Haslett, 1999). In an Australian sample question 9i and 3c had the highest levels of missing data respectively.

Validity

Criterion or predictive validity

Criterion validity is presented in the SF-36 manual for all dimensions except vitality and social functioning. Each scale reportedly provides a valid representation of the criterion to be measured (Ware, 1993). The SF-36 has been linked to utilization of health care services, clinical course of depression and five-year survival (Ware, J. E 1996). Question items contributing to each dimension are appropriate.

Content Validity

The test items are representative of the conceptual domains of Physical Functioning, General Health and Vitality (Ware, 1993). The author reports that content validity compares favourably with other widely used generic health surveys (Ware, J 1996). The content validity is further supported by the work of Brazier, Harper & Jones, 1992).

Construct Validity

Both physical and mental health scale scores decline in a predictable manner across the 8 scales (Anderson, 1996). The eight scales and 2 summary scales diverge as expected.

In a psychometric and clinical test of validity, the following Rotated Principle Components and relative validity yielded strong association (r ≥ 0.70) for Physical health: Physical functioning (RPC = 0.88, RV = 1.00), Role- physical (RPC = 0.78, RV = 0.79), and Bodily pain (RPC = 0.77, RV = 0.77). Mental health: Mental health (RPC = 0.90, RV = 1.00), role – emotional (RPC = 0.81, RV = 0.81), and Social functioning (RPC = 0.71, RV = 0.62) (McHorney, Ware & Raczek (1993). Construct validity was also supported by a New Zealand study (Scott, Tobias, Sarfati, & Haslett, 1999).

Evidence of construct validity is acceptable on variables of age, gender, socio-economic class, (Jenkinson, et al 1993) presence or absence of chronic physical problems and recent consultation with general practitioner (2 weeks) or outpatient service (3 months) (Brazier et al, 1992).

Convergent Validity and Discriminant Validity

Physical functioning	0.15 –0.56	Vitality	0.24 –1.52
Role physical	0.20 – 0.58	Social Functioning	0.39 –0.56
Bodily pain	0.26 –0.56	Role emotional	0.24 –0.52
General health	0.19 –0.56	Mental Health	0.10 –0.55
Scott, Tobias Sarfati and Haslett (1999) found good item discriminant validity. The lower the correlation the better the discriminant validity.

McHorney, Ware & Raczek (1996) analysed the validity of the physical and mental health constructs. These results show clearly the convergent and discriminant validity of the SF 36. The SF-36 can discriminate between mental health and physical health among medical and psychiatric patients.

Role physical and role emotional showed strong convergent and discriminant validity, Social function scale showed moderate to strong convergent and discriminant validity, Vitality showed good convergent validity but poor discriminant validity. General health perception showed good convergent validity for physical health but poor convergent validity for mental health. Bodily pain showed strong convergent validity in physical health and poor convergent in medical severity clinical test. The authors suggest that the result for bodily pain is most likely a research artefact, as the medical conditions used in the study were not dominated by pain (McHorney et al, 1993).

Sensitivity

Overall the scales contained within the SF-36 are sensitive to clinical manifestations of medical (physical functioning) and global psychiatric (mental health) conditions. It is sensitive to mild functional losses relevant to independent living (Anderson, 1996) and ‘can detect higher levels of everyday physical functioning allowing a broader range of needs to be identified’ (Anderson, 1996, p6). The SF- 36 is able to detect low levels of ill health (Brazier, 1992). This instrument is sensitive to change and therefore can be used for pre and post measurement. However, the psychiatric aspects of this questionnaire are quite medically biased.

www.sf-36.com/faq/generalinfo.shtml#1105

Scoring Methods

Web based scoring is an option and takes about five minutes to complete the questionnaire and receive results and composite scores. On-line demonstration and scoring are available www.sf-36.com.

The manual comprehensively outlines scoring and includes methods of item recoding, recalibration, treatment of missing data, computing raw scores, and transformation of scores. To access the manual, contact the authors www.sf-36.com.

Scores for all dimensions are expressed on a scale 0-100, where higher scores indicate better health and well-being.

Norms

NBS of SF-36 are based on general US populations.

Scores can be transformed to make a minimum and maximum possible score of between 0 and 100. All scores above or below 50 can be interpreted as above or below the general population norm and because the standard deviation for each scale is equalized at 10, it is easy to see exactly how far above (or below) the average score any result is in standard deviation units. A score of 1.96 standard deviations (»20) above or below the mean would suggest that, with 95% confidence, the sample is healthier or unhealthier than are people in general on that measure.

Under such norm-based scoring, the following occurs:

The items are scored and entered;
The items requiring it are recoded;
Scale scores are computed by summing across the recoded items under each scale;
The scale scores are transformed (to make them out of 0 to 100);
An algorithm is applied to make the scores relational to some aspect of the population (e.g. Males) whereby a score of 50 is the mean and 10 is the standard deviation.

However, because there is not always a readily available algorithm to make the scores comparative to some aspect of the population you want to look at (e.g., males, or 55-64 year age group, or smokers, or even male 55-64 years old who smoke, etc), step 5 is not always followed. In these cases a score above 50 can be interpreted as being representative of having a more positive response set to whatever that scale measures (e.g. vitality) rather than worse. However, this does not translate to population means (i.e. a score of above 50 does not mean that the veteran is doing better than the general population on this scale, and a score of below 50 does not mean that the veteran is worse than the average male of equal age in the community). To allow for such a comparison one must obtain norm scores for the comparison group from another study or organisation (e.g. Australian Bureau of Statistics).

www.qualitymetric.com/innohome/norm.shtml.

Lower scores on the SF-36 reflect poorer health, long-standing illness, medical consultation in the past 2 weeks, and women generally reflect poorer scores on all variables (Jenkinson et al, 1993).

This also means that scores from the original SF-36 can be compared to scores from SF-36 Version 2.

The Australian Bureau of Statistics National Health Survey SF-36 Population norms, provides mean values for the same dimension for different population groups in their report. The report provides norm based data for: age, sex, marital status and sex, employment status and sex, equivalent income and sex, household type and sex, risk factors/ behaviours, by State and Territory, selected illness, health transition and age, self-assessed health status and sex.

Australian norms for the SF-36 have not standardised the eight scales to a mean of 50. However the Physical and mental component scores do have a mean of 50 and standard deviation of 10.

Utility/ applicability

Amongst other population samples, the SF-36 has been used to compare quality of life and:

Breast cancer	Schizophrenia	Sleep apnoea
Radiation and cancer	Mood and anxiety disorders	Non surgical patients with lower extremity peripheral artery disease
Pulmonary function	Negative affect	Chronic heart disease
Haemodialysis and peritoneal dialysis	Depressive symptoms in asthmatic patients	Epilepsy and seizure frequency
Carpal tunnel syndrome	Solid organ transplant	Pulmonary rehabilitation

Recommendations to researchers

In the case where a researcher is using other instruments concurrent to the SF-36, the SF-36 should be presented first.

www.sf-36.com/faq/generalinfo.shtml#1105

Limitations

Contains no variable for sleep.

Lower response rate for the >65 population therefore researchers might consider using an alternative questionnaire.

Future

Bell, Kahn et al used the SF-36 to assess health status via the web. Participants remained anonymous. Item inter-correlation of 99.28% was found. Cronbach’s alpha was in an acceptable range (0.76 to 0.90).

References

Australian Bureau of Statistics (1995). National Health Survey: SF-36 Population Norms. 4399.0

Anderson, C., Laubscher, S., & Burns, R (1996). Validation of the short form 36 (SF-36) health survey questionnaire among stroke patients. Stroke 27 (10), 1812 –1816.

Bell, D. S., Kahn C. E (Jr) (1996). Assessing health status via the World Wide Web. In Cimino JC, (Ed) Proceeding of the AMIA Annual Fall Symposium. Philadelphia: Hanley & Belfus 338-342. (Abstract only)

Brazier, J. E., Harper, R., Jones, N.M.B., O’Cathain, A., Usherwood, T., & Westlake, J. (1992) Validating the SF-36 Health survey questionnaire: new outcome measure for primary care. British Medical Journal, 305, 160- 164

Jenkinson, C., Coulter, A., & Wright, L (1993). Short Form 36 (SF-36 health survey questionnaire: Normative data for adults of working age. British Medical Journal, 306 (6890), 1437-1440.

McHorney, C. A., Ware, J. E (Jr)., & Raczek, A. E. (1993) The MOS 36-Item short form health survey (SF-36): II. Psychometric and Clinical Tests of validity in measuring physical and mental health constructs. Medical Care. 31 (3) 247-263.

Scott, K. M., Tobias, M. I., Sarfati, D., & Haslett, S (1999). SF-36 health survey reliability, validity and norms for New Zealand. Australian and New Zealand Journal of Public Health. 23, 401-406.

Ware, J. E, (1993). SF-36 Health Survey: Manual and Interpretation Guide. Boston: The Health Institute, New England Medical Center.

Ware, J. E, (1996). The MOS 36-Item Short Form Health Survey (SF-36). In Sederer, L. I & Dickey, B (1996). Outcomes Assessment in Clinical Practice. Baltimore: Williams and Wilkins.

Ware, J. E and Sherbourne C. D (1992). The MOS 36-item short form health survey (SF-36.) I. Conceptual framework and item selection Medical Care 30, 473-483.

www.sf36.com

Above written by: Ms. Lisa Pearson

Reviewed, edited and approved by: Dr. Grant J. Devilly