Medical Outcome Short Form (36) Health Survey

Questionnaire Availability

Questionnaire is available here in html format.


A pdf version of the sf36 is also available here, for ease of administration (it fits on 2 pages!).

Devised by: John E. Ware Jr. PhD – QualityMetric, Inc. Address: 640 George Washington Hwy, Suite 201, RI. Phone 401 334 8800, x242, Fax: 401 334 8801, email: Internet site

Research supported by:

The Henry J. Kaiser Family Foundation, Menlo Park, CA (Grant no. 85-6515) granted to the Health Institute, New England Medical Center.

Co-copyright Holders:

Medical Outcome Trust (MOT), Health Assessment Laboratories (HAL) and QualityMetric Incorporated.


Fee for use of the SF- 36™ is paid by companies and organisations who will profit from the use of the instrument. This allows the MOT, HAL and QualityMetric Inc “to make surveys available royalty free to individuals and organisations for academic research”.

The SF-36™ is distributed by MOS Trust Inc and “strict adherence to item wording and scoring recommendations is required in order to use the SF-36 trademark”

Type of Instrument

The SF- 36™ is a short form measure of generic health status in the general population. The SF-36™ is designed for self-administration. Alternatively, a trained interviewer can use a standardized script for face to face and telephone interview. The SF™-36 takes 5 –10 minutes for respondent to complete. Can be administered to anyone over the age of 14.

 From the 36 items, eight health profiles are derived from summarised scores. All dimensions are independent of each other. A comprehensive manual and interpretation guide is available from the author (Ware, 1993).

Designed to be used in


The SF-36 has been translated and adapted in 29 countries. It has been translated into languages including: English, Spanish, French, Swedish, Korean, German, Dutch, Portuguese, Chinese, Czech, Finnish, Danish, Hungarian, Hebrew, Italian, Japanese, Norwegian, Polish, Romanian, Slovak, Russian, Afrikaans. Furthermore, the SF-36 has been replicated across 24 different patient groups from various socio-economic situations and diagnosis (Ware, J, 1996). From an Australian and particularly Melbourne perspective this instrument has great utility in a multicultural city providing predominantly Anglo-Saxon services.


Split half/ Cronbach’s alpha

One study found that the SF-36 yielded a high rate of response (83%) and rate of completion (95%) (Brazier, Harper, Jones, O’Cathain, Thomas, et al, 1992). A study consisting of 19,785 respondents over 18 years of age yielded an 80.6% full completion of SF-36 and 14.6% partial completion (The Australian Bureau of Statistics (ABS), 1995). Using Cronbach’s alpha, an alpha of 0.5 is acceptable, Nunnally recommends 0.7 and above. Well-used tests should give 0.8 and above (Jenkinson et al, 1993, p.4). Cronbach's alpha on all scales of the SF-36 exceed alpha of 0.8, except for social functioning (α = 0.76) (Jenkinson et al, 1993). Similar findings have been reported by Brazier, Harper and Jones et al  (1992). In the case of the Social Functioning dimension the results are considered acceptable due to the small number of items (2 items using a 5 point scale; Jenkinson et al, 1993). The Physical Functioning dimension has consistently exceeded 0.90 (Ware, 1993).


The 8 Health Profiles including number of items, Cronbach’s alpha and item internal consistency.


Number of items

Definition of scale

Internal consistency reliability (Cronbach’s alpha)

Range of item internal consistency

Physical Functioning – (PF)

10 items

Limitations in physical activity because of health problems

α= 0.93

0.64 – 0.83

Social Functioning


2 items

Limitations in social activities because of physical or emotional problems

α = 0.90


Role limitations – physical (RP)

4 items

Limitations in usual role activities because of physical health problem

α = 0.82


Bodily pain (BP)

2 items

Presence of pain and limitations due to pain

α = 0.95


General medical health (GH)

5 items

Self evaluation of personal health

α = 0.82


Mental health (MH)

5 items

Psychological distress and well-being.

α = 0.80

0.62 –0.77

Role limitations – emotional (RE)

3 items

Limitations in usual role activities because of emotional problems.

α = 0.83

0.83 –0.77

Vitality (VT)

4 items

Energy and fatigue

α =0.82

0.77 –0.80

General Health perceptions

Single item




Scott, Tobias , Sarfati & Haslett (1999).


 Additionally two composite summary scores measure physical health and mental health (Ware, J. E. 1992). Reliability estimates for the composite physical and mental summary scores usually exceed 0.90.

Test/re-test reliability

Brazier, Harper and Jones (1992) used the Bland and Altmann technique.

‘The differences [in scores] are plotted, an overall mean and variance of differences calculated, and 95% confidence intervals constructed around the mean by assuming a normal distribution. The test and retest scores are assumed to be from the same distribution when the differences have a mean of zero and 95% of the differences lie within the 95% confidence limits’ (Brazier, Harper & Jones, 1992, p161).

The researchers found for all dimensions 91-98% of cases lay within the 95% confidence interval. The maximum mean difference in dimension scores was 0.80.

The SF-36 is available in standard version when post-test occurs at 4 weeks. The acute version is available when time limitations require a one-week recall. (McDowell & Newell, 1996)

Alternate Form Reliability

The SF-36 is acceptable to patients (Brazier et al, In Jenkinson et al, 1993), and has practical advantages over such instruments as Sickness Impact Profile, in that it is shorter, and the Nottingham Profile, which has been found to be insensitive to lower levels of dysfunction and disability (Jenkinson et al, 1993).

The SF-12 and SF-8 health survey are also available from the author.  The SF-36 and SF12 now have version 2 options. The SF-36 provides greater precision than the SF-12 by providing more detailed information in physical and mental scales, and more robust assessment of the 8 scales. However, it takes longer to complete when compared to the SF 8 (1-2 minutes), and the SF-12 (2 – 3 minutes) and printing space becomes incrementally larger.

Missing Data

Some variability in data completeness has been found across population sub-groups in a New Zealand study. This was particular to the elderly and Pacific people. (Scott, Tobias, Sarfati Haslett, 1999). In an Australian sample question 9i and 3c had the highest levels of missing data respectively.


Criterion or predictive validity

Criterion validity is presented in the SF-36 manual for all dimensions except vitality and social functioning. Each scale reportedly provides a valid representation of the criterion to be measured (Ware, 1993). The SF-36 has been linked to utilization of health care services, clinical course of depression and five-year survival (Ware, J. E 1996). Question items contributing to each dimension are appropriate.

Content Validity

The test items are representative of the conceptual domains of Physical Functioning, General Health and Vitality (Ware, 1993). The author reports that content validity compares favourably with other widely used generic health surveys (Ware, J 1996). The content validity is further supported by the work of Brazier, Harper & Jones, 1992).

Construct Validity

Both physical and mental health scale scores decline in a predictable manner across the 8 scales (Anderson, 1996). The eight scales and 2 summary scales diverge as expected.

In a psychometric and clinical test of validity, the following Rotated Principle Components and relative validity yielded strong association (r 0.70) for Physical health: Physical functioning (RPC = 0.88, RV = 1.00), Role- physical (RPC = 0.78, RV = 0.79), and Bodily pain (RPC = 0.77, RV = 0.77). Mental health: Mental health (RPC = 0.90, RV = 1.00), role – emotional (RPC = 0.81, RV = 0.81), and Social functioning (RPC = 0.71, RV = 0.62) (McHorney, Ware & Raczek (1993). Construct validity was also supported by a New Zealand study (Scott, Tobias, Sarfati, & Haslett, 1999).

Evidence of construct validity is acceptable on variables of age, gender, socio-economic class, (Jenkinson, et al 1993) presence or absence of chronic physical problems and recent consultation with general practitioner (2 weeks) or outpatient service (3 months) (Brazier et al, 1992).

Convergent Validity and Discriminant Validity

Physical functioning

0.15 –0.56


0.24 –1.52

Role physical

0.20 – 0.58

Social Functioning

0.39 –0.56

Bodily pain

0.26 –0.56

Role emotional

0.24 –0.52

General health

0.19 –0.56

Mental Health

0.10 –0.55

Scott, Tobias Sarfati and Haslett (1999) found good item discriminant validity. The lower the correlation the better the discriminant validity.

McHorney, Ware & Raczek (1996) analysed the validity of the physical and mental health constructs. These results show clearly the convergent and discriminant validity of the SF 36. The SF-36 can discriminate between mental health and physical health among medical and psychiatric patients.

Role physical and role emotional showed strong convergent and discriminant validity, Social function scale showed moderate to strong convergent and discriminant validity, Vitality showed good convergent validity but poor discriminant validity. General health perception showed good convergent validity for physical health but poor convergent validity for mental health. Bodily pain showed strong convergent validity in physical health and poor convergent in medical severity clinical test. The authors suggest that the result for bodily pain is most likely a research artefact, as the medical conditions used in the study were not dominated by pain (McHorney et al, 1993).


Overall the scales contained within the SF-36 are sensitive to clinical manifestations of medical (physical functioning) and global psychiatric (mental health) conditions. It is sensitive to mild functional losses relevant to independent living (Anderson, 1996) and ‘can detect higher levels of everyday physical functioning allowing a broader range of needs to be identified’ (Anderson, 1996, p6). The SF- 36 is able to detect low levels of ill health (Brazier, 1992). This instrument is sensitive to change and therefore can be used for pre and post measurement. However, the psychiatric aspects of this questionnaire are quite medically biased.

Scoring Methods

Web based scoring is an option and takes about five minutes to complete the questionnaire and receive results and composite scores. On-line demonstration and scoring are available

The manual comprehensively outlines scoring and includes methods of item recoding, recalibration, treatment of missing data, computing raw scores, and transformation of scores. To access the manual, contact the authors

Scores for all dimensions are expressed on a scale 0-100, where higher scores indicate better health and well-being.


NBS of SF-36 are based on general US populations. 

Scores can be transformed to make a minimum and maximum possible score of between 0 and 100.  All scores above or below 50 can be interpreted as above or below the general population norm and because the standard deviation for each scale is equalized at 10, it is easy to see exactly how far above (or below) the average score any result is in standard deviation units.  A score of 1.96 standard deviations (»20) above or below the mean would suggest that, with 95% confidence, the sample is healthier or unhealthier than are people in general on that measure.

Under such norm-based scoring, the following occurs:

  1. The items are scored and entered;

  2. The items requiring it are recoded;

  3. Scale scores are computed by summing across the recoded items under each scale;

  4. The scale scores are transformed (to make them out of 0 to 100);

  5. An algorithm is applied to make the scores relational to some aspect of the population (e.g. Males) whereby a score of 50 is the mean and 10 is the standard deviation.

However, because there is not always a readily available algorithm to make the scores comparative to some aspect of the population you want to look at (e.g., males, or 55-64 year age group, or smokers, or even male 55-64 years old who smoke, etc), step 5 is not always followed. In these cases a score above 50 can be interpreted as being representative of having a more positive response set to whatever that scale measures (e.g.  vitality) rather than worse.  However, this does not translate to population means (i.e.  a score of above 50 does not mean that the veteran is doing better than the general population on this scale, and a score of below 50 does not mean that the veteran is worse than the average male of equal age in the community).  To allow for such a comparison one must obtain norm scores for the comparison group from another study or organisation (e.g. Australian Bureau of Statistics).

Lower scores on the SF-36 reflect poorer health, long-standing illness, medical consultation in the past 2 weeks, and women generally reflect poorer scores on all variables (Jenkinson et al, 1993).

This also means that scores from the original SF-36 can be compared to scores from SF-36 Version 2.

Australian norms for the SF-36 have not standardised the eight scales to a mean of 50. However the Physical and mental component scores do have a mean of 50 and standard deviation of 10.

Utility/ applicability

Amongst other population samples, the SF-36 has been used to compare quality of life and:

Breast cancer


Sleep apnoea

Radiation and cancer

Mood and anxiety disorders

Non surgical patients with lower extremity peripheral artery disease

Pulmonary function

Negative affect

Chronic heart disease

Haemodialysis and peritoneal dialysis

Depressive symptoms in asthmatic patients

Epilepsy and seizure frequency

Carpal tunnel syndrome

Solid organ transplant

Pulmonary rehabilitation

Recommendations to researchers

In the case where a researcher is using other instruments concurrent to the SF-36, the SF-36 should be presented first.


Contains no variable for sleep.

Lower response rate for the >65 population therefore researchers might consider using an alternative questionnaire.


Bell, Kahn et al used the SF-36 to assess health status via the web. Participants remained anonymous. Item inter-correlation of 99.28% was found. Cronbach’s alpha was in an acceptable range (0.76 to 0.90).


Australian Bureau of Statistics (1995). National Health Survey: SF-36 Population Norms. 4399.0

Anderson, C., Laubscher, S., & Burns, R (1996). Validation of the short form 36 (SF-36) health survey questionnaire among stroke patients. Stroke 27 (10), 1812 –1816.

Bell, D. S., Kahn C. E (Jr) (1996). Assessing health status via the World Wide Web. In Cimino JC, (Ed) Proceeding of the AMIA Annual Fall Symposium. Philadelphia: Hanley & Belfus 338-342. (Abstract only)

Brazier, J. E., Harper, R., Jones, N.M.B., O’Cathain, A., Usherwood, T., & Westlake, J. (1992) Validating the SF-36 Health survey questionnaire: new outcome measure for primary care. British Medical Journal, 305, 160- 164

Jenkinson, C., Coulter, A., & Wright, L (1993). Short Form 36 (SF-36 health survey questionnaire: Normative data for adults of working age. British Medical Journal, 306 (6890), 1437-1440.

McHorney, C. A., Ware, J. E (Jr)., & Raczek, A. E. (1993) The MOS 36-Item short form health survey (SF-36): II. Psychometric and Clinical Tests of validity in measuring physical and mental health constructs. Medical Care. 31 (3) 247-263.

Scott, K. M., Tobias, M. I., Sarfati, D., & Haslett, S (1999). SF-36 health survey reliability, validity and norms for New Zealand. Australian and New Zealand Journal of Public Health. 23, 401-406.

Ware, J. E, (1993). SF-36 Health Survey: Manual and Interpretation Guide. Boston: The Health Institute, New England Medical Center. 

Ware, J. E, (1996). The MOS 36-Item Short Form Health Survey (SF-36). In Sederer, L. I & Dickey, B (1996).  Outcomes Assessment in Clinical Practice. Baltimore: Williams and Wilkins.

Ware, J. E and Sherbourne C. D (1992). The MOS 36-item short form health survey (SF-36.) I. Conceptual framework and item selection Medical Care 30, 473-483.

Above written by: Ms. Lisa Pearson

Reviewed, edited and approved by: Dr. Grant J. Devilly