Coming to Consensus

In March 2007, the American Academy of Sleep Medicine (AASM) published a series of articles in the Journal of Clinical Sleep Medicine (JCSM) outlining proposed changes to existing scoring criteria for sleep and related events. The articles were followed by the release of a manual, with recommendations for visual scoring of sleep stages, arousals, cardiac events, movement events, and respiratory events. The AASM has announced that compliance with the new manual will be mandatory for accredited sleep centers effective July 2008.

The AASM Manual for the Scoring of Sleep and Associated Events¹ is authored by a steering committee of the AASM, and is based on literature reviews and consensus reports from several task forces, each assigned to a specific topic. The members of the task forces did not have an opportunity to review the manual or the supporting JCSM articles prior to publication. The recommended changes to existing scoring criteria are based largely on consensus. Of a total of 119 parameters, seven are reported meeting level I or II evidence criteria; two are reported as guidelines, 10 as adjudications by the steering committee, and the remaining 100 as consensus-based (level IV evidence or less).

More commentary on the new AASM Scoring Manual available in the online archives.

Following conventional scientific practice, consensus-based recommendations are generally field-tested, debated, and modified as necessary before they are mandated as a new standard. A closer look at the manual and the supporting JCSM articles reveals a number of technical and practical concerns that warrant further discussion.

PROPOSED CHANGES TO THE RECHTSCHAFFEN AND KALES MANUAL

For nearly 40 years, the Rechtschaffen and Kales (R&K) scoring manual² has served as the framework for scoring sleep stages and remains the most widely recognized and most often cited sleep reference manual worldwide. The technical specifications for collecting and scoring the data are practical, concise, and easy to understand. Logical explanations are given for the recommended derivations and scoring criteria. It is also generally understood that the scoring rules are based on normal young adult subjects and that some adjustments must be made when implementing these rules in clinical sleep studies.

The R&K manual specifies central EEG derivations (either C3-A2 or C4-A1) as most optimal for identifying EEG features relevant to scoring sleep stages. These include sleep spindles, K-complexes, vertex sharp waves, and high-amplitude slow waves. The R&K manual especially emphasizes the importance of scoring sleep stages based on either the C3-A2 or C4-A1 derivations, even if other EEG channels are recorded.

The new AASM Scoring Manual proposes the addition of frontal leads for the scoring of slow wave sleep. The rationale for this recommendation is that slow wave amplitudes are maximal in the frontal region; therefore, higher percentages of slow wave sleep will be scored if frontal EEG derivations are used. But at the same time, the new manual recommends combining stages 3 and 4 sleep, seemingly diminishing the importance of quantifying deep slow wave sleep.

One of the arguments for adding more channels to the polysomnogram is that today’s recording systems offer that capability. However, there is a practical limit to the number of channels that can adequately be displayed on a computer screen while maintaining correct viewing size, clarity, and aspect ratio for proper visual scoring. Moreover, adding extra channels does not compensate for lack of quality. It may be suggested that better results are achieved with an adequate number of essential, properly reproduced channels, than an excessive number of poorly managed ones.

There are other practical issues to consider when adding frontal EEG leads. The recording and scoring of sleep studies become more complicated. Frontal EEG leads are more likely to detect eye movements, which could be misinterpreted as slow EEG waves. The occurrence of frontal intermittent delta activity, a phenomenon sometimes associated with cognitive impairment in the elderly, may also be misinterpreted as slow wave sleep.³ Moreover, the clinical value of elevating the overall percentage of slow wave sleep remains unclear.

BIPOLAR EEG DISCREPANCY

In addition to recommending frontal leads, the new manual presents the alternative option of using bipolar frontal derivations, an arrangement that greatly reduces EEG amplitudes (Figure 1). The rationale for using bipolar derivations, as described by the supporting articles published in JCSM, is that “the ear or mastoid electrode is an active reference, which in addition to recording EEG activity may also record EMG and EKG artifact.” This statement is technically misleading, because the ear or mastoid reference provides a relatively inactive reference to the exploring scalp electrodes with respect to EEG activity.

Figure 1. Referential and bipolar EEG comparison. Note that the frontal and central referential derivations are very similar, but the amplitude of the bipolar derivation is greatly reduced.

The reason that referential EEG derivations (such as C3-A2) are used in sleep studies is because this arrangement maximizes EEG amplitudes and identifies the site of potential origin. In contrast, bipolar EEG derivations significantly reduce signal amplitudes, and the site of potential origin is not apparent unless serial chain-linked bipolar derivations are used. It is true that longer interelectrode distances used in referential EEG derivations may sometimes cause ECG artifacts to appear in the EEG and eye channels, but these can be minimized by practicing proper electrode application techniques.

Since the publication of the new manual, the AASM has addressed the bipolar EEG discrepancy in a frequently asked questions (FAQ) Web site; however, the Web site posting then offers yet another unconventional derivation (Fpz-E1). The FAQ posting also states that scoring from a conventional central EEG derivation (C4-M1) remains an option. This raises the question as to why frontal leads are recommended in the first place.

A particular problem with the proposed addition of frontal leads is that multiple options have now been presented to the field, each yielding differing results.

An added problem is that modifications to the new manual are being posted on the Internet, creating confusion among those who perform and score the sleep studies.

ALTERNATIVE ELECTROOCULOGRAM RECORDINGS

The new manual also offers the option of alternative electrooculogram (EOG) derivations, using Fpz as a reference for the right and left outer canthus electrodes. The rationale for using this option, as noted in the manual, is to differentiate between vertical and horizontal eye movements, a distinction that arguably has no practical value.

An argument against using the alternative EOG derivations is that eye movements that appear as in-phase deflections can easily be confused with artifacts. The R&K manual specifically cautions against the use of a supranasion (Fpz) electrode for this very reason. In the present day, continuous positive airway pressure (CPAP) headgear used during sleep studies is likely to increase the incidence of these artifacts by pressing against the Fpz electrode.

OTHER PROPOSED CHANGES TO SLEEP STAGE SCORING

Aside from the addition of frontal leads and the option of alternative derivations, proposed changes to the R&K manual are relatively minor. Nomenclature changes include the renaming of stages 1, 2, 3, and 4 sleep as N1, N2, and N3, with stages 3 and 4 combined. Stage REM is renamed as stage R. The A1 and A2 reference electrode sites are renamed as M1 and M2. A few modifications are made to marking the beginning and ending of sleep stages. The 3-minute rule for continuing stage 2 sleep in the absence of K-complexes and sleep spindles has been discontinued, as well as the use of “movement time” to denote epochs that are unscorable due to artifact.

MOVEMENT EVENT SCORING

The new manual has recommendations for scoring periodic limb movements, bruxism, features of REM sleep behavior disorder, and rhythmic movement disorder. Optional recommendations are presented for alternating leg muscle activation, hypnagogic foot tremor, and fragmentary myoclonus.

The scoring of limb movements is an example of a process that has become excessively complicated. Historically, the scoring of limb movements was limited to periodic leg movements (PLMs), based on a stereotypical pattern easily recognized by the scorer. In 1993, the AASM, formerly the American Sleep Disorders Association (ASDA), published a paper in the journal Sleep,⁴ outlining a more elaborate system of scoring leg movements, proposing that all movements be scored and arranged into multiple categories, with subsequent analysis that went far beyond the recognition of stereotypical PLMs.

The new AASM manual initially simplifies the scoring of leg movements by refocusing on the relevant issue of PLMs, but then adds voltage criteria for marking the beginning and ending of each event. The manual suggests using an 8 μV increase above baseline to mark the onset of a movement and a decrease below 2 μV to mark the offset. At a standard sensitivity of 50 μV per centimeter, such measurements are too minimal to discern.

The manual also refers to the use of “sensitivity limits of –100 and 100 μV (upper/lower).” This statement is unclear, because sensitivity is defined as the ratio of voltage to a unit of measure. Assuming that the unit of measure is one centimeter, a sensitivity of 100 μV/cm is even lower than the commonly used 50 mV/cm, making it virtually impossible to visually discern the voltage values suggested above.

The options of scoring alternating leg muscle activations, hypnagogic foot tremors, and fragmentary myoclonus seem unnecessary because these phenomena have no known clinical significance. Options such as these are more likely to result in over-scoring of the study, with tabulation of events that have no clinical relevance.

SCORING OF BRUXISM

The recommendations for scoring bruxism presented by the new manual appear oversimplified and will likely result in significant over-scoring of these events. The manual recommends scoring bruxism based on “brief or sustained elevations of chin EMG activity that are at least twice the amplitude of background EMG.” The problem with this definition is that many activities can cause elevations in the chin EMG, including head movements, snoring, swallowing, coughing, yawning, or simply tensing the chin muscles. In fact, most arousals are accompanied by brief elevations in chin EMG, which are not necessarily associated with bruxism.

When bruxism involves rhythmic masticatory muscle activity, it is usually best seen as a stereotypical, rapid-sequence artifact in the EEG and EOG channels (Figure 2). EMG electrodes placed over the masseter area can also detect this artifact. However, in the absence of this stereotypical pattern, it is not reasonable to assume that every elevation in chin EMG represents bruxism.

Figure 2. Bruxism. Note the stereotypical, rapid-sequence artifacts in the EEG and eye channels.

SCORING OF REM SLEEP BEHAVIOR DISORDER

The new manual recommends scoring characteristic features of REM sleep behavior disorder (RBD), based on transient or sustained muscle activity during REM sleep. This definition also presents a problem, because recognizing features of RBD, and differentiating them from other physiological phenomena or possible artifacts, requires a careful examination of the pattern within the context of the recording. Other conditions, such as Parkinson’s disease, may cause increased motor activity during REM sleep. Quite often, the features of RBD may appear similar to wakefulness. In other instances, frequent arousals during REM sleep could be misinterpreted as RBD. Recordings with atypical eye movements during NREM sleep caused by medications may also take on the appearance of RBD.

TECHNICAL RECOMMENDATIONS FOR RESPIRATORY EVENT SCORING

The new manual has recommendations for scoring apneas, hypopneas, and Cheyne-Stokes breathing, along with optional recommendations for scoring respiratory effort-related arousals (RERAs) and hypoventilation. The manual also has technical recommendations for recording respiratory parameters, the most notable being the use of dual airflow sensors. According to the manual, thermal sensors are recommended for the detection of apnea and nasal air pressure transducers for the detection of hypopnea. The rationale for using dual airflow sensors is that nasal pressure transducers are more sensitive to minor changes in airflow, but may cause overestimation of apnea, while thermal sensors are less sensitive to minor breathing fluctuations, but are more reliable for identifying apnea.

In discussing thermal sensors, the JCSM articles and the new manual do not make a distinction between thermistors and thermocouples. A thermistor is a variable resistor that responds to temperature changes, while a thermocouple is made of dissimilar metals that generate a variable voltage in response to temperature fluctuations. Of the two, the thermocouple provides a more stable signal and is more sensitive to minor airflow changes. And while pressure transducers are even more sensitive than thermocouples, they are more expensive, more cumbersome to the patient, and susceptible to signal loss due to mouth breathing. An even greater concern is that cannula-based pressure transducers may potentially increase nasal airway resistance.⁵

For these reasons, thermocouples are favored by many sleep centers as a reasonable compromise between a pressure transducer and a thermistor. Thermocouples are unobtrusive, are easily tolerated by patients, and are typically designed to record both nasal and oral airflow. Attempts to detect oral airflow with a pressure transducer have been largely unsuccessful.

The suggestion to use both types of sensors raises logistical issues regarding practicality, patient comfort, and the potential effects of the combined sensors on nasal airflow. It is also likely that respiratory events will be over-scored, especially if the signals are evaluated out of context, without taking into account normal variants of sleep/wake physiology and the high incidence of artifacts.

APNEA AND HYPOPNEA SCORING

The new manual offers definitions for scoring apneas and hypopneas based on percentages of signal excursion drop from “baseline.” In actual practice, such measurements are unrealistic because the signals are nonquantitative and continually variable with postural changes and body movements. Establishing a signal baseline is especially unrealistic in patients with sleep-disordered breathing because the waveforms are repeatedly exaggerated during the recovery breaths occurring at the end of each event. Thus, designations of 30% or 50% reductions in amplitude are more hypothetical than practical.

The manual follows the conventional wisdom that central respiratory events are distinguished from obstructive events by absence of effort. In actual practice, lack of respiratory effort does not rule out the possibility of upper airway obstruction. It is not uncommon for obstructive respiratory events to appear ostensibly central, even when quantitative measures of effort are made. A more effective method of differentiating central and obstructive events is to examine the respiratory signals within the context of other PSG parameters (Figure 3). Evidence of upper airway obstruction can usually be identified on the basis of breakthrough snores, arousals, body movements, and other physiological activations that typically accompany obstructive events, whereas true central apneas and hypopneas are generally quiet, without evidence of activation or struggle to clear an obstructive airway. It may also be suggested that direct behavioral observation is still the most reliable method for confirming the etiology of any respiratory disturbance, and is more accurate than relying on any particular sensor.

Figure 3. Cyclic obstructive hypopneas (shown in compressed time scale). Correlating the respiratory patterns with the top channels, which show snoring artifacts in the chin EMG and arousals in the EEG confirms the etiology of these events. However, without the top channels, the respiratory patterns alone are meaningless. Also note that if the currently recommended hypopnea definition is used, these events cannot be scored because of insufficient O2 desaturations; but if the alternative definition is used, the hypopneas can be scored on the basis of arousal.

SCORING OF CHEYNE-STOKES RESPIRATION

The new manual defines Cheyne-Stokes respiration based on “cycles of crescendo and decrescendo changes in breathing amplitude with five or more central apneas or hypopneas per hour of sleep, or if the crescendo and decrescendo pattern has a duration of at least 10 consecutive minutes.” The main concern with this definition is that many forms of sleep-disordered breathing may take on the appearance of a Cheyne-Stokes breathing pattern. Drug-related central apneas and hypopneas, repetitive sleep-onset central apneas, central apneas and hypopneas temporarily induced by CPAP, and even obstructive apneas may appear similar to Cheyne-Stokes breathing, especially when viewed in a compressed time scale.

TREND TOWARD PORTABILITY AND AUTOMATION

It has been suggested by some that the new scoring manual is paving the way for portable, limited channel sleep studies and automated scoring. The manual does, in fact, present the respiratory scoring parameters apart from the rest of the polysomnogram, without discussing the relevance of viewing respiratory patterns within the context of the patient’s sleep/wake physiology. The only reference to other PSG channels is made by the brief mention of arousal, as a possible scoring criterion for the alternative hypopnea definition and as a criterion for the optional scoring of RERAs. The lack of discussion regarding other PSG parameters creates the impression that respiratory events can be evaluated based solely on respiratory tracings and oximetry, without viewing the polysomnogram as a whole. This is unfortunate because without correlating respiratory patterns with the patient’s physiological state, and evaluating their effects upon that state, the interpretation of respiratory events becomes largely a matter of guesswork.

The subject of automated scoring was the first item on the agenda presented to the digital task force. This topic received far more attention than some of the more urgent technical issues pertaining to digital polysomnography, such as amplifier quality and design. After an extensive literature search, the task force concluded that evidence regarding automated scoring was lacking. Subsequently, the task force did address some of the more pertinent issues relating to digital polysomnography, although more work is needed in this area.

SUMMARY

The new scoring manual presents multiple recommendations, based largely on group consensus. The recommendations have not yet been field-tested, nor have they received universal approval. The new manual and the supporting JCSM articles contain a number of technical misconceptions and contradictions that have not yet been adequately addressed. The proposed changes are described as recommendations, yet are being mandated for sleep center accreditation.

Following basic scientific principles, consensus-based recommendations are generally scrutinized and debated, and are eventually either accepted or rejected on the basis of merit and practicality. This was the case with the Rechtschaffen and Kales manual nearly 40 years ago. It seems reasonable to expect the same level of scrutiny to occur now, before the changes are implemented as a new standard.

Nic Butkov, RPSGT, is education coordinator at the Rogue Valley Sleep Center, CEO of Synapse Media Inc, and director of the School of Clinical Polysomnography in Medford, Ore.

REFERENCES

Iber C. Ancoli-Israel S, Chesson A, Quan SF, for the American Academy of Sleep Medicine. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. Westchester, Ill: American Academy of Sleep Medicine; 2007.
Rechtschaffen A, Kales A. A Manual of Standardized Terminology, Techniques and Scoring Systems for Sleep Stages of Human Subjects. Los Angeles: UCLA Brain Information Services/Brain Research Institute; 1968.
Torres A, Faoro A, Loewenson R, Johnson E. The electroencephalogram of elderly subjects revisited. Electroencephalogr Clin Neurophysiol. 1983;56:391-98.
Bonnet MH, Carley D, Carskadon MA, et al. Recording and scoring leg movements: the Atlas Task Force. Sleep. 1993;16:748-759.
Lorino AM, Lorino H, Dehan E, et al. Effects of nasal prongs on nasal airflow resistance. Chest. 2000;118:366–371.

Coming to Consensus