Troubleshooting Cisco Catalyst Stack Switch Discovery Issues

Use this when Cisco Catalyst stack members are stuck in discovery or fail to reach Ready state.

Quick Read

  • Symptom: Use this when Cisco Catalyst stack members are stuck in discovery or fail to reach Ready state.
  • Check first: Run `show switch` and confirm each expected member state.
  • Risk: Destructive

Symptoms

One or more Cisco Catalyst stack members do not finish discovery or do not reach a ready state. The stack may show missing members, version mismatch, removed/provisioned members, stack port down state, or repeated discovery messages.

Environment

Cisco Catalyst switch stacks, such as Catalyst 9300 series, running IOS XE with StackWise cabling and multiple stack members.

Most Likely Causes

Stack discovery failures are commonly caused by loose or failed stack cables, an open StackWise ring, member-number conflicts, priority/version mismatch, incompatible IOS XE versions, power or hardware faults, or stale provisioned member configuration. Less common causes include bugs in a specific IOS XE release or a member that cannot pass hardware diagnostics.

What to Check First

  1. Run `show switch` and confirm each expected member state.
  2. Run `show switch stack-ports` and confirm stack ports are up and the ring is healthy.
  3. Check stack cable seating, StackWise port LEDs, and whether the ring is open.
  4. Review logs for stack member, version, election, or hardware diagnostic errors.

Fix Steps

  1. Capture stack member state

    Start with the current control-plane view of the stack. Record member number, role, priority, MAC, version, and state before making changes.

    Example pattern only. Adjust for your environment before running.

    show switch
    show version
  2. Check StackWise port health

    Use StackWise-specific port output and physical LED/cable inspection to determine whether the stack ring is closed or broken.

    Example pattern only. Adjust for your environment before running.

    show switch stack-ports
    show interfaces status
  3. Review stack logs

    Look for member join, election, version mismatch, StackWise link, or hardware messages around the time discovery stalled.

    Example pattern only. Adjust for your environment before running.

    show logging | include STACK|Stack|SWITCH|VERSION|DIAG
    show logging
  4. Run supported diagnostics during an approved window

    Use platform-supported diagnostic commands for the switch model and IOS XE release. Confirm the exact syntax in Cisco documentation for the target platform before running diagnostics; some diagnostic commands are platform-specific and some tests can be disruptive.

    Example pattern only. Adjust for your environment before running.

    show diagnostic result switch <switch-number>
    hw-module switch <switch-number> test online
  5. Reload only after evidence points to stale discovery state

    A reload interrupts traffic. Use it only when cabling, member state, logs, and maintenance approval support a controlled stack reload, and confirm console or out-of-band access before proceeding.

    Example pattern only. Adjust for your environment before running.

    reload
  6. Back up configuration before any destructive reset

    If a factory reset or stack configuration reset is being considered, take a current backup first and verify restore access.

    Example pattern only. Adjust for your environment before running.

    copy running-config startup-config
    copy running-config flash:pre-stack-reset-backup.cfg
    show startup-config
  7. Reset configuration only as a last resort

    This is destructive. write erase removes configuration and must be used only with a verified backup, console access, restore plan, and approved outage window.

    Example pattern only. Adjust for your environment before running.

    write erase
    reload

Validation

  • Run `show switch` and confirm all expected members show Ready state and correct active/standby/member roles.
  • Run `show switch stack-ports` and confirm the StackWise ring is healthy with expected ports up.
  • Review logs after the fix and confirm discovery, version mismatch, or stack port errors do not continue.
  • Confirm downstream links and VLAN trunks are passing traffic after any reload or member recovery.

Logs to Check

  • Cisco IOS XE `show logging` output around member join/discovery time.
  • StackWise port and member state output from `show switch` and `show switch stack-ports`.
  • Hardware diagnostic output from `show diagnostic result switch <n>`.

Rollback and Escalation

  • Restore the pre-change running configuration if reset or reconfiguration creates service impact.
  • Replace a failed stack cable or isolate a failed member if diagnostics point to hardware.
  • Rollback IOS XE only through the organization's standard image management process.

Escalate When

  • Escalate before running `write erase` or any factory reset command.
  • Escalate when stack cabling looks healthy but member diagnostics fail.
  • Escalate when a production reload would affect redundant paths, access switching, or uplinks without a maintenance window.

Edge Cases

  • A stack can limp with an open ring but become fragile; fix the physical StackWise ring instead of treating it as only a discovery issue.
  • A provisioned-but-missing member may be expected after hardware replacement unless member numbering is cleaned up intentionally.
  • Mixed IOS XE versions can keep a member from joining cleanly even when cabling is correct.

Notes from the Field

  • A real first check is often physical: StackWise cable seating, port LEDs, and whether the ring is open. The CLI confirms what the rack is already trying to tell you.
  • Treat `write erase` as a recovery operation, not normal troubleshooting. A stack reset without a config backup turns a discovery issue into an outage.