Troubleshooting Cisco Catalyst Stack Switch Discovery Issues
Use this when Cisco Catalyst stack members are stuck in discovery or fail to reach Ready state.
Quick Read
- Symptom: Use this when Cisco Catalyst stack members are stuck in discovery or fail to reach Ready state.
- Check first: Run `show switch` and confirm each expected member state.
- Risk: Destructive
Symptoms
One or more Cisco Catalyst stack members do not finish discovery or do not reach a ready state. The stack may show missing members, version mismatch, removed/provisioned members, stack port down state, or repeated discovery messages.
Environment
Cisco Catalyst switch stacks, such as Catalyst 9300 series, running IOS XE with StackWise cabling and multiple stack members.
Most Likely Causes
Stack discovery failures are commonly caused by loose or failed stack cables, an open StackWise ring, member-number conflicts, priority/version mismatch, incompatible IOS XE versions, power or hardware faults, or stale provisioned member configuration. Less common causes include bugs in a specific IOS XE release or a member that cannot pass hardware diagnostics.
What to Check First
- Run `show switch` and confirm each expected member state.
- Run `show switch stack-ports` and confirm stack ports are up and the ring is healthy.
- Check stack cable seating, StackWise port LEDs, and whether the ring is open.
- Review logs for stack member, version, election, or hardware diagnostic errors.
Fix Steps
- Capture stack member state
Start with the current control-plane view of the stack. Record member number, role, priority, MAC, version, and state before making changes.
Example pattern only. Adjust for your environment before running.
show switch show version
- Check StackWise port health
Use StackWise-specific port output and physical LED/cable inspection to determine whether the stack ring is closed or broken.
Example pattern only. Adjust for your environment before running.
show switch stack-ports show interfaces status
- Review stack logs
Look for member join, election, version mismatch, StackWise link, or hardware messages around the time discovery stalled.
Example pattern only. Adjust for your environment before running.
show logging | include STACK|Stack|SWITCH|VERSION|DIAG show logging
- Run supported diagnostics during an approved window
Use platform-supported diagnostic commands for the switch model and IOS XE release. Confirm the exact syntax in Cisco documentation for the target platform before running diagnostics; some diagnostic commands are platform-specific and some tests can be disruptive.
Example pattern only. Adjust for your environment before running.
show diagnostic result switch <switch-number> hw-module switch <switch-number> test online
- Reload only after evidence points to stale discovery state
A reload interrupts traffic. Use it only when cabling, member state, logs, and maintenance approval support a controlled stack reload, and confirm console or out-of-band access before proceeding.
Example pattern only. Adjust for your environment before running.
reload
- Back up configuration before any destructive reset
If a factory reset or stack configuration reset is being considered, take a current backup first and verify restore access.
Example pattern only. Adjust for your environment before running.
copy running-config startup-config copy running-config flash:pre-stack-reset-backup.cfg show startup-config
- Reset configuration only as a last resort
This is destructive. write erase removes configuration and must be used only with a verified backup, console access, restore plan, and approved outage window.
Example pattern only. Adjust for your environment before running.
write erase reload
Validation
- Run `show switch` and confirm all expected members show Ready state and correct active/standby/member roles.
- Run `show switch stack-ports` and confirm the StackWise ring is healthy with expected ports up.
- Review logs after the fix and confirm discovery, version mismatch, or stack port errors do not continue.
- Confirm downstream links and VLAN trunks are passing traffic after any reload or member recovery.
Logs to Check
- Cisco IOS XE `show logging` output around member join/discovery time.
- StackWise port and member state output from `show switch` and `show switch stack-ports`.
- Hardware diagnostic output from `show diagnostic result switch <n>`.
Rollback and Escalation
- Restore the pre-change running configuration if reset or reconfiguration creates service impact.
- Replace a failed stack cable or isolate a failed member if diagnostics point to hardware.
- Rollback IOS XE only through the organization's standard image management process.
Escalate When
- Escalate before running `write erase` or any factory reset command.
- Escalate when stack cabling looks healthy but member diagnostics fail.
- Escalate when a production reload would affect redundant paths, access switching, or uplinks without a maintenance window.
Edge Cases
- A stack can limp with an open ring but become fragile; fix the physical StackWise ring instead of treating it as only a discovery issue.
- A provisioned-but-missing member may be expected after hardware replacement unless member numbering is cleaned up intentionally.
- Mixed IOS XE versions can keep a member from joining cleanly even when cabling is correct.
Notes from the Field
- A real first check is often physical: StackWise cable seating, port LEDs, and whether the ring is open. The CLI confirms what the rack is already trying to tell you.
- Treat `write erase` as a recovery operation, not normal troubleshooting. A stack reset without a config backup turns a discovery issue into an outage.