This post is a summary of the flap list feature. It is not necessarily just a Cisco-only feature, even though I will be focussing on Cisco here. There are other vendors such as Casa, Arris, Huawei.. The list goes on.
A flap list keeps track of CMs with connectivity problems (they're flapping back and forth). The flap list can help in deciding between whether there is a problem with the given CM, or whether there is a problem with the upstream or downstream cable.
There are three classes of problems:
- Reinsertions: A CM re-registers more frequently than a specified insertion time. Too many reinsertions may indicate problems in the downstream cable or that the CM is provisioned wrongly.
- Hits and misses: A hit occurs when a CM successfully responds to MAC-layer keepalive messages that the CMTS sends out. A miss occurs when the CM does not respond in after a timeout. Too many misses followed by hits may indicate problems in the up/downstream cable.
- Power adjustments: A CM can adjust their upstream transmission power up to a maximum power level. Too many adjustments may indicate a problem with an amplifier in the upstream direction.
- If a customer reports a problem, but the flap list does not contain the customer's modem, problems with the cable can be ruled out. The issue is then most likely local.
- CMs with more than 50 power adjustments a day have a potential issue in their upstream path.
- CMs with roughly same number of hits and misses and with a lot of insertions have a potential issue in their downstream path.
- All CMs incrementing their insertion numbers at the same time indicates a problem with the provisioning servers.
- CMs with high CRC errors have bad upstream paths.
- Correlating CMs on the same upstream port with similar flap list statistics (same number of hits/misses/insertions) may show cable or node-wide issues.
References: Cisco Flap List pdf (use cases are from this pdf)
|KPI (per CM)||Description|
|ccsFlapInsertionFailNum||If a CM registered more than once in a certain period (default: 90s), the first registration is considered failed, which increments the insertion fail KPI.|
|ccsFlapHitsNum||CMTS sends request every 10 secs, a successful response within 25ms from CM increases flap hits by one.|
|ccsFlapMissesNum||If the response is completely missing or takes more than 25ms, flap misses increases by one.|
|ccsFlapPowerAdjustmentNum||If upstream power is adjusted more than X dB (default: 1 dB, but they say it often should be more like 6 dB), this KPI is increased.|
|ccsFlapCreateTime||Time when this modem was added to the flap list. After max age (default: 7 days) they get removed again.|
If any of the main KPIs (insertion fail, hit/miss, power adjustment) is significantly higher than the others, this is a very important signal.
If hit/miss is highest → the modem keeps going up and down.
If power adjustment highest → "improper transmit power level setting at the modem end" (personal story: I used to have that with M-Net. Internet kept going down completely every few days. They reduced power level (which capped our downstream MBit at a lower level..), and everything was fine)
The point is that these KPIs tell a lot.
Below is an illustration of misses increasing live on some cable modem. The
snmpget was executed in the course of an hour.
snmpget -v2c -c 'REDACTED' IP iso.184.108.40.206.220.127.116.11.18.104.22.168.22.214.171.124.18.124.193 iso.126.96.36.199.188.8.131.52.184.108.40.206.220.127.116.11.18.124.193 = Gauge32: 1580 snmpget -v2c -c 'REDACTED' IP iso.18.104.22.168.22.214.171.124.126.96.36.199.188.8.131.52.18.124.193 iso.184.108.40.206.220.127.116.11.18.104.22.168.22.214.171.124.18.124.193 = Gauge32: 1634 snmpget -v2c -c 'REDACTED' IP iso.126.96.36.199.188.8.131.52.184.108.40.206.220.127.116.11.18.124.193 iso.18.104.22.168.22.214.171.124.126.96.36.199.188.8.131.52.18.124.193 = Gauge32: 1650