Friday, 21 November 2014

Troubleshooting PPPoE Client on Cisco Routers

There are are essentially a handful of stages involved in bringing up a PPPoE client session on a Cisco router, each of which could fail for a distinct set of reasons. This guide takes a walk through the entire process, step by step, highlighting the most common causes of problems at each stage.

Routing


Even though PPP itself is peer to peer, PPPoE is inherently client-server. That means that the connection has to be originated by the client and, in most cases, the client will only do that when it has some traffic to send over PPPoE. Therefore, the router must know the dialer as its next hop interface for some destination, i.e. it must have a route. It sounds trivial but it's surprising how often people go to all the trouble of putting in a perfectly good PPPoE config, then forget to put a default route in for the traffic!

More broadly, though, there are other things that could stop a route from being installed. For example, if you configure a dialer as a backup interface to another interface then there are some gotchas. Shutting down the primary will usually not enable its backup, also you still need a (static) route pointing traffic towards the dialer in order to make it dial - a step which is often forgotten. It's usually best to remove the backup interface configuration while testing the dialer, then re-apply it when that has been proven to work.

Dialer


Once traffic hits the dialer interface, the router will only attempt to bring up a PPPoE session if its dialer becomes activated. If a route exists but the dialer is not trying to connect then enable dialer debugging as follows:

Client#debug dialer packet
Dial on demand packets debugging is on
Client#debug dialer event
Dial on demand events debugging is on

Now generate some traffic that should bring up the link, for example by sending a ping:

Client#ping 8.8.8.8
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
*Nov  5 21:44:34.147: Di0 DDR: ip (s=10.10.10.10, d=8.8.8.8), 100 bytes, outgoing interesting (ip PERMIT)
*Nov  5 21:44:34.151: Di0 DDR: Cannot place call, no dialer string set.
*Nov  5 21:44:41.491: %DIALER-6-BIND: Interface Vi2 bound to profile Di0
*Nov  5 21:44:41.503: %LINK-3-UPDOWN: Interface Virtual-Access2, changed state to up
*Nov  5 21:44:41.503: Vi2 DDR: Dialer statechange to up
Client#
*Nov  5 21:44:42.195: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access2, changed state to up
Client#

*Nov  5 21:44:42.371: Vi2 DDR: dialer protocol up

When the "dialer protocol up" message is received, the dialer has been activated correctly and troubleshooting should move on to the PPPoE stage.

Common problems:

  • If no debug messages are generated at all:
    • Verify that traffic is being routed towards the dialer interface
    • Verify that the dialer-list is configured correctly and referenced by the dialer
    • Verify that the dialer's encapsulation is configured to ppp
  • Di0 DDR: Cannot place call, no dialer string set. 
    • Appears to be spurious, no dialer string is required for PPPoE and it dials anyway
  • Di0 DDR: ip (s=10.10.10.10, d=8.8.8.8), 100 bytes, outgoing uninteresting (no dialer-group defined).
    • Exactly what it says - dialer is not associated with a dialer list. Ensure a dialer-list is configured and is referred to by the dialer with the "dialer-group x" command.
  • Di0 DDR: ip (s=10.10.10.10, d=8.8.8.8), 100 bytes, outgoing uninteresting (dialer-list 2 not defined).
    • Again, exactly what it says. The dialer is associated with a non-existent dialer-list. Either re-point the dialer using the "dialer-group x" command or create a new dialer-list with the appropriate number.
  • Di0 DDR: ip (s=10.10.10.10, d=8.8.8.8), 100 bytes, outgoing interesting (ip PERMIT)
    • Repeated interesting traffic lines but no dialling can occur if the dialer references an empty dialer pool - ensure your PPPoE interface is configured with both "pppoe enable" and "pppoe-client dial-pool-number x". Also ensure that the interface is admin up.
    • Could also be due to PPPoE discovery phase failing, see below.

PPPoE Discovery


In order to bring up a PPP session over Ethernet, a PPPoE session must be set up to create a point-to-point connection over a broadcast Ethernet network. This is established using PPPoE Auto-Discovery, where the PPPoE client (our router) searches for a PPPoE access concentrator which is willing to terminate its connection. This phase should operate as follows:

PPPoE Discovery Phase
The client sends a PPPoE Auto Discovery Initiate (PADI) frame, asking any available access concentrators to make themselves known. The access concentrator(s) then respond(s) with a PADO (offer) frame to indicate its availability. The client then sends a PADR (request) frame to its chosen access concentrator which, all being well, will respond with a PADS (session) message to indicate that the PPPoE session is now up. At any time either device may issue a PADT to close the PPPoE session.

To see what is happening at this stage, run the following command:

Client#debug pppoe packet

*Nov  5 20:38:52.107: pppoe_send_padi
contiguous pak, size 60
FF FF FF FF FF FF 00 01 02 03 04 05 88 63 11 09
00 00 00 10 01 01 00 00 01 03 00 08 2A 00 00 01
00 00 06 CD 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
*Nov  5 20:38:52.143: PPPoE 0: I PADO  R:0011.2233.4455 L:0001.0203.0405 Fa0/0
contiguous pak, size 66
00 01 02 03 04 05 00 11 22 33 44 55 88 63 11 07
00 00 00 2E 01 01 00 00 01 03 00 08 2A 00 00 01
00 00 06 CD 01 02 00 06 4C 61 62 2D 41 43 01 04
00 10 D5 60 38 B8 05 81 B6 69 29 1B 5E 82 77 A0
5E 91
*Nov  5 20:38:54.155: OUT PADR from PPPoE Session
contiguous pak, size 66
00 11 22 33 44 55 00 01 02 03 04 05 88 63 11 19
00 00 00 2E 01 03 00 08 2A 00 00 01 00 00 06 CD
01 02 00 06 4C 61 62 2D 41 43 01 04 00 10 D5 60
38 B8 05 81 B6 69 29 1B 5E 82 77 A0 5E 91 01 01
00 00
*Nov  5 20:38:54.355: PPPoE 14: I PADS  R:0011.2233.4455 L:0001.0203.0405 Fa0/0
contiguous pak, size 66
00 01 02 03 04 05 00 11 22 33 44 55 88 63 11 65
00 0E 00 2E 01 03 00 08 2A 00 00 01 00 00 06 CD
01 02 00 06 4C 61 62 2D 41 43 01 04 00 10 D5 60
38 B8 05 81 B6 69 29 1B 5E 82 77 A0 5E 91 01 01
00 00
*Nov  5 20:38:54.363: [0]PPPoE 0: O PADT  R:0000.0000.0000 L:0000.0000.0000 Fa0/0
contiguous pak, size 60
00 00 00 00 00 00 00 01 02 03 04 05 88 63 11 A7
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00

Client#


The mark of a successful PPPoE Discovery phase is that a PADS packet is received - at this point the PPPoE session is up and troubleshooting focus should shift to the PPP stage.

Common problems:
  • Only PADI seen:
    • Layer 1 or 2 issue between client and server
    • PPPoE traffic being filtered between client and server, max sessions per MAC exceeded
    • Sometimes possible even if interface is admin down!
  • PADT messages received after PADI or PADR:
    • Restrictions on Access Concentrator (e.g. max sessions per MAC exceeded)
  • PADT message received after PADS:
    • Generally a problem further up the stack, continue troubleshooting
Note - PPPoE discovery may occur without the dialer even being activated.

PPP Negotiation (LCP)


With the PPPoE session up, Link Control Protocol (LCP) will attempt to negotiate the parameters for the actual PPP session. These include useful parameters such as the MRU and authentication type, plus potentially many less applicable parameters such as callback, compression or PPP multilink.

The process is that each side will send proposals (CONFiguration REQuests or CONFREQs) to the other indicating its preferred settings. The opposite device can then respond in one of the following ways:

  • Send a CONFiguration ACKnowledgement (CONFACK) to indicate agreement to the proposal
  • Send a CONFiguration Negative AcKnowledgement (CONFNAK) to indicate a particular setting should be changed and providing a suggested alternative value. If multiple values need to change, there will be multiple CONFNAKs.
  • Send a CONFiguration REJect (CONFREJ) to indicate that either the type or the value of the parameter proposed is completely unacceptable.
Only once all the parameters are agreed between the two peers can the connection be established and higher layer protocols be negotiated. Bear in mind that PPP is inherently peer to peer, so both sides will play the roles of both requester and approver / rejecter.

To see what is happening at this stage, run the following command:

Client#debug ppp negotiation
PPP protocol negotiation debugging is on
*Nov  5 22:11:39.191: %DIALER-6-BIND: Interface Vi2 bound to profile Di0
*Nov  5 22:11:39.207: %LINK-3-UPDOWN: Interface Virtual-Access2, changed state to up
*Nov  5 22:11:39.211: Vi2 PPP: Sending cstate UP notification
*Nov  5 22:11:39.219: Vi2 PPP: Processing CstateUp message
*Nov  5 22:11:39.251: PPP: Alloc Context [670153F8]
*Nov  5 22:11:39.255: ppp22 PPP: Phase is ESTABLISHING
*Nov  5 22:11:39.259: Vi2 PPP: Using dialer call direction
*Nov  5 22:11:39.259: Vi2 PPP: Treating connection as a callout
*Nov  5 22:11:39.263: Vi2 PPP: Session handle[F4000016] Session id[22]
*Nov  5 22:11:39.263: Vi2 LCP: Event[OPEN] State[Initial to Starting]
*Nov  5 22:11:39.267: Vi2 PPP: No remote authentication for call-out
*Nov  5 22:11:39.267: Vi2 LCP: O CONFREQ [Starting] id 1 len 10
*Nov  5 22:11:39.271: Vi2 LCP:    MagicNumber 0x03BD43E3 (0x050603BD43E3)
*Nov  5 22:11:39.271: Vi2 LCP: Event[UP] State[Starting to REQsent]
*Nov  5 22:11:39.323: Vi2 LCP: I CONFREQ [REQsent] id 1 len 19
*Nov  5 22:11:39.323: Vi2 LCP:    MRU 1492 (0x010405D4)
*Nov  5 22:11:39.327: Vi2 LCP:    AuthProto CHAP (0x0305C22305)
*Nov  5 22:11:39.327: Vi2 LCP:    MagicNumber 0x02C2DFB3 (0x050602C2DFB3)
*Nov  5 22:11:39.331: Vi2 LCP: O CONFNAK [REQsent] id 1 len 8
*Nov  5 22:11:39.331: Vi2 LCP:    MRU 1500 (0x010405DC)
*Nov  5 22:11:39.335: Vi2 LCP: Event[Receive ConfReq-] State[REQsent to REQsent]
*Nov  5 22:11:39.395: Vi2 LCP: I CONFACK [REQsent] id 1 len 10
*Nov  5 22:11:39.395: Vi2 LCP:    MagicNumber 0x03BD43E3 (0x050603BD43E3)
*Nov  5 22:11:39.399: Vi2 LCP: Event[Receive ConfAck] State[REQsent to ACKrcvd]
*Nov  5 22:11:39.439: Vi2 LCP: I CONFREQ [ACKrcvd] id 2 len 19
*Nov  5 22:11:39.439: Vi2 LCP:    MRU 1500 (0x010405DC)
*Nov  5 22:11:39.443: Vi2 LCP:    AuthProto CHAP (0x0305C22305)
*Nov  5 22:11:39.443: Vi2 LCP:    MagicNumber 0x02C2DFB3 (0x050602C2DFB3)
*Nov  5 22:11:39.447: Vi2 LCP: O CONFACK [ACKrcvd] id 2 len 19
*Nov  5 22:11:39.447: Vi2 LCP:    MRU 1500 (0x010405DC)
*Nov  5 22:11:39.451: Vi2 LCP:    AuthProto CHAP (0x0305C22305)
*Nov  5 22:11:39.451: Vi2 LCP:    MagicNumber 0x02C2DFB3 (0x050602C2DFB3)
*Nov  5 22:11:39.455: Vi2 LCP: Event[Receive ConfReq+] State[ACKrcvd to Open]
*Nov  5 22:11:39.467: Vi2 PPP: Phase is AUTHENTICATING, by the peer
*Nov  5 22:11:39.467: Vi2 LCP: State is Open
Client#

To explain the above transaction, I have highlighted the two conversations in different colours. "O" indicates an outbound frame, "I" an inbound frame.

The blue conversation is what we (the client) are proposing to the access concentrator. The client sends an essentially empty proposal with only a magic number (used for loop detection). The access concentrator responds with an acknowledgement, after all there's nothing to argue about!

The red conversation (where the access concentrator is proposing settings) is slightly more interesting. The first proposal contains a proposed maximum receive unit (MRU) size of 1492 and a proposal to use CHAP authentication. In the next frame our client sends a NAK message to indicate it would prefer the access concentrator used an MRU of 1500. Following that, the access concentrator sends a new proposal with an MRU of 1500 and CHAP authentication, which our client then acknowledges.

Now that both sides are in agreement, the state changes to "Open", which is PPP talk for "up".

Common Problems:

  • MRU mismatch - many access concentrators are strictly RFC 2516 compliant and allow a maximum MRU of 1492. This is because 1492 bytes IP + 6 bytes PPPoE + 2 bytes PPP is the largest that can fit inside a standard 1500 byte Ethernet payload. It may be necessary to tweak the MTU on the Ethernet interface using the "pppoe-client ppp-max-payload xxxx" command.
  • Authentication type mismatch - if one peer is set for CHAP only while the other is set to PAP only or no authentication, they are not going to talk. A common mistake is forgetting the authentication callin option, which means that the client asks the server to authenticate itself - this is almost invariably not.
General Note:

If you examine the debug output, it will be clear what the local device is saying (marked with "O" for output) and what the other end is saying (marked with "I" for input). Whichever options are being rejected (CONFREJ'd) will be at the root of the problem - just work out which end is rejecting what and the rest should fall into place.

PPP Authentication


PPP has the ability to authenticate either, both or neither of the peers. In a typical deployment, the access concentrator will require the client to authenticate, but will refuse to authenticate itself to the client. This is typically done using CHAP as in the example below (output from "debug ppp negotiation"):

*Nov  5 22:11:39.259: Vi2 PPP: Treating connection as a callout
*Nov  5 22:11:39.263: Vi2 PPP: Session handle[F4000016] Session id[22]
*Nov  5 22:11:39.263: Vi2 LCP: Event[OPEN] State[Initial to Starting]
*Nov  5 22:11:39.267: Vi2 PPP: No remote authentication for call-out
-- SNIP --
*Nov  5 22:11:39.467: Vi2 PPP: Phase is AUTHENTICATING, by the peer
*Nov  5 22:11:39.527: Vi2 CHAP: I CHALLENGE id 1 len 27 from "Lab-AC"
*Nov  5 22:11:39.543: Vi2 CHAP: Using hostname from interface CHAP
*Nov  5 22:11:39.543: Vi2 CHAP: Using password from interface CHAP
*Nov  5 22:11:39.543: Vi2 CHAP: O RESPONSE id 1 len 30 from "pppoeuser"
*Nov  5 22:11:39.835: Vi2 CHAP: I SUCCESS id 1 len 4
*Nov  5 22:11:39.839: Vi2 PPP: Phase is FORWARDING, Attempting Forward

The output clearly shows that this connection is considered to be a "callout", i.e. we are the initiating party. Next, the debug informs us that we do not require the remote party to authenticate.

Following that, the peer (the access concentrator) asks us to authenticate. It sends us a "CHALLENGE", we send a "RESPONSE", then it sends us a "SUCCESS" message, indicating that our credentials were accepted.

Common problems:


  • It is worth noting the lines which state we are using the hostname and password from interface CHAP. This means that the hostname (in practice essentially a username) and password are configured under the dialer interface with the "ppp chap hostname xxx" and "ppp chap password xxxx". If these are not specified, the router will use its actual hostname and the password will be taken from the local user database, under a user named after the the peer's hostname. Usually that's not what you want.
  • A response of "CHAP: I FAILURE id 1 len 25 msg is "Authentication failed"means exactly what it looks like it means. Check both the username and password are configured correctly.
  • A response of "CHAP: Unable to authenticate for peer" indicates that the device does not  know what password to use to authenticate with the peer. This can be because a "ppp chap hostname" is configured but a "ppp chap password" is not, there is not even a "ppp chap hostname" configured or in the case where local usernames are being used it means there's no local username which matches the AC's hostname.

IPCP


Once the PPP session has been brought up (negotiated through LCP), the next stage is to negotiate each of the protocols that will run through the PPP tunnel. Normally this is just IPv4, which is negotiated using IPCP, but there are also IPv6CP, CDPCP and so on, collectively known as Network Control Protocols or NCPs. Below is some example debug (output from "debug ppp negotiation"), with the AC's IP negotiation in blue and the client's IP negotiation in red:

*Nov  5 22:11:39.867: Vi2 PPP: Queue IPCP code[1] id[1]
*Nov  5 22:11:39.883: Vi2 PPP: Phase is ESTABLISHING, Finish LCP
*Nov  5 22:11:39.887: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access2, changed state to up
*Nov  5 22:11:39.899: Vi2 PPP: Phase is UP
*Nov  5 22:11:39.903: Vi2 IPCP: Protocol configured, start CP. state[Initial]
*Nov  5 22:11:39.903: Vi2 IPCP: Event[OPEN] State[Initial to Starting]
*Nov  5 22:11:39.907: Vi2 IPCP: O CONFREQ [Starting] id 1 len 10
*Nov  5 22:11:39.907: Vi2 IPCP:    Address 0.0.0.0 (0x030600000000)
*Nov  5 22:11:39.911: Vi2 IPCP: Event[UP] State[Starting to REQsent]
*Nov  5 22:11:39.915: Vi2 PPP: Process pending ncp packets
*Nov  5 22:11:39.915: Vi2 IPCP: Redirect packet to Vi2
*Nov  5 22:11:39.915: Vi2 IPCP: I CONFREQ [REQsent] id 1 len 10
*Nov  5 22:11:39.919: Vi2 IPCP:    Address 1.1.1.1 (0x030601010101)
*Nov  5 22:11:39.923: Vi2 IPCP: O CONFACK [REQsent] id 1 len 10
*Nov  5 22:11:39.923: Vi2 IPCP:    Address 1.1.1.1 (0x030601010101)
*Nov  5 22:11:39.927: Vi2 IPCP: Event[Receive ConfReq+] State[REQsent to ACKsent]
*Nov  5 22:11:39.987: Vi2 IPCP: I CONFNAK [ACKsent] id 1 len 10
*Nov  5 22:11:39.987: Vi2 IPCP:    Address 172.16.0.6 (0x0306AC100006)
*Nov  5 22:11:39.991: Vi2 IPCP: O CONFREQ [ACKsent] id 2 len 10
*Nov  5 22:11:39.991: Vi2 IPCP:    Address 172.16.0.6 (0x0306AC100006)
*Nov  5 22:11:39.995: Vi2 IPCP: Event[Receive ConfNak/Rej] State[ACKsent to ACKsent]
*Nov  5 22:11:40.055: Vi2 IPCP: I CONFACK [ACKsent] id 2 len 10
*Nov  5 22:11:40.059: Vi2 IPCP:    Address 172.16.0.6 (0x0306AC100006)
*Nov  5 22:11:40.059: Vi2 IPCP: Event[Receive ConfAck] State[ACKsent to Open]
*Nov  5 22:11:40.071: Vi2 IPCP: State is Open
*Nov  5 22:11:40.075: Di0 IPCP: Install negotiated IP interface address 172.16.0.6
*Nov  5 22:11:40.131: Di0 Added to neighbor route AVL tree: topoid 0, address 1.1.1.1
*Nov  5 22:11:40.135: Di0 IPCP: Install route to 1.1.1.1

Common Problems:


  • If no IPCP appears at all, it could be that both ends have "ppp ncp passive" set.
  • If you see a message similar to "O PROTREJ [Open] id 2 len 16 protocol IPCP (0x0101000C030601010101)" this is because IP is not configured on the local dialer interface. Usually you just want to add "ip address negotiated" under the dialer to fix this.

At this point your session is up and you should be able to pass traffic OK. If you're still looking, it may help to read my blog post on the theory behind bringing up a PPPoE session.

References


http://tools.ietf.org/html/rfc2516 (PPPoE)
http://tools.ietf.org/html/rfc4638 (PPPoE large MRU)
http://tools.ietf.org/html/rfc1661 (PPP)


2 comments:

  1. Very helpful and well explained.

    ReplyDelete
  2. You save my life for useful info. The best troubleshooting article for PPPOE I have been reading.

    ReplyDelete