Troubleshooting WiFi Connectivity Issues (Vendor Neutral)

Connectivity Issues (Considering the fact that the RF environment is clean enough for wireless Clients to communicate i.e. without interference and high noise levels) are difficult to troubleshoot in a wireless environment. There are many parameters to look in since the medium we use for communication is air which is widely open to all. A connectivity issue based on a client/station perspective would be as follows:

  • The station is associated, but not authenticated.
  • The station is associated and authenticated but doesn’t get an IP address.
  • The station is associated, authenticated and has an IP, but is not able to pass traffic.
  • The station is associated, authenticated but gets an incorrect IP post authentication.
  • The Station gets disconnected intermittently and loses IP address.
  • The Station gets disconnected intermittently but doesn’t lose IP address.
  • The station is not associating at all.

The above mentioned symptoms are quite self-explanatory. It is understood that a connectivity issue in wireless mainly doesn’t lead to troubleshooting only wireless. Some amount of troubleshooting needs to be done on the wired end as well.

Common Troubleshooting steps (for all of the above scenarios):

  1. Confirm if the APs are stable. They should not keep rebooting or the radios should not bootstrap frequently.
  2. Confirm if the issue is periodic or happens at any specific time of day.
  3. Check if the issue is specific to any particular type of clients.
  4. Confirm if the issue persists on a specific location.
  5. Make sure the station drivers are up to date and latest.
  6. Confirm if the issue started happening recently or it had been happening from the day since the wireless was deployed.
  7. Is this happening for a specific SSID?
  8. If the issue started happening recently, please check for changes done on the wireless or wired network.

If the issue is not resolved or identified after verifying the above steps, we may need to take a closer look on the wireless as well as wired environment. Some details regarding client/station behaviour:

  1. The station is associated, but not authenticated: There are 2 important things that needs to be looked when we are troubleshooting any wireless LAN related issues. First is Association and other one is Authentication:
    1. Association: Make sure a wireless station sends an association request frame to the Access Point for which a response is sent back. Association is like plugging an Ethernet port on LAN. If this is not successful, everything else will fail.
    2. Authentication: Wireless Authentication is an important part where we need to focus as per the symptom. We need to first confirm the type of authentication used for the wireless network and check what can possibly cause the station for it not authenticate. There are two types of authentication. One is open and other one is protected. More information on authentication types is as per Technology Documents.
      1. Open Authentication: If the authentication is of open type, then there is nothing much that needs to be done. The client as soon as associates sends an open authentication request which should be responded back with open authentication response. In most of the cases for open authentication, a client may look to be not authenticated; however it in fact it has not received any IP address. So it should be confirmed that there are no issues on the DHCP server.
      2. Protected Authentication (WPA/WPA2 PSK): Make sure on the station we are entering the correct passphrase as configured in the AP. If not, the client will not be able to authenticate and will fail to create keys required to encrypt unicast and multicast/broadcast data.
      3. Server Authentication (Radius/LDAP/AD): When using 802.1x authentication, confirm the below:
        1. Check 802.1x authentication with Radius or LDAP.
        2. If using Radius server, confirm that the AP/Controller is configured as client on the Radius server.
        3. Confirm if the Radius Server has certificate to perform 802.1x authentication.
        4. Confirm the Inner Tunnel protocols i.e. MSChap, MSChapv2, CHAP, etc are configured as per the requirements and should be identical between the AP/Controller and Radius server.
        5. If using LDAP, confirm that the PAP is configured on AP/Controller. If using MSChapv2 for LDAP confirm if relevant Certificates are uploaded on the LDAP server.
        6. Confirm the port of authentication is set to 1812 in case of Radius. If changed, make the necessary changes in AP/Controller and Radius server.
        7. Confirm the port of authentication is set to 389 for LDAP clear-text Authentication and 636 for secure LDAP authentication.
        8. Confirm the reachability (by performing ping) to either of the servers.
        9. Confirm the Pre shared key in case of Radius server is the same on AP/Controller and Radius server for their communication.
        10. Confirm the Source IP from where the RADIUS is initiated is configured as Radius Client on the Radius Server.
        11. Confirm Radius Service is running on the Radius Server.
        12. Check logs on the Radius server and check the Reason in case of Radius Reject or Radius Timeout
  2. The station is associated and authenticated, but doesn’t get an IP: Association and authentication happens at Layer 2 (OSI Reference model) in wireless LAN. As soon as the station is associated and authenticated, the Layer 3 communication for that particular station should kick in. For the same, it needs an IP address to communicate with other devices in the network. The troubleshooting scope for this type of behavior in wireless LAN is minimal. The next thing to check here would be the AP/Controller’s interface (webUI or SSH/telnet). Below things needs to be confirmed or checked in such instance:
    1. Is the Controller/AP leasing DHCP addresses?
    2. If yes, confirm if the DHCP service is running on the AP/Controller.
    3. Confirm if the subnet has enough leases to share to the wireless Stations.
    4. Enable logging on the AP/Controller and verify the DORA process.
    5. Collect necessary wireless or AP/Controller packet captures (using wireshark or any other similar tools) to understand where exactly is the breakage in DORA process.
    6. If the DHCP is not on the AP/Controller, then verify if the DHCP server is reachable from AP/Controller.
    7. Check the service on the DHCP server and confirm if it is running.
    8. Confirm if there are enough IP addresses configured in the DHCP Server to lease out to WLAN Stations.
    9. DORA usually is a Broadcast communication. Confirm if there is IP helper or Relay Agent configured on the AP/Controller or any other Layer 3 devices in the network.
    10. Collect captures on the Server end and understand the DORA communication to check where exactly the breakage is.
  3. The Station is associated, authenticated and also has an IP address; but is not able to pass traffic: As soon as the client receives IP address it is ready to communicate. We need to check the below in order to understand why the client is not able to communicate:
    1. Check first if the client has received a correct IP address from the correct subnet and subnet mask.
    2. If the IP address is incorrect, please follow the same steps as mentioned in “The station is associated and authenticated, but doesn’t get an IP”.
    3. If the IP address is correct, then confirm if the client is able to reach the Default Gateway it has received.
    4. If the Default Gateway is reachable, then check if the Station has received a valid DNS server. If yes, check if the client can do nslookup to different websites.
    5. If the client cannot do nslookup, possibly the Query is not reaching the DNS server or the DNS server doesn’t have the information to provide DNS response back to the station. Normally in these scenarios always use google.com to query as this is a widely known website and every DNS Server should have an entry for the same.
    6. Check if AP/Controller itself is able to reach the DNS server. If not, the Wireless Client as well be not able to reach the DNS server.
    7. Try configuring the client with static DNS IP as global DNS (4.2.2.2 or 8.8.8.8).
    8. If Default Gateway is on the controller, confirm the controller itself is able to reach internet. Again, if the AP/Controller is not able to reach the internet, the wireless Clients will also not be able to reach internet (in case AP/Controller is the default gateway).
    9. If the Default Gateway is Core switch or router, then confirm if that particular switch/router is able to access internet with that particular subnet.
    10. Confirm if the issue is specific to any particular subnet.
    11. Check the Access Rules for wireless Client in the AP/Controller. Confirm if they have relevant access to communicate.
    12. If the AP/Controller maintains user sessions, check if the sessions are being created and if correct communication is happening.
    13. Collect wireshark packet captures to understand the flow and check where the breakage is. Packet captures may be taken on different hops in the network to narrow down.
  4. Station is associated, authenticated but gets an incorrect IP address post authentication: Association and authentication happens at Layer 2 (OSI Reference model) in wireless LAN. As soon as the station is associated and authenticated, the Layer 3 communication for that particular station should kick in. For the same, it needs an IP address to communicate with other devices in the network. The troubleshooting scope for this type of behavior in wireless LAN is minimal. The next thing to check here would be the AP/Controller’s interface (webUI or SSH/telnet). Below things needs to be confirmed or checked in such instance:
    1. Is the Controller/AP leasing DHCP addresses?
    2. Enable logging on the AP/Controller and verify the DORA process.
    3. Collect necessary wireless or AP/Controller packet captures (using wireshark or any other similar tools) to understand where exactly is the breakage in DORA process.
    4. Collect captures on the Server end and understand the DORA communication to check where exactly the breakage is.
    5. If using specific DHCP server, check if the configuration on Layer3 device/Controller OR AP have correct IP helper/DHCP relay.
    6. Check for rogue DHCP servers in the network.
    7. Verify if the client has any static IP configured.
  5. The Station gets disconnected intermittently and loses IP address: This particular situation basically represents that the client or the Access Point sends a De-authentication message to disconnect from the wireless network. Basically, if a working client loses its IP address and get reconnected (or needs to be reconnected) basically means that the client is no more connected to the wireless network. Below are the different possibilities and few Troubleshooting steps we can follow to understand why this behavior would kick in:
    1. There are two scenarios here which can cause disconnection of the client and possible reason for client losing the IP address. The client or AP/Controller may send De-Authentication frame in order to disconnect from a network.
    2. The client would send a deauth frame in case the client wireless NIC card drivers are old, or if it doesn’t support one of the required (enabled with SSID; would mostly happen during roaming where the SSIDs are not configured uniformly) functionality like 802.11r, 802.11k, 802.11n, 802.11ac, etc.
    3. Most of the wireless Vendors will have debug logging functionality enabled for wireless Station disconnections. These debug logs will also show the reason for disconnect. The same would be needed to analyze this further to understand if the Station or AP/Controller has sent and de-auth frames.
    4. The Wireless Vendors would also show us the management frame exchange between wireless Station and AP/Controller where also we can learn which device is sending deauth request and why.
    5. Collect Over the Air Captures and check the negotiations between a wireless Client and AP/Controller.
    6. Check the configuration on the AP/Controller to confirm if there is any deauth timer configured. This would keep sending wireless station deauth message to the client as soon as it tries to join the network.
  6. The station gets disconnected intermittently, but doesn’t lose IP address: In a very rare scenario a client may be configured using a static IP address, which could be a part of an overlapping subnet. Due to this the client may get intermittent internet access, but it may fail getting ARP response most of the time. It should be taken care that all the WLAN clients receive IP address dynamically only. If static IP address configuration is not the case, then this can become a classic Wired LAN not working scenario. Below needs to be checked:
    1. Check if the ARP entry on the client is same always. If the gateway MAC keeps changing there could be an issue with the upstream devices in the network.
    2. Verify if the issue is specific to a VLAN which is used by WLAN clients.
    3. Confirm if wired device plugged in the same VLAN as wireless is able to access internet without any hiccups. 
    4. Check if another wired device can be connected in the same VLAN and same switch where WLAN APs are connected OR check by connecting to WLAN controller directly.
  7. The Station is not associating at all: As we know wireless LAN works in Air medium. In Wireless the only way a Wireless Station can become a part of network is by associating it to the SSID. Association is very much similar to a laptop/PC connecting to a wired Switch (on one of its ports) using an Ethernet Cable. Below are the few reasons as to why a Wireless Station may not associate at all:
    1. The Wireless Station may not be able to scan the SSID.
    2. In case the SSID is hidden, the same needs to be configured on the Wireless Station. We need to check if the name of the SSID (case sensitive) is configured correctly as per AP/Controller.
    3. In case the SSID is hidden, the security settings should be exactly the same that is configured in the AP/Controller.
    4. There is a possibility that the client may not support the Association parameters that are being broadcasted by “Beacon”.
    5. Due to Regulatory Conditions, the client may not be scanning for SSID in the same Channel where the AP/Controller is broadcasting the same.
    6. The client may send irrelevant Association Response with incorrect Parameters to the AP and hence may not associate.
    7. The AP/Controller may have been configured with a deny connection timer, which may cause the Wireless Station to not associate at all in that particular time of day or week.

2 thoughts on “Troubleshooting WiFi Connectivity Issues (Vendor Neutral)

  1. The client may send irrelevant Association Response with incorrect Parameters to the AP and hence may not associate.

    Please check this point, its incorrect.

    Like

    1. Thanks for reading and giving feedback. However, in this case, I was referring to the client sending parameters like un-supported basic rates to the AP and hence may not associate to the AP which is possible.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: