Smokeping and Policy Routing
I happen to have two home Internet connections, Comcast and AT&T. I’ve found this can be quite handy when everyone in the family is streaming or gaming or if one of them is experiencing problems of one type or another. I use pfSense as my main router and firewall. It does a pretty good job of being able to load share across each connection so that they are both used at the same time and then do automatic failover when it notices a problem with one or the other. The way it “monitors” the connection is to ping some outside host and if it notices a high amount of loss or latency, remove that connection from path. When loss or latency improves it will add the path back in. Pretty handy.
I used to work for a service provider that would actively manage traffic to multiple upstreams. To do this there were two hosts in each PoP that would constantly perform pings, traceroutes, and other tests to destinations on the Internet to find optimal paths. I use those hosts as my targets in pfSense as canaries to monitor each of my upstream connections. Generally this works pretty well. However sometimes there are false positives as the problem is not with my direct connection but something further along the path or with the hosts themselves. Troubleshooting these issues can be difficult having only the single viewpoint from my connections at home.
I have always maintained a virtual machine at some provider to host my website or to perform troubleshooting from. It is incredibly handy to “get a 2nd opinion” and have a view from the outside. Additionally running something like smokeping from this host provides invaluable data. I’ve had that host monitor my two internet connections for a while
That’s supremely helpful in showing how my host in AWS can reach the interfaces of both internet connections. It helps correlate when they’re having issues. Defining if its the local link or something larger wide spread. While adding data to help identify issues there’s still no good data about how each connection performs, generally. Smokeping supports the ability to have remote probers. That is, have a process probe from its point of view and then report back to the primary smokeping instance its results. Those results are then displayed along with everything else.
Initially I configured a single instance at home, on my local lan, to ping similar hosts that my primary instance was probing. That had good data but I thought it would be really great to probe the same destinations but seperate them out across both of my upstream connections. Generally there’s a few ways to accomplish this, either based on source address, destination address, or some attribute of the packet itself. Put a rule in an ACL to match one of those and force the packet along a particular path. Easy enough.
Since I’m firmly in the containerized app camp these days I wanted a setup such that each container (of my prober) could run on the same machine. Initial thought was to just have multiple interface IP’s on the host machine and use the built in iproute2 policy routing capabilities to sNAT traffic and then have a rule in the firewall to match the source and force out a specific internet connection. This turned out to be a little more difficult than I had planned. Mucking around with iptables, docker’s built in rules, and sNAT looked pretty complicated. Not impossible but not the quickest and easiest way to do things. I then realized that smokeping’s primary “ping” tool is fPing.
A quick look at the docs and we find a way to set the Type-of-Service(ToS)
-O, –tos=N
Set the typ of service flag ( TOS ). N can be either decimal or hexadecimal (0xh) format.
Excellent, now we just need to get smokeping to use that when it does the probes. Looking at the documentation for smokeping’s fping probe did not show a setting for it though. I was actually kind of surprised. The thoroughness of most of the probing modules is generally pretty good. So I went to look at the source code to see maybe if there was an undocumented setting or how the probe module was setup, maybe I could just add it myself.
After some scrolling I found this bit:
|
|
Excellent, it does support setting the tos bit, now to implement!
So now I know I can set the tos bit on the probes and I can just set a different value depending on which connection I want to forward traffic along. I just need to figure out what values I can use. I generally don’t use any ToS/DSCP settings on my lan, just not really needed. I decided to take a peak to see if anything was being set by any applications running on any devices on my lan. I ran tcpdump on my LAN interface of the firewall to see what was being used. Quite a bit actually.
|
|
Each time I saw a packet with ToS bit set I’d add it to the list of exceptions. After an hour or so I wasn’t seeing any more packets and figured I had a pretty good list of frequent ToS settings.
This is what I saw the most of:
Hex Value | Class Name | Decimal Value |
---|---|---|
0x48 | Low-Latency Data | 72 |
0x10 | Minimize-Delay | 16 |
0x0 | Normal Service | 0 |
0x80 | Real-Time Interactive | 128 |
0xb8 | Telephony | 184 |
0xc0 | Network Routing | 192 |
0x2 | Minimize Cost | 2 |
0x28 | High Throughput | 40 |
0x3 | High Throughput | 48 |
I decided to use the following for my fping probe packets:
Hex Value | Class Name | Decimal Value | DSCP Name | Provider |
---|---|---|---|---|
0x50 | Low-Latency Data | 80 | AF22 | Comcast |
0x58 | Low-Latency Data | 88 | AF23 | AT&T |
In the Slaves
config of the smokeping master I added the following config:
|
|
In the pfSense configuration I added a LAN rule to match on the DSCP Name of AF22 and AF23 and then manually force the default gateway to the matching upstream
Now we just fire up the probers and see what happens!
Cool!