Configuring Ubiquiti Unifi Gear for Starlink

This is the final blog post in a three part series describing my experience with SpaceX’s Starlink to date. Here are the links to the first and second blog posts.

My tweet showing 200 Mbps download speed while testing my dmark cable.

My tweet showing 200 Mbps download speed while testing my dmark cable.

After thawing out from the Starlink dish install with my brother-in-law, sending in the support ticket about the mount (that I still have not received a reply to), and investigating dinner options, I sat down on the couch with my laptop to figure out how I was going to leverage the Starlink uplink to meet my user experience requirements. The first step was to make sure that my dmark work was successful. I went downstairs to the gear closet, powered up the Starlink PoE injector and router and ran another speed test on my phone. 200 Mbps down! I’d say my home wiring was working just fine. I unplugged the ethernet from the Starlink router and plugged it into WAN2 on the Ubiquiti Unifi Secure Gateway (USG) PRO 4 and headed back upstairs. An important note: if you are not using the Starlink WIFI router, you can no longer see any information or stats about your Starlink connection. No more debug stats or other useful insight. I don’t know if this affects things like service upgrades or data collection for SpaceX. But, I was not interested in having a double NAT setup that might impact user experience.

Ubiquiti Unifi Controller ISP Load Screenshot. The large throughput usage spike is during an ESO raid (ESO is Elder Scrolls Online. A raid is a run with 12 people to beat a Trial).

Ubiquiti Unifi Controller ISP Load Screenshot. The large throughput usage spike is during an ESO raid (ESO is Elder Scrolls Online. A raid is a run with 12 people to beat a Trial).

If you are responsible for your household’s connectivity to the outside world like me, you know how any blip in service can result in a crisis. We’ve been getting by with 26 Mbps down and 6 Mbps up on point-to-point WIFI for most of the pandemic. However, when my home users are pushing the limits of my service and my Internet Service Provider (ISP) is loaded with users at the end of the day, gaming is almost impossible. I’ve pulled out my hotspot so I don’t lag out during a raid and cause a group wipe. I’ve been extremely impressed with Zoom working well in the worst of internet connectivity. I can’t say the same for Teams or Google Meet. There’s plenty of calls I’m on where I don’t see anyone’s video on those platforms. I assume this is how they deal with my poor connectivity.

This blog outlines my design that I’ve implemented to maximize my home’s user experience. As with any design, yours should align to the requirements you have. The other people in my household use the internet mainly for streaming Netflix, YouTube TV, Hulu, and Amazon Prime. They also have the occasional telehealth or fun virtual gathering. There’s definitely a lot of web browsing, social media, and messaging platforms use. My user profile is that of any home worker plus online gaming requiring real time inputs. I don’t stream due to many factors, but mostly because of my internet speeds. I do record my screen and audio locally to review the runs, and I log the raids which increases the work my client does, but it doesn’t seem to be a major factor when considering the connectivity requirements. Maybe one day with Starlink I will be able to stream!

My rural ISP and Starlink have the same latency, 40-60 ms. This is completely satisfactory for online gaming unless you’re a professional esports athlete, then I wouldn’t recommend it. I’d probably sit in the data center with a cross-connect to the cage where the game is hosted if that was my job! However, the biggest issue with Starlink right now is that I only have about 73% coverage. And I experience a 30 second disconnect every 15-20 minutes during the day. How do I know I have 73% coverage? There are a number of sites with Starlink maps and other Open Source projects. Satellite Map Space and Starlink Coverage in GitHub are the two I have seen.

Clearly, 30 second disconnects multiple times an hour does not work for online meetings and gaming. Therefore, Starlink cannot be my internet connection for those activities. But I really want to have the much faster upload and download speeds for everything else! Based on the rest of the household’s user profile, the majority of their activities can handle those disconnects. Plus, if all of that traffic is removed from the rural ISP connection, it frees up bandwidth and lowers latency. My plan was to route all traffic except for meetings, virtual gatherings, telehealth, and online gaming over Starlink.

Screenshot from the Unifi Secure Gateway Details page. The IP Address has been removed from the image.

Screenshot from the Unifi Secure Gateway Details page. The IP Address has been removed from the image.

Enabling Starlink on WAN2 on the Unifi USG PRO 4 was easy. Navigate in the UniFi Controller to Settings -> Create New Network. Label the link for WAN2, enable Failover only, accept the rest of the defaults, and Save. After that, I was able to see that WAN2 received a Starlink IP address and traffic was moving over the link. Things were moving quickly and easily. Red flag! The pain started as I was clicking around the Unifi Controller GUI to figure out how to route my different subnets over WAN1 or WAN2 based on my preference aligned to the user experience requirements. This is called Policy Based Routing (PBR). After Googling I found that configuring PBR is not an option in the GUI and it must be configured via a JSON file stored on the controller. I also noted the irony that I only learn about and configure Policy Based Routing on Sundays from the couch. The last time was for a VMworld demo!

Most of my Googling resulted in reading about people’s disappointment that PBR was not in the GUI. The complaints on the Ubiquiti forums began years ago. Rather than including the capability in the GUI, Ubiquiti wrote a Support and Help article on how to configure PRB through the Unifi Secure Gateway interface and JSON.

After reading the above article and the related article about USG Advanced Configuration with JSON multiple times, Googling more to see if people posted their JSONs for PBR, and sighing loudly, I went to work. I have multiple subnets for various purposes in my house. But none of VLANs were for users I wanted to give uninterrupted internet access to (since this is usually a given). The first step was to create a new network in the Unifi Controller GUI. I also created a WIFI SSID for the new VLAN to enable easier testing since I wasn’t near an ethernet port. The first test of my capabilities was to route the new VLAN over WAN2.

I am not a fan of static routes. They feel lazy (Like GOTO statements in BASIC. Clearly my elementary school coding experience was traumatic.) and should only be used as a last resort. Instead, I wanted to properly configure failover groups between my two uplinks and designate a primary route for my VLANs based on user requirements. I also wanted to use pointers to groups of VLANs rather than having to explicitly define subnets, but from what I could find, subnets must be explicitly defined in JSON Policy Based Routing. Working with what I was given, I opened an SSH session to my USG and started typing. This part was straight-forward. I scrolled to the “Routing Traffic to Different Load Balancing Groups Based on the Source Network” part of the Policy Based Routing help article, substituted in my subnet for the session state VLAN, and typed the commands into my USG. I connected my phone to the newly created WIFI SSID and ran a speed test. 110 Mbps down. Success!

If you don’t want to navigate to the Unifi help article at this time, here is a copy paste of the example code to run directly on your USG:

configure
set load-balance group wan2_failover interface eth2 failover-only
set load-balance group wan2_failover interface eth3
set firewall modify LOAD_BALANCE rule 2503 action modify
set firewall modify LOAD_BALANCE rule 2503 modify lb-group wan2_failover
set firewall modify LOAD_BALANCE rule 2503 source address 192.168.1.0/24
commit ; exit

Note: eth2 is WAN1 and eth3 is WAN2 on the USG PRO 4. Use the show interface command when connected via SSH in your USG to verify which ethernet interface are your WAN ports.

Second note: if you are curious why we couldn’t stop here, whenever the USG reboots, any configuration applied in the command line interface is erased. This is why we need the custom JSON file stored on the controller. The controller applies the configuration from the JSON upon a provisioning event.

According to the help article, the next step is dumping the USG config to a text file and using this text file to populate the JSON on the controller. This is where the fun really began. Here’s why:

1.     I force myself to use Linux without a GUI at home to improve my skills. Generally, doing anything requires learning something new on Linux first.
2.     JSON is not fun.
3.     Editing JSON in Nano is really not fun.

The Ubiquiti article mentioned above goes into detail on creating and using the config.gateway.json file. If you copy and paste the entire config dump on the USG into your config.gateway.json file, you can no longer control your Unifi setup through the controller. With my obvious Windows background and preference for a GUI, this was a no go for me. Therefore, I had to delete everything from the JSON file that the USG created except for the areas that the above commands generated for my Policy Based Routing. Also, I had to keep track of all the {s and }s and still retain the section headers for the JSON to work.

In an effort to avoid all of this editing, I Googled some more to find any JSON file posted that already accomplished what I wanted to do. Unfortunately, those examples used static routes rather than load balancing. The one example I found that did use load balancing wasn’t valid even though it was a valid JSON file. More on this later.

So, back to Nano over SSH. You can see my true procrastination personality here! After avoiding editing by Googling, then editing, and counting brackets, I was ready to apply my new JSON configuration that would accomplish (hopefully) what the six simple commands above did. Spoiler alert: It didn’t.

To apply the new JSON file, there are two steps.
1. Restart the unifi service on the controller machine. In Ubuntu the command is

sudo service unifi restart

 

Note: if this takes you as many tries as it does me, press the up arrow a few times and enter to restart your service.
2. Navigate into the Unifi Controller and force provision to the USG under Devices -> USG -> Config -> Manage Device -> Provision (or trigger this from the USG CLI).

If your Unifi Controller is pushing an invalid JSON file to your USG, you’ll notice your USG is stuck in the provisioning phase. The only way to stop this is to delete or rename your json file and the provisioning phase will complete successfully. Most times, the USG will reboot after this. The result of the USG reboot is the internet is unavailable, and it will elicit a loud huff from anyone trying to watch YouTube TV while you curse at your screen. There are many ways to ensure your JSON file is valid. Since I was using my high-tech Nano editor, I preferred a SaaS solution rather than wrestle with Linux to install a development tool. This site is recommended by Ubiquiti: https://jsonlint.com/

After the dreaded USG reboot and subsequent cursing, I pasted my json file into the SaaS verification tool. I was missing curly brackets. I added those in, hoped all my commas were right, restarted the service, and forced a provision. Still stuck in provisioning.

I reviewed the one JSON example for a similar load balancer setup that I found online. I edited my file to match that format (which the author said was verified by a JSON validator) and pasted my file into the JSON verification tool on the site above. It did return that the JSON was valid. When I force provisioned this version of the config file, the USG was still stuck in provisioning in the Unifi Controller. Then I started reading about USG error logs.

One person online was of the opinion that verbose / debug logging had to be enabled to troubleshoot JSON config issues. That sounded like a lot of work, so I kept Googling. Another person suggested connecting via SSH to the USG and run

tail -f /var/log/messages


That I was down for! Even though the USG is in a provisioning state, you are able to SSH to it and run commands if you weren’t already connected. My tail -f command flooded my terminal window with error messages that hurt my brain to decrypt. Googling these messages provided no insight. So, I stared at them longer. Finally, I figured out what was happening. Even though the JSON file was valid according to the JSON verification tool and copied from a forum post of someone who was quite confident in their JSON abilities, it was wrong. It needed another set of curly brackets because the USG was interpreting my load balancer rules as an entry under the firewall section! @#*($%&*(^%*(@#!!!

Back in Nano I counted curly brackets again, added in another set, moved around a comma or two, crossed my fingers, and copied and pasted it to the JSON verification tool. It told me it was valid. I restarted the service for the umpteenth time, force provisioned the USG, and threw up my arms with a victory cry after it provisioned successfully. I tested my download speeds on my phone that was connected to my special VLAN, sure enough, they were FAST. I tested the download speeds of my computer that was supposed to route through the rural ISP, sure enough, they were SLOW!

Feeling smug, I tweeted my success.

Then I thought, wait, I actually want all of my VLANs to route through WAN2 and fail to WAN1, and I want my special VLAN to route through WAN1 and fail to WAN2. You’re probably thinking, why not swap ports for the uplinks and call it a night? Well, I would, except my rural ISP authenticates by MAC address. If I swapped ports, I’d have no rural ISP connectivity because it sees a different MAC address. And it was 9 pm on Sunday - there isn’t anyone to answer the phone and make this change.

Therefore, I had to add more firewall rules to my oh so fragile JSON file. I didn’t want to create yet another network to test with, so I figured the guest network subnet could be the second one to move. Remember how I tested if I was my traffic was going out the correct uplink? Yes, a speed test. The difference between 8 Mbps and 150 Mbps is a pretty good test… or so I thought. Well, I forgot that I had setup a bandwidth restriction on the guest network subnet a month ago to prevent video uploads to YouTube from taking down my internet. I spent another 15 minutes of complete frustration and Googling to determine if Unifi Guest networks are compatible with Policy Based Routing. I couldn’t find information stating one way or another. I decided to go all in and added in a firewall rule for the subnet that all of my NVIDIA Shields are on. I braced myself for the fifth loud huff of the night as YouTube TV was interrupted. But, it never happened and that picture sure looked crisp running over Starlink!

It was only then that I remembered my bandwidth limit rule on the Guest network and went through the exercise of removing all references to it in the Unifi Controller GUI so I could delete it FOREVER! Once deleted, my speed test for the Guest network showed 80 Mbps. I guess that was fine with two NVIDIA Shields streaming and the Ubuntu updates downloading. Way better than the 6 Mbps download speed I enforced before. I made a few more changes to switch ports connected to RJ45 jacks in my office to use the special VLAN in preparation for work and leading a raid the next day and called it a night.

I know you’ve been waiting with bated breath to see my masterpiece of a JSON file so you can avoid everything you just read here. Ta da!

{
    "firewall": {
        "modify": {
            "LOAD_BALANCE": {
                "rule": {
                    "2606": {
                        "action": "modify",
                        "modify": {
                            "lb-group": "wan2_failover"
                        },
                        "source": {
                            "address": "10.10.0.0/16"
                        }
                    },
                    "2607": {
                        "action": "modify",
                        "modify": {
                            "lb-group": "wan2_failover"
                        },
                        "source": {
                            "address": "10.50.1.0/24"
                        }
                    }
                }
            }
        }
    },
    "load-balance": {
        "group": {
            "wan2_failover": {
                "flush-on-active": "disable",
                "interface": {
                    "eth2": {
                        "failover-only": "''"
                    },
                    "eth3": "''"
                },
                "lb-local": "enable",
                "lb-local-metric-change": "enable"
            }
        }
    }
}

Seems rather insignificant for all that work, huh? It could certainly be fancier to include rules around testing the connection and how many ping failures before initiating the failover. But, that requires working out some math considering how frequently the Starlink connection goes offline to prevent unnecessary failovers. With today’s and future SpaceX Starlink launches, my experience might change, so I just left it as it is.

Someone with far more patience than me might figure out IP address ranges and ports associated with Zoom, Team, Google Meet, Telehealth sites, Elder Scrolls Online, and everything else that needs a stable connection and have a much longer JSON than you see here which would forgo the need to manually change networks / SSIDs when I want to do something other than download and upload large files or stream content. But this works for me. And, I’m optimistic about Starlink improving their service quickly!

SpaceX Starlink Roof Installation on the Colorado Plains

This is part two of the SpaceX Starlink Blogs Series. Read part one here!

My brother-in-law texted me early afternoon to say he was coming over to do the install in an hour. I couldn’t hear the wind howling, the snow was melted from the roof, and the grass was swaying in a normal breeze. Everything was a go!

Most of what is necessary for the install is included in the Volcano Roof Mount kit – including a carrying bag to safely bring the dish onto the roof! The state of drills and drill bits in my house is questionable. Where do those batteries keep wandering off to?! I suggested my brother-in-law bring his own drill, drill bits, and stud finder. There was no need to purchase silicon for the roof as the kit comes with sticky rubber to place in the drilled-out holes on the roof and set beneath the mount before the lag bolts are fastened to the roof. The only other items I needed were a socket for the lag bolts, socket wrench, and 6-inch extension. There’s no shortage of these in the shop!

I collected the Starlink dish in its carrying case, Volcano Roof Mount install kit, and the POE injector and WIFI router, and carried it outside for the install. As I was setting up and reviewing the cabling the dmark box on the side of the house, I felt the wind pick up. Uh oh. My brother-in-law hadn’t arrived yet and mid-afternoon on the plains can turn into torture with cold, biting wind. I hoped for the best and hunted for the tools I needed to clean up the cabling at the dmark box and make room for the large Starlink ethernet cable.

Upon his arrival, I showed him the layout for the cabling and how we will run it down the roof and the side of the house. We discussed the dish positioning ad nauseum while he was standing on the ground and after he climbed onto the roof. I was probably being extra paranoid about obstructions based on my first impressions of the Starlink service. The dish needed clear visibility to the northwest, north, and northeast. However, we can see the Starlink satellites flying past the house to the west at night, so I didn’t want to block that view either. Once we selected the location for the dish, then it was time to find a stud to attach the Volcano Roof Mount. I have no idea what kind of stud finder works through roofing material, but it wasn’t the one he brought. In fact, I can’t think of a time a stud finder has confidently located a stud in a wall. I’ve always just knocked. And that’s what he did on the roof. He couldn’t hear the different between hollow versus solid because of the wind howling in his ears. Acoustics are interesting. I could hear it perfectly standing on the ground. After locating the studs on the roof with our high-tech methods, it was time to mark and drill.

My brother in law torquing down the lag bolts for the Volcano Roof Mount on the very windy Colorado plains.

My brother in law torquing down the lag bolts for the Volcano Roof Mount on the very windy Colorado plains.

At this point I had been standing in the shade, enduring the increasing wind for an hour. My role was mainly moral support and safety supervisor after the drilling began. Even wearing Carhartt work gloves and shoving my hands into the pockets of my Carhartt heavy duty jacket, my fingers were feeling numb and stinging from the cold at the same time. I regretted not putting on proper winter gear. But, it is impossible to do cabling with winter gloves on. Plus, I wasn’t the one on the roof trying to use power tools with uncooperative Starlink “sealing tape” without protection from the elements. I had to tough it out!

The first test through the Starlink WIFI Router after the dish was mounted on the roof.

The first test through the Starlink WIFI Router after the dish was mounted on the roof.

I was feeling concerned about his safety with the strong gusts while he was attaching the dish to the roof mount. He dropped the ethernet cable down to me and I quickly plugged it into the POE injector that was ready to go in the garage. A few minutes later I was connected to the internet and ran a speed test from my phone through the Starlink WIFI router. 130 Mbps down! Success!

Now we were working as fast as we could because the cold wind was miserable. He was attaching the cabling to the roof and down the side of the house with the Volcano Roof Mount kit cable ties. The screws for the cable ties were entirely too short, so he switched to zip ties to run it down the side of the house. In theory, zip ties were easy because he ran the Starlink ethernet cable along the existing ethernet cable from my other internet service provider. However, the zip ties in the package were brittle from their multiple outdoor projects and kept breaking when tightened. &%#(*&@#$^! I was cutting the opening of the dmark box a bit more to shove the Starlink RJ45 end into it without damaging the other cables inside. My fingers weren’t functioning, and the pain from the biting cold was almost unbearable. I had to stop a couple of times to run into the shop to run my hands under hot water to be able to use them again. Finally, the Starlink ethernet cable fit into the dmark box. I connected it to an existing Cat6 cable with an RJ45 coupler, verified none of the cables were pinched, screwed the cover back on, and sealed it with silicone. We used a few more zip ties to make sure the excess cable wasn’t flopping around, and he quickly got into his car, cranked the heat, and sped away. My sister had checked in twice, so it was clear he was expected home. I will clean up the cabling when it isn’t winter!

I put all the tools away, cleaned the silicone from my fingers as much as I could, and headed into the house feeling extremely proud of what we accomplished and very excited about heat and no wind. The first thing my partner says to me is, “What took you so long?”. I tried to explain, but it is clear from her reaction and my sister’s text and call, that this is not a partner approved Sunday afternoon project when it extends into dinner time.

Before I end this blog, I want to note that I submitted a support ticket to Starlink as soon as I thawed out. We were very deliberate every step of the way to secure the install because it has to withstand sustained 40+ mph wind weekly. The two buttons that “lock” the Starlink dish to the Volcano Roof Mount base is a weak point in the design. The other dish on my roof uses bolts and nuts to attach to the base and it has stayed in place for a year and a half. I don’t want my next blog to be about the $500 Starlink dish tumbleweed that has detached from its mount, damaged my house, and is flying across the property. Hopefully SpaceX Starlink will respond positively and can send me an updated design as soon as possible!

Read part three of this blog series for details on how I configured my Ubiquiti Unifi Secure Gateway Pro 4 to work around the connectivity issues of Starlink while maximizing user experience.

SpaceX Starlink First Impressions

Many people have asked about my experience with Starlink, so I am writing a blog! As someone who lives in a rural area and began the pandemic with 10 Mbps down and 3 Mbps up internet speeds, I’ve been counting the days until Starlink was available to me. Even though I submitted my information at each stage of the Starlink announcements and beta, I did not receive a notification that Starlink was available for me to order. I found out from a coworker instead. After reading his Slack message, my order was confirmed in 10 minutes. I also placed an order for the Volcano Roof Mount at the same time.

Three days after my order was placed, the Starlink dish kit shipped. It was supposed to arrive 2 days after that. FedEx has been delivering packages to my house for about 1.5 years now. Every once in a while, they cannot find my house, so I have to call and give them directions after I see my package was on the truck for delivery and returned to the warehouse that evening. Of course, this was one of those packages. After the additional 2 days for delivery due to a snowstorm and the new driver, I finally had the Starlink box in my possession.

Starlink POE Injector and WIFI Router in a plastic bin on my deck.

Starlink POE Injector and WIFI Router in a plastic bin on my deck.

I immediately shoveled the 4 inches of snow off my deck and setup the dish with the router outside. It was entirely too cold to crack a window or door to run the ethernet cable inside unless I wanted to pay $2000 for my propane bill that month. So, I used a large plastic storage bin to house the PoE injector and router and plugged the power into an outdoor outlet on the deck. They kept each other warm in their cozy plastic bin with the lid snapped shut!

 

First speed test of Starlink.

First speed test of Starlink.

The instructions said the dish needs to face north. That isn’t exactly true. The dish needs a clear line of site from slightly NW to NE. My deck is on the west side of my house, so the house was blocking line of site for the dish most of the time. But I was able to connect and do a speed test. Considering the conditions of the “install”, I was impressed to see 79 Mbps down, 18 Mbps up, and 42 ms latency. I left it online in this state after weighing it down with a duffel bag and backpack full of cans from the pantry. My partner and I noted we need to purchase sandbags.

 

I left it online overnight so we could stream Netflix since my current internet service was down. In the morning I peeked at the app and saw the stats for how much time the dish was connected with internet access and how long it was down “due to beta”, upgrades, and obstruction in the app. It was obstructed almost 50% of the time from the moment I brought it online the afternoon before. I was impressed we could watch Netflix almost uninterrupted from a WIFI router in a plastic bin outside 20 feet away with 50% downtime! Unfortunately, the wind was picking up, and I didn’t want a $500 tumbleweed. I packed it up and brought it inside that morning.

 Two days after the first test on the deck, my internet from service provider for the last 1.5 years was still down. The radio burned out on their dish. They don’t send out technicians on the weekend, so I was without internet for 4 days as, of course, this failure occurred late Thursday night. The technician was supposed to arrive Monday morning to repair the radio, but my roof was covered in snow and ice. We rescheduled to Tuesday afternoon. Tuesday was Global Leadership Forum at work, and I had at least 6 other meetings on my calendar. Spotty 4G hotspot on my phone wasn’t going to cut it for a full day of meetings. I hauled my plastic bin to the east side of the house and ran the cable for the dish as far north and slightly east from my front door as possible.

Starlink dish without obstructions, or so I thought!

Starlink dish without obstructions, or so I thought!

At this point, it was physically impossible for there to be an obstruction. But the app still reported an obstruction about 10% of the time. There must be a lot of space garbage blocking the signal. My theory is they use “obstruction” as a catch all for no connectivity. Interesting tidbit: in the stats in debug mode in the app during the “obstruction” the dish was still connected and showing stats for upload speeds and ping, but it showed no data for download speeds. This meant I couldn’t get through a Zoom meeting without a significant audio and video freeze or full disconnect. Starlink was fine for watching others present on Zoom, but it wasn’t ready for interactive Zoom meetings. It also wasn’t ready for online gaming with these short, frequent outages.

Luckily my internet technician showed up around the time I found out that Starlink was not ready for Zoom meetings, and he replaced the radio on my other dish. I was back in business with 26 Mbps down and 6 Mbps up on my point-to-point WIFI internet service. The wind was picking up again, and I didn’t want any animals chewing on the ethernet cable overnight, so I packed up the dish and bin until my roof mount arrived.

I checked my account on Starlink to see if the roof mount was scheduled to ship. There was no update in its status, so I emailed support through the app. I heard back from support about 24 hours later telling me my roof mount will ship in the next 7-14 business days. I only had to wait 5 days to receive the shipment notice and delivery was only delayed by 1 day due to snowstorm. This gave me 3 days to get my brother-in-law to agree to do all the hard work of the installation on the roof. I’m grateful he is the type of person who always shows up when you need him. He agreed to install the dish on the roof, and we waited until the weather was nice enough to work on the roof safely.

Check out Part 2 of this series to see how the roof installation went!

Home Networking. Part 3 - VeloCloud Architecture

Before I blog about my experience in configuring VeloCloud from Orchestrator to Edge, it is important to understand the architecture and how the VeloCloud SD-WAN platform functions. With this knowledge one can make the best decisions about how to configure their SD-WAN. SD-WAN solutions provide the software abstraction to create a network overlay and decouple network software services from the underlying hardware.

There are three major components to the VeloCloud Platform: Orchestrator, Gateways, and Edges. I will describe and summarize their functions and relationships to each other in this blog.

VeloCloud Orchestrator Operator Menu

VeloCloud Orchestrator Operator Menu

Orchestrator (VCO).

The VCO is the portal that is used to create, configure, and monitor VeloCloud SD-WANs. VeloCloud Orchestrator is multi-tenant and very powerful. Through a single Orchestator and its associated Gateways, one can create SD-WANs, or Software-defined Wide Area Networks, for Customers or Partners. A customer is able to manage and monitor their own VeloCloud Edges, Network Profiles, Business Policies, Firewall Rules, and more through the VCO. Partners are able to create their own customers within the VCO and manage their customer environments directly. The VCO is also used to activate and configure Edges. The VCO is a virtual machine that can run on vSphere, KVM, or AWS.

Gateway.

A VeloCloud Gateway, or VCG, is the device that an Edge routes traffic through when the traffic is defined to take a “multi-path” route (there will be more on route types in a future blog) or for non-VeloCloud VPNs. There are two main types of configurations for a Gateway, default and Partner. In the VCO, the VeloCloud Operator creates one or more Gateway pools and then places Gateways into that pool. Gateways are virtual machines that can run on vSphere, KVM, or AWS.

Gateway Pools are then assigned to Partners and/or Customers.

VCO Gateway Pools

VCO Gateway Pools

VCO Customers

VCO Customers

In a Cloud Hosted Model where Gateways are in default mode, an Edge is assigned a primary and secondary Gateway based on Geo location through the Maxmind database. The Edge’s peered Gateways are geographically closest to that Edge. The Edge device sends all “multi-path” traffic to its primary VCG and the Gateway then sends the traffic on to the intended destination. Return traffic is sent to the Gateway and then back to the Edge device. If the Edge identifies one of its Gateways as unreachable after 60 seconds, it marks the routes as stale. If the VCG is still unavailable after another 60 seconds, the Edge removes the routes for this Gateway. If all gateways are down, the routes are retained, and the timer is restarted. If the Gateways reconnect, the routes are refreshed on the Edge.

An SD-WAN with Partner Gateways gives the Operator the ability to route traffic to specific VCGs from Edges based on subnet. This is a value-add beyond the Cloud Hosted Model. A Partner will place Gateways geographically close to the services that they offer. When an Edge peered with a Partner VCG wants to access that service, the Edge leverages the tunnel to the Partner Gateway assigned for that service by subnet. Often Edges that are peered with partner Gateways have an average of 4 Gateways manually assigned. This number generally equals the number and locations of the services that the Partner is providing the customer such as SaaS offerings, Cloud services, etc.

You can see in the screenshot below that I checked the box for Partner Gateway during the Gateway creation and was given an option to define which subnets should be routed by that Gateway.

It is important to note that VCGs do not talk to each other and are not aware of each other’s state. Traffic is not routed between Gateways. The Edge sends “multi-path” traffic to its Gateway, that traffic is sent to its destination. When the destination responds, it is routed back through the Gateway to the Edge.

Gateways can be assigned to multiple Gateway Pools. Gateway Pools can be assigned to multiple Customers and Partners within the VCO. Partner Gateways should be placed closest (within 5-10 ms latency) to services that the Edges will access. Default Gateways should be geographically close to the Edges deployed in the Customer SD-WAN. It is not ideal for an Edge on the west coast of the US to send traffic to a Gateway on the east coast of the US before it is routed to its destination, for example.

Edge.

VeloCloud Edge, or VCE, devices are where the magic happens! Edge devices can be physical or virtual. They are implemented in enterprise datacenters, remote locations, and hyperscalers. Edge devices are able to aggregate multiple WAN links from different providers and send traffic on a per packet basis through the best WAN link to its peered Gateway. An Edge can aggregate the multiple WAN links and remediate issues found on public Internet providers such as loss, jitter, and latency. Even if just one WAN Link is connected to a VCE, improvement can be seen because of remediation capabilities of the Edge device.

In this screenshot you can see that VoIP traffic quality was greatly improved by the VCE. This VCE only has one WAN link.

VeloCloud Voice Enhancements

VeloCloud Voice Enhancements

All VCE Management is performed via the VCO in the customer portal. The Enterprise Administrator uses Profiles to manage Edge devices. This makes it very easy for thousands of VCEs to be managed with the modification of a single profile. Enterprise Administrators can also override the profile settings to give individual VCEs a unique configuration that is necessary for it specific site.

Edges can be configured in three main functions. As a default VCE, a Hub, or Internet Backhaul. The default VCE routes traffic as described above leveraging its profile rules and Business Policies. It might connect to a Hub or Internet Backhaul. A Hub is when one or more VCEs act as a central location for other VCEs to connect over VPN. A Hub is generally created at major data centers. An Internet Backhaul is a destination for traffic is routed via Business Policy rules from VCEs back to a single location such as a data center. This is often used for security or compliance purposes. I will provide more information on Business Policies in a future blog.

VCEs are created within the VCO by the Enterprise Administrator and assigned a profile. This profile includes all configuration items for interfaces, Wi-fi, static routes, firewall rules, business policies, VPNs, security services, and more. When the VCE is activated by the VCO, all configuration is pushed to the VCE by the VCO, and the VCE is peered with its primary and secondary gateway and Partner VCGs, if any.

Once the VCE is online, the VCO displays data about the traffic type, source, destination, and quality that passes through each VCE. A world map is displayed that shows all VCE locations and their status in the customer portal.

VCE Monitoring. Applications Tab.

VCE Monitoring. Applications Tab.

There are three ways that the VCE will route traffic. The way that traffic is routed is determined by Business Policies in the Edge Profile. These three routing types are defined as Network Services. They are Multi-Path, Direct, and Internet Backhaul. Multi-Path means that the VCE determines the best carrier for each packet from all WAN links. Each packet is routed to a Cloud or Partner Gateway. Direct is when the Enterprise Administrator routes specific application traffic by defining a single WAN link and does not route through a VCG. Internet Backhaul is described above.

The VeloCloud platform is extremely robust and easy to use at the same time. The ability to configure VCEs and provide security and services to 1000s of sites with a few clicks is nothing short of amazing. If you are looking to improve WAN quality, move away from expensive MPLS, aggregate multiple WAN links, create VPNs across the enterprise, provide security services, and have an easy to use portal to accomplish it all, definitely look at VeloCloud.

Thank you for reading! I will provide details on how to deploy and configure VCO, VCGs, and VCEs in the next blog of this series.

I want to give a shoutout to Cliff Lane at VMware for spending a lot of time answering my numerous questions about how VeloCloud works. Without him, this post would not be possible (or at least correct)! Thanks Cliff!

Home Networking. Part 2 – Foundational Configuration.

Now that the UniFi Security Gateway, or USG, and switches were online and updated to the latest firmware, I was anxious to really start using my VeloCloud Edge. I have access to a VeloCloud Orchestrator that is hosted and managed by VMware. But as an Enterprise Administrator, I can only configure and monitor Edges in a customer environment. There was a lot to the platform that I hadn’t seen. I would have Operator privileges in my own environment!

However, my home lab wasn’t ready because the VeloCloud Orchestrator, or VCO, is distributed as an OVF that requires vCenter to setup the VCO before it boots. I was hoping that I could deploy the OVF through the direct host management that I had been using. I gave it a try and was able to deploy the OVF. However, because I was not deploying through vCenter I wasn’t able to set the host name, password, or SSH keys. After the VCO booted up, I couldn’t log in or do anything. I deleted the VM and turned my attention to VLANs.

Velocloud OVF Configuration

Velocloud OVF Configuration

Because the UniFi switches are only layer 2 capable, they cannot route traffic between VLANs. This means all inter-VLAN traffic must be routed through the USG. Because I planned to have at least 5 separate VLANs, I began to feel concerned about the CPU utilization on the USG. It already would be performing DPI and other security features. Now, it will need to route most of the packets on my network. At the time of writing this, less than 25% of my devices are online. Every day a few more are connected. It will be interesting to how chatty these devices are with just a few human users. Here is the latest usage chart. CPU is sitting at about 25% with a few devices streaming and a couple of people using cell phones and laptops.

USG Performance Chart

USG Performance Chart

Setting up VLANs in the controller software is very easy. It’s configuring firewall rules that I find to be kludgy because they give you about 6 different ways to make the same thing happen. For software that is for home and small/medium business use, I think they should make this simpler and more intuitive. In the screenshot below you see I am creating a new network named Demo. I typed in 10.10.200.1/24, and it automatically populated everything below the Gateway/Subnet box. If the UniFi Site is setup with the correct DHCP and DNS servers, you won’t need to change those settings unless you wish. You’ll notice there are multiple purposes when creating a new network. To create a VLAN as one might expect to use it in an enterprise environment, select Corporate.

Network Creation in UniFi Controller

Network Creation in UniFi Controller

A network set with the Guest Purpose is used for Guest networks where you do not want those devices to access everything such as visitors who want to use Wi-Fi instead of data on their phones. If you want to use tokens or hotspot authentication, that is built into the guest profile and enabling it takes only a few clicks. This is certainly easier than manually setting up those firewall rules.

After creating VLANs for the different types of devices I would have on my network, it was time to prevent communication between the VLANs where it is unnecessary. When I looked at the site settings for routing and firewall, I was amazed. Why does it need to be so complicated? Nine different places I could create a rule seems excessive. To make matters worse, members of the Ubiquiti community give misinformation in the forums as to how to create a firewall rule such as creating an “IN” rule when there should truly be an “OUT” rule. I don’t think this is their fault, it is due to how the GUI is built and possibly to how the USG handles rules. For example, you must create the rules in order in the GUI that you wish for them to be enforced by the USG. You do not get to edit them to reorder them or even set the rule index during creation. This is just silly. There is more flexibility in the CLI, but then we have to get JSON involved for the settings to be remembered whenever the USG is rebooted or provisioned with a new setting. I would suggest a different product if you want simple firewall management at home. I don’t know what that would be. It seems a lot of people like pfsense. I’ve never used it, so I can’t recommend or not recommend.

UniFi Firewall Rules

UniFi Firewall Rules

It has been a very long time since I did anything that resembled real network administration. Many years ago, I spent a few days in San Jose to take Cisco ACE training. I am pretty sure administering the ACE was more intuitive than creating firewall rules in the USG’s GUI. This is saying a lot. But I prevailed and my IoT devices no longer had access to internal systems or the internet. No botnets coming from my house! Not that they’d have the bandwidth to do much destruction to the world.

USG Threat Management

USG Threat Management

Another feature that I’ve decided to turn on in the USG is Threat Management. We all accidentally click a wrong link every so often. Limiting my internet speed to 85 Mbps? No problem! This is another opportunity to look closely at the specs of the USG Pro if you can pull more than 80 Mbps. Since I do not have a Pro, I don’t know what its throughput would be reduced to.

I thought I was finally ready to install vCenter. But alas, I didn’t have a DNS server running on my network. And if I’m going to have a home lab with a bunch of VMs, I certainly need Active Directory. Creating a domain controller for a new forest in a home lab in 2020 is far less nerve-wracking than running DCPromo.exe in 2001 in an enterprise environment, that’s for sure!

After creating A and PTR records in DNS, it was finally time for the VCSA. As all of you probably know, the tiny deployment of vCenter requires 10 GB of RAM. That certainly wasn’t going to fly with my hardware limitations!

Gaming PC ESXi Host

Gaming PC ESXi Host

VCSA Config

VCSA Config

My host came to a grinding halt. I reduced the VCSA to 6 GB of RAM. It could barely boot and could not load the UI. I set it to 8 GB, and at least it ran with minimal complaining long enough for me to deploy the VeloCloud Orchestrator. After that, it was shut down until it was time to deploy a VeloCloud Gateway and subsequently powered off after that.

I was certainly happy to see this login screen after going through host resource gymnastics!

VCO VM

VCO VM

A production VM of VCO wants more resources than my host can provide.

VCO VM Resource Consumption

VCO VM Resource Consumption

Luckily, it is well behaved and only consumes what it needs while powered on.

One last note for before closing out this blog. VeloCloud Orchestrator must have a publicly accessible IP address. The default route must egress to the internet.

VCO OVF Network Selection

VCO OVF Network Selection

This means if you want to do this in your own environment at scale for true internet routing purposes, you might want to have a separate NIC that isn’t hidden behind NAT from something like, a USG for example! There will be many more things that you would need in addition to this, so it is unlikely that an individual would be running their own VCO instance for true SD-WAN multi-pathing into the world. But running it in your home lab to familiarize yourself with the platform behind a firewall and NAT is just fine.

Thank you for reading Part 2! Part 3 will address the VeloCloud architecture. I will describe what the individual components do and how they talk to each other.

Home Networking. Part 1 – The Beginning.

This is a multi-part series on what I’ve learned from my home lab configuration and troubleshooting of Ubiquiti Unifi gear and Velocloud Orchestrator, gateways, and edge devices. Some of this information is already available on the internet, but it took a lot of searching to find it. Some of this information required conversations with the engineers who created the product and is not documented publicly (yet). This blog series is an attempt to consolidate information and links in a single location and simplify some of the mystery of what happens behind the scenes of a Velocloud implementation. I’m happy to answer any questions you might have to the best of my abilities. If it is about Velocloud, I can check with our engineers too.

Phase 1. Requirements.

Cloud and internet. Due to my lack of reliable and highspeed internet connections, security concerns, and my desire to nerd out, everything must be hosted locally and cannot use cloud services.

Wired networks. My home was cabled with cat6, cat6a, and speaker wire to all the locations I might have electronics by professionals. I cannot recommend this enough. Since someone else did it, if a cable doesn’t work, it is on them to fix it. Totally worth it! I also had them install conduits to major locations so it is easy to run new cables in 5-10 years when everything I’ve designed and implemented is obsolete.

I chose to run ethernet everywhere because I find Wifi to be unreliable and generally a pain. With the amount of bandwidth and reliability we need for 4k content and gaming and beyond, ethernet is necessary. Plus, devices like access points and cameras need ethernet for power and security.

Whole home A/V and automation. I didn’t want a closed system (Control 4, Crestron, Savant, etc.) that I was not allowed to configure myself per the manufacturer. And, I didn’t want anything that relied on the cloud to function or stored my data in someone else’s cloud. I choose automation software that has a very large community and runs 100% local. Of course, if I automated something that is controlled in the cloud, that part of the script would be dependent on internet connectivity and the service being online, otherwise, everything is stored and executed locally.

Gear location. I had a closet made specifically for racks of gear with independent cooling. Due to the cost of some cabling types (HDMI over Ethernet), limitations of products on the market, and the laws of physics, some gear couldn’t be stored in the closet which required more cabling to media locations because gear would be mounted there.

Security. I attempted to follow all of the normal best practices: IoT devices can’t talk to the internet or other systems, untrusted guests can’t access internal systems, etc. Any product such as doorbells or cameras that have Bluetooth or Wifi are usually hacked quite quickly and updated for security rarely, so I didn’t choose them. I didn’t buy “smart” appliances or other “smart” home items because security is not those companies’ priority.

Exceptions. Unfortunately, not all of my requirements could be met. And, I still have cell phones, tablets, and a job to do! So, we’re not completely walled off from the world. However, if the internet does become unavailable, we’re able to enjoy media, I can observe and manage my network, and automation still functions.

Phase 2. Setting up the network equipment.

If you follow me on Twitter, then you probably already know that my first issue was the fact that I didn’t have a computer prepared to install Unifi Controller software. Luckily, ESXi runs on pretty much everything these days, like my ancient gaming PC. Within an hour (thank you rural download speeds and my inability to find a large enough USB stick for the iso), I had an ESXi standalone host online. It took a few more hours to have an operating system iso downloaded. I chose to deploy Windows because that’s my most comfortable OS, and I didn’t want to spend time dealing with a less supported version of the controller software. I haven’t had any issues with it yet (after the initial setup as you’ll see below)!

My original choices of gear were the Ubiquiti Unifi Secure Gateway (USG), a Unifi 48 port managed switch, a Unifi 24 port managed POE switch, and multiple Unifi AP Pros. I could have read the product details a closer before placing my order, but a lot was going on in my life, and I made some assumptions I shouldn’t have. Here they are:

·      The USG has deep packet inspection (DPI) capabilities. Yay security and observability! The throughput advertised of the USG is 1 Gbps. Since my rural internet connections come nowhere close to that, I thought I’d be fine. Now, if you turn on DPI, your throughput drops to about 100 Mbps. This is still fine for me, but if you live where you can get fast internet, you will want to get the USG Pro or maybe go a different route altogether (more on that in another post). Also, the USG doesn’t have a rackmount option. The USG Pro does.

·      The Unifi 24 port managed POE switch only has 16 POE ports! The rest are not powered. I did the math on how much power each of my POE devices would probably draw and calculated that the switch could handle that load before ordering. However, I did not check to see how many ports on the switch actually provide power. I ordered a few POE injectors for the remaining devices after realizing my mistake, but I would have purchased different switches if I read the specs closer before placing my order. One cannot return gear to Ubiquiti due to their own mistakes.

·      I did not order a top of rack switch for the second gear rack. When my AV installers showed up and we started discussing the roll out of equipment, I discovered that the cable terminated in different parts of the closet based on a predetermined device layout in the racks. It was expected that the second rack would have devices that needed a switch, so they cabled it that way. I thought there’d be a bit more flexibility. They did provide 4 patch cables between the racks when they ran the rest of the cabling into the closet. If I factored this and my POE switch mistake into the design along with limitations I have since discovered in the USG, my purchase certainly would have looked different.

·      Cat 6a has huge connectors! I already knew Cat 6a didn’t bend well, so we only ran it for certain locations and devices. But, due to the connector size, you cannot have two Cat 6a cables next to each other in the Unifi switch. This is another factor for how many switches to buy and how many ports are needed. Maybe there are switches with ports farther apart than Unifi gear for this reason, I haven’t looked into it.

The Unifi gear was simple to deploy. I was able to just wire up my internet connection to the USG WAN1 port, the USG LAN port to any port on the 48 port switch, and the ESXi host NIC to any port on the 48 port switch and obtain an IP address from the USG DHCP for my host and VMs. The WAN1 port on the USG was configured to use DHCP out of the box as well. After figuring out how to put my satellite modem in bridged mode (Pro tip: disable any VPN software so local addresses can be accessed =), I had internet connectivity. I even placed the Velocloud Edge 510 in line between the USG and modem and had internet connectivity without it being activated and configured. This was nice.

After installing Windows Server, Java, and the Unifi Controller software, it was time to see if setting up the Unifi gear was as easy as everyone said it was. Short answer: Yes! Long answer: not if you think you can change the Unifi devices’ IP addresses!

The Unifi controller software easily discovered the USG and 48 port switch without my having to do anything. There is a simple function called “adopt” so the controller can then manage the Unifi devices it finds. This is all very easy. Except, I didn’t want my hardware to use factory defaults, and I certainly didn’t want it to be using DHCP. The first time, I changed the IP address of the 48 port switch. After the controller software pushed the change, it could no longer find the switch to manage it. I made the change in the controller software so it obviously knows the new IP address. Strange, since the Windows VM was running on a host directly plugged into the switch. I “forgot” the device in the controller, but the software wouldn’t discover it again for me to force an adoption. I spent about an hour looking at community posts on forums about their issues when changing IP addresses (I should have got the hint when every final solution was do a factory reset and adopt the device in the controller software with the defaults) and reading through manuals. I thought I had a plan ready for try number 2 and performed a factory reset on the switch.

This time, I would start with re-IPing the USG. The controller lost its ability to manage USG after I changed its IP address. At least it could talk to it, but it failed when trying to manage it after the change provisioned. My attempts to force the management were thwarted with an authentication issue. Strange again, since the controller software sets the password on adoption. After working through various other documented and community suggested troubleshooting attempts, I gave up. I performed another factory reset and just used the factory defaults. I thought of the other switch and APs that I planned on bringing online and counted the hours of my life I would spend trying to get it to work. I decided it wasn’t worth it. Unless you are extremely lucky, patient, enjoy a cli, and have a lot of extra time on your hands, I don’t suggest wrestling with these devices that definitely do not like their IP addresses changed.

With all these IP changes and abandoned devices, the controller software became very unhappy and couldn’t discover the devices that just went through the second factory reset. I tried a few troubleshooting steps and decided that I may as well just reinstall the software. Minutes later after an uninstall, a reboot, a reinstall, a reboot for good measure, and answering the initial questions, the Unifi controller software was online and able to discover the devices again. I powered up the POE switch, connected it to the 48 port switch, and adopted it in the controller software. I set the DHCP reservations in the USG, performed any remaining updates, and called it a night. If I accepted my factory default fate from the beginning and had faster internet speeds, I probably would have spent no more than two hours on all of this. But hey, I had event tickets to get and dungeons to run in ESO while I waited.

Thanks for reading! Part 2 will be about VLAN creation, firewall rules, and the beginning of the Velocloud edge activation.