Home Networking. Part 3 - VeloCloud Architecture

Before I blog about my experience in configuring VeloCloud from Orchestrator to Edge, it is important to understand the architecture and how the VeloCloud SD-WAN platform functions. With this knowledge one can make the best decisions about how to configure their SD-WAN. SD-WAN solutions provide the software abstraction to create a network overlay and decouple network software services from the underlying hardware.

There are three major components to the VeloCloud Platform: Orchestrator, Gateways, and Edges. I will describe and summarize their functions and relationships to each other in this blog.

VeloCloud Orchestrator Operator Menu

VeloCloud Orchestrator Operator Menu

Orchestrator (VCO).

The VCO is the portal that is used to create, configure, and monitor VeloCloud SD-WANs. VeloCloud Orchestrator is multi-tenant and very powerful. Through a single Orchestator and its associated Gateways, one can create SD-WANs, or Software-defined Wide Area Networks, for Customers or Partners. A customer is able to manage and monitor their own VeloCloud Edges, Network Profiles, Business Policies, Firewall Rules, and more through the VCO. Partners are able to create their own customers within the VCO and manage their customer environments directly. The VCO is also used to activate and configure Edges. The VCO is a virtual machine that can run on vSphere, KVM, or AWS.

Gateway.

A VeloCloud Gateway, or VCG, is the device that an Edge routes traffic through when the traffic is defined to take a “multi-path” route (there will be more on route types in a future blog) or for non-VeloCloud VPNs. There are two main types of configurations for a Gateway, default and Partner. In the VCO, the VeloCloud Operator creates one or more Gateway pools and then places Gateways into that pool. Gateways are virtual machines that can run on vSphere, KVM, or AWS.

Gateway Pools are then assigned to Partners and/or Customers.

VCO Gateway Pools

VCO Gateway Pools

VCO Customers

VCO Customers

In a Cloud Hosted Model where Gateways are in default mode, an Edge is assigned a primary and secondary Gateway based on Geo location through the Maxmind database. The Edge’s peered Gateways are geographically closest to that Edge. The Edge device sends all “multi-path” traffic to its primary VCG and the Gateway then sends the traffic on to the intended destination. Return traffic is sent to the Gateway and then back to the Edge device. If the Edge identifies one of its Gateways as unreachable after 60 seconds, it marks the routes as stale. If the VCG is still unavailable after another 60 seconds, the Edge removes the routes for this Gateway. If all gateways are down, the routes are retained, and the timer is restarted. If the Gateways reconnect, the routes are refreshed on the Edge.

An SD-WAN with Partner Gateways gives the Operator the ability to route traffic to specific VCGs from Edges based on subnet. This is a value-add beyond the Cloud Hosted Model. A Partner will place Gateways geographically close to the services that they offer. When an Edge peered with a Partner VCG wants to access that service, the Edge leverages the tunnel to the Partner Gateway assigned for that service by subnet. Often Edges that are peered with partner Gateways have an average of 4 Gateways manually assigned. This number generally equals the number and locations of the services that the Partner is providing the customer such as SaaS offerings, Cloud services, etc.

You can see in the screenshot below that I checked the box for Partner Gateway during the Gateway creation and was given an option to define which subnets should be routed by that Gateway.

It is important to note that VCGs do not talk to each other and are not aware of each other’s state. Traffic is not routed between Gateways. The Edge sends “multi-path” traffic to its Gateway, that traffic is sent to its destination. When the destination responds, it is routed back through the Gateway to the Edge.

Gateways can be assigned to multiple Gateway Pools. Gateway Pools can be assigned to multiple Customers and Partners within the VCO. Partner Gateways should be placed closest (within 5-10 ms latency) to services that the Edges will access. Default Gateways should be geographically close to the Edges deployed in the Customer SD-WAN. It is not ideal for an Edge on the west coast of the US to send traffic to a Gateway on the east coast of the US before it is routed to its destination, for example.

Edge.

VeloCloud Edge, or VCE, devices are where the magic happens! Edge devices can be physical or virtual. They are implemented in enterprise datacenters, remote locations, and hyperscalers. Edge devices are able to aggregate multiple WAN links from different providers and send traffic on a per packet basis through the best WAN link to its peered Gateway. An Edge can aggregate the multiple WAN links and remediate issues found on public Internet providers such as loss, jitter, and latency. Even if just one WAN Link is connected to a VCE, improvement can be seen because of remediation capabilities of the Edge device.

In this screenshot you can see that VoIP traffic quality was greatly improved by the VCE. This VCE only has one WAN link.

VeloCloud Voice Enhancements

VeloCloud Voice Enhancements

All VCE Management is performed via the VCO in the customer portal. The Enterprise Administrator uses Profiles to manage Edge devices. This makes it very easy for thousands of VCEs to be managed with the modification of a single profile. Enterprise Administrators can also override the profile settings to give individual VCEs a unique configuration that is necessary for it specific site.

Edges can be configured in three main functions. As a default VCE, a Hub, or Internet Backhaul. The default VCE routes traffic as described above leveraging its profile rules and Business Policies. It might connect to a Hub or Internet Backhaul. A Hub is when one or more VCEs act as a central location for other VCEs to connect over VPN. A Hub is generally created at major data centers. An Internet Backhaul is a destination for traffic is routed via Business Policy rules from VCEs back to a single location such as a data center. This is often used for security or compliance purposes. I will provide more information on Business Policies in a future blog.

VCEs are created within the VCO by the Enterprise Administrator and assigned a profile. This profile includes all configuration items for interfaces, Wi-fi, static routes, firewall rules, business policies, VPNs, security services, and more. When the VCE is activated by the VCO, all configuration is pushed to the VCE by the VCO, and the VCE is peered with its primary and secondary gateway and Partner VCGs, if any.

Once the VCE is online, the VCO displays data about the traffic type, source, destination, and quality that passes through each VCE. A world map is displayed that shows all VCE locations and their status in the customer portal.

VCE Monitoring. Applications Tab.

VCE Monitoring. Applications Tab.

There are three ways that the VCE will route traffic. The way that traffic is routed is determined by Business Policies in the Edge Profile. These three routing types are defined as Network Services. They are Multi-Path, Direct, and Internet Backhaul. Multi-Path means that the VCE determines the best carrier for each packet from all WAN links. Each packet is routed to a Cloud or Partner Gateway. Direct is when the Enterprise Administrator routes specific application traffic by defining a single WAN link and does not route through a VCG. Internet Backhaul is described above.

The VeloCloud platform is extremely robust and easy to use at the same time. The ability to configure VCEs and provide security and services to 1000s of sites with a few clicks is nothing short of amazing. If you are looking to improve WAN quality, move away from expensive MPLS, aggregate multiple WAN links, create VPNs across the enterprise, provide security services, and have an easy to use portal to accomplish it all, definitely look at VeloCloud.

Thank you for reading! I will provide details on how to deploy and configure VCO, VCGs, and VCEs in the next blog of this series.

I want to give a shoutout to Cliff Lane at VMware for spending a lot of time answering my numerous questions about how VeloCloud works. Without him, this post would not be possible (or at least correct)! Thanks Cliff!

Home Networking. Part 2 – Foundational Configuration.

Now that the UniFi Security Gateway, or USG, and switches were online and updated to the latest firmware, I was anxious to really start using my VeloCloud Edge. I have access to a VeloCloud Orchestrator that is hosted and managed by VMware. But as an Enterprise Administrator, I can only configure and monitor Edges in a customer environment. There was a lot to the platform that I hadn’t seen. I would have Operator privileges in my own environment!

However, my home lab wasn’t ready because the VeloCloud Orchestrator, or VCO, is distributed as an OVF that requires vCenter to setup the VCO before it boots. I was hoping that I could deploy the OVF through the direct host management that I had been using. I gave it a try and was able to deploy the OVF. However, because I was not deploying through vCenter I wasn’t able to set the host name, password, or SSH keys. After the VCO booted up, I couldn’t log in or do anything. I deleted the VM and turned my attention to VLANs.

Velocloud OVF Configuration

Velocloud OVF Configuration

Because the UniFi switches are only layer 2 capable, they cannot route traffic between VLANs. This means all inter-VLAN traffic must be routed through the USG. Because I planned to have at least 5 separate VLANs, I began to feel concerned about the CPU utilization on the USG. It already would be performing DPI and other security features. Now, it will need to route most of the packets on my network. At the time of writing this, less than 25% of my devices are online. Every day a few more are connected. It will be interesting to how chatty these devices are with just a few human users. Here is the latest usage chart. CPU is sitting at about 25% with a few devices streaming and a couple of people using cell phones and laptops.

USG Performance Chart

USG Performance Chart

Setting up VLANs in the controller software is very easy. It’s configuring firewall rules that I find to be kludgy because they give you about 6 different ways to make the same thing happen. For software that is for home and small/medium business use, I think they should make this simpler and more intuitive. In the screenshot below you see I am creating a new network named Demo. I typed in 10.10.200.1/24, and it automatically populated everything below the Gateway/Subnet box. If the UniFi Site is setup with the correct DHCP and DNS servers, you won’t need to change those settings unless you wish. You’ll notice there are multiple purposes when creating a new network. To create a VLAN as one might expect to use it in an enterprise environment, select Corporate.

Network Creation in UniFi Controller

Network Creation in UniFi Controller

A network set with the Guest Purpose is used for Guest networks where you do not want those devices to access everything such as visitors who want to use Wi-Fi instead of data on their phones. If you want to use tokens or hotspot authentication, that is built into the guest profile and enabling it takes only a few clicks. This is certainly easier than manually setting up those firewall rules.

After creating VLANs for the different types of devices I would have on my network, it was time to prevent communication between the VLANs where it is unnecessary. When I looked at the site settings for routing and firewall, I was amazed. Why does it need to be so complicated? Nine different places I could create a rule seems excessive. To make matters worse, members of the Ubiquiti community give misinformation in the forums as to how to create a firewall rule such as creating an “IN” rule when there should truly be an “OUT” rule. I don’t think this is their fault, it is due to how the GUI is built and possibly to how the USG handles rules. For example, you must create the rules in order in the GUI that you wish for them to be enforced by the USG. You do not get to edit them to reorder them or even set the rule index during creation. This is just silly. There is more flexibility in the CLI, but then we have to get JSON involved for the settings to be remembered whenever the USG is rebooted or provisioned with a new setting. I would suggest a different product if you want simple firewall management at home. I don’t know what that would be. It seems a lot of people like pfsense. I’ve never used it, so I can’t recommend or not recommend.

UniFi Firewall Rules

UniFi Firewall Rules

It has been a very long time since I did anything that resembled real network administration. Many years ago, I spent a few days in San Jose to take Cisco ACE training. I am pretty sure administering the ACE was more intuitive than creating firewall rules in the USG’s GUI. This is saying a lot. But I prevailed and my IoT devices no longer had access to internal systems or the internet. No botnets coming from my house! Not that they’d have the bandwidth to do much destruction to the world.

USG Threat Management

USG Threat Management

Another feature that I’ve decided to turn on in the USG is Threat Management. We all accidentally click a wrong link every so often. Limiting my internet speed to 85 Mbps? No problem! This is another opportunity to look closely at the specs of the USG Pro if you can pull more than 80 Mbps. Since I do not have a Pro, I don’t know what its throughput would be reduced to.

I thought I was finally ready to install vCenter. But alas, I didn’t have a DNS server running on my network. And if I’m going to have a home lab with a bunch of VMs, I certainly need Active Directory. Creating a domain controller for a new forest in a home lab in 2020 is far less nerve-wracking than running DCPromo.exe in 2001 in an enterprise environment, that’s for sure!

After creating A and PTR records in DNS, it was finally time for the VCSA. As all of you probably know, the tiny deployment of vCenter requires 10 GB of RAM. That certainly wasn’t going to fly with my hardware limitations!

Gaming PC ESXi Host

Gaming PC ESXi Host

VCSA Config

VCSA Config

My host came to a grinding halt. I reduced the VCSA to 6 GB of RAM. It could barely boot and could not load the UI. I set it to 8 GB, and at least it ran with minimal complaining long enough for me to deploy the VeloCloud Orchestrator. After that, it was shut down until it was time to deploy a VeloCloud Gateway and subsequently powered off after that.

I was certainly happy to see this login screen after going through host resource gymnastics!

VCO VM

VCO VM

A production VM of VCO wants more resources than my host can provide.

VCO VM Resource Consumption

VCO VM Resource Consumption

Luckily, it is well behaved and only consumes what it needs while powered on.

One last note for before closing out this blog. VeloCloud Orchestrator must have a publicly accessible IP address. The default route must egress to the internet.

VCO OVF Network Selection

VCO OVF Network Selection

This means if you want to do this in your own environment at scale for true internet routing purposes, you might want to have a separate NIC that isn’t hidden behind NAT from something like, a USG for example! There will be many more things that you would need in addition to this, so it is unlikely that an individual would be running their own VCO instance for true SD-WAN multi-pathing into the world. But running it in your home lab to familiarize yourself with the platform behind a firewall and NAT is just fine.

Thank you for reading Part 2! Part 3 will address the VeloCloud architecture. I will describe what the individual components do and how they talk to each other.

Home Networking. Part 1 – The Beginning.

This is a multi-part series on what I’ve learned from my home lab configuration and troubleshooting of Ubiquiti Unifi gear and Velocloud Orchestrator, gateways, and edge devices. Some of this information is already available on the internet, but it took a lot of searching to find it. Some of this information required conversations with the engineers who created the product and is not documented publicly (yet). This blog series is an attempt to consolidate information and links in a single location and simplify some of the mystery of what happens behind the scenes of a Velocloud implementation. I’m happy to answer any questions you might have to the best of my abilities. If it is about Velocloud, I can check with our engineers too.

Phase 1. Requirements.

Cloud and internet. Due to my lack of reliable and highspeed internet connections, security concerns, and my desire to nerd out, everything must be hosted locally and cannot use cloud services.

Wired networks. My home was cabled with cat6, cat6a, and speaker wire to all the locations I might have electronics by professionals. I cannot recommend this enough. Since someone else did it, if a cable doesn’t work, it is on them to fix it. Totally worth it! I also had them install conduits to major locations so it is easy to run new cables in 5-10 years when everything I’ve designed and implemented is obsolete.

I chose to run ethernet everywhere because I find Wifi to be unreliable and generally a pain. With the amount of bandwidth and reliability we need for 4k content and gaming and beyond, ethernet is necessary. Plus, devices like access points and cameras need ethernet for power and security.

Whole home A/V and automation. I didn’t want a closed system (Control 4, Crestron, Savant, etc.) that I was not allowed to configure myself per the manufacturer. And, I didn’t want anything that relied on the cloud to function or stored my data in someone else’s cloud. I choose automation software that has a very large community and runs 100% local. Of course, if I automated something that is controlled in the cloud, that part of the script would be dependent on internet connectivity and the service being online, otherwise, everything is stored and executed locally.

Gear location. I had a closet made specifically for racks of gear with independent cooling. Due to the cost of some cabling types (HDMI over Ethernet), limitations of products on the market, and the laws of physics, some gear couldn’t be stored in the closet which required more cabling to media locations because gear would be mounted there.

Security. I attempted to follow all of the normal best practices: IoT devices can’t talk to the internet or other systems, untrusted guests can’t access internal systems, etc. Any product such as doorbells or cameras that have Bluetooth or Wifi are usually hacked quite quickly and updated for security rarely, so I didn’t choose them. I didn’t buy “smart” appliances or other “smart” home items because security is not those companies’ priority.

Exceptions. Unfortunately, not all of my requirements could be met. And, I still have cell phones, tablets, and a job to do! So, we’re not completely walled off from the world. However, if the internet does become unavailable, we’re able to enjoy media, I can observe and manage my network, and automation still functions.

Phase 2. Setting up the network equipment.

If you follow me on Twitter, then you probably already know that my first issue was the fact that I didn’t have a computer prepared to install Unifi Controller software. Luckily, ESXi runs on pretty much everything these days, like my ancient gaming PC. Within an hour (thank you rural download speeds and my inability to find a large enough USB stick for the iso), I had an ESXi standalone host online. It took a few more hours to have an operating system iso downloaded. I chose to deploy Windows because that’s my most comfortable OS, and I didn’t want to spend time dealing with a less supported version of the controller software. I haven’t had any issues with it yet (after the initial setup as you’ll see below)!

My original choices of gear were the Ubiquiti Unifi Secure Gateway (USG), a Unifi 48 port managed switch, a Unifi 24 port managed POE switch, and multiple Unifi AP Pros. I could have read the product details a closer before placing my order, but a lot was going on in my life, and I made some assumptions I shouldn’t have. Here they are:

·      The USG has deep packet inspection (DPI) capabilities. Yay security and observability! The throughput advertised of the USG is 1 Gbps. Since my rural internet connections come nowhere close to that, I thought I’d be fine. Now, if you turn on DPI, your throughput drops to about 100 Mbps. This is still fine for me, but if you live where you can get fast internet, you will want to get the USG Pro or maybe go a different route altogether (more on that in another post). Also, the USG doesn’t have a rackmount option. The USG Pro does.

·      The Unifi 24 port managed POE switch only has 16 POE ports! The rest are not powered. I did the math on how much power each of my POE devices would probably draw and calculated that the switch could handle that load before ordering. However, I did not check to see how many ports on the switch actually provide power. I ordered a few POE injectors for the remaining devices after realizing my mistake, but I would have purchased different switches if I read the specs closer before placing my order. One cannot return gear to Ubiquiti due to their own mistakes.

·      I did not order a top of rack switch for the second gear rack. When my AV installers showed up and we started discussing the roll out of equipment, I discovered that the cable terminated in different parts of the closet based on a predetermined device layout in the racks. It was expected that the second rack would have devices that needed a switch, so they cabled it that way. I thought there’d be a bit more flexibility. They did provide 4 patch cables between the racks when they ran the rest of the cabling into the closet. If I factored this and my POE switch mistake into the design along with limitations I have since discovered in the USG, my purchase certainly would have looked different.

·      Cat 6a has huge connectors! I already knew Cat 6a didn’t bend well, so we only ran it for certain locations and devices. But, due to the connector size, you cannot have two Cat 6a cables next to each other in the Unifi switch. This is another factor for how many switches to buy and how many ports are needed. Maybe there are switches with ports farther apart than Unifi gear for this reason, I haven’t looked into it.

The Unifi gear was simple to deploy. I was able to just wire up my internet connection to the USG WAN1 port, the USG LAN port to any port on the 48 port switch, and the ESXi host NIC to any port on the 48 port switch and obtain an IP address from the USG DHCP for my host and VMs. The WAN1 port on the USG was configured to use DHCP out of the box as well. After figuring out how to put my satellite modem in bridged mode (Pro tip: disable any VPN software so local addresses can be accessed =), I had internet connectivity. I even placed the Velocloud Edge 510 in line between the USG and modem and had internet connectivity without it being activated and configured. This was nice.

After installing Windows Server, Java, and the Unifi Controller software, it was time to see if setting up the Unifi gear was as easy as everyone said it was. Short answer: Yes! Long answer: not if you think you can change the Unifi devices’ IP addresses!

The Unifi controller software easily discovered the USG and 48 port switch without my having to do anything. There is a simple function called “adopt” so the controller can then manage the Unifi devices it finds. This is all very easy. Except, I didn’t want my hardware to use factory defaults, and I certainly didn’t want it to be using DHCP. The first time, I changed the IP address of the 48 port switch. After the controller software pushed the change, it could no longer find the switch to manage it. I made the change in the controller software so it obviously knows the new IP address. Strange, since the Windows VM was running on a host directly plugged into the switch. I “forgot” the device in the controller, but the software wouldn’t discover it again for me to force an adoption. I spent about an hour looking at community posts on forums about their issues when changing IP addresses (I should have got the hint when every final solution was do a factory reset and adopt the device in the controller software with the defaults) and reading through manuals. I thought I had a plan ready for try number 2 and performed a factory reset on the switch.

This time, I would start with re-IPing the USG. The controller lost its ability to manage USG after I changed its IP address. At least it could talk to it, but it failed when trying to manage it after the change provisioned. My attempts to force the management were thwarted with an authentication issue. Strange again, since the controller software sets the password on adoption. After working through various other documented and community suggested troubleshooting attempts, I gave up. I performed another factory reset and just used the factory defaults. I thought of the other switch and APs that I planned on bringing online and counted the hours of my life I would spend trying to get it to work. I decided it wasn’t worth it. Unless you are extremely lucky, patient, enjoy a cli, and have a lot of extra time on your hands, I don’t suggest wrestling with these devices that definitely do not like their IP addresses changed.

With all these IP changes and abandoned devices, the controller software became very unhappy and couldn’t discover the devices that just went through the second factory reset. I tried a few troubleshooting steps and decided that I may as well just reinstall the software. Minutes later after an uninstall, a reboot, a reinstall, a reboot for good measure, and answering the initial questions, the Unifi controller software was online and able to discover the devices again. I powered up the POE switch, connected it to the 48 port switch, and adopted it in the controller software. I set the DHCP reservations in the USG, performed any remaining updates, and called it a night. If I accepted my factory default fate from the beginning and had faster internet speeds, I probably would have spent no more than two hours on all of this. But hey, I had event tickets to get and dungeons to run in ESO while I waited.

Thanks for reading! Part 2 will be about VLAN creation, firewall rules, and the beginning of the Velocloud edge activation.

The Thing about Internet of Things

Internet of Things is no longer a buzz-phrase. It is a full-blown race in every industry to capitalize on the total addressable market trending towards $1 Trillion by the year 2021. The technology industry has not seen this type of opportunity in a long time, possibly ever. Internet of Things gives a chance to companies who have never been in the business of servers, network, security, cloud, data center, or internet to interrupt established companies in these previously dominated arenas. Many companies, large and startup alike, want to have the largest impact on the ecosystem first so others will follow the same design and implementation path.

I have mixed feelings about this boom in IoT. While the exponential technology improvements are exciting, it seems we are ignoring important factors in trying to create the next big thing. My main concerns are security, on-premises analytics, data management, and security. Yes, security is that important and mostly being ignored! In IT we talk about technology debt. If we ignore these important factors, there will be more to resolve than the technology debt of trying to redo poorly implemented systems. Without a design focused on security, there will be a large increase in successful device and network compromises. These compromises will result in a range of outages. The start of the range would be an amusing inconvenience, while the other end of the spectrum would be a complete disruption of important services across the globe.

My recommendations for successful IoT deployments are as follows:

1.     Security. Security should be designed into the IoT device and ecosystem from the beginning. There must be a baseline for IoT security and compliance. The IoT Security Foundation has published an IoT Security and Compliance Framework. All devices should adhere to these practices whenever possible. IoT devices should be scanned for vulnerabilities just like Operating Systems and Appliances are today and kept up to date. This will be an ever-evolving area as more systems are linked to gain more value from the data.

2.     On-premises analytics. There are many platforms for analytics in the cloud. Companies have improved processes and saved money with these platforms already. My favorite story is how Hershey’s will save millions of dollars on Twizzlers production improvements. As a previous employee of a manufacturing company, I do have a soft spot for this industry, but analytics applies to all industries. The more proprietary the data or the more isolated the IoT networks, companies will need to run analytics on-premises for security, compliance, and design reasons. This will require edge computing and storage beyond anything we’ve seen before. IT will need to design the best combination of edge and core analytics to accomplish what their lines of business need. Many companies are trying to consolidate their edge computing instances. Instead, they should be thinking about using SDDC to quickly make their edge computing more uniform in preparations for its growth.

3.     Data management. Hershey’s has 22 sensors on each Twizzlers tank. These sensors collect 60 million data points. These astronomical numbers represent a very, very small fraction of IoT data on the planet today. And those numbers will increase exponentially monthly. It is a good time to be in the data management business! We will need a new approach to this compared to how we’ve managed and retained databases, files, and email in the past. The data that isn’t valuable today, might be valuable in the future. Educated guesses will determine which historical data should be retained either at the edge or within the core at the beginning. As IoT matures, deep learning can be used to identify which values will be useful. The size of the data points will be small, but they will be generated quickly and often. This will also require vast improvements to networks – 400GbE and 5G implementations are right around the corner. We’ll see requirements for IPv6 in every organization when IoT proliferates. Design teams must be created around data management solutions and processes to solve these problems.

There are many more perils than those listed above when pioneering a new era of technology, but those three are the ones that I believe are most often overlooked by product teams and engineers. I look forward to more conversation about the blind spots you have identified in this space.