Blog Cloud

Continuous Operations with ExtremeCloud IQ

Imagine losing power for a few hours or even the whole day because of some issues at a power station due to weather or a fuse. You cannot watch TV, no network connection, no heater, no ability to cook anything, the food in the refrigerator goes bad and the worst thing is – you cannot charge your phone! Now you stumble through the darkness looking for a torch. Doesn’t that feel like being whisked back to the stone Age? No wonder losing power is called a blackout. This is usually the point in your life when you take an oath to start talking to solar companies for a roof panel and purchase a standby power generator for tertiary backup. Just as power is fundamental to the 21st-century lifestyle, uptime is crucial for companies undergoing digitization with cloud-managed networking. Don’t you wonder how Extreme designs for CloudOps?

The power going out across a city.
Image courtesy giphy.com

Let us start by defining a few terms. The popular term DevOps streamlines the processes associated with software development(Dev) and IT Operations(Ops) and aims to accelerate development lifecycles without compromising quality. CloudOps, on the other hand, is short for Cloud Operations which combines DevOps and the optimization of IT services for cloud-based solutions. Continuous operation means that we keep the cloud services running while providing you frequent new innovations. How often is regular – how about hundreds of updates in 365 days! Every single process from onboarding your devices, authenticating users, setting your network configuration, ongoing observability of your network, and user experience is built in the cloud. To properly architect a highly available cloud-managed solution with various services is a herculean task and takes talent as well as effort. To keep this blog short; let us focus on how we design the authentication mechanism. With millions of worldwide customers logging daily to ExtremeCloud IQ to manage their networks, this service needs to support large scale, low-latency, and high availability. Additionally, the database needs to be backed-up for disaster recovery. Now let us dig into the authentication architecture.

Geo-distribution for low latency and data residency

A geo-location DNS flow chart, separated by region.

The Lego blocks of availability begins with the placement of the GDC, or the Global Data Center that is geographically disbursed between the US and Europe and load balanced. To maintain data localization or residency, the login information for EU customers exists only in the European instance. When users login from Asia Pacific region, the appropriate GDC to be utilized is determined based upon the measured latency. In addition to serving as the primary authentication mechanism to the ExtremeCloud IQ, the GDC also performs device redirection and other global services as required. All instances of the GDC are hosted within Amazon AWS. The RDC, or Regional Data Center, is hosted among various cloud providers depending on data retention time and location. For more details about the architecture don’t forget to check out this whitepaper.

A data center distribution from global to regional to the customer

The authentication service uses two types of data storage services. When you log in once to the GDC, the token is stored in the local GDC and it replicates to multiple remote GDCs. Some data is stored in a database while other is stored in an in-memory cache.

Cross-Region Remote as well as Local DataBase Replica:

  1. Remote Replica – The authentication server is made of multiple micro-services. Each micro-service has an identical pair that is constantly maintained in synchronization. Each region by the Cloud provider is designed to be fully autonomous and physically distant from the other regions. Even in the case of a natural disaster, this ensures that the standby authentication server is always available. For example, in the case of Europe, while the primary is maintained in one region the second load-balanced server is maintained in another region all the time maintaining the data within Europe.
  2. Local Replica – Furthermore, a read-only replica of GDC is maintained in the same region. This allows load-balancing of authentication between the primary GDC and the replica. This way even in situations when the primary GDC is down before the failover has happened, users can continue to authenticate using the database from the replica.
  3. Data Sovereignty: Since both, the regions for Europe maintain their data in EU region, the architecture is flexible enough to meet GDPR regulations. The process of Sharding is undertaken to ensure that the data is kept in the appropriate locality. In order to support GDPR, some interesting corner cases need to be solved. If a European customer travels to the US, none of the user Personal Identifiable Information (PII) data such as email is stored in the US shard. The Authentication server acts like a proxy to connect the user to the EU servers.
  4. Geo-location DNS – Customers can use any GDC to authenticate and DNS will update in case the GDC fails.
  5. Comprehensive Monitoring & Alerting: To ensure analytics for diagnosing any mishaps, this crucial service has the complete step by step monitoring and alerting.

To recap, designing for Continuous Operations is a Herculean effort and this only covers the Authentication Server part of the GDC. If you want to monitor the availability of all the services in your own region, you can see them like this. To learn more about ExtremeCloud IQ, keep monitoring this space!

The Extreme Cloud IQ network health dashboard

This blog was originally authored by Jeevan Patil, Senior Director, Product Management

Get the latest stories sent straight to your inbox!

Related Enterprise Stories