What’s our secret sauce for hyperscale SaaS operations? Well, it's simple. We use the OpsRamp platform to manage our own SaaS scale operations across North America, Europe and Japan.
How does our SaaS Ops team drive platform scalability, reliability and security? OpsRamp ensures the availability and performance of our SaaS platform using dashboards, policies, service desk, knowledge base and integrations.
Figure 1 - US Region Dashboard
Dashboards display how the SaaS platform performs at any given time. Our SaaS Ops team sets up different dashboards to understand overall platform health. For example, the US region dashboard shows device count, service availability, and device performance.
Regional and service level dashboards help us manage platform uptime and performance. Dashboards monitor not just applications but also APIs, big data clusters, and even alerts and tickets! We manage availability for critical services by tracking relevant metrics for device, memory and CPU usage.
Figure 2 - Big Data Cluster Dashboard
Our big data dashboard tracks read and write request latencies, memory use, compaction pending tasks, and top CPU usage devices for the big data cluster (Cassandra, Kafka, Hadoop).
Figure 3 - Device Groups
We segregate our SaaS infrastructure with device groups using policies and filters. Each device group's performance is monitored with dashboards. When new devices come onboard, dashboards are automatically updated to reflect the latest infrastructure. Policy based device management ensures that our platform is current and managed all the time.
Figure 4 - Discovery & Deployment Policies
OpsRamp’s discovery profiles rapidly onboard devices across global locations. There are discovery profiles for servers, app nodes, subnets and metrics processors. We define schedules for discovery profiles to collect data across different devices at regular intervals.
OpsRamp uses Gateways and Agents to discover device data. Gateways collect hypervisor and network information while Agents gather operating system metrics.
Device Management Policies
Figure 5 - Device Management Policies
Device management policies define how to monitor a device. Device management policies apply monitoring templates, knowledge base articles and custom attributes to discovered devices.
Device management policies get triggered whenever a new device gets added. Monitoring templates in OpsRamp automatically manage new devices across their lifecycle.