Managed_Services.jpg

Hybrid Operations Management Platform

Scaling SaaS Operations With OpsRamp 


What’s our secret sauce for hyperscale SaaS operations? Well, it's simple. We use the OpsRamp platform to manage our own SaaS scale operations across North America, Europe and Japan. 

How does our SaaS Ops team drive platform scalability, reliability and security? OpsRamp ensures the availability and performance of our SaaS platform using dashboards, policies, service desk, knowledge base and integrations.

Custom Dashboards

Custom Dashboards

Figure 1 - US Region Dashboard

Dashboards display how the SaaS platform performs at any given time. Our SaaS Ops team sets up different dashboards to understand overall platform health. For example, the US region dashboard shows device count, service availability, and device performance. 

Regional and service level dashboards help us manage platform uptime and performance. Dashboards monitor not just applications but also APIs, big data clusters, and even alerts and tickets! We manage availability for critical services by tracking relevant metrics for device, memory and CPU usage.

Screen Shot SaaS Ops - Custom Dashboards -01.jpg

Figure 2 - Big Data Cluster Dashboard

Our big data dashboard tracks read and write request latencies, memory use, compaction pending tasks, and top CPU usage devices for the big data cluster (Cassandra, Kafka, Hadoop).

Device Groups

Screen Shot -SaaS Ops - Device Groups.jpg

Figure 3 - Device Groups

We segregate our SaaS infrastructure with device groups using policies and filters. Each device group's performance is monitored with dashboards. When new devices come onboard, dashboards are automatically updated to reflect the latest infrastructure. Policy based device management ensures that our platform is current and managed all the time.

Discovery Policies

Discovery Policies

Figure 4 - Discovery & Deployment Policies

OpsRamp’s discovery profiles rapidly onboard devices across global locations. There are discovery profiles for servers, app nodes, subnets and metrics processors. We define schedules for discovery profiles to collect data across different devices at regular intervals.

OpsRamp uses Gateways and Agents to discover device data. Gateways collect hypervisor and network information while Agents gather operating system metrics.

Device Management Policies

Device Management Policies

Figure 5 - Device Management Policies

Device management policies define how to monitor a device. Device management policies apply monitoring templates, knowledge base articles and custom attributes to discovered devices.

Device management policies get triggered whenever a new device gets added. Monitoring templates in OpsRamp automatically manage new devices across their lifecycle.

 

OpsRamp: IT Operations Management For A Hybrid World

 

Alert Management

Alert Management

Figure 6 - Alert Browser

The alerts tab groups related alerts and creates tickets or change requests to manage issues through resolution. We create alerts manually, where the team processes an alert and creates a ticket. We also create alerts using auto incident policies without human intervention. Auto incident policies create tickets for critical alerts and integrate tickets in our Slack workflow. 

Service Desk

Service Desk

Figure 7 - Service Desk

Service desk helps us manage service requests for the platform. We track conversations, activity logs, recordings, notes, status and alerts for each ticket. The service desk ensures that all our SaaS operations is auditable and traceable. 

Knowledge Base

Knowledge Base

Figure 8 -  Knowledge Base

We maintain an active knowledge base for platform maintenance. We segregate knowledge base articles for configuration, data management, deployment, standard operating procedures and troubleshooting. 

We assign articles to tickets or devices to help our teams respond and resolve issues faster. We are also able to reduce staff training times with searchable knowledge base articles.

Integrations

Slack

Slack Integration

Figure 9 - Slack Incident Management Channels

Slack is the internal collaboration tool for the Ops team. Our Slack integration lets us view, address and resolve tickets within designated Slack channels. Slack helps us stay on top of application and infrastructure performance for the platform. 

Jenkins

We use Jenkins for all our build deployments. Our Jenkins integration allows us to build and test codebase during release cycles. When we need to deploy a build, we configure Jenkins jobs to trigger automatic alerts in OpsRamp.

Want To Work With Us ? No Need To Wait, Get Quote Now