Home Datacenter Monitoring challenges moving critical systems in virtual infrastructure

Monitoring challenges moving critical systems in virtual infrastructure

by Philip Sellers

Like many shops, we have finally attained buy-in from all our stakeholders for virtualization.  As a result, we’ve pushed more and more into our infrastructure.  And while VMware is the most datacenter ready solution for virtualization, it is not without its shortcomings — monitoring and visibility into the infrastructure being one of the biggest.

While we were first deploying VI3 and performing our consolidation, the primary focus was on the non-critical systems and moving them into the virtual infrastructure to get the best utilization of hardware.  Since completion of this phase, the next focus became moving some of our mission critical systems to VMware in order to establish disaster recovery for our non-clustered systems.  Disaster recovery through VMware is accomplished by 1) relocating the boot and data onto SAN storage which is replicated to our secondary data center and 2) by the ability to utilized VMware HA in the event of hardware failure to establish resiliency we do not have on a single-server, hardware deployment.

As we have expanded VMware’s role in our data center, new challenges have emerged.  First, when a network issue is occurring, we don’t have our traditional monitoring tools (like PRTG) in a position where they are able to alert for large changes in traffic.  In our physical environment, HP agents are run and PRTG is able to query against these systems with SNMP to retrieve information about traffic.  In the virtual environment, we don’t run these agents (because they are largely non-applicable since these are virtual boxes).  Our preferred way to monitor is through something that can look directly into the virtualization layer and retrieve information.

When we began our virtualization initiative, we saw some monitoring solutions at regional VMware user group meetings, but at that point – we didn’t see a lot of value in the products.  As we progressed through the beginning of phase two, we knew that monitoring and visibility into the virtualization beyond what vCenter allowed was a big thing for us.  We demoed several products from companies I had seen or talked with at VMworld and we finally decided upon two candidates to demo within our environment.

The first product we tested was Hyper9 — and we were really impressed with this product.  This product began as a search tool – basically a Google for virtual infrastructure.  It was really cool and the search interface was really good for answering questions as asked or posed by management.  It also was quick to drill down and get to information that we were searching for.  One of its most powerful features is it’s vmDNA – where it tracks changes in the VM and can compare two point-in-time views to see what has changed on the server.  This, we felt, was really powerful.  The place where we saw a deficiency in the product was related to alerts and alarming.  Hyper9 had separated this into a separate product which is good from a flexibility stand-point, but bad from a central management and alarming standpoint.  This piece of the puzzle just didn’t seem to be enterprise ready.  The second deficiency was in relation to network monitoring – something that was on our must-have list of requirements.   Hyper9 came pre-populated with lots of common pain point searches that allowed us to quickly look for problem areas — large snapshot files, datastores that are low on disk space, and tools that are out of date or not running.  The problem we saw with this was that the searches were passive (had to be executed at a specific time — scheduled — in the secondary application) and did not actively alert when a condition was met.

The second product we had decided on demoing was Vizioncore’s vFoglight application.  vFoglight is based on Quest Software’s Foglight engine (Quest owns Vizioncore), which is an enterprise class monitoring and visualization engine.  Vizioncore wrote the VMware cartridge which handles the monitoring for VMware infrastructure by pulling information directly from vCenter — with no impact to the ESX hosts.   It is strong in reporting and alarming.  The configuration, terminology and organization of the application is different from anything else we looked at, so learning the layout and how it operates was a large learning curve.  Fortunately, Vizioncore offers good training for free on their website.

vFoglight was exponentially more complex than the Hyper9 product, but it offered the additional features we were looking for in the networking areas.  It also had the integrated alarming and notifications within a centralized console – not a in a separate application.   Most of vFoglight’s intelligence was built around the alert and alarm conditions which triggered alarms in an alarms panel which could be tracked and researched (how often has this happened before), where notes could be stored and where individual members of our team could acknowledge an alarm or add notes on how the issue was resolved.   In addition, one of the internal developer had created a “vBundle” of common, useful reports which is available free to add to the system.

Because this software was built on the Foglight engine, it was not only possible but may be very advantageous for us to add additional cartridges to monitor specific databases or application or physical hosts.  The fewer monitoring systems we have to maintain or watch, the better, in my opinion.  Time will tell.

Ultimately, we made the decision to go with vFoglight.  The downside to the decision was its price.  We paid well for all this functionality.  But, so far, it has worked extremely well.  We finished our production installation last week and have about a week’s worth of data.  The Foglight system works much better over a long time as statistics and predictions become more accurate because of historical data warehoused.  At this point, we are still learning the application and how to do things in it and letting it collect data.

The service engineer for our account was extremely helpful in setting up the application.  They have a sizer application which determines the number of objects in your infrastructure and then provides recommendation on how to best hardware configuration, database options and estimated data growth over 1 year and 3 years for planning.  I hope to have more information on vFoglight as I continue to learn about it.

You may also like