3PAR OS 3.3.1 Performance Enhancements

Follow my blog with Bloglovin

There are a bunch of performance and under the hood enhancements in 3PAR OS 3.3.1 which I wanted to take a deeper look at today, I have covered an overview of all the new features in a previous post.  Some of them are brand new features others are enhancements of old ones. Let’s look at each one in turn

Adaptive flash cache (AFC) – allows SSD’s to be used as an extension to the controllers onboard DRAM memory to accelerate reads.  Analysis of the 3PAR install base has demonstrated that there is regularly capacity free in the AFC and to maximise its benefit the cache should be as full as possible. In 3PAR OS 3.3.1 AFC remains for reads only but extends the type of requests allowed further to include: large sequential I/O >64K and data read from snapshots.  Having more data in cache will of course increase the chances of a hit, these new types of request take a back seat to those previously defined in AFC i.e. if AFC is full with small reads it will not flush these out to allow a large sequential read.

Express Writes – Express writes aims to deliver lower latency by delivering lower CPU interrupts per IO.  This is achieved by sending the data along with the command rather than waiting for the target to request it. Previously this was only available with FC protocol, this is now extended to iSCSI for the 8000 and 20,000 systems. This will be enabled automatically at upgrade and can result in up to 40% improvements in latency for iSCSI writes

Multi Queue – Is another option that will automatically be turned on and self-optimised in 3Par OS 3.3.1. Previously each SAS or FC port was locked to a processor core, this worked well if all ports were utilised but left cores idle if all ports were not fully utilised.  Multi-queue cores can be shared between ports allowing for greater utilisation

Persistent Checksum – Ensures the integrity of data by performing a checksum on the data from the HBA to the disk.  The current implementation of Persistent Checksum is proprietary, hence the requirements for specific HBAs.  The new implementation switches to using standard T10 diff which is reliant on the host OS not the HBA’s and therefore widens the support.

To make sure that you don’t miss any updates, you can get e-mail updates, or follow via Face Book, LinkedIN and Twitter.

OTHER 3PAR OS 3.3.1 POSTS

3PAR Dedupe + Compression Deep Dive

3PAR Mega News Bundle – Including Compression

 

3Par Gets Flashier

For a while standard 3Par owner’s could be forgiven for looking on with envy to those lucky enough to own the all flash 7450 model, with its all flash massive IOPs crunching potential and new features such as dedupe. Well feel jealous no more 3Par fans, last week HP shared out the flashy goodness to the hybrid arrays by announcing a bunch of features that will well and truly pimp out your 3Par, to an extent which would even impress Xzibit. Check out this orange Roller below, I bet Rolls Royce didn’t see that coming when it rolled off the production line.

 

xzibit

3 Par OS 3.2.1

The announcements from HP centred around using flash to deliver an extra whack of performance for your 3Par. The key new features were in summary Adaptive Flash Cache, Dedupe and Express Writes. All these new features are enabled by upgrading to 3Par OS 3.2.1, and the even better news is that all these new features are freebies i.e. they are included as part of the standard 3Par OS. Let’s break each one of these elements down and see how it is going to deliver additional performance.

Adaptive Flash Cache

Adaptive flash cache is about utilising SSD’s to expand the size of the cache. A bigger cache allows more data to be stored in cache and hence a greater likelihood of being able to retrive data from cache. This is great news for anyone’s environment that has high random reads, judging by the fact my most popular post of all time is still Adaptive Flash Cache Deep Dive lots of other people agree. Do check out the deep dive it’s got lots of good info in it. Also check out this Adaptive Flash Cache video Calvin Zito has put together where he shows a practical demo of the kind of performance improvement that is possible.

Express Writes

Next up Express Writes, again like Adaptive Flash Cache this new feature is aimed at improving latency but for writes by optimising the FC protocol. Express writes aims to deliver lower latency by delivering lower CPU interrupts per IO, Performance improvements can be up to 10%.

Dedupe

Dedupe was announced earlier this year for the 7450, and last week’s announcements was that it will now be available to all 3Par systems with the Gen 4 asic i.e. the 7000 and 10,000. The way the dedupe works is exactly the same as on the 7450, inline and by the assigning a hash to each unique incoming write and then comparing the signature of further incoming writes to ensure they are unique. The dedupe process is demonstrated in the diagram below taken from the HP whitepaper HP 3PAR StoreServ Storage: optimized for flash. The limitations to using dedupe are that it is only available on the SSD tier and the technology cannot be combined with AO.

Final Thoughts

This new set of features has to be one of the most compelling reasons to upgrade in some time. I’ll be upgrading ASAP and really making those SSD’s work for their living!

You can see a summary of the announcements in this ChalkTalk.

 

Adaptive Flash Cache – Deep Dive

Last week I posted a quick overview of the latest feature announced for 3Par – Adaptive Flash Cache. HP have provided me with some more detailed documents regarding HP Adaptive Flash Cache technology and so today I wanted to take a more in-depth look into it.

 

Caching 101

Let’s start at the beginning, cache is traditionally memory that acts as a buffer between IO requests and disk, temporarily storing data to reduce the service time of requests. The cache will contain a mixture of write requests that are waiting to be destaged to disk and data related to reads that have recently been requested or prefetched using a read ahead algorithm. Each read or write request that arrives at the SAN will first check if the data is in cache and if it finds it this is called a cache hit. The response time to the host will be significantly quicker than if the data had been retrieved from disk, this behaviour is shown in the diagram below.

Why Flash Cache?

Cache has traditionally been provided by DRAM memory which whilst providing the quickest response times is expensive and so is limited in size in most controllers. OK so we want a bigger cache to maximise cache hits and minimise response time, but DRAM is expensive so enter flash the saviour of every one of us!

OK not Flash Gordon, but enter flash cache technologies which allow a caching area to be extended by utilising SSD’s. SSD cache will not provide the same performance as DRAM cache but it is much cheaper and can therefore be scaled larger economically. The aim of flash cache is simple, to expand the size available to cache and to thus increase the volume of data stored increasing the chances of a cache hit reducing response time.

 

HP’s Answer

3Par had a hole in its armour given that the competition has long had flash cache available as part of their storage systems. HP has now plugged this gap with a technology it is calling HP Adaptive Flash Cache. A standard 3Par provides DRAM memory within the controllers for caching as the DRAM starts to become full data is flushed to disk so it is no longer available to cache. In a system enabled with Advance Flash Cache the DRAM will continue to be the primary cache for the system however when the DRAM becomes 90% full instead of the data being flushed to disk it will destaged to the SSD’s in the system, future host I/O will be redirected and served from flash cache. Data is selectively destaged from DRAM to Advanced Flash Cache in 16Kb pages. The pages rejected from being admitted to the Advanced Flash Cache are those that are least likely to produce a hit and include I/O larger than 64KB, sequential read/writes plus data that is already stored on SSD.

 

A write will continue to be serviced in exactly the same way as above even with an AFC (Advanced Flash Cache) implementation as it is only read data that can be read from AFC. The AFC is used by writes only to invalidate data not for retrieval.

 

We can see the process with reads is where it gets interesting. When a read request is received DRAM is still used as the primary cache and is checked first, next the AFC is checked and if the data is present on the SSD’s a cache hit is registered and the data does not need to be serviced from spinning disk.

Flushing data from the SSD’s that has been placed there by AFC occurs through a LRU (Least recently used) algorithm. When data arrives in the AFC it is admitted at normal temperature it will be promoted to hot when data is accessed frequently and marked cold as it eventually ages and will then be subject to eviction from flash cache. So to summarise what we are seeing here is essentially a tiered cache system, DRAM is used as primary cache, then destages to AFC which then further destages to spinning disk as data becomes cold. The take home benefit from all of this is a larger cache providing improved response times for random read workloads.

 

The good thing compared to other offerings like EMC’s FAST cache is that the SSD’s used by AFC don’t need to be dedicated to cache they be used in a standard manner for storing data as well.

Managing Flash Cache

If you’re thinking this all sounds great but is it any good for me the handy thing is HP have built in a simulation mode which doesn’t even require any SSD’s present in the system. Simulation mode allows you to look at your cache stats and see if AFC would be beneficial to your system. The output below is from one of new statcache commands available, the FMP (flash cache memory page)column represents AFC and a hit rate here of zero would suggest that all cache requirements are already been covered by internal DRAM cache. A good candidate would have a hit rate in AFC equal to the on-board cache or greater.

AFC utilises RAID 1 logical disks and the recommendation is that it is striped across all available SSD’s to maximise performance. Initially managing AFC will be via the CLI only with management console support to follow. What is neat is that AFC can be enabled system wide or on specific volumes. If you go down the specific volumes route you apply the settings via virtual volume sets. This essentially allows you to prioritise important volumes by including only them in virtual volume sets with access to flash cache.

To find virtual volumes that are good candidates for AFC the recommendation is to use a mix of cache statistics and vlun stats for identification. The ideal candidates will be vluns with high read requests but low cache hits demonstrating it’s a random workload.

AFC can co-exist with all current 3Par features including adaptive optimisation.

 

Sweet so how do I get it!

AFC will be available from 3Par OS 3.2.1, will be included as part of the base OS and you will need a mixture of SSD’s and spinning disk.   The 7000 series will need a minimum of 4 SSD drives and will support up 768GB per node pair. The 10,000 series will need a minimum of 8 SSD drives and will support up 2TB per node pair.

 

Final Thoughts

So today we have seen that you can never have too many Flash Gordon pictures in any post, plus HP have added to its already strong line up another key feature. Adaptive Optimisation has always performed well for me and does a good job of moving hot data that has a predictable workload to faster disks however you could be left lagging behind with random read workloads. AFC will plug this gap plus the reduced backend load will in turn also benefit write requests.

 

Follow 3ParDude on Twitter here