3PAR OS 3.3.1 Performance Enhancements

Follow my blog with Bloglovin

There are a bunch of performance and under the hood enhancements in 3PAR OS 3.3.1 which I wanted to take a deeper look at today, I have covered an overview of all the new features in a previous post.  Some of them are brand new features others are enhancements of old ones. Let’s look at each one in turn

Adaptive flash cache (AFC) – allows SSD’s to be used as an extension to the controllers onboard DRAM memory to accelerate reads.  Analysis of the 3PAR install base has demonstrated that there is regularly capacity free in the AFC and to maximise its benefit the cache should be as full as possible. In 3PAR OS 3.3.1 AFC remains for reads only but extends the type of requests allowed further to include: large sequential I/O >64K and data read from snapshots.  Having more data in cache will of course increase the chances of a hit, these new types of request take a back seat to those previously defined in AFC i.e. if AFC is full with small reads it will not flush these out to allow a large sequential read.

Express Writes – Express writes aims to deliver lower latency by delivering lower CPU interrupts per IO.  This is achieved by sending the data along with the command rather than waiting for the target to request it. Previously this was only available with FC protocol, this is now extended to iSCSI for the 8000 and 20,000 systems. This will be enabled automatically at upgrade and can result in up to 40% improvements in latency for iSCSI writes

Multi Queue – Is another option that will automatically be turned on and self-optimised in 3Par OS 3.3.1. Previously each SAS or FC port was locked to a processor core, this worked well if all ports were utilised but left cores idle if all ports were not fully utilised.  Multi-queue cores can be shared between ports allowing for greater utilisation

Persistent Checksum – Ensures the integrity of data by performing a checksum on the data from the HBA to the disk.  The current implementation of Persistent Checksum is proprietary, hence the requirements for specific HBAs.  The new implementation switches to using standard T10 diff which is reliant on the host OS not the HBA’s and therefore widens the support.

To make sure that you don’t miss any updates, you can get e-mail updates, or follow via Face Book, LinkedIN and Twitter.

OTHER 3PAR OS 3.3.1 POSTS

3PAR Dedupe + Compression Deep Dive

3PAR Mega News Bundle – Including Compression

 

3PAR Dedupe + Compression Deep Dive

Getting Trim

HPE are a bit late on this release, it’s normally January that people want to start losing weight.  Well not 3PAR it’s gut busting, data crunching release comes in the form of the 3PAR OS 3.3.1 which combines existing data reduction technologies with new ones including compression. To see what else is new in the 3PAR OS 3.3.1 release check out this post.

The new data reduction stack in 3PAR OS 3.3.1 and the order they are applied is shown in the following graphic. Data Packing and compression are new technologies.  There are no changes to Zero Detect but dedupe receives a code update. Zero detect is one of the original thin technologies that removes zeros from writes, most of you are probably already familiar with this so lets focus on the new and updated technologies, stepping through each in turn.

3PAR Adaptive Data Reduction

Dedupe

Dedupe continues to operate like before, analysing incoming writes in 16K pages assigning a hash for each and checking if this is unique to the system, this is all done inline before the data is written to disk. What does change is how deduped data is stored to disk, it is written to a shared area within the CPG versus to a private space in the volume previously. How data is now stored on disk is shown graphically below.

3PAR dedupe

This amendment will effectively determine where unique and deduped data is stored.  Given that the average dedupe level seen on a 3PAR is 2:1 it would be logical that half the blocks would be deduped and half unique, so why would we care where the blocks are stored given that the number of each is equal? Further analysis has shown that not half of data is deduped, but that 10% of data is deduped several times over giving an overall dedupe ratio of 2:1. In summary a significantly greater proportion of data will be unique versus deduped.

There are two elements to how deduped data is stored in a system, a private space that exists inside every volume and a shared space that exists one per CPG. When a write comes into the system and has a unique hash (i.e. it has never been seen before) it is placed in the private area of the volume. When a write comes into the system and has a known hash (i.e. it is a duplicate) it is placed in the shared space. We know that approximately only 10% of data will be deduped, and therefore the net results of this design is that the majority of data will sit in the private area. This is advantageous as when a block is deleted that exists in the shared area several checks need to be made to check if this block is still required.  The process to perform these sanity checks before deleting the block is too resource intensive to perform real time, so must be performed post process. Post processing is disadvantageous as it adds overhead to the system. As in the new design less data sits in the shared area there is less need for this post processing which makes dedupe more scalable.

The new version of dedupe is intended to allow dedupe and compression to function together in the same VV (Virtual Volume). Up until now dedupe was enabled with the use of the TDVV (Thin Deduplicated Virtual Volume), this is now no longer required as dedupe is just an attribute of a volume that can be enabled. Once enabled dedupe is performed across all VV’s in a CPG with the dedupe attribute turned on and will be at a 16K page level. Consider this when planning your system layout to ensure that like volumes are grouped in the same CPG to maximise dedupe.

The good news is there is no additional licence cost for using dedupe. This new version of dedupe is still driven by the ASIC and will be available to all models with the GEN 4/5 ASIC so this is the 7000, 8000, 10,000 and 20,000 series.  Like the existing version of dedupe the Virtual Volumes must be on flash drives, so this also negates the use of AO with dedupe.

If you are already using the existing form of dedupe you will need to create a new CPG and then do a Dynamic Optimization operation to migrate volumes to the new CPG.  The necessary changes to enable the updated version of dedupe will be made automatically when the volume migrates.

Compression

Again let’s start with the good news, compression will be available at no additional licencing cost, however it is only available on the GEN 5 systems i.e. the 8000 and 20,000 series.  As with dedupe this option is only available on data stored on flash, so again cannot be coupled with AO.  Compression takes place at a per VV level and aims to remove redundancy. HPE are estimating an average saving with compression of 2:1. Dedupe and compression can be combined together and HPE are expecting an average of 4:1 data reduction when used together.

At release when using compression there will be no support for Asynchronous Remote Streaming. Remote Copy is supported but data in transmission will not be deduped or compressed. The data will be deduped and compressed again at ingestion to the target if these options are enabled.

Data Packing

3PAR writes to SSD’s in 16K pages, however after compression the 16K pages will end up with a sub 16K size, not optimised for writing to SSD.  The inefficiency comes with pages being written across block boundaries, to tackle this HPE have developed the Data Packing technology.  Data packing takes a number of these odd sized pages and combines them into a 16K page, reducing the need for a post process garbage collection on the SSD’s.  The odd sized pages grouped together will usually all come from the same volume, since this improves the chances these blocks will all need to be changed at the same time in the future. The technology sounds very similar to when I pack a suitcase and stuff is sticking out everywhere, then my wife comes along and makes it all fit in nicely.  She has denied any part in the development of this new technology.

suit-case

Getting running with dedupe and compression

First of all you need to get yourself upgraded to 3PAR OS 3.3.1 and remember you need a minimum of a GEN 4 ASIC for dedupe and GEN 5 for compression. Once running 3PAR OS 3.3.1 dedupe and compression can then be enabled via the SSMC on a per volume basis.  Given this per volume granularity, the attributes can be set appropriately for the data contained in each, for example a video format that was already deduped and compressed would just be thin provisioned but VDI volumes could be deduped and compressed. The option to set dedupe and compression in SSMC is shown in the screenshot below, they can be set together or independently.

2-turn-on

Remember if you are using dedupe already and want to take advantage of the enhanced performance of the latest version of dedupe you will need to do a Dynamic Optimization operation to move existing volumes to a newly created CPG.

 

You can also use the SSMC to estimate the savings you will get from dedupe and compression.

2-estimate-space

Data that is already encrypted or compressed at the application level will not be suitable for compression. However if encryption is required the recommendation would be to use drive level encryption so that compression can still take place.

 

To make sure that you don’t miss any more news or tips, you can get e-mail updates, or follow via Face Book, LinkedIN and Twitter.

Further reading

Adaptive data reduction brochure

What is HPE 3PAR Adaptive Data Reduction

 

 

Tech Preview – 3PAR 3D Cache

With Intel officially announcing its 3D Xpoint based Optane product line this week I thought it was worth another look at this post I wrote previously demonstrating Optane working with 3PAR. This post talks through how an Intel Optane drive can be used to significantly reduce latency in the system and features a video talk through with one of the product managers:

The adoption of flash technology has revolutionised storage over the past few years.  Current implementations of flash use NAND flash, Intel have been developing a new type of persistent storage that will offer up to 1000 times the performance of NAND.  At Discover, HPE previewed a 3PAR utilising Intel 3D-Xpoint point as an additional caching layer, you can get an overview of 3D-XPoint in this post I wrote previously.

 

In the video below I discuss the 3D-XPoint preview with Eduardo, the media product manager for 3PAR . We discuss how it works and the benefits to the system.  Don’t forget to subscribe to the YouTube channel so you don’t miss any future updates

 

I have summarised some of the main points around 3D Cache in the points below.

  • HPE showed a tech preview of a new storage technology that could significantly improve performance in terms of latency and potentially reduce costs. The tech is called 3PAR 3D Cache
  • Current flash drives in 3PAR and other flash arrays are NAND
  • NAND has produced significant performance improvements v spinning disk but is still significantly slower than DRAM. DRAM is memory that is found in 3PAR controllers and compute, it is very fast but also expensive so generally used in small amounts
  • DRAM acts as the cache in 3PAR buffering IO, from having to be written or read direct from disk
  • The more DRAM (cache) the better as you are saving more IO from disk.
  • Intel have been developing a persistent memory technology that sits in-between NAND and DRAM in terms of cost and performance i.e. its much faster than NAND but not as fast or as expensive as DRAM
  • Intel’s new form of storage is based on a technology they have named 3D XPoint
  • At Discover London HPE, gave a tech preview of 3PAR using 3D XPoint as a caching layer to accelerate system performance
  • The 3D XPoint card was NVMe and plugged into the PCI slot of a 3PAR
  • The card was used to extend the on-board DRAM cache, in a very similar way to Adaptive Flash Cache
  • The cache extension is for reads only
  • When a read goes cold within DRAM it would be tiered down to the 3D Cache. This extends the likelihood of a read hit, resulting in a performance boost from more reads being returned from cache. Plus by not requesting IO from the SSD’s they are quieter for other operations
  • This is a tech preview so is not yet available for order.

To stay in touch with more 3PAR news and tips connect with me on LinkedIn and Twitter.