3PAR Dedupe + Compression Deep Dive

Getting Trim

HPE are a bit late on this release, it’s normally January that people want to start losing weight.  Well not 3PAR it’s gut busting, data crunching release comes in the form of the 3PAR OS 3.3.1 which combines existing data reduction technologies with new ones including compression. To see what else is new in the 3PAR OS 3.3.1 release check out this post.

The new data reduction stack in 3PAR OS 3.3.1 and the order they are applied is shown in the following graphic. Data Packing and compression are new technologies.  There are no changes to Zero Detect but dedupe receives a code update. Zero detect is one of the original thin technologies that removes zeros from writes, most of you are probably already familiar with this so lets focus on the new and updated technologies, stepping through each in turn.

3PAR Adaptive Data Reduction

Dedupe

Dedupe continues to operate like before, analysing incoming writes in 16K pages assigning a hash for each and checking if this is unique to the system, this is all done inline before the data is written to disk. What does change is how deduped data is stored to disk, it is written to a shared area within the CPG versus to a private space in the volume previously. How data is now stored on disk is shown graphically below.

3PAR dedupe

This amendment will effectively determine where unique and deduped data is stored.  Given that the average dedupe level seen on a 3PAR is 2:1 it would be logical that half the blocks would be deduped and half unique, so why would we care where the blocks are stored given that the number of each is equal? Further analysis has shown that not half of data is deduped, but that 10% of data is deduped several times over giving an overall dedupe ratio of 2:1. In summary a significantly greater proportion of data will be unique versus deduped.

There are two elements to how deduped data is stored in a system, a private space that exists inside every volume and a shared space that exists one per CPG. When a write comes into the system and has a unique hash (i.e. it has never been seen before) it is placed in the private area of the volume. When a write comes into the system and has a known hash (i.e. it is a duplicate) it is placed in the shared space. We know that approximately only 10% of data will be deduped, and therefore the net results of this design is that the majority of data will sit in the private area. This is advantageous as when a block is deleted that exists in the shared area several checks need to be made to check if this block is still required.  The process to perform these sanity checks before deleting the block is too resource intensive to perform real time, so must be performed post process. Post processing is disadvantageous as it adds overhead to the system. As in the new design less data sits in the shared area there is less need for this post processing which makes dedupe more scalable.

The new version of dedupe is intended to allow dedupe and compression to function together in the same VV (Virtual Volume). Up until now dedupe was enabled with the use of the TDVV (Thin Deduplicated Virtual Volume), this is now no longer required as dedupe is just an attribute of a volume that can be enabled. Once enabled dedupe is performed across all VV’s in a CPG with the dedupe attribute turned on and will be at a 16K page level. Consider this when planning your system layout to ensure that like volumes are grouped in the same CPG to maximise dedupe.

The good news is there is no additional licence cost for using dedupe. This new version of dedupe is still driven by the ASIC and will be available to all models with the GEN 4/5 ASIC so this is the 7000, 8000, 10,000 and 20,000 series.  Like the existing version of dedupe the Virtual Volumes must be on flash drives, so this also negates the use of AO with dedupe.

If you are already using the existing form of dedupe you will need to create a new CPG and then do a Dynamic Optimization operation to migrate volumes to the new CPG.  The necessary changes to enable the updated version of dedupe will be made automatically when the volume migrates.

Compression

Again let’s start with the good news, compression will be available at no additional licencing cost, however it is only available on the GEN 5 systems i.e. the 8000 and 20,000 series.  As with dedupe this option is only available on data stored on flash, so again cannot be coupled with AO.  Compression takes place at a per VV level and aims to remove redundancy. HPE are estimating an average saving with compression of 2:1. Dedupe and compression can be combined together and HPE are expecting an average of 4:1 data reduction when used together.

At release when using compression there will be no support for Asynchronous Remote Streaming. Remote Copy is supported but data in transmission will not be deduped or compressed. The data will be deduped and compressed again at ingestion to the target if these options are enabled.

Data Packing

3PAR writes to SSD’s in 16K pages, however after compression the 16K pages will end up with a sub 16K size, not optimised for writing to SSD.  The inefficiency comes with pages being written across block boundaries, to tackle this HPE have developed the Data Packing technology.  Data packing takes a number of these odd sized pages and combines them into a 16K page, reducing the need for a post process garbage collection on the SSD’s.  The odd sized pages grouped together will usually all come from the same volume, since this improves the chances these blocks will all need to be changed at the same time in the future. The technology sounds very similar to when I pack a suitcase and stuff is sticking out everywhere, then my wife comes along and makes it all fit in nicely.  She has denied any part in the development of this new technology.

suit-case

Getting running with dedupe and compression

First of all you need to get yourself upgraded to 3PAR OS 3.3.1 and remember you need a minimum of a GEN 4 ASIC for dedupe and GEN 5 for compression. Once running 3PAR OS 3.3.1 dedupe and compression can then be enabled via the SSMC on a per volume basis.  Given this per volume granularity, the attributes can be set appropriately for the data contained in each, for example a video format that was already deduped and compressed would just be thin provisioned but VDI volumes could be deduped and compressed. The option to set dedupe and compression in SSMC is shown in the screenshot below, they can be set together or independently.

2-turn-on

Remember if you are using dedupe already and want to take advantage of the enhanced performance of the latest version of dedupe you will need to do a Dynamic Optimization operation to move existing volumes to a newly created CPG.

 

You can also use the SSMC to estimate the savings you will get from dedupe and compression.

2-estimate-space

Data that is already encrypted or compressed at the application level will not be suitable for compression. However if encryption is required the recommendation would be to use drive level encryption so that compression can still take place.

 

To make sure that you don’t miss any more news or tips, you can get e-mail updates, or follow via Face Book, LinkedIN and Twitter.

Further reading

Adaptive data reduction brochure

What is HPE 3PAR Adaptive Data Reduction

 

 

3PAR Mega News Bundle – Including Compression

3PAR Announcements

Today HPE announced a significant number of enhancements to the 3PAR product plus some changes in how the product is owned. The feature enhancements are enabled with the upgrade to 3PAR OS 3.3.1 which has been announced today. This really is quite some list, so hold onto your hat’s and let’s start with some data reduction enhancements which in combination HPE is calling Adaptive Data Reduction

Compression

3PAR has had dedupe for some time but has not had compression available until the release of 3PAR OS 3.3.1.  Compression is going be available on flash disks only and will be available to the GEN 5 systems i.e. the 8000 and 20,000. The aim of dedupe and now compression is to reduce the data size as much as possible on flash to make flash more affordable. Compression operations are performed inline i.e. before the data hits the SSD’s and are enabled at a per volume level. The best news is that compression is going to be licence free! HPE are expecting a 2:1 data reduction from using compression. I am going to do a deep dive on the compression and other data reduction technologies in the next few days, so watch out for that

Data Packing

3PAR writes to SSD’s in 16K pages, with compression of course you end up with odd sized pages.  These sub 16K sized pages are not optimal for writing to SSD and would incur a post process garbage collection to neaten things up and optimise their layout.  To eliminate this need for the additional garbage collection over head the odd sized pages are stitched together, just like your Nan makes a jumper, to form neat 16K pages. This process of taking odd sized compressed pages and packing them together is shown below.

1-data-packing

Dedupe

Dedupe gets a code update in 3PAR OS 3.3.1. This new update changes the way dedupe operates writing to a private area first, as opposed to a shared area previously.  This change in operation aims to make the process more efficient, reducing garbage collection and ultimately making the system more scalable. TDVV (Thin Deduplicated Virtual Volumes) are depreciated, now you just create a standard volume and just change the attribute for the volume to turn on dedupe. Setting the dedupe attribute can be done from the CLI or SSMC.  Dedupe and compression can be combined together and HPE are expecting a median 4:1 data reduction, when the technologies are combined. The good news is that dedupe is again licence free and can be applied to all the GEN4 (7000 + 10,000) and GEN5 (8000 + 20,000) systems.

VVOLs

I really like the idea of VVOLs and have covered them in depth previously.  Their take up has been a little slower than I would have expected and one of the main reasons I have seen cited was the lack of replication support.  vSphere 6.5 introduced VVOL replication and 3PAR now supports this functionality once upgraded to 3PAR OS 3.3.1.  Don’t forget given the granular nature of VVOLs this will mean that replication can be controlled at a per VM level.

Peer Persistence 3 Data Centres (3DC)

Peer Persistence has to be one of my favourite 3PAR features, this enables the creation of a metro cluster or stretched cluster if you like across data centres.  This high availability setup allows storage to be switched between data centres, online with no disruption to hosts.  Peer Persistence 3DC allows a 3rd leg to be added to the replication topology.  This 3rd location does not become part of the metro cluster but allows a 3rd copy of the data to be replicated asynchronously for data availability.

pp-3dc

Free Licences

3PAR must have more native software allowing data services than any other SAN now.  Deciding which ones you wanted and then having to pay for them was painful. Well now all licences within a single system are free! If you have more than one system and are using features that allow the systems to talk to each other e.g. Remote Copy, Peer Persistence, Peer Motion you will need to upgrade to the multi-site licence.  Again once you have the site licence this enables all multi system functionality.

Replication to StoreVirtual VSA (Peer Copy)

I have been wondering for a while if and when they were going to enable replication between the StoreVirtual and 3PAR, well it is now possible with the use of RMC in a feature called Peer Copy.  RMC is now included in the free licence bundle and in this circumstance acts as the data mover, replicating crash consistent snapshots between a 3PAR and Store Virtual.  Other RMC enhancements include replication to Azure, deployment in Hyper-V and SAP HANA support.

3-rmc-rep

Online Import

The online import tool which already supported the EMC Clarrion, VNX and VMAX now adds support for the EMX DMX. Other vendors already supported include Hitachi and IBM

File Persona

File persona gets a bunch of feature additions including doubling scalability, file locks for governance, cross protocol file sharing and a file system checker.

Extended SSD Warranty

The existing warranty on 3PAR SSD’s is 5 years, now this has been extended to 7 years.  This covers media wear out and electronics.  The offer is open to all SSD’s in 8000 and 20,000 systems bought after June 2015.

There are even more enhancements than this but I don’t want this blog turning into a hardback edition so I will bring more news in the coming days on other enhancements and deep dives on what has been discussed today. To make sure that you don’t miss any updates, you can get e-mail updates, or follow via Face Book, LinkedIN and Twitter.

This is a mind blowing amount of information and if your head is spinning check out these nice videos in which Calvin Zito summarises these announcements in two videos Video 1, Video 2.