What is AO?
Adaptive Optimization (AO) is 3PAR’s disk tiering technology, which automatically moves the hottest most frequently accessed blocks to the fastest disks and at the same time moves the infrequently accessed cold blocks of data to slower disks. If you are from an EMC background AO is comparable to FAST VP, if you are from a NetApp background welcome to the brave new world of tiering.
An AO config can consist of between two or three layers of disk. You can think of this as being a gold, silver and bronze level in terms of performance. Your SSD’s forming your high performing gold layer, 10 or 15K SAS disks operating as your silver layer and NL disks operating in bronze. Below the different tiers of disk are shown diagrammatically along with some example of blocks of data tiering both up and down. AO is a licenced feature that is enabled through the creatively named Adaptive Optimization software option and is available across all the hybrid models. It is not available on the all flash models for obvious reasons.
How AO works
So now we understand AO is a tiering technology that intelligently places our data based on how hot (frequently accessed) or cold it is. So you may expect to see all your VV’s (Virtual Volumes) that are demanding the most IOPs end up in your tier 0 SSD’s, this however is not the case as AO is a sub-lun tiering system, i.e. it does not need to move entire volumes just the hot parts.
Under the hood 3PAR allocates space in 128MB blocks or in HP speak regions, and it is at the region level that AO both analyses and moves data. This granular level of analysis and movement ensures that the capacity of expensive SSD disks is utilised to its fullest, by moving only the very hottest regions as opposed to entire VVs. To give an example if you have a 100GB VV and only 2 regions are getting hit hard only 256MB of data need to be migrated to SSD, a massive saving in space compared to moving the entire volume.
AO does its voodoo by analysing the number of hits a region gets within a time period to measure its regional IO density. Regional IO density is measured in terms of IOPs per GB per minute. The regional IO density stats are then used by AO to select those regions that have an above average regional IO density and marking them for movement to a higher tier. How aggressive AO is in moving data is dependent on the performance mode selected.
Once a region has been marked as a candidate for movement the systems then performs a number of sanity checks, to verify if the region move should go ahead:
- Average service time – of a tier is checked to ensure that data isn’t migrated to a faster disk technology, which is working so hard its service times are in fact worse than where the data is being migrated from
- Space – If no space is available in a tier or the CPG space warning or limit has been reached no regions will be moved to that tier. However if a disk exceeds a CPG warning or limit AO will try to remediate this by moving regions out of that tier, first by moving busy regions to faster tiers and slower regions to slower tiers
AO building blocks
To understand AO design we next need to consider what the building blocks of an AO setup are. AO is controlled through an AO config. In an AO config you define your tiers of disk through the selection of CPG’s and then choose an operational mode optimised for cost, performance or a balance between the two. Once you have setup your AO config you then need to schedule a task to run AO. When scheduling you will need to choose the analysis period during which regional IO density will be analysed and the times during which AO will actually perform the data moves. An example of AO config is shown in the table below.
|AO Config Name||AO_Balanced|
|Tiers||0=SSD_R5_CPG, 1 SAS_R5_CPG, 2NL_R6_CPG|
|Analysis Period||09:00-17:00 Mon-Fri|
|Max Run Time||12 Hours|
AO design considerations
Now that that we understand the principles and the building blocks of AO, let’s look at some of the design considerations.
Number of tiers
You can have between two and three tiers, 0 being the fastest and 2 being the slowest. It is not recommended to have a 2 tier system that only contains NL and SSD disks as the performance differential would be too great. Some example tiers would be:
- Two tier 1 = 10K SAS, 2=NL
- Two tier 0 = SSDs, 1=10K SAS
- Three tier = 0 = SSDs, 1=10K SAS, 2=NL
Even if you do start with a two tier system without SSDs, leave tier 0 empty in your config so that you can add SSDs at a later date
This is actually the point that inspired me to write the post and I believe is the most important design principle, keep it simple. At least start out with a single AO policy containing ALL your tiers of disk and allow ALL data to move freely. If for example you choose what you believe are your busiest VV’s and lock them to an SSD CPG you may find only a small proportion of data is hot, and be robbing yourself of space. Conversely if you choose to lock a VV into a lower tier of CPG on NL disks it may become busy and have nowhere to move up to, hammering the disks it’s placed on and affecting all the volumes hosted from there.
A CPG can only exist within one AO config. So if you do go down the route of having multiple AO policies you must have a separate CPG to represent each tier of disks in each different AO policy. Additional CPG’s create additional management overhead in terms of reporting etc. For a reminder on what CPGs are about go here.
You need to be aware that AO only occurs across a node pair. So on a 7400 with 4 nodes, AO would occur across the cages attached to nodes 0, 1 and across those attached to nodes 2, 3. The key design principle here is to keep drives and drive cages balanced across nodes so performance in turn remains balanced
Business requirements v system requirements
Just because a system is perceived as having a higher importance by the business does not mean that it has a higher regional access density. Remember space in your higher tiers is at a premium, by second guessing and placing the wrong data in your tier 0 you are short changing yourself. I made this change for a customer recently moving from an AO config that only allowed business critical apps access to the SSDs, to allowing all data to move freely across all tiers. The net result was utilisation of each SSD increased from 200 to 1000+ IOPs each, thus reducing the pressure on the lower tiers of disks.
Provision from your middle tier
When you create a new VV do so from your middle tier if you are using a 3 tier config. This way any new writes aren’t hitting slow disk, but also aren’t taking up valuable space in your top tier. By allowing AO complete control of your top tier and not provisioning any volumes from it you can allow AO to take capacity utilisation up to 100%.
If you are running a two tier system that contains NL disks, provision new VVs from tier 1 so that new writes are not hitting slow NL disks.
You can’t beat the machine
This point effectively summarises all those above. The traditional days of storage and having to calculate the number disks and RAID type to allocate to a LUN are gone. Just provision your VVs from the CPG representing the central tier, allow the volumes access to all the disks and let the system decide which RAID and disk type is appropriate at a sub-lun level. Don’t try and second guess where data would be best placed, the machine will outsmart you.
Timing is a key consideration. Start again by keeping things simple, monitor during your core business hours and be careful to not include things like backups in your monitoring period which could throw the results off. If you find you have certain servers with very specific access patterns adjust the timing to monitor during these periods. Schedule AO to run out of hours if possible as it will put additional overhead on the system. You can set a max runtime on AO to make sure that it is not running during business hours. At first make the max run period as long as you can outside of business hours to give AO every opportunity to run. If you do run multiple AO policies set them to start at the same time, this will minimise the chance of you running into space problems
The three AO modes are quite self-explanatory, at either extreme performance moves data up to higher tiers more aggressively, cost – moves data to cheaper large capacity tiers more aggressively, balanced is a half-way house between the two. Which one you choose will depend on if your aim leans towards achieving optimum cost or performance. I would suggest selecting balanced to start with then monitoring and adjusting accordingly
All CPG’s in an AO config should be set to use the same availability level, for example cage. Having a mix of availability levels will mean that data is protected at the lowest availability level in the AO config. For example if you FC CPG has an availability level of cage and your NL CPG has magazine the net result will be an availability of magazine level.
Exceptions to simplicity
There will always be exceptions to simplicity, but at least give simplicity a chance and then tune from there. If the access pattern of your data is truly random each day Adaptive Flash Cache may help to soak up some of the hits and can be used conjunction with AO.
If you want to take advantage of deduplication with thinly deduplicated virtual volumes you will need to place the volumes directly on the SSDs and they cannot be on a CPG which is part of an AO config.
As discussed in the timing section if you have applications with very specific access patterns at different times of the day you may need to create multiple policies to correctly analyse this.
Moving forward into the brave new world of VVOLs AO will be supported.
As ever feel free to share your perspective and experiences on AO design in the comments sections and on Twitter.