Adaptive Optimization Design Considerations

What is AO?

Adaptive Optimization (AO) is 3PAR’s disk tiering technology, which automatically moves the hottest most frequently accessed blocks to the fastest disks and at the same time moves the infrequently accessed cold blocks of data to slower disks. If you are from an EMC background AO is comparable to FAST VP, if you are from a NetApp background welcome to the brave new world of tiering.

 

An AO config can consist of between two or three layers of disk. You can think of this as being a gold, silver and bronze level in terms of performance. Your SSD’s forming your high performing gold layer, 10 or 15K SAS disks operating as your silver layer and NL disks operating in bronze. Below the different tiers of disk are shown diagrammatically along with some example of blocks of data tiering both up and down. AO is a licenced feature that is enabled through the creatively named Adaptive Optimization software option and is available across all the hybrid models. It is not available on the all flash models for obvious reasons.

AO1

How AO works

So now we understand AO is a tiering technology that intelligently places our data based on how hot (frequently accessed) or cold it is. So you may expect to see all your VV’s (Virtual Volumes) that are demanding the most IOPs end up in your tier 0 SSD’s, this however is not the case as AO is a sub-lun tiering system, i.e. it does not need to move entire volumes just the hot parts.

Under the hood 3PAR allocates space in 128MB blocks or in HP speak regions, and it is at the region level that AO both analyses and moves data. This granular level of analysis and movement ensures that the capacity of expensive SSD disks is utilised to its fullest, by moving only the very hottest regions as opposed to entire VVs.  To give an example if you have a 100GB VV and only 2 regions are getting hit hard only 256MB of data need to be migrated to SSD, a massive saving in space compared to moving the entire volume.

 

AO does its voodoo by analysing the number of hits a region gets within a time period to measure its regional IO density. Regional IO density is measured in terms of IOPs per GB per minute.  The regional IO density stats are then used by AO to select those regions that have an above average regional IO density and marking them for movement to a higher tier. How aggressive AO is in moving data is dependent on the performance mode selected.

Once a region has been marked as a candidate for movement the systems then performs a number of sanity checks, to verify if the region move should go ahead:

  • Average service time – of a tier is checked to ensure that data isn’t migrated to a faster disk technology, which is working so hard its service times are in fact worse than where the data is being migrated from
  • Space – If no space is available in a tier or the CPG space warning or limit has been reached no regions will be moved to that tier. However if a disk exceeds a CPG warning or limit AO will try to remediate this by moving regions out of that tier, first by moving busy regions to faster tiers and slower regions to slower tiers

AO building blocks

To understand AO design we next need to consider what the building blocks of an AO setup are. AO is controlled through an AO config. In an AO config you define your tiers of disk through the selection of CPG’s and then choose an operational mode optimised for cost, performance or a balance between the two. Once you have setup your AO config you then need to schedule a task to run AO. When scheduling you will need to choose the analysis period during which regional IO density will be analysed and the times during which AO will actually perform the data moves. An example of AO config is shown in the table below.

AO Config Name AO_Balanced
Tiers 0=SSD_R5_CPG, 1 SAS_R5_CPG, 2NL_R6_CPG
Mode Balanced
Analysis Period 09:00-17:00 Mon-Fri
Execution Time 20:00
Max Run Time 12 Hours

 

AO design considerations

Now that that we understand the principles and the building blocks of AO, let’s look at some of the design considerations.

 

Number of tiers

You can have between two and three tiers, 0 being the fastest and 2 being the slowest. It is not recommended to have a 2 tier system that only contains NL and SSD disks as the performance differential would be too great. Some example tiers would be:

  • Two tier 1 = 10K SAS, 2=NL
  • Two tier 0 = SSDs, 1=10K SAS
  • Three tier = 0 = SSDs, 1=10K SAS, 2=NL

 

Even if you do start with a two tier system without SSDs, leave tier 0 empty in your config so that you can add SSDs at a later date

 

Simplicity

This is actually the point that inspired me to write the post and I believe is the most important design principle, keep it simple. At least start out with a single AO policy containing ALL your tiers of disk and allow ALL data to move freely. If for example you choose what you believe are your busiest VV’s and lock them to an SSD CPG you may find only a small proportion of data is hot, and be robbing yourself of space. Conversely if you choose to lock a VV into a lower tier of CPG on NL disks it may become busy and have nowhere to move up to, hammering the disks it’s placed on and affecting all the volumes hosted from there.

simple large

 CPGs

A CPG can only exist within one AO config. So if you do go down the route of having multiple AO policies you must have a separate CPG to represent each tier of disks in each different AO policy. Additional CPG’s create additional management overhead in terms of reporting etc. For a reminder on what CPGs are about go here.

Node pairs

You need to be aware that AO only occurs across a node pair. So on a 7400 with 4 nodes, AO would occur across the cages attached to nodes 0, 1 and across those attached to nodes 2, 3. The key design principle here is to keep drives and drive cages balanced across nodes so performance in turn remains balanced

Business requirements v system requirements

Just because a system is perceived as having a higher importance by the business does not mean that it has a higher regional access density. Remember space in your higher tiers is at a premium, by second guessing and placing the wrong data in your tier 0 you are short changing yourself. I made this change for a customer recently moving from an AO config that only allowed business critical apps access to the SSDs, to allowing all data to move freely across all tiers. The net result was utilisation of each SSD increased from 200 to 1000+ IOPs each, thus reducing the pressure on the lower tiers of disks.

Provision from your middle tier

When you create a new VV do so from your middle tier if you are using a 3 tier config. This way any new writes aren’t hitting slow disk, but also aren’t taking up valuable space in your top tier. By allowing AO complete control of your top tier and not provisioning any volumes from it you can allow AO to take capacity utilisation up to 100%.

If you are running a two tier system that contains NL disks, provision new VVs from tier 1 so that new writes are not hitting slow NL disks.

You can’t beat the machine

This point effectively summarises all those above. The traditional days of storage and having to calculate the number disks and RAID type to allocate to a LUN are gone. Just provision your VVs from the CPG representing the central tier, allow the volumes access to all the disks and let the system decide which RAID and disk type is appropriate at a sub-lun level. Don’t try and second guess where data would be best placed, the machine will outsmart you.

term1

Timing

Timing is a key consideration. Start again by keeping things simple, monitor during your core business hours and be careful to not include things like backups in your monitoring period which could throw the results off. If you find you have certain servers with very specific access patterns adjust the timing to monitor during these periods. Schedule AO to run out of hours if possible as it will put additional overhead on the system. You can set a max runtime on AO to make sure that it is not running during business hours. At first make the max run period as long as you can outside of business hours to give AO every opportunity to run. If you do run multiple AO policies set them to start at the same time, this will minimise the chance of you running into space problems

AO Mode

The three AO modes are quite self-explanatory, at either extreme performance moves data up to higher tiers more aggressively, cost – moves data to cheaper large capacity tiers more aggressively, balanced is a half-way house between the two. Which one you choose will depend on if your aim leans towards achieving optimum cost or performance. I would suggest selecting balanced to start with then monitoring and adjusting accordingly

Availability

All CPG’s in an AO config should be set to use the same availability level, for example cage. Having a mix of availability levels will mean that data is protected at the lowest availability level in the AO config. For example if you FC CPG has an availability level of cage and your NL CPG has magazine the net result will be an availability of magazine level.

Exceptions to simplicity

There will always be exceptions to simplicity, but at least give simplicity a chance and then tune from there. If the access pattern of your data is truly random each day Adaptive Flash Cache may help to soak up some of the hits and can be used conjunction with AO.

If you want to take advantage of deduplication with thinly deduplicated virtual volumes you will need to place the volumes directly on the SSDs and they cannot be on a CPG which is part of an AO config.

As discussed in the timing section if you have applications with very specific access patterns at different times of the day you may need to create multiple policies to correctly analyse this.

Moving forward into the brave new world of VVOLs AO will be supported.

 

As ever feel free to share your perspective and experiences on AO design in the comments sections and on Twitter.

 

Further Reading

Patrick Terlisten blog post – Some thoughts about HP 3PAR Adaptive Optimization

HP 3PAR StoreServ Storage best practices guide

HP Technical white paper Adaptive – Optimization for HP 3PAR StoreServ Storage

 

Published by

21 thoughts on “Adaptive Optimization Design Considerations

  1. Good read!

    My concern with AO is that it’s reactive and not real time. It would be good if HP could learn the storage characteristic and then adapt for a time period for example lets say you had a workload that ran weekly which was IO intensive. My understanding from today’s version of AO is as follows:

    1. Monday – Intensive Workload
    2. Monday Night – Move to SSD tier
    3. Tuesday – Normal Workload
    4. Tuesday Night – Move to SAS tier
    5. Wednesday – Nothing
    6. Thursday – Nothing
    7. Friday – Nothing
    8. Saturday – Nothing
    9. Sunday – Nothing
    10. Monday – On wrong tier

    I’m sure the boffin’s at HP could track and analyse this so that AO became predictive.

    1. Craig thanks for reading and commenting. You make a good point that not every work load will be suitable for AO. For AO to be a good fit the regional IO density needs to be high i.e. a high proportion of your IOPs comes from a small proportion of capacity and the workload must have some kind of repeatable pattern to it. If the pattern is truly random like in your example then a different approach would have to be sought. However if the pattern repeated each week e.g. each Monday is an intensive day you could create a schedule which looked at the appropriate period from the previous week, although this would start to take you away from the model of simplicity.

      1. Isn’t what your asking for called cache ? And HP do have adaptive flash cache.

        The problem with such truly random workloads is firstly they are very rare, but also by their very naure they don’t display such consistency week on week So optimizing for.week 1 won’t necessarily help with week 2. If that’s the case you’d always be playing catchup and probably burn more I/O moving data than you’d gain using tiering.

        BTW there’s a min IOps setting to prevent demotion due to minimal access. You can also now filter by volume, so you can have different schedules for specific VV’s within the same CPG and config.

    2. Is this a case in which a scheduled DO job would make sense, to allow you to manually migrate a lun to a high performing CPG (perhaps a SSD-only CPG) at a certain time, in advance of a known high workload? Then perhaps another DO job to move it back to the CPG with AO, once the workload drops.

      I don’t use DO often, but this seems like a good use case for it?

      1. Interesting question Jason. But I think there would be a couple of challenges with this approach. First of all DO operations take time, so if it was a larger volume you could be waiting several days for it to migrate to SSD’s and then several days back again. Of course while it was migrating there would be some impact on performance. Secondly remember AO works on regional IO density and so is able to move just the hottest regions to your smaller more expensive high performance tiers. By moving an entire volume you may find that it was only certain regions that truly required the SSD’s and but pushing the rest of the volume also to SSD’s you are effectively wasting space on them.

        1. In my limited experience with migrating luns to a new CPG, I’d agree with that statement of it taking a while. I was mostly thinking about something that would fit completely into SSD, which would imply a small amount of data. I would think that if a 75GB database lun runs super hot one day a week on a schedule, the plan would work. A 10TB one? Not so much.

  2. Great post BTW and as you say the best model for AO is to keep it simple and it typically just works. However If you try to force movement based on your own bias, rather than the arrays recorded heuristics then you’ll likely be disappointed 🙂

    1. Hi yes you can run Remote Copy and Adaptive Optimization together. Although the AO config is not replicated, so you may find if you fail over to another site it would take a while for AO to optimise the layout of the regions.

  3. Hi all, new to 3par with a question with regarding Adaptive Optimization configuration. We have used the default balanced configuration on our 3Par for adaptive optimization on our virtual volumes. I’ve recently noticed that this configuration has, SSD_r1/FC_r1/NL_r6 as it’s CPGs. We’ve got several volumes now using this config. What would happen if I decided to change the RAID levels on the various Tiers? For example if I wanted to change it to SSD_r6/FC_r6/NL_r6? What would be the impact on the existing volumes in doing this, or can it be done without destroying the data on the volumes? Would I instead have to create a new volume with a new AO config as I want it and then move the date from one volume to another?

    1. If you change the RAID settings on an existing CPG this will just impact new writes to that tier. Existing data on the CPG will remain in the old format. You can try to get all data laid out in the new RAID style by running a tunesys afterwards. I would however say it would be cleaner to create a new CPG, and then to use dynamic optimization to move the volumes over. Then when its the original CPG is empty you would be able to delete it.

    1. Dynamic optimization is much simpler to understand. Dynamic optimization is when you manually choose to move an entire virtual volume between tiers.

  4. Hi, I have a 3 tier AO config. After running happily for a while I added further SSD disk into my P1000 array which is of a larger size to the original SSD. On the systems screen It shows the SSD as 2 different entries and massively favours the original smaller disk, in fact, when I select an SSD cpg it shows only a small amount of available space left when I know the newer SSD has loads of space. I have’t allocated anything from SSD, it is purely for AO. Any idea how I can make the array view both sizes of SSD as 1 big pool and use accordingly, they are both 150k and the SSD cpg is set to default speed? Cheers.

  5. Hi,
    I use 3PAR10800 and I have two types SSD (32 disk SSD 100GB, 32 disk SSD 500GB). I create CPG on 64 disk SSD and schdule AO 2 tier SSD and FC. Now I am very worry disk SSD 100GB alocate 98%. I supended AO. whether I start AO do system occur problem?

  6. Hi,
    I would like to create an AO with a somewhat non usual config for my VMs:
    Tier 0: 1TB RAID5 SSD
    Tier 1: 3TB RAID1 15K disks
    Tier 2: 14TB RAID5 10K disks

    This seems ‘understandable’ to me in that I expect most of the data to move down to Tier 2 (e.g. OS footprint, installed programs etc) with some data going to Tier 1 eventually and maybe really hot data (e.g. web caches) in Tier 0.

    My problem is whether this sounds reasonable and, moreover, how to deal with provisioning. Specifically, I expect these LUNs to be large (e.g. 4 x 4TB each) so really and truly I can only provision from my Tier 2 initially to accomodate the LUN size I want. I’m wondering whether this is possible and whether there are any issues to doing so as all the recommendations I’ve read so far (including the above) seem to suggest that provisioning should always happen from the middle tier in a 3 tier system (which is not possible in my case). I’ve got no problem with the lun initially being a little slow and then gradually improving over time (as stuff moves up the tiers).

    My question is whether what I want is do-able and whether there are any catches?

    Thanks for your help!

    1. Hi
      The advice not to provision from tier 2 relates is because any new writes will initially hit this tier and then have to wait till AO kicks in for any hot data to be tiered up. As NL disks are often used in tier 2 many companies cannot tolerate the hit of new writes initially going to slow disk. As in your case your new writes will be going direct to 10K disk I would have thought this would be. In summary there is no technical reason not to do this, it’s just if you can accept the slower speed from new writes to 10K disks.

Leave a Reply

Your email address will not be published. Required fields are marked *