Adaptive Optimization Design Considerations

What is AO?

Adaptive Optimization (AO) is 3PAR’s disk tiering technology, which automatically moves the hottest most frequently accessed blocks to the fastest disks and at the same time moves the infrequently accessed cold blocks of data to slower disks. If you are from an EMC background AO is comparable to FAST VP, if you are from a NetApp background welcome to the brave new world of tiering.

 

An AO config can consist of between two or three layers of disk. You can think of this as being a gold, silver and bronze level in terms of performance. Your SSD’s forming your high performing gold layer, 10 or 15K SAS disks operating as your silver layer and NL disks operating in bronze. Below the different tiers of disk are shown diagrammatically along with some example of blocks of data tiering both up and down. AO is a licenced feature that is enabled through the creatively named Adaptive Optimization software option and is available across all the hybrid models. It is not available on the all flash models for obvious reasons.

AO1

How AO works

So now we understand AO is a tiering technology that intelligently places our data based on how hot (frequently accessed) or cold it is. So you may expect to see all your VV’s (Virtual Volumes) that are demanding the most IOPs end up in your tier 0 SSD’s, this however is not the case as AO is a sub-lun tiering system, i.e. it does not need to move entire volumes just the hot parts.

Under the hood 3PAR allocates space in 128MB blocks or in HP speak regions, and it is at the region level that AO both analyses and moves data. This granular level of analysis and movement ensures that the capacity of expensive SSD disks is utilised to its fullest, by moving only the very hottest regions as opposed to entire VVs.  To give an example if you have a 100GB VV and only 2 regions are getting hit hard only 256MB of data need to be migrated to SSD, a massive saving in space compared to moving the entire volume.

 

AO does its voodoo by analysing the number of hits a region gets within a time period to measure its regional IO density. Regional IO density is measured in terms of IOPs per GB per minute.  The regional IO density stats are then used by AO to select those regions that have an above average regional IO density and marking them for movement to a higher tier. How aggressive AO is in moving data is dependent on the performance mode selected.

Once a region has been marked as a candidate for movement the systems then performs a number of sanity checks, to verify if the region move should go ahead:

  • Average service time – of a tier is checked to ensure that data isn’t migrated to a faster disk technology, which is working so hard its service times are in fact worse than where the data is being migrated from
  • Space – If no space is available in a tier or the CPG space warning or limit has been reached no regions will be moved to that tier. However if a disk exceeds a CPG warning or limit AO will try to remediate this by moving regions out of that tier, first by moving busy regions to faster tiers and slower regions to slower tiers

AO building blocks

To understand AO design we next need to consider what the building blocks of an AO setup are. AO is controlled through an AO config. In an AO config you define your tiers of disk through the selection of CPG’s and then choose an operational mode optimised for cost, performance or a balance between the two. Once you have setup your AO config you then need to schedule a task to run AO. When scheduling you will need to choose the analysis period during which regional IO density will be analysed and the times during which AO will actually perform the data moves. An example of AO config is shown in the table below.

AO Config Name AO_Balanced
Tiers 0=SSD_R5_CPG, 1 SAS_R5_CPG, 2NL_R6_CPG
Mode Balanced
Analysis Period 09:00-17:00 Mon-Fri
Execution Time 20:00
Max Run Time 12 Hours

 

AO design considerations

Now that that we understand the principles and the building blocks of AO, let’s look at some of the design considerations.

 

Number of tiers

You can have between two and three tiers, 0 being the fastest and 2 being the slowest. It is not recommended to have a 2 tier system that only contains NL and SSD disks as the performance differential would be too great. Some example tiers would be:

  • Two tier 1 = 10K SAS, 2=NL
  • Two tier 0 = SSDs, 1=10K SAS
  • Three tier = 0 = SSDs, 1=10K SAS, 2=NL

 

Even if you do start with a two tier system without SSDs, leave tier 0 empty in your config so that you can add SSDs at a later date

 

Simplicity

This is actually the point that inspired me to write the post and I believe is the most important design principle, keep it simple. At least start out with a single AO policy containing ALL your tiers of disk and allow ALL data to move freely. If for example you choose what you believe are your busiest VV’s and lock them to an SSD CPG you may find only a small proportion of data is hot, and be robbing yourself of space. Conversely if you choose to lock a VV into a lower tier of CPG on NL disks it may become busy and have nowhere to move up to, hammering the disks it’s placed on and affecting all the volumes hosted from there.

simple large

 CPGs

A CPG can only exist within one AO config. So if you do go down the route of having multiple AO policies you must have a separate CPG to represent each tier of disks in each different AO policy. Additional CPG’s create additional management overhead in terms of reporting etc. For a reminder on what CPGs are about go here.

Node pairs

You need to be aware that AO only occurs across a node pair. So on a 7400 with 4 nodes, AO would occur across the cages attached to nodes 0, 1 and across those attached to nodes 2, 3. The key design principle here is to keep drives and drive cages balanced across nodes so performance in turn remains balanced

Business requirements v system requirements

Just because a system is perceived as having a higher importance by the business does not mean that it has a higher regional access density. Remember space in your higher tiers is at a premium, by second guessing and placing the wrong data in your tier 0 you are short changing yourself. I made this change for a customer recently moving from an AO config that only allowed business critical apps access to the SSDs, to allowing all data to move freely across all tiers. The net result was utilisation of each SSD increased from 200 to 1000+ IOPs each, thus reducing the pressure on the lower tiers of disks.

Provision from your middle tier

When you create a new VV do so from your middle tier if you are using a 3 tier config. This way any new writes aren’t hitting slow disk, but also aren’t taking up valuable space in your top tier. By allowing AO complete control of your top tier and not provisioning any volumes from it you can allow AO to take capacity utilisation up to 100%.

If you are running a two tier system that contains NL disks, provision new VVs from tier 1 so that new writes are not hitting slow NL disks.

You can’t beat the machine

This point effectively summarises all those above. The traditional days of storage and having to calculate the number disks and RAID type to allocate to a LUN are gone. Just provision your VVs from the CPG representing the central tier, allow the volumes access to all the disks and let the system decide which RAID and disk type is appropriate at a sub-lun level. Don’t try and second guess where data would be best placed, the machine will outsmart you.

term1

Timing

Timing is a key consideration. Start again by keeping things simple, monitor during your core business hours and be careful to not include things like backups in your monitoring period which could throw the results off. If you find you have certain servers with very specific access patterns adjust the timing to monitor during these periods. Schedule AO to run out of hours if possible as it will put additional overhead on the system. You can set a max runtime on AO to make sure that it is not running during business hours. At first make the max run period as long as you can outside of business hours to give AO every opportunity to run. If you do run multiple AO policies set them to start at the same time, this will minimise the chance of you running into space problems

AO Mode

The three AO modes are quite self-explanatory, at either extreme performance moves data up to higher tiers more aggressively, cost – moves data to cheaper large capacity tiers more aggressively, balanced is a half-way house between the two. Which one you choose will depend on if your aim leans towards achieving optimum cost or performance. I would suggest selecting balanced to start with then monitoring and adjusting accordingly

Availability

All CPG’s in an AO config should be set to use the same availability level, for example cage. Having a mix of availability levels will mean that data is protected at the lowest availability level in the AO config. For example if you FC CPG has an availability level of cage and your NL CPG has magazine the net result will be an availability of magazine level.

Exceptions to simplicity

There will always be exceptions to simplicity, but at least give simplicity a chance and then tune from there. If the access pattern of your data is truly random each day Adaptive Flash Cache may help to soak up some of the hits and can be used conjunction with AO.

If you want to take advantage of deduplication with thinly deduplicated virtual volumes you will need to place the volumes directly on the SSDs and they cannot be on a CPG which is part of an AO config.

As discussed in the timing section if you have applications with very specific access patterns at different times of the day you may need to create multiple policies to correctly analyse this.

Moving forward into the brave new world of VVOLs AO will be supported.

 

As ever feel free to share your perspective and experiences on AO design in the comments sections and on Twitter.

 

Further Reading

Patrick Terlisten blog post – Some thoughts about HP 3PAR Adaptive Optimization

HP 3PAR StoreServ Storage best practices guide

HP Technical white paper Adaptive – Optimization for HP 3PAR StoreServ Storage

 

Backing Up the System Reporter Database

As I posted recently I had an issue with the corruption of one of the tables in MySQL, which caused System Reporter to go offline. To help to avoid a repeat of the 5 minutes I spent rolling around the floor muttering that all the data was lost, I have been looking into an automated way to backup the MySQL database used by System Reporter. I asked HP and they advised they don’t have any guidelines on backing up the System Reporter database or a preferred tool, so I jumped onto Google and had a look around at the various options.

 

The solution I have selected is a script published by By Matthew Moeller at Red Olive Design. The script is essentially a batch file which can easily be added as a scheduled task in Windows, which effectively means the backups can be automated. The other thing which made this particular script stand out is that it compresses the completed backup in a Zip file, in my case this resulted in the Zip being 4 times smaller than the actual DB. Another neat feature is that you can set how many days of backups you want to keep and it will automatically delete backups older than the specified date for you.

 

I have now had several successful runs with the script but as with any script use it at your own risk and make sure it is suitable for your environment. You can find the script which is free, and all the installation instructions here.

Viewing Free Capacity in 3Par

How much free capacity has my 3PAR got?  This seems like a simple question but I see lots of questions about it. The SSMC has simplified viewing the 3PAR capacity information significantly so lets start there .

SSMC

Raw Capacity

When you open the 3 PAR SSMC you will see the dashboard view. In the dashboard you will see a number of widgets that will help you with your 3PAR capacity management, the widgets are as follows:

Total capacity – This shows you the total raw capacity. For clarity this raw capacity is just the sum of the capacity of all the disks in your system and takes no account of RAID levels, sparing etc.

Device Type Capacity – This again measures raw space but this view allows you to see it by device type i.e. the type of disk be that FC, NL or SSD

Allocated capacity – Another raw figure, this takes the total capacity allocated which means used and shows you what it is using up the space dividing it up by block, file and systemThe above widgets show the capacity for ALL systems connected to SSMC, to see the capacity information for a single system click on the widget in the dashboard. You can also drill down to the capacity for a single system by opening the main menu, and then choosing systems, then choosing the capacity view.

Useable Capacity

The capacity measures we have looked at so far have shown raw space available in different ways. Potentially the figure you will be most interested in will be how much space is actually available to be written to i.e. what is the useable space. For this we need to look at the CPG level

From the main menu choose Common Provisioning Groups. If you cannot see this option choose show more from the top right

In the left hand pane choose the CPG you are interested in checking the available space for. In the right hand window look at the Capacity Summary widget. When you look at this you will probably look at the free figure and see there is not much space left. Once you finish nearly having a heart attack thinking you are about to run out of space, let me reassure you that this figure is highly misleading. CPG’s grow on demand and this figure will always show a small amount of free space.

The figure you are really interested in is the Estimated maximum CPG size. This is the figure that shows the useable capacity available to the 3PAR. Be mindful that multiple CPG’s can consume space from the same set of physical disks, check out our CPG Overview if you need a refresher on this

Command Line

If you are a command line junkie you can get exactly the same stats from the CLI. To view get a similar view of disk capacity utilisation as in part 1 run

showsys – space

To view writeable capacity left on a CPG as in part 2 run the below, the figure you are interested in is LD free

showspace -cpg cpgname

     -------------------------(MB)--------------------------

 CPG -----EstFree----- ------Usr------ ---Snp---- ---Adm----

 Name RawFree   LDFree   Total   Used Total Used Total Used

R6 15954816 10636544 2051840 2048000     0   0     0   0

3PAR Management Console

Raw Space

If you still use the 3PAR management console read on. The first thing to understand is how much capacity has been consumed on the physical disks. This is simply as a percentage how full are the disks. The steps to find this are below

  • Open up your 3PAR Management Console
  • Select systems from the navigation pane
  • Highlight the name of your SAN in the management tree
  • In the management window choose the capacity tab.

It’s quite self-explanatory. The top of the windows shows the total for all the different types of disks in your system and then breaks this down by disk type below, this is then further broken down into allocated and free space. So for example the above screen shot shows the fast class disks are 88.51% full.

Next you want to know what’s gobbling up all that space. If you expand the disk class you are interested in you will see it then splits this into space used by the system v space used by volumes

capacity 2

You can keep drilling down further and further on an item. Like below, I’m looking at what proportion of the allocated space is taken up by the volume itself and how much is snapshot space. Have a play and you will get the idea.

Ok so now we know how much space is left on our physical disks but the key question is how much actual writeable space is left for critical stuff like the marketing department to store their pictures in. You can view this on a per CPG level.

Useable Space

This part deals with the question how much writeable space have I got left in the CPG.

  • Open up your 3Par Management Console
  • Select provisioning from the navigation pane
  • Then from the management tree highlight CPG’s
  • In the management pane on the top right highlight the name of the CPG you want to look at
  • Choose summary in the bottom management pane
  • The stat you are interested in is estimated free system space, shown in the bottom management pane

So the above screenshot shows that there is 10,387GiB of writeable space left on this CPG. Writeable space takes account of the amount of space consumed by raid parity, system space etc. and tells you how much space is actually available for volumes to grow into.

With the CPG space figures you need to be aware if you have other CPG’s on the same set of physical disks they will also be competing for the space. So for example in the above screenshot the 10,387GIB may not be for exclusive use by the CPG you are looking at. Also be aware that some CPG’s will consume space much faster than others, for example a raid 1 CPG will use up space much quicker than a raid 5 CPG.

 

Hopefully knowing these methods will allow you to manage the capacity of your 3PAR system more easily

To stay in touch with more 3PAR news and tips connect with me on LinkedIn and Twitter.