Thursday, February 14, 2008
Tuesday, February 12, 2008
Disk Alignment
This is one of the most crucial pieces that we can talk about so far regarding performance. Having the disks that make up the LUN misaligned can be a performance impact of up to 30% on an application. The reason this occurs is because of the “Signature” or MetaData information that a host writes to the beginning of a LUN/Disk. To understand this we must first look at how the Clariion formats the LUNs.
In an earlier blog, we described how the Clariion formats the disks. The Clariion formats the disks in blocks of 128 per disk, which is equivalent to a 64 KB of data that is written to a disk from Cache. Why this is a problem, is that when an Operating System like Windows, grabs the LUN, it wants to initialize the disk, or write a disk signature. The size of this disk signature is 63 blocks, or 31 ½ KB of disk space. Because the Clariion formats the disks in 128 blocks, or 64 KB of disk space, that leaves 65 blocks, or 32 KB of disk space remaining on the first disk for the host to write data. The problem is that the host writes to Cache in whatever block size it does. Cache then holds the data a writes the data out to disk in a 64 KB Data Chunk. Because of the “Signature”, the 64 KB Data Chunk now has to go across two physical disks on the Clariion. Usually, we say that hitting more disks is better for performance. However, with this DISK CROSS, performance will go down on a LUN because Cache is now waiting for an acknowledgement from two disks instead of one disk. If one disk is overloaded with I/O, a disk is failing, etc…this will cause a delay in the acknowledgement back to the Storage Processor. This will be the case from now on when Cache writes every chunk of data out to this LUN. This will impact not only the LUN Cache is writing too, but to every LUN on the Raid Group may be affected.
By using an offset on a LUN from a Host Based Utility, ie Diskpart, or Diskpar for Windows, we are allowing the Clariion to write a 64 KB Data Chunk to one physical disk at a time. Essentially, what we are doing is giving up the remaining disk space on the first physical disk in the Raid Group as the illustration shows above. Window’s still writes it’s “Signature” to the first 63 blocks, but we use Diskpart, or Diskpar to offset the disk space the Clariion Writes to of the remaining space on the first disk. When Cache writes out to disk now, it will begin writing out to the first block on the second disk in the Raid Group, thereby giving the full 128 blocks/64 KB chunk that the Clariion hopes to write out to one physical disk.
The problem with all of this is that this offset or alignment needs to set on a Window’s Disk/LUN before any data is written to the LUN. Once there is data on the LUN, this cannot be done without destroying the existing LUN/data. The only way to now fix this problem is to create a new LUN on the Clariion, assign it to the host, set the offset/alignment, and do a host-based copy/migration. Again, a Clariion LUN Migration is a block for block LUN copy/move. All you are doing with a LUN Migration is moving the problem to a new location on the Clariion.
Windows has two utilities from the Command prompt that can be run to set the offset/alignment, Diskpar and Diskpart.
Diskpar is used for Window’s systems running Windows 2000, or 2003, not using at least Service Pack 1. Diskpar can be downloaded as part of the Resource Kit, and requires through its command line interface that the offset be equal to 128. Diskpar sets the offset in blocks, Since the Clariion formats the disks in 128 blocks, the Clariion will now offset writing to the LUN to block number 128, which is the first block on the second disk.
Diskpart is for Windows Systems running Windows 2003, service pack 1 and up. Diskpart sets the alignment in KiloBytes. Since the Clariion formats the disk in 64 KB, the Clariion will now align the writing to the LUN in 64 KB Chunks, or the first full 64 KB chunk, which is the second physical disk in the Raid Group.
This is also an issue with Linux servers, as an offset will need to set as well. Here again, the number to use is 128, because fdisk uses the number of blocks, not KiloBytes.
The following blog entry will list the steps for setting the offset for Windows 2003, as well as Linux servers.
LUN Migration
The process of a LUN Migration has been available in Navisphere as of Flare Code or Release 16. The LUN Migration is a move of a LUN within a Clariion from one location to another location. It is a two step process. First it is a block by block copy of a “Source LUN” to its new location “Destination LUN”. After the copy is complete, it then moves the “Source” LUNs location to its new place in the Clariion.
The Process of the Migration.
Again, this type of LUN Migration is an internal move of a LUN, not like a SANCopy where a Data Migration occurs between a Clariion and another storage device. In the illustration above, we are showing that we are moving Exchange off of the Vault drives onto Raid Group 10 on another Enclosure in the Clariion. We will first discuss the process of the Migration, and then the Rules of the Migration.
1. Create a Destination LUN. This is going to be the Source LUN’s new location in the Clariion on the disks. The Destination LUN is a LUN which can be on a different Raid Group, on a different BUS, on a different Enclosure. The reason for a LUN Migration might be an instance where we may want to offload a LUN from a busy Raid Group for performance issues. Or, we want to move a LUN from Fibre Drives to ATA Drives. This we will discuss in the RULES portion.
2. Start the Migration from the Source LUN. From the LUN in Navisphere, we simply right-click and select Migrate. Navisphere gives us a window that displays the current information about the Source LUN, and a selection window of the Destination LUN. Once we select the Destination LUN and click Apply, the migration begins. The migration process is actually a two step process. It is a copy first, then a move. Once the migration begins, it is a block for block copy from the Source LUN (Original Location) to the Destination LUN (New Location). This is important to know because the Source LUN does not have to be offline while this process is running. The host will continue to read and write to the Source LUN, which will write to Cache, then Cache writing out to the disk. Because it is a copy, any new write to the source lun will also write to the destination lun. At any time during this process, you may cancel the Migration if the wrong LUN was selected, or to wait until a later time. A priority level is also available to speed up or slow down the process.
3. Migration Completes. When the migration completes, the Source LUN will then MOVE to it’s new location in the Clariion. Again, there is nothing that needs to be done from the host, as it is still the same LUN as it was to begin with, just in a new space on the Clariion. The host doesn’t even know that the LUN is on a Clariion. It thinks the LUN is a local disk. The Destination LUN ID that you give a LUN when creating, will disappear. To the Clariion, that LUN never existed. The Source LUN will occupy the space of the Destination LUN, taking with it the same LUN ID, SP Ownership, and host connectivity. The only things that may or may not change based on your selection of the Destination might be the Raid type, Raid Group, size of the LUN, or Drive Type. The original space that the Source LUN once occupied is going to show as FREE Space in Navisphere on the Clariion. If you were to look at the Raid Group where the Source LUN used to live, under the Partitions tab, you will see the space the original LUN occupied as a Free. The Source LUN is still in the same Storage Group, assigned to the Host as it was before.
Migration Rules
The rules of a Migration as illustrated above are as follows.
The Destination LUN can be:
1. Equal to in size or larger. You can migrate a LUN to a LUN that is the exact same block count size, or to a LUN that is larger in size, so long as the host has the ability to see the additional space once the migration has completed. Windows would need a rescan, reboot of the disks to see the additional space, then using Diskpart, extend the Volume on the host. A host that doesn’t haven’t the ability to extend a volume, would need a Volume Manager software to grow a filesystem, etc.
1. Equal to in size or larger. You can migrate a LUN to a LUN that is the exact same block count size, or to a LUN that is larger in size, so long as the host has the ability to see the additional space once the migration has completed. Windows would need a rescan, reboot of the disks to see the additional space, then using Diskpart, extend the Volume on the host. A host that doesn’t haven’t the ability to extend a volume, would need a Volume Manager software to grow a filesystem, etc.
2. The same or a different drive type. A destination LUN can be on the same type of drives as the source, or a different type of drive. For instance, you can migrate a LUN from Fibre Drives to ATA Drives when the Source LUN no longer needs the faster type drives. This is a LUN to LUN copy/move, so again, disk types will not stop a migration from happening, although it may slow the process from completing.
3. The same or a different raid type. Again, because it is a LUN to LUN copy, raid types don’t matter. You can move a LUN from Raid 1_0 to Raid 5 and reclaim some of the space on the Raid 1_0 disks. Or find that Raid 1_0 better suits your needs for performance and redundancy than Raid 5.
4. A Regular LUN or MetaLUN. The destination LUN only has to be equal in size, so whether it is a regular LUN on a 5 disk Raid 5 group or a Striped MetaLUN spread across multiple enclosures, buses, raid groups for performance is completely up to you.
However, the Destination LUN cannot be:
1. Smaller in size. There is no way on a Clariion to shrink a LUN to allow a user to reclaim space that is not being used.
2. A SnapView, MirrorView, or SanCopy LUN. Because these LUNs are being used by the Clariion to replicate data for local recoveries, replicate data to another Clariion for Disaster Recovery, or to move the data to/from another storage device, they are not available as a Destination LUN.
3. In a Storage Group. If a LUN is in a Storage Group, it is believed to belong to a Host. Therefore, the Clariion will not let you write over a LUN that potentially belongs to another host.
MetaLUNs
The purpose of a MetaLUN is that a Clariion can grow the size of a LUN on the ‘fly’. Let’s say that a host is running out of space on a LUN. From Navisphere, we can “Expand” a LUN by adding more LUNs to the LUN that the host has access to. To the host, we are not adding more LUNs. All the host is going to see is that the LUN has grown in size. We will explain later how to make space available to the host.
There are two types of MetaLUNs, Concatenated and Striped. Each has their advantages and disadvantages, but the end result which ever you use, is that you are growing, “expanding” a LUN.
A Concatenated MetaLUN is advantageous because it allows a LUN to be “grown” quickly and the space made available to the host rather quickly as well. The other advantage is that the Component LUNs that are added to the LUN assigned to the Host can be of a different RAID type and of a different size.
The host writes to Cache on the Storage Processor, the Storage Processor then flushes out to the disk. With a Concatenated MetaLUN, the Clariion only writes to one LUN at a time. The Clariion is going to write to LUN 6 first. Once the Clariion fills LUN 6 with data, it then begins writing to the next LUN in the MetaLUN, which is LUN 23. The Clariion will continue writing to LUN 23 until it is full, then write to LUN 73. Because of this writing process, there is no performance gain. The Clariion is still only writing to one LUN at a time.
A Striped MetaLUN is advantageous because if setup properly could enhance performance as well as protection. Let’s look first at how the MetaLUN is setup and written to, and how performance can be gained. With the Striped MetaLUN, the Clariion writes to all LUNs that make up the MetaLUN, not just one at a time. The advantage of this is more spindles/disks. The Clariion will stripe the data across all of the LUNs in the MetaLUN, and if the LUNs are on different Raid Groups, on different Buses, this will allow the application to be striped across fifteen (15) disks, and in the example above, three back-end buses of the Clariion. The workload of the application is being spread out across the back-end of the Clariion, thereby possibly increasing speed. As illustrated above, the first Data Stripe (Data Stripe 1) that the Clariion writes out to disk will go across the five disks on Raid Group 5 where LUN 6 lives. The next stripe of data (Data Stripe 2), is striped across the five disks that make up RAID Group 10 where LUN23 lives. And finally, the third stripe of data (Data Stripe 3) is striped across the five disks that make up Raid Group 20 where LUN 73 lives. And then the Clariion starts the process all over again with LUN6, then LUN 23, then LUN 73. This gives the application 15 disks to be spread across, and three buses.
As for data protection, this would be similar to building a 15 disk raid group. The problem with a 15 disk raid group is that if one disk where to fail, it would take a considerable amount of time to rebuild the failed disk from the other 14 disks. Also, if there were two disks to fail in this raid group, and it was RAID 5, data would be lost. In the drawing above, each of the LUNs is on a different RAID group. That would mean that we could lose a disk in RAID Group 5, RAID Group 10, and RAID Group 20 at the same time, and still have access to the data. The other advantage of this configuration is that the rebuilds are occurring within each individual RAID Group. Rebuilding from four disks is going to be much faster than the 14 disks in a fifteen disk RAID Group.
The disadvantage of using a Striped MetaLUN is that it takes time to create. When a component LUN is added to the MetaLUN, the Clariion must restripe the data across the existing LUN(s) and the new LUN. This takes time and resources of the Clariion. There may be a performance impact while a Striped MetaLUN is re-striping the data. Also, the space is not available to the host until the MetaLUN has completed re-striping the data.
Access Logix
Access Logix, often referred to as ‘LUN Masking’, is the Clariion term for:
1. Assigning LUNs to a particular Host
2. Making sure that hosts cannot see every LUN in the Clariion
Let’s talk about making sure that every host cannot see every LUN in the Clariion first.
Access Logix is an enabler on the Clariion that allows hosts to connect to the Clariion, but not have the ability to just go out and take ownership of every LUN. Think of this situation. You have ten Window’s Hosts attached to the Clariion, five Solaris Hosts, eight HP Hosts, etc… If all of the hosts were attached to the Clariion (zoning), and there was no such thing as Access Logix, every host could potentially see every LUN after a rescan or 17 reboots by Window’s. Probably not a good thing to have more than one host writing to a LUN at a time, let alone different Operating Systems writing to the same LUNs.
Now, in order for a host to see a LUN, a few things must be done first in Navisphere.
1. For a Host, a Storage Group must be created. In the illustration above, the ‘Storage Group’ is like a bucket.
2. We have to Connect the host to the Storage Group
3. Finally, we have to add the LUNs to the Host’s Storage Group we want the host to see.
From the illustration above, let’s start with the Windows Host on the far left side. We created a Storage Group for the Windows Host. You can name the Storage Group whatever you want in Navisphere. It would make sense to name the Storage Group the same as the Host name. Second, we connected the host to the Storage Group. Finally, we added LUNs to the Storage Group. Now, the host has the ability to see the LUNs, after a rescan, or a reboot.
However, in the Storage Group, when the LUNs are added to the Storage Group, there is a column on the bottom right-side of the Storage Group window that is labeled Host ID. You will notice that as the LUNs are placed into the Storage Group, Navisphere gives each LUN a Host ID number. The host ID number starts at 0, and continues to 255. We can place up to 256 LUNs into a Storage Group. The reason for this, is that the Host has no idea that the LUN is on a Clariion. The host believes that the LUN is a Local Disk. For the host, this is fine. In Windows, the host is going to rescan, and pick up the LUNs as the next available disk. In the example above, the Windows Host picks up LUNs 6 and 23, but to the host, after a rescan/reboot, the host is going to see the LUNs as Disk 4 and Disk 5, which we can now initialize, add a drive letter, format, create the partition, and make the LUN visible through the host.
In the case of the Solaris Host’s Storage Group, when we added the LUNs to the Storage Group, we changed LUN 9s host id to 9, and LUN 15s host id to 15. This allows the Solaris host to see the Clariion LUN 9 as c_t_d 9, and LUN 15 as c_t_d 15. If we hadn’t changed the Host ID number for the LUNs however, Navisphere would have assigned LUN 9 with the Host ID of 0, and LUN 15 with the Host ID of 1. Then the host would see LUN 9 as c_t_d 0 and LUN 15 as c_t_d 15.
The last drawing is an example of a Clustered environment. The blue server is the Active Node of the cluster, and the orange server is the Standby/Passive Node of the cluster. In this example, we created a Storage Group in Navisphere for each host in the Cluster. Into the Active Node Storage Group, we place LUN 8. LUN 8 also went into the Passive Node Storage Group. A LUN can belong to multiple storage groups. The reason for this, is if we only placed LUN 8 in to the Active Node Storage Group, not into the Passive Node Storage Group, and the Cluster failed over to the Passive Node for some reason, there would be no LUN to see. A host can only see what is in it’s storage group. That is why LUN 8 is in both Storage Groups.
Now, if this is not a Clustered Environment, this brings up another problem. The Clariion does not limit who has access, or read/write privileges to a LUN. When a LUN is assigned to a Storage Group, the LUN belongs to the host. If we assign a LUN out to two hosts, with no Cluster setup, we are giving simultaneous access of a LUN to two different servers. This means that each server would assume ownership of the LUN, and constantly be overwriting each other’s data.
We also added LUN 73 to the Active Node Storage Group, and LUN 74 to the Passive Node Storage Group. This allows each server to see LUN 8 for failover purposes, but LUN 73 only belonging to the Active Node Host, and LUN 74 belonging to the Passive Node Host. If the cluster fails over to the Passive Node, the Passive Node will see LUN 8 and LUN 74, not LUN 73 because it is not in the Storage Group.
Notice that LUN 28 is in the Clariion, but not assigned to anyone at the time. No host has the ability to access LUN 28.
Subscribe to:
Posts (Atom)