Wednesday, November 28, 2007

Cache Page Size



Cache Page Size

Here we are discussing the use of the Cache Page Size. We say that it is the same as saying Cache Block Size. Each “Page” or block in Cache is a fixed size. And, in the Clariion, the entire Cache is the same fixed size. Therefore, we feel that this is one of the areas in Cache where knowing your environment (applications, etc) can make a difference. In the diagram above, we are illustrating the use of Cache with three different applications, Oracle, SQL, and Exchange. Next to the applications is a Block Size. We are using these three applications in this diagram because these seem to be the most common applications people come to class with.


Next to the applications is a default Block Size. Again, we are only using these as examples. You want to verify the applications running on the Clariion and their Block Sizes.


There are four different Page Size Settings in Cache for the Clariion, 2 KB, 4 KB, 8 KB, and 16 KB. Let’s start with the default Clariion Page Size of 8 KB. Again, every “Page” in Cache will be 8 KB in size. If we have an application like Oracle running on this Clariion, and Oracle using a default Block Size of 16 KB, that would mean that every Oracle Block of data to the Clariion would be broken into two separate Pages in Cache. With SQL writing to this 8 KB Page Size, it is a one to one ratio, as it is with Exchange, however, with every Exchange Block of data, there is a 4 KB waste of space per block, which could be filling up Cache more rapidly with this “wasted space.”


The next Page Size down shows a 4 KB Page Size for Cache. The nice thing about this size in Cache is that there is no wasted space. Exchange is still in a 1:1 ratio of blocks. However, SQL now has to split into two separate Cache Pages, and Oracle splits into four separate Cache Pages. The good thing about this size is “No Wasted Space.” The down side to this is now we have to listen to the Oracle and SQL admins complain about performance.


So, we set the Page Size to 16 KB to appease the Oracle and SQL admins. Here comes the problem again of wasted space in cache, which, depending on your Clariion, you don’t have a lot of. With the 16 KB Page Size, all of the applications write to one Cache Page. The applications are happy because of this, but we are back to the wasted space. For every Exchange block written to the Clariion, there is a waste of 12 KB Cache space. For every SQL Block, there is a waste of 8 KB Cache Space.


If you are only using one of these applications on the Clariion, great, match the Cache Page Size to that application. If that is not the case, you as the Storage Administrator, will have to decide the Winners and Losers. Next to each of the different page sizes, we have listed the Winners, and the Losers.


In the 8 KB Page Size, SQL and Exchange are winners because from the application point of view, they are a 1:1 ratio. Oracle is a Loser because it is split across two separate blocks in Cache. Another loser in this setting is the Clariion Cache because of the wasted space.

In the 4 KB Page Size, Exchange and Cache are winners because Exchange is again a ratio of 1:1, and no wasted space in Cache. Oracle and SQL are losers because they are written to separate Pages in Cache.

With the 16 KB Page Size, the applications all win. Oracle, SQL and Exchange are all a 1:1 ratio. The big loser in this setting is Cache. Cache is a loser with all of the wasted space.
This, again is one of the places to look at for performance of Cache in a Clariion. Knowing your environment plays a big piece in how things are written to Cache.

Cache Allocation







Cache Allocation


In the illustration above, we are seeing again that if data is written to one Storage Processor, it is MIRRORed to the other Storage Processor.


A host that writes data to SP A, will mirror to SP B, and vice versa. So, you will be losing some Cache space to this mirroring. In this example, we are setting SP A’s Write Cache to 1 ½ GB. Which means that over on SP B, 1 ½ GB of Cache space will be taken for the Mirroring of SP A’s Write Cache. The same scenario is set for SP B. The same values are transferred across SPs for Write Cache.


SP Usage


SP Usage is pre-allocated Cache Space that is used by the Clariion for things like pointers/deltas, SnapView, MirrorView. The amount of space that is lost per Storage Processor for SP Usage depends on a couple of things. First, is the type of Clariion you have. Second, what Flare Code you are running on the Clariion. We’ll talk later where to find the Flare Code your Clariion is running.


In this example, we are using 750 MB per Storage Processor as the vaule for SP Usage. To give you some real numbers:


Type of Clariion Flare Code SP Usage:
CX3-80 26 1464 MB
CX3-80 24 1464 MB
CX700 26 884 MB
CX700 24 832 MB


After Write Cache is allocated and SP Usage is taken into account, this leaves us with 250 MB of Cache for Reads.



The nice thing about the Clariion though is that it allows you to change those cache values. Let’s say for instance, that this initial setup above works for you in the mornings when people are writing to a database, but later in the day, the database has more reads. You can take from Write Cache and give the rest to Read Cache. The other nice thing about it is that it can be scripted from the Command Line Interface. Below the chart are the three commands that you can use to change cache.


Command One


Before we can change the values of Cache, we must first disable Cache. This command is the command to disable Write Cache, Read Cache of SP A and SP B. Not only does this disable Cache, it also forces a Flush of Cache to disk. This means that the command prompt will not return immediately. There will be a delay in the command prompt returning until Cache is flushed. As I always say, I cannot give you an amount of time that this will take (two weeks). The answer is going to be….”it depends, you’ll have to test it.”



Command Two


This is the actual setting of Cache command. By default, the setting of Cache is allocated in MegaBytes. By setting Write Cache to 2048 MB (2 GB), we are telling the Clariion to take that number, and divide half of it for SP A Write Cache, and half for SP B Write Cache. We don’t calculate into this the Mirroring of Write Cache, just the actual usable space. Next, we specify the amount for the Read Cache Size of SP A of 1250 MB (1.25 GB) and the Read Cache Size of SP B of 1250 MB (1.25 GB). Read Caching is not Mirrored, so we must specify both SPs Read Cache. Notice how by simply taking ½ GB away from SP A and SP B Write Cache, we can allocate 1 GB more of Cache space to the SPs for Reads.



Command Three


Finally, we have to re-enable Cache. The ones (1) next to –wc, -rca, and –rcb stand for Enabling.



Changing the values of Cache could be done at any time, all day long if you want to, though I wouldn’t recommend it. But, it could prove to be extremely beneficial to performance of the Clariion. Acknowledgements from Writes, and Reading from Cache is going to happen in Nanoseconds as opposed to milliseconds coming from disk.


Another example of why to change Cache could be when Backups are going to occur. Since you will be reading data from Clariion Luns, you could allocate as much Cache to Reads as possible so that the Backup Host could be retrieving data from Cache rather than disk. When the Backups are complete, you could script that the Cache values go back to Production Levels.



Tuesday, November 13, 2007

Caching






From the chart above, the amount of Cache that a Clariion contains is based on the model.

Read Caching
First, we will describe the process of when a host issues a request for data from the Clariion.

1.The host issues the request for data to the Storage Processor that owns the LUN. If that data is sitting in Cache on the Storage Processor,
2.The SP sends the data back to the host.

If however, the data is not in Cache, the Storage Processor must go to disk now to retrieve the data. (Step 1 ½ ). It reads the data from the LUN into Read Cache of the owning Storage Processor. (Step 1 ¾ ) before it sends the data to the host.

Write Caching

1.The host writes a block of data to the LUN’s owning Storage Processor.
2.The Storage Processor MIRRORs that data to the other Storage Processor.
3.The owning Storage Processor then sends the Acknowledgement back to the host, that the data is “on disk.”
4.At a later time, the data will be “flushed” from Cache on the SP out to the LUN.

Why does Write Cache MIRROR the data to the other Storage Processor before it sends the acknowledgement back to the host?

This is done to ensure that both Storage Processors have the data in Cache in the event of an SP failure. Let’s say that the owning Storage Processor crashed (again, never happens). If that data was not written to the other Storage Processor’s Cache, that data would be lost. But, because it was written to the other SP Cache, that Storage Processor can now write that data out to the LUN.

This MIRRORing of Write Cache is done through the CMI (Clariion Messaging Interface) Channel which lives on the Clariion.

Zoning







On this page, we are going to discuss how a Host might be zoned through switches to a Clariion. This host has two(2) Host Bus Adapters. From the previous page, we know that the host must have at least one connection to SP A and one connection to SP B. What we are illustrating here is from the “Host to Clariion Configuration” page, Configuration Three. We are also going to look at what is meant by “Single Inititiator Zoning”. Single Initiator Zoning means that you create a zone with one HBA entry. We don’t want to have a zone that would contain an HBAs from two(2) Hosts.


HBA1 is connected to Port 0 on the switch. SP A port 0 is connected to the same switch at Port 14. Based on the World Wide Names of HBA1 and SP A port 0, we can now create a zone through the switch software. The zone could look as follows:


Zone HBA1 to SP A port 0
10:00:00:00:07:36:55:86
50:06:01:60:10:60:08:74


We also want to connect HBA1 to SP B. We connect SP B port 0 to Port 15 on the same switch. That zone could look as follows:


Zone HBA1 to SP B port 0
10:00:00:00:07:36:55:86
50:06:01:68:10:60:08:74


HBA1 is now zoned and connected to both Storage Processors on the Clariion.
We would repeat the same steps for HBA2 and the switch that it is connected to. HBA2 is connected to Port 0 on the switch. SP A port 1 is connected to the same switch at Port 14. Based on the World Wide Names of HBA1 and SP A port 1, we can now create a zone through the switch software. The zone could look as follows:


Zone HBA2 to SP A port 1
10:00:00:00:66:87:35:20
50:06:01:61:10:60:08:74


We also want to connect HBA2 to SP B. We connect SP B port 1 to Port 15 on the same switch. That zone could look as follows:


Zone HBA2 to SP B port 1
10:00:00:00:66:87:35:20
50:06:01:69:10:60:08:74


Another way in which the zoning could have been done is:


Zone HBA1 to SP A port 0 and SP B port 0
10:00:00:00:07:36:55:86
50:06:01:60:10:60:08:74
50:06:01:68:10:60:08:74


Again, there is only one HBA in that zone. The preferred method is simply up to you and how you want to manage the switches. The advantage of doing it this way is that it cuts the number of zones on the switch in half, but could be a little confusing (which could be nice for job security).
Now, what do we do if there is an HBA failure? First of all, that never happens. (Kidding) This is where we go to the four(4) steps listed under HBA Failure. The three R’s and a D. Let’s say that HBA1 were do fail. The first thing we would do is to replace that failed HBA. Next, because we did our zoning on the switch based on the World Wide Names of the HBAs, we would have to rezone the switch for the new HBA because it would have a new World Wide Name. The third step is to go to Navisphere, and using Connectivity Status, Register the new HBA with the Clariion. And finally, the Clariion does not automatically clean itself up. You would have to again, in Connectivity Status, Deregister the failed HBA.

Storage Processor Ports WWNs






Each Storage Processor Port will have a unique World Wide Name associated with it. What we are doing on this page is to “break down” what makes up the SP Port WWN. What I am showing here are the three(3) pieces that make up the WWN. The three(3) pieces are what I am calling the ‘EMC Flag’, the SP Port Identifier, and the Array ID. All SP Port WWNs on Clariions start with the same ‘EMC Flag’ of 50:06:01. When you are looking at the Switch Software that shows the ports on the switch and what is plugged into those ports, anytime you see a World Wide Name that starts with the 50:06:01, you will know that a Clariion SP Port is connected there.


The next “piece” to the World Wide Name, is the SP Port Identifier. On all Clariions, these numbers are the same as well. For instance, if you have 3 Clariions in your environment, every one of those Clariion’s SPA Port 0 World Wide Name would start off 50:06:01:60. And every Clariion’s SP B Port 1 would start off 50:06:01:69. These SP Port Identifiers will not change from Clariion to Clariion.


The last “piece” to the puzzle is the Array ID. This is related to the Unique ID of the Clariion itself. Every Clariion has a unique World Wide Name associated with it. But, that Array ID belongs to every port on that Clariion as it shows above. Now, if you have two(2) Clariions in your environment, you will see two(2) sets of Array IDs. Let’s say you have a Production Clariion and a Development Clariion (I know, no one has that), the Production Clariion could have an Array ID of 10:60:08:74, and the Development Clariion could have an Array ID of 10:60:06:23. So, the Production Clariion’s SP A Port 0 would be 50:06:01:60:10:60:08:74, and the Development Clariion’s SP A Port 0 would be 50:06:01:60:10:60:06:23.

Wednesday, November 7, 2007

Host Connectivity Limitations






This page is going to discuss how many hosts can connect to a Clariion. The deciding factor in this is going to be the number of times you connect your host(s) to the Clariion. We are going to use the three configurations that were discussed in the prevoius blog. The chart above lists the number of ports each Storage Processor contains based on the model, as well as the number of Initiator Registration Records each port supports. An Initiator Registration Record (IRR) is used everytime a host, via an HBA, is connected and "Registered" with the Clariion. The Clariion now recognizes that this HBA belongs to a specific host attached to the Clariion, and will now allow the host to "talk" with the Clariion. The more times you connect and register a host, the more IRRs it uses, thus taking away potential connections for other or more hosts.

With Configuration One, even though it only has one HBA, that HBA must be connected at least once to SP A and once to SP B. Again, this goes back to the previous blog about access to the Clariion if a LUN were to trespass. Therefore, this host is using two IRRs.

With Configuration Two, this host has one connection from each HBA to one SP Port on each Storage Processor. Even though this host has two HBAs, it is still only using two IRRs. One connection to SPA, one connection to SP B.

With Configuration Three, this host has two connections to the Clariion from each HBA. HBA1 is connected once to SPA and once to SP B. HBA2 is connected once to SP A and once to SP B. This host is using four IRRs because it is connected four times to the Clariion.

In the chart, we are trying to illustrate the maximum number of hosts that can connect to a Clariion based on the host configurations. Again, the more times you connect a host, the more IRRs you use, the less the number of hosts that can be attached to a Clariion. If you are using a CX700, CX3-40 or CX3-80, you have the possibility of hooking up 256 hosts based on each host only having one connection to SP A and one connection to SP B. However, if every host were connected four(4) times, as in Configuration three, that number is cut in half to 128 hosts. If every host were connected to the Clariion eight(8) times, the number is cut again to 64 hosts.

Host to Clariion Configurations









Here we are looking at only three possible ways in which a host can be attached to a Clariion. From talking with customers in class, these seem to be the three most common ways in which the hosts are attached.



The key points to the slide are:
1. The LUN, the disk space that is created on the Clariion, that will eventually be assigned to the host, is owned by one of the Storage Processors, not both.
2. The host needs to be physically connected via fibre, either directly attached, or through a switch.




CONFIGURATION ONE


In Configuration One, we see a host that has a single Host Bus Adapter (HBA), attached to a single switch. From the Switch, the cables run once to SP A, and once to SP B. The reason this host is zoned and cabled to both SPs is in the event of a LUN trespass. In Configuration One, if SP A would go down, reboot, etc...the LUN would trespass to SP B. Because the host is cabled and zoned to SP B, the host would still have access to the LUN via SP B. The problem with this configuration is the list of Single Point(s) of Failure. In the event that you would lose the HBA, the Switch, or a connection between the HBA and the Switch (the fibre, GBIC on the switch, etc...), you lose access to the Clariion, thereby losing access to your LUNs.



CONFIGURATION TWO


In Configuration Two, we have a host with two Host Bus Adapters. HBA1 is attached to a switch, and from there, the host is zoned and cabled to SP B. HBA2 is attached to a separate switch, and from there , the host is zoned and cabled to SP A. The path from HBA2 to SP A, is shown as the "Active Path" because that is the path data will leave the host from to get to the LUN, as it is owned by SP A. The path from HBA1 to SP B, is shown as the "Standby Path" because the LUN doesn't belong to SP B. The only time that the host would use the "Standby Path" is in the event of a LUN Trespass. The advantage of using Configuration Two over Configuration One, is that there is no single point of failure.


Now, let's say we install PowerPath on the host. With PowerPath, the host has the potential to do two things. First, it allows the host to initiate the Trespass of the LUN. With PowerPath on the host, if there is a path failure (HBA gone bad, switch down, etc...), the host will issue the trespass command to the SPs, and the SPs will move the LUN, temporarily, from SP A to SP B. The second advantage of PowerPath on a host, is that it allows the host to 'Load Balance' data from the host. Again, this has nothing to do with load balancing the Clariion SPs. We will get there later. However, in Configuration Two, we only have one connection from the host to SP A. This is the only path the host has and will use to move data for this LUN.


CONFIGURATION THREE


In Configuration Three, hardware wise, we have the same as Configuration Two. However, notice that we have a few more cables running from the switches to the Storage Processors. HBA1 is into the switch and zoned and cabled to SP A and SP B. HBA2 is into the switch and zoned and cabled to SP A and SP B. What this does now is to give HBA1 and HBA2 an 'Active Path' to SP A, and HBA1 and HBA2, 'Standby Paths' to SP B. Because of this, the Host now can route data down each active path to the Clariion, allowing the host "Load Balancing" capabilities. Also, the only time a LUN should trespass from one SP to another is if there is a Storage Processor failure. If the host were to lose HBA1, it still has HBA2 with an active path to the Clariion. The same goes for a switch failure and connection failure.