Monday, July 16, 2012

ASM on AWS

Note to readers: like much in the tech world, docs go obsolete really quickly. This post is just such a document. As of summer of 2013, we converted our hand-built Oracle server to use Amazon RDS instead. RDS is a DBA-killer. Unless you have specific Oracle needs that are not met with RDS, you are better off with RDS.

And really, who wants to have an in-house DBA anyway?
+++++
We use Oracle's database server as the core of our persistence layer. It's probably the main reason I have my job, otherwise I would expect a business (and tech team) this size to operate on a different technology stack (LAMP or whatever you call SS+ASP). But Oracle's database is a kick-ass piece of infrastructure. You can do a *lot* with it, if you know how to configure and maintain it.

As a DBA, one of the many things I really like about the 11g series is how much less maintenance work is on my shoulders. I imagine in a large installation the DBAs still want to handle a bunch of the daily/regular tasks with logs and backups and such, but I really like having jobs scheduled in Enterprise Manager and occasionally checking to make sure they're happy.  Along those same lines, there's a  feature in the 11g family called ASM (Automated Storage Management), a low-level database service that handles most (all?) of the storage issues that DBAs are used to touching. Some DBAs may not like ASM, but I dig it and would not want to give up the storage management.

But here's the issue: we also use Amazon AWS for our production infrastructure.  AWS is a tremendously cool collection of infrastructure services, and you can do a *lot* with it (them) if you know how to configure and maintain whatever you set up. And you can run Oracle inside AWS, either as a managed service (RDS) or on your own instance. There are machine images available with current versions of Oracle fully installed and configured, and of course you can install it yourself. This post is about a couple of the tricky bits along with the crux move in doing that very thing, with a tiny bit about motivation.

One of the things you will notice in using a pre-built AMI is that there are no ASM-configured images. I haven't done an install for a while, but when I was looking, there were no AMIs with grid services installed. There's at least one simple reason for this: you can't run RAC (grid) without shared-access low-level storage, and outside of installing your own virtual SAN, you can't share access to Amazon's EBS volumes.

But ASM requires grid, and ASM is awesome even without RAC. So how does an enterprising DBA get an Oracle database running on ASM in the cloud?  (ahh, finally we get to the point, 5 paragraphs in).

Like a lot of the technology I dig around in, the answers themselves are simple, but the background involved in learning them is no small price to pay.  In this case, there are a couple key decisions and one major crux move you must get right in order to get a successful result. I'm not going to document the entire install process, there are plenty of blogs out there that provide step-by-step instructions (eg: http://www.oracle-base.com/articles/11g/articles-11g.php ) and the details change based on what you're trying to accomplish.  But there are a couple key details:

1. Pick the right AMI - Recent version of OEL with ASM support
  • I prefer OEL without the database software
  • choose the most-recent version supported in the public yum repos
2. Configure it properly - ephemeral and swap
  • I like to use ephemeral for swap and scratch space
  • keep 100G of eph1 unallocated (unpartitioned) for a rainy day
3. Install it correctly - grid partition, oracle home, eph
  • change ohasd's inittab during 7th inning stretch (root.sh)
  • change this: $GRID_HOME/crs/install/inittab 
  • ohasd starts just fine if you've done it right
4. Assuming grid setup went ok, do a normal db install

Clearly, #3 is the crux move. It took me a lot of broken installs to get that maneuver right. For whatever reason, EC2 instances run in runlevel 4, not the 3 or 5 in a "normal" linux box.  The trick is changing the installer's inittab to include run level 4 at the "7th inning stretch" in the process. That's the point when the binaries have been laid down on the drive and the installer pauses for you to run a couple of scripts as root. One of those scripts configures and starts up the cluster services, and needs the inittab to be right to do so. While the install is paused, and before you run the root scripts, edit the inittab file in bullet 3 above to include runlevel 4. You want it to look like this:

h1:345:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

Just to clear, the tricky part is to know that you've reached the 7th inning stretch, and that the next thing to run is going to copy the ohasd inittab config to the system's inittab and try to start up cluster services. If you don't get the right file modified at the right time, ohasd won't start and your whole setup is screwed. You might as well nuke it completely and start over from scratch. It might be possible to remove and reinstall without a reboot, I don't remember ever accomplishing that feat. The good news is that if you do modify it correctly, at the correct time, ohasd will start and you're good to go.

No comments:

Post a Comment