Monday, July 16, 2012

ASM on AWS

Note to readers: like much in the tech world, docs go obsolete really quickly. This post is just such a document. As of summer of 2013, we converted our hand-built Oracle server to use Amazon RDS instead. RDS is a DBA-killer. Unless you have specific Oracle needs that are not met with RDS, you are better off with RDS.

And really, who wants to have an in-house DBA anyway?
+++++
We use Oracle's database server as the core of our persistence layer. It's probably the main reason I have my job, otherwise I would expect a business (and tech team) this size to operate on a different technology stack (LAMP or whatever you call SS+ASP). But Oracle's database is a kick-ass piece of infrastructure. You can do a *lot* with it, if you know how to configure and maintain it.

As a DBA, one of the many things I really like about the 11g series is how much less maintenance work is on my shoulders. I imagine in a large installation the DBAs still want to handle a bunch of the daily/regular tasks with logs and backups and such, but I really like having jobs scheduled in Enterprise Manager and occasionally checking to make sure they're happy.  Along those same lines, there's a  feature in the 11g family called ASM (Automated Storage Management), a low-level database service that handles most (all?) of the storage issues that DBAs are used to touching. Some DBAs may not like ASM, but I dig it and would not want to give up the storage management.

But here's the issue: we also use Amazon AWS for our production infrastructure.  AWS is a tremendously cool collection of infrastructure services, and you can do a *lot* with it (them) if you know how to configure and maintain whatever you set up. And you can run Oracle inside AWS, either as a managed service (RDS) or on your own instance. There are machine images available with current versions of Oracle fully installed and configured, and of course you can install it yourself. This post is about a couple of the tricky bits along with the crux move in doing that very thing, with a tiny bit about motivation.

One of the things you will notice in using a pre-built AMI is that there are no ASM-configured images. I haven't done an install for a while, but when I was looking, there were no AMIs with grid services installed. There's at least one simple reason for this: you can't run RAC (grid) without shared-access low-level storage, and outside of installing your own virtual SAN, you can't share access to Amazon's EBS volumes.

But ASM requires grid, and ASM is awesome even without RAC. So how does an enterprising DBA get an Oracle database running on ASM in the cloud?  (ahh, finally we get to the point, 5 paragraphs in).

Like a lot of the technology I dig around in, the answers themselves are simple, but the background involved in learning them is no small price to pay.  In this case, there are a couple key decisions and one major crux move you must get right in order to get a successful result. I'm not going to document the entire install process, there are plenty of blogs out there that provide step-by-step instructions (eg: http://www.oracle-base.com/articles/11g/articles-11g.php ) and the details change based on what you're trying to accomplish.  But there are a couple key details:

1. Pick the right AMI - Recent version of OEL with ASM support
  • I prefer OEL without the database software
  • choose the most-recent version supported in the public yum repos
2. Configure it properly - ephemeral and swap
  • I like to use ephemeral for swap and scratch space
  • keep 100G of eph1 unallocated (unpartitioned) for a rainy day
3. Install it correctly - grid partition, oracle home, eph
  • change ohasd's inittab during 7th inning stretch (root.sh)
  • change this: $GRID_HOME/crs/install/inittab 
  • ohasd starts just fine if you've done it right
4. Assuming grid setup went ok, do a normal db install

Clearly, #3 is the crux move. It took me a lot of broken installs to get that maneuver right. For whatever reason, EC2 instances run in runlevel 4, not the 3 or 5 in a "normal" linux box.  The trick is changing the installer's inittab to include run level 4 at the "7th inning stretch" in the process. That's the point when the binaries have been laid down on the drive and the installer pauses for you to run a couple of scripts as root. One of those scripts configures and starts up the cluster services, and needs the inittab to be right to do so. While the install is paused, and before you run the root scripts, edit the inittab file in bullet 3 above to include runlevel 4. You want it to look like this:

h1:345:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

Just to clear, the tricky part is to know that you've reached the 7th inning stretch, and that the next thing to run is going to copy the ohasd inittab config to the system's inittab and try to start up cluster services. If you don't get the right file modified at the right time, ohasd won't start and your whole setup is screwed. You might as well nuke it completely and start over from scratch. It might be possible to remove and reinstall without a reboot, I don't remember ever accomplishing that feat. The good news is that if you do modify it correctly, at the correct time, ohasd will start and you're good to go.

Thursday, July 12, 2012

Ready...Rant

Let's start this off with a rant. A StackOverflow rant. About human behavior, motivations, and going on tilt. I've been meaning to start posting to this blog for a couple months now, and even have a rack of long, technical write-ups already done, but haven't felt motivated to start boring people. But today I'm on tilt, which is a perfect time to launch a verbal barrage of BS on a fundamentally meaningless interchange on a site intended to help people like me. And maybe you.

I like StackOverflow. I use it almost every day, or one of its kindred StackExchange sites. Like Wikipedia, I learn something most of the time I go there, and have found answers to tricky technical problems reading other people's issues. A bunch of my friends work SO regularly, with reputations so high you wish they were bank balances. I think it's a good concept, and I wish them all success. And I enjoy Jeff Atwood's blog as well. So just to be clear, nothing in this rant should be interpreted as a problem with SO.

My issue with SO is that they don't know I'm not evil. I have very little reputation. So little that if it were my bank balance, I would be worried. And the way SO works, that means I'm not allowed to post comments, which means I'm not allowed to add on to a conversation unless I post an answer. Worse, it means I'm not able to add on a simple tweak to a good, already-accepted answer, even if that answer pointed me down the right path to the actual answer.

It's a bummer, but easily fixed: do more on SO. One of my friends with the huge reputation balance mentioned that the ability to comment comes at around 50 and that SO becomes a lot nicer at 100, so that has become my near-term goal. Mind you, not because I particularly want to help out the SO community, but because I want to be able to use SO more effectively for myself. Motivations.

Lately, things at my work have been a bit more relaxed, so I have put a little time into looking for questions to answer. And yesterday I thought I'd hit the jackpot: an unanswered question with a +150 bounty, on an attractive but somewhat esoteric Java technology I implemented for my company. I have direct experience suffering from the issues the OP was asking about, and remember the time it took me to get them nailed down (not to suck up, but the OP was making a good decision to spend 150 points to save those 6-8 hours). I have code in my version control system implementing those precise questions, and I remember the secret path through the online docs to get from nothing-works to wow-cool.

I wrote up my answer, structured to address the OP's direct questions. I linked to the correct API docs, differentiated between the two APIs you need to understand to drive the thing, and pointed out how to use the only standalone code example (from an earlier version of the API) to prototype the end solution. I saved my work offline, checked to make sure nobody else had answered the question while I was typing, and then submitted my answer. That's a good answer, I thought. Would have saved me a number of hours if I'd had that as a resource a year ago. Motivations.

This morning, I took a look at SO to see if any of my other answers had been accepted, and discovered that somebody had top-posted my answer on the bounty question. Same format, many of the same links, but wrong. @top-poster has clearly googled for the information but hasn't actually used either of APIs, and in fact referenced API-2 in a situation that requires API-1. But here's the thing: @top-poster formatted his (her?) answer better than I did, and garnered a vote. Erk. TILT.

Seriously? Somebody voted for that answer because it was formatted with bullet points? Somebody who doesn't understand the technology enough to know whether the answer is correct? Neither @top-poster nor whoever upvoted that answer have written the java, because if they had they would know you can't do that with API-2. It doesn't have those facilities.

Of course, this is where it gets ironic and funny. If I had the ability to post comments on SO, I would simply add a comment to @top-poster's answer saying: +1 Good answer, nicely written. I think you meant API-1 in your first bullet. Then @top-poster would edit his (her?) answer, collect the points, and anyone venturing into SO later would get a good answer to starting out with this difficult java technology.

But I don't have the ability to post comments, which is why I attempted to answer the bounty question. And because I want the bounty, I'm unwilling to improve @top-poster's answer, even if it means the OP accepts an incorrect answer. Instead, I edited my answer to use bullet points, copying @top-poster's answer. The OP will see two identical-looking answers, only one of which is actually usable. But since the OP hasn't used the technology yet, they will not be able to tell which is correct until they go spend the time digging through the docs and examples -- the thing they were trying to avoid by posting the bounty in the first place.

The end result of this is, fundamentally, a blog post. I understand why SO works the way it does, and appreciate the lack of spammers they seem to have achieved. I need to find some more questions to answer to achieve SO-nirvana, and the OP will have to do the homework they were trying to shortcut. Hopefully @top-poster will forgive me for copying formats. To the person who voted for a nice-looking but incorrect answer: thanks, you got me to start my blog. I owe you a beer.