Taking a couple days off to attend the Distributed Systems geekfest at RICON 2012, at the W Hotel in San Francisco. It isn't particularly well-focused for my core skillset, but I really enjoy being challenged to think about different architectures and I am interested in the scalability problems faced by some of the other folks at the conference. Plus, I've been thinking about a design for a time-series database tool that would use Riak and I am hoping to get some useful design criticism from people with experience.
I don't intend to live-blog the festivities, but after a few sessions and a couple of really helpful conversations with people attending, I can see a couple clear trends. The concept of "eventual consistency" really dominates the discussions (as you might expect) but more importantly it really drives design. Hellerstein's keynote emphasized some work he and his team are doing to formalize this and define both analysis methodology as well as languages and tools to identify those areas in your code where you require transactions and consistency. As an Oracle DBA, my immediate reaction was to scoff ("the first time your code really needs consistency is when you want to read() data" - well duh). But as I reflect more and listen to the other discussions, I'm warming to the ideas. Frankly, the change of thinking reminds me of another phase change I encountered when we first started using AWS, when the exhortation to "challenge your storage" kept coming back to haunt my attempts at design.
Another idea that I've heard several times today relates to something I like to think about as the master data problem. The core idea is to separate your read-heavy data from your write-heavy data, particularly for things like logging and notification data but also for the transaction base. This probably won't hold true for really important data like financial transactions, but it seems that if your problem set allows for any sort of latency between writing data and reading it (which would be obviously bad in an accounting application) there's likely an opportunity to apply some of Hellerstein's concepts to refactoring your world.
In the stuff I support at work, the most obvious dimension for a refactor is to remove the master data (client organizations, those organizations' users, the entitlement for those users, etc) out of the custom application and into a back-office system like (name your favorite) CRM. In the external web applications, that master data need not ever be updated by the users themselves (ie nobody ever changes the name of the organization they work for). So despite that data being incredibly important for everything that goes on in the application, it isn't actually ever involved in a transaction. To put it another way, that data could be productively queried once per day, cached in a k/v store of some sort, and not updated the rest of the day.
More to the point, even our transaction information is amenable to an eventually-consistent design view. Our clients use a well-defined process workflow, so at any given point in the process we have a solid idea of who might be changing data, and what data they would be touching. As I'm coming to think about it, this can fit with eventual consistency with some very minor changes to the way we think about our data. Not to our data model, simply to the way we make it available to people to use. It's a pretty small step from marshalling-for-Angular to a well-defined conflict resolution model.
Now in our case, Riak is overkill. But I really love the party game where you challenge one of the assumptions of your product and think about how redesigning around a different principle (or set of principles) would change your code base, support profile, team skills, scalability, or profitability (dream on, overhead boy). As Hellerstein's ideas have infiltrated deeper into my thinking, I've found some really useful ideas to simplify the way our applications work with our data, and maybe nudged our future path a bit toward something that would allow eventual consistency rather than being built around traditional transactions.
One last thought. In a previous post, I hinted at an evolving set of ideas about NoSQL tools and where they fit in the ecosystem. Prior to this conference, those ideas I had were simply "obvious" ideas that were rattling around in my head. But the final presentation for the conference was Dr. Eric Brewer, talking about the road ahead for NoSQL as a toolset (apologies about not pointing to his presentation, I'll change this when I locate it). He laid out virtually everything I've had rattling around in my head, with clear motivations and directions, as well as the occasional hint about how little code a given change might require. Given that Brewer is "the guy", and that he presented my case better than I could have done, I am unlikely to go any farther with the notions. It's clear to me that the people at the head of this movement see everything I'm seeing, so I intend to use my scarce spare time to work on advancing some smaller features within the broader set.
As I finish up this post, the conference is over and long gone. I've caught up on sleep, followed up with people I met, even written some design docs for my pet time-series project (coming soon!). I can't fully express how much I enjoyed it, the Basho folks went way over the line on producing this one. I found myself constantly in the presence of people smarter and more experienced than me, with all kinds of ideas about systems and architectures. I used vacation time and my own money to attend, and received value back in spades. There's really no way to rave too much about it. If they put the conference on again next year, I'll be there.
the k1 project
This is not a blog.
Tuesday, October 16, 2012
Sunday, September 16, 2012
I have come to doubt the veracity...
Several years ago I was a subscriber to a website that discussed stocks, options, and the markets. While I was a member, I contributed blog posts summarizing some of the strategies the site owner described for trading. Those blog posts were published under the header "The K1 Project".
I stopped being a member on September 9, 2008. That was the day that a "stock club" containing money from some of the site's subscribers was wiped out by a margin call. Before leaving, I added a caveat to the top of the landing page for the stuff I had put together, stating that I did not believe the site owner's published trading performance was real. I used the word "veracity" in my parting shot, as well as a bunch of other pompous $5 words.
Once or twice each year I receive an email, asking me what I meant by my statement, which by now must seem like ancient Greek chiseled onto the frieze of a ruin somewhere. I try to be polite and honest about what I remember, without disparaging the site owner. The email messages are invariably anonymous gmail accounts, unverifiable, and (so far) the senders don't engage beyond their initial question.
A part of the reason for this blog is to provide a place that Google can find for any current members of the site who stumble across that caveat and wonder what it means, and for the small random chance that any of the former members decide they want to reach out. I keep the comments on this blog moderated to be certain that nobody accidentally mentions the name of the site or the site owner, and I don't intend to discuss the matter any further.
But as for you, wayward googler, if you were a fellow member, or lost money in the stock club, reach out and say hi.
I stopped being a member on September 9, 2008. That was the day that a "stock club" containing money from some of the site's subscribers was wiped out by a margin call. Before leaving, I added a caveat to the top of the landing page for the stuff I had put together, stating that I did not believe the site owner's published trading performance was real. I used the word "veracity" in my parting shot, as well as a bunch of other pompous $5 words.
Once or twice each year I receive an email, asking me what I meant by my statement, which by now must seem like ancient Greek chiseled onto the frieze of a ruin somewhere. I try to be polite and honest about what I remember, without disparaging the site owner. The email messages are invariably anonymous gmail accounts, unverifiable, and (so far) the senders don't engage beyond their initial question.
A part of the reason for this blog is to provide a place that Google can find for any current members of the site who stumble across that caveat and wonder what it means, and for the small random chance that any of the former members decide they want to reach out. I keep the comments on this blog moderated to be certain that nobody accidentally mentions the name of the site or the site owner, and I don't intend to discuss the matter any further.
But as for you, wayward googler, if you were a fellow member, or lost money in the stock club, reach out and say hi.
Thursday, September 6, 2012
NoSQL and Heavy Data
This post is OBE. I couldn't bring it to a useful conclusion, and left it alone for too long. A bunch of things have evolved in my thinking about NoSQL, as well as the work I'm doing with some of the tools. I'm publishing it only because it's a useful foil for some other things I want to write about in the future.
***
I have been paying more attention to the NoSQL world lately, and to all of the choices available. The Amazon paper about the early genesis of Dynamo is interesting, and seems to have influenced a bunch of people to create cool tools across a wide range of domains. I've also read both the Google Bigtable paper as well as the Percolator paper, and scanned through a presentation on F1. I'm interested in some of the scalability and performance characteristics available in the different toolsets, but also just interested in other ways to solve the persistence problem with web applications.
I've tried out a couple of systems, not extensively but enough to get a flavor for how they (the systems I've tried) work. My current favorite is redis, for its clean implementation of all those basic data types from that 1st level Data Structures & Algorithms class from college. I have a couple thoughts about redis, but first let's set the wayback machine to 2000 so I can get my crusty-DBA rant on.
The first non-relational database I ever used was probably Zope's ZODB. It has power and a couple weird limitations, but clearly provides more than enough capability to run large, complex concurrent/transactional systems (edit: for some circa-2000 definition of large and complex). Using Zope (and later Plone) also gave me my first taste of the hellish netherworld of upgrading that goes along with schema-less implementations.
In a relational database, you can establish constraints to ensure referential integrity, to limit the values that a column can assume, to require that master data be defined, and so on. If you really wanted to you could build relational schemas without constraints <snark>like a current project of mine</snark>, but in so doing you give up your ability to communicate your design to teams in the future. When a developer attempts to change system code against a well-defined model, they will quickly figure out any incorrect assumptions, by way of hard errors.
On the flip side, a poorly-constrained data model might allow all sorts of creativity, and will certainly not forbid it. Since we all know that what is not expressly forbidden is allowed (some programmers would say "encouraged"), if your data model is unconstrained it is implying to your future programmers that you *intend* for the design to be freely modified. Fast-forward to schema-less designs, and that intention is now a feature. Professionally, I seem to specialize in legacy systems that have been built with or de-constrained into spotty schemas, and so am extra-cranky about slapdash domain models.
During the heady pre-plone days of the Zope Content Management Framework (CMF), any mildly interesting feature you wanted to add to your code meant an upgrade cycle. You could walk the database tree looking for objects of your type, or you could search the catalog (if you'd registered your objects there). And then poke each existing data object into compliance with your new model. Very entertaining. Current versions of Plone are more explicit about the upgrade path, paying attention and providing some services for handling that transition, but upgrades are still fraught. And often don't work, stranding your data.
The same sort of thing happens in relational systems, obviously, although somewhat less severe since you can always query them. The only real difference I can think of is that in an RDBMS you are guaranteed that all the data in a table adheres to the same schema, even after you apply a change. That's both blessing and curse, of course, and is one of the motivations described in the Dynamo paper. SteveY blogged about it as well (angrily, in the Drunken Blog Rants), talking about his customer service days at Amazon.
The thing is, data has gravity (ht Colin Clark of DarkStar). It distorts its surrounding reality. I have a whole spittle-flecked rant about frameworks that assume my database is empty when I start a project. I don't care if you've poured your heart and soul and several years of your life into your snazzy new framework, if you are just now getting around to dealing with *existing* data, it's a toy. My data has weight. Like gold. It's worth money. I don't let toys touch my data. But holy moly, are there only two points on the Schema line? With, and Without?
So let's talk about redis. It's a really interesting piece of software, implementing a group of data structures that are very useful for a large range of projects. To date, I don't trust it with my data, so I only use it as a caching and synchronization mechanism (Ed: you and everybody else, genius. Everyone talks about the great features it has over and above memcached, not compared to Oracle). And I think it might have a namespace problem. But for a single application I can overlook the stuff that makes my inner DBA cringe, and revel in luscious feature-rich creaminess.
On both my mac (homebrew) and linux (apt) boxes, redis installed without a peep. Type 'redis-server' and you're live. (No password? No security? Inner DBA turns pale). I can go on and on about the features, but that ain't what I'm about.
As I savor my way through another delightful morsel (sorted sets? mmmmmmmm) the thought occurs to me that redis is an ideal prototype system for a large-scale deployment in AWS using SimpleDB and SQS. Of course, redis does quite a bit more than those two services in many areas, and quite a bit less in one small but important one (scale). But if you limit yourself to lists and hashes, you can easily build an application that runs locally against redis and remotely in AWS using the services. Why, you ask? Because you can then develop your application even when you don't have network, and yet easily deploy to an AWS setup if you find yourself needing an unreasonable amount of cpu power and insane scalability.
To bring this all back together, how does one take a (delicious, lovely) toolset like redis and apply any rigor? It has no data types, let alone schemas. You could encode object type and version in the tag (Person.v27.id13579), you could introspect your runtime objects and serialize them natively (eg pickles in python, Zope-style) or in json. Or, use Google protocol buffers and a bit of metadata. You could constrain yourself to hashes for data storage, and employ an envelope technique to encode metadata around your serialized objects. Can I just say it? Bleah. And a cursory reading of Google's F1 presentation suggests there are some people holding a similar opinion. Or at least, it assuages my ego to think so. (More on this idea, deeper support for structured data, in a later post).
But wait, there's more. MongoDB. CouchDB. Riak and Cassandra and Dynamo. Neo4J. Prophet. A bunch of other stuff I haven't heard of (LevelDB). Clearly, there is a serious need that the relational model doesn't address. Perhaps less clear is the fact that these tools lead to distributed architectures that challenge the notion of System of Record for a particular object/document/id. I'm all in favor of higher scalability and lighter weight to drive the applications, but only if I can eliminate any mismatches in expectations about duplicates and conflicts and transactions (or lack thereof) that the business teams may have floating around.
I'm doing a lot in redis lately, and leaning on Riak for a financial tick store problem I've been working. The more I learn about the available tools, and talk to people doing real work with heavy data, the less convinced I am that the current crop of NoSQL databases are anywhere near "done". Don't get me wrong, I'm enjoying the mental exercise of trying out different solutions to my problem space, but I'm beginning to formulate a concept about where all these tools are in their lifecycle and what that means to the data-is-gold crowd.
***
Apologies for the lack of useful conclusions. I think this is going somewhere but will take a while to get there. I'm attending the Basho RICON sessions in a few weeks, and really looking forward to some geekage, listening to some of the really big users talk about their experiences, and challenging my strawman tick-store architecture among people who have architecture experience.
***
I have been paying more attention to the NoSQL world lately, and to all of the choices available. The Amazon paper about the early genesis of Dynamo is interesting, and seems to have influenced a bunch of people to create cool tools across a wide range of domains. I've also read both the Google Bigtable paper as well as the Percolator paper, and scanned through a presentation on F1. I'm interested in some of the scalability and performance characteristics available in the different toolsets, but also just interested in other ways to solve the persistence problem with web applications.
I've tried out a couple of systems, not extensively but enough to get a flavor for how they (the systems I've tried) work. My current favorite is redis, for its clean implementation of all those basic data types from that 1st level Data Structures & Algorithms class from college. I have a couple thoughts about redis, but first let's set the wayback machine to 2000 so I can get my crusty-DBA rant on.
The first non-relational database I ever used was probably Zope's ZODB. It has power and a couple weird limitations, but clearly provides more than enough capability to run large, complex concurrent/transactional systems (edit: for some circa-2000 definition of large and complex). Using Zope (and later Plone) also gave me my first taste of the hellish netherworld of upgrading that goes along with schema-less implementations.
In a relational database, you can establish constraints to ensure referential integrity, to limit the values that a column can assume, to require that master data be defined, and so on. If you really wanted to you could build relational schemas without constraints <snark>like a current project of mine</snark>, but in so doing you give up your ability to communicate your design to teams in the future. When a developer attempts to change system code against a well-defined model, they will quickly figure out any incorrect assumptions, by way of hard errors.
On the flip side, a poorly-constrained data model might allow all sorts of creativity, and will certainly not forbid it. Since we all know that what is not expressly forbidden is allowed (some programmers would say "encouraged"), if your data model is unconstrained it is implying to your future programmers that you *intend* for the design to be freely modified. Fast-forward to schema-less designs, and that intention is now a feature. Professionally, I seem to specialize in legacy systems that have been built with or de-constrained into spotty schemas, and so am extra-cranky about slapdash domain models.
During the heady pre-plone days of the Zope Content Management Framework (CMF), any mildly interesting feature you wanted to add to your code meant an upgrade cycle. You could walk the database tree looking for objects of your type, or you could search the catalog (if you'd registered your objects there). And then poke each existing data object into compliance with your new model. Very entertaining. Current versions of Plone are more explicit about the upgrade path, paying attention and providing some services for handling that transition, but upgrades are still fraught. And often don't work, stranding your data.
The same sort of thing happens in relational systems, obviously, although somewhat less severe since you can always query them. The only real difference I can think of is that in an RDBMS you are guaranteed that all the data in a table adheres to the same schema, even after you apply a change. That's both blessing and curse, of course, and is one of the motivations described in the Dynamo paper. SteveY blogged about it as well (angrily, in the Drunken Blog Rants), talking about his customer service days at Amazon.
The thing is, data has gravity (ht Colin Clark of DarkStar). It distorts its surrounding reality. I have a whole spittle-flecked rant about frameworks that assume my database is empty when I start a project. I don't care if you've poured your heart and soul and several years of your life into your snazzy new framework, if you are just now getting around to dealing with *existing* data, it's a toy. My data has weight. Like gold. It's worth money. I don't let toys touch my data. But holy moly, are there only two points on the Schema line? With, and Without?
So let's talk about redis. It's a really interesting piece of software, implementing a group of data structures that are very useful for a large range of projects. To date, I don't trust it with my data, so I only use it as a caching and synchronization mechanism (Ed: you and everybody else, genius. Everyone talks about the great features it has over and above memcached, not compared to Oracle). And I think it might have a namespace problem. But for a single application I can overlook the stuff that makes my inner DBA cringe, and revel in luscious feature-rich creaminess.
On both my mac (homebrew) and linux (apt) boxes, redis installed without a peep. Type 'redis-server' and you're live. (No password? No security? Inner DBA turns pale). I can go on and on about the features, but that ain't what I'm about.
As I savor my way through another delightful morsel (sorted sets? mmmmmmmm) the thought occurs to me that redis is an ideal prototype system for a large-scale deployment in AWS using SimpleDB and SQS. Of course, redis does quite a bit more than those two services in many areas, and quite a bit less in one small but important one (scale). But if you limit yourself to lists and hashes, you can easily build an application that runs locally against redis and remotely in AWS using the services. Why, you ask? Because you can then develop your application even when you don't have network, and yet easily deploy to an AWS setup if you find yourself needing an unreasonable amount of cpu power and insane scalability.
To bring this all back together, how does one take a (delicious, lovely) toolset like redis and apply any rigor? It has no data types, let alone schemas. You could encode object type and version in the tag (Person.v27.id13579), you could introspect your runtime objects and serialize them natively (eg pickles in python, Zope-style) or in json. Or, use Google protocol buffers and a bit of metadata. You could constrain yourself to hashes for data storage, and employ an envelope technique to encode metadata around your serialized objects. Can I just say it? Bleah. And a cursory reading of Google's F1 presentation suggests there are some people holding a similar opinion. Or at least, it assuages my ego to think so. (More on this idea, deeper support for structured data, in a later post).
But wait, there's more. MongoDB. CouchDB. Riak and Cassandra and Dynamo. Neo4J. Prophet. A bunch of other stuff I haven't heard of (LevelDB). Clearly, there is a serious need that the relational model doesn't address. Perhaps less clear is the fact that these tools lead to distributed architectures that challenge the notion of System of Record for a particular object/document/id. I'm all in favor of higher scalability and lighter weight to drive the applications, but only if I can eliminate any mismatches in expectations about duplicates and conflicts and transactions (or lack thereof) that the business teams may have floating around.
I'm doing a lot in redis lately, and leaning on Riak for a financial tick store problem I've been working. The more I learn about the available tools, and talk to people doing real work with heavy data, the less convinced I am that the current crop of NoSQL databases are anywhere near "done". Don't get me wrong, I'm enjoying the mental exercise of trying out different solutions to my problem space, but I'm beginning to formulate a concept about where all these tools are in their lifecycle and what that means to the data-is-gold crowd.
***
Apologies for the lack of useful conclusions. I think this is going somewhere but will take a while to get there. I'm attending the Basho RICON sessions in a few weeks, and really looking forward to some geekage, listening to some of the really big users talk about their experiences, and challenging my strawman tick-store architecture among people who have architecture experience.
Thursday, August 2, 2012
AWS Load Spike o' Death
My company has been using Amazon Web Services for our production web architecture for a while now. We implemented during the summer months when our usage is a bit lighter, and slowly gained experience into how to make best use of the services for our needs. It's been a fun ride, and I'm a convert.
But there are definitely some aspects of using AWS that are unique, and one of them killed my service one day. There's an issue with EC2 instances encountering a load spike every day at about the same time. And I'm not the only one seeing it. Check out this search on the AWS Developer Forum.
Some of us have been hunting this problem for more than a year now. There are some qualities of the problem that we've been able to isolate, but we still have no knowledge of what's really happening.
The problem appeared for us one day as our load climbed to our normal peak levels. We run mod_perl inside Apache, which has something of a RAM footprint but nothing really noteworthy. Our system had been responding just fine at a load factor of about 0.6 when suddendly the load spiked up into the 30s. The machine locked up completely, taking our site offline. With no idea of what was going on, I was afraid we were under a DoS attack of some kind. It took us two hours to get our system back online, and then spent the rest of the day nervously monitoring performance and trying to figure what had happened.
The next day, the same thing happened, at essentially the same time. CPU spike to 30, complete meltdown. We were more prepared this time, and were able to get our service back online in a few minutes, although my business team were less than amused. Preparing for this to be a pattern, we started an overkill procedure, beefing up each of the links in our deployment architecture to handle ridiculous amounts of load. AWS makes this easy - Launch Instance - click!. Too much of everything was just enough.
The next day, the load spike did not disable our service, and we began a long process of watching each day for the spike, and trying to instrument our setup to figure out what was happening. We eliminated at least one theory per day for weeks. Actual load? DoS? Throttling? Database job? Timeout? cron? iptables? After weeks (months) of experiments, we still have no idea, and we still see the spike in our cpu traces, virtually every day.
For a while, it seemed like we were seeing a kernel bug, something really deep in the swap handling. The problem definitely gets really bad if we allow Apache to start up as many handlers as it would like to, and we run out of memory. No surprise. But using a patched kernel didn't make the spike go away. Nor did changing Linux distros.
After that, for a while I thought it might be a disk bandwidth problem, maybe an issue with access to EBS during internal AWS cleanup or backup processes. The problem definitely shows up in vmstat as a spike in the io(bi) measure, followed by a longer period of elevated io(bo). But switching our servers off of EBS didn't change the spike behavior either.
At this point, the load spike is a feature of my universe, a force of nature. Like the tide, it is predictable and reliable, but I have no knowledge of what hidden force is causing it. I can see its effect in the stream of questions on the AWS Developer forums but have no answers. We've learned to manage the spike with good system discipline and a couple minor architectural changes, and the system admin's traditional over-provisioning. My favorite conspiracy theory is that the load spike is a ploy by the Amazon folks to get everyone to upsize their EC2 instances, thus increasing revenue. But I don't actually think that.
For any wayward Googlers searching for relief from the same problem, I can recommend you try several configuration changes:
1. Most important: use top or smem to get some idea of how large your Apache server footprint is, and set MaxClients to a number that will fit within your existing RAM. That clips one horrible problem by preventing Apache from starting enough new servers to send your machine into swapping hell, effectively killing your service.
2. Set your MinSpareServers and MaxSpareServers relatively low. Ours are at 1 and 2, again to keep Apache from starting up gobs new servers when the load spike hits. (It seems to take some CPU for the VM to clear the problem, and starting up new processes makes that task take longer).
3. Spread out your load. Our workload has a strong seasonality component to it, so we bring up extra resources going into that period and take them down later. Even though the spike hits all of the servers at the same time, more cpu means faster clearing. If your loads are less predictable, you can always address the cpu spike problem by starting up an extra instance or two for the couple hours around "spike time" (22:00 - 00:00 GMT).
4. Try out a high-CPU EC2 instance type. Looking through the forums you can see that the problem hits hardest those people who run a small instance relatively close to the ceiling. That's what we were doing originally ("it's just apache, why should we allocate a lot of resources to it?") and the switch to c1.medium is huge even though the cost difference is minor.
5. Take advantage of caching. This is sort of a no-brainer, but we were letting our apache/mod_perl processes serve everything, including all of the static objects. This turned out to be a poor use of RAM, as the mod_perl libraries take up a ton of RAM and are completely unused serving statics. Adding in a caching tier drastically reduced the activity on our app tier, thus reducing the number of servers Apache thought we needed to run, thus reducing RAM demand, etc.
And please let me know if you find any clues about this crazy thing.
28-May-2013 Update:
The last time we touched our server configs, I made one minor change to test out the guess that this was somehow related to EBS use: set APACHE_LOG_DIR to point to local ephemeral storage rather than the EBS volume. The idea stemmed from our observation that new instances didn't seem susceptible to the load spike until they had been through some significant log-writing, leading us to wonder if the spike itself somehow originated out of the EBS channel.
We have recently been through another of our load periods, and have seen the load spike appear in our logs again, with one significant difference: we only see the spike on our database machine. The various apache servers (caching and app tiers) show no sign of the spike even during periods where the additional load is very clear on the database tier.
Though we still have no clear cause for the issue, it would seem that moving your Apache logs to local ephemeral disk can help.
But there are definitely some aspects of using AWS that are unique, and one of them killed my service one day. There's an issue with EC2 instances encountering a load spike every day at about the same time. And I'm not the only one seeing it. Check out this search on the AWS Developer Forum.
Some of us have been hunting this problem for more than a year now. There are some qualities of the problem that we've been able to isolate, but we still have no knowledge of what's really happening.
The problem appeared for us one day as our load climbed to our normal peak levels. We run mod_perl inside Apache, which has something of a RAM footprint but nothing really noteworthy. Our system had been responding just fine at a load factor of about 0.6 when suddendly the load spiked up into the 30s. The machine locked up completely, taking our site offline. With no idea of what was going on, I was afraid we were under a DoS attack of some kind. It took us two hours to get our system back online, and then spent the rest of the day nervously monitoring performance and trying to figure what had happened.
The next day, the same thing happened, at essentially the same time. CPU spike to 30, complete meltdown. We were more prepared this time, and were able to get our service back online in a few minutes, although my business team were less than amused. Preparing for this to be a pattern, we started an overkill procedure, beefing up each of the links in our deployment architecture to handle ridiculous amounts of load. AWS makes this easy - Launch Instance - click!. Too much of everything was just enough.
The next day, the load spike did not disable our service, and we began a long process of watching each day for the spike, and trying to instrument our setup to figure out what was happening. We eliminated at least one theory per day for weeks. Actual load? DoS? Throttling? Database job? Timeout? cron? iptables? After weeks (months) of experiments, we still have no idea, and we still see the spike in our cpu traces, virtually every day.
For a while, it seemed like we were seeing a kernel bug, something really deep in the swap handling. The problem definitely gets really bad if we allow Apache to start up as many handlers as it would like to, and we run out of memory. No surprise. But using a patched kernel didn't make the spike go away. Nor did changing Linux distros.
After that, for a while I thought it might be a disk bandwidth problem, maybe an issue with access to EBS during internal AWS cleanup or backup processes. The problem definitely shows up in vmstat as a spike in the io(bi) measure, followed by a longer period of elevated io(bo). But switching our servers off of EBS didn't change the spike behavior either.
At this point, the load spike is a feature of my universe, a force of nature. Like the tide, it is predictable and reliable, but I have no knowledge of what hidden force is causing it. I can see its effect in the stream of questions on the AWS Developer forums but have no answers. We've learned to manage the spike with good system discipline and a couple minor architectural changes, and the system admin's traditional over-provisioning. My favorite conspiracy theory is that the load spike is a ploy by the Amazon folks to get everyone to upsize their EC2 instances, thus increasing revenue. But I don't actually think that.
For any wayward Googlers searching for relief from the same problem, I can recommend you try several configuration changes:
1. Most important: use top or smem to get some idea of how large your Apache server footprint is, and set MaxClients to a number that will fit within your existing RAM. That clips one horrible problem by preventing Apache from starting enough new servers to send your machine into swapping hell, effectively killing your service.
2. Set your MinSpareServers and MaxSpareServers relatively low. Ours are at 1 and 2, again to keep Apache from starting up gobs new servers when the load spike hits. (It seems to take some CPU for the VM to clear the problem, and starting up new processes makes that task take longer).
3. Spread out your load. Our workload has a strong seasonality component to it, so we bring up extra resources going into that period and take them down later. Even though the spike hits all of the servers at the same time, more cpu means faster clearing. If your loads are less predictable, you can always address the cpu spike problem by starting up an extra instance or two for the couple hours around "spike time" (22:00 - 00:00 GMT).
4. Try out a high-CPU EC2 instance type. Looking through the forums you can see that the problem hits hardest those people who run a small instance relatively close to the ceiling. That's what we were doing originally ("it's just apache, why should we allocate a lot of resources to it?") and the switch to c1.medium is huge even though the cost difference is minor.
5. Take advantage of caching. This is sort of a no-brainer, but we were letting our apache/mod_perl processes serve everything, including all of the static objects. This turned out to be a poor use of RAM, as the mod_perl libraries take up a ton of RAM and are completely unused serving statics. Adding in a caching tier drastically reduced the activity on our app tier, thus reducing the number of servers Apache thought we needed to run, thus reducing RAM demand, etc.
And please let me know if you find any clues about this crazy thing.
28-May-2013 Update:
The last time we touched our server configs, I made one minor change to test out the guess that this was somehow related to EBS use: set APACHE_LOG_DIR to point to local ephemeral storage rather than the EBS volume. The idea stemmed from our observation that new instances didn't seem susceptible to the load spike until they had been through some significant log-writing, leading us to wonder if the spike itself somehow originated out of the EBS channel.
We have recently been through another of our load periods, and have seen the load spike appear in our logs again, with one significant difference: we only see the spike on our database machine. The various apache servers (caching and app tiers) show no sign of the spike even during periods where the additional load is very clear on the database tier.
Though we still have no clear cause for the issue, it would seem that moving your Apache logs to local ephemeral disk can help.
Monday, July 16, 2012
ASM on AWS
Note to readers: like much in the tech world, docs go obsolete really quickly. This post is just such a document. As of summer of 2013, we converted our hand-built Oracle server to use Amazon RDS instead. RDS is a DBA-killer. Unless you have specific Oracle needs that are not met with RDS, you are better off with RDS.
And really, who wants to have an in-house DBA anyway?
+++++
We use Oracle's database server as the core of our persistence layer. It's probably the main reason I have my job, otherwise I would expect a business (and tech team) this size to operate on a different technology stack (LAMP or whatever you call SS+ASP). But Oracle's database is a kick-ass piece of infrastructure. You can do a *lot* with it, if you know how to configure and maintain it.
As a DBA, one of the many things I really like about the 11g series is how much less maintenance work is on my shoulders. I imagine in a large installation the DBAs still want to handle a bunch of the daily/regular tasks with logs and backups and such, but I really like having jobs scheduled in Enterprise Manager and occasionally checking to make sure they're happy. Along those same lines, there's a feature in the 11g family called ASM (Automated Storage Management), a low-level database service that handles most (all?) of the storage issues that DBAs are used to touching. Some DBAs may not like ASM, but I dig it and would not want to give up the storage management.
But here's the issue: we also use Amazon AWS for our production infrastructure. AWS is a tremendously cool collection of infrastructure services, and you can do a *lot* with it (them) if you know how to configure and maintain whatever you set up. And you can run Oracle inside AWS, either as a managed service (RDS) or on your own instance. There are machine images available with current versions of Oracle fully installed and configured, and of course you can install it yourself. This post is about a couple of the tricky bits along with the crux move in doing that very thing, with a tiny bit about motivation.
One of the things you will notice in using a pre-built AMI is that there are no ASM-configured images. I haven't done an install for a while, but when I was looking, there were no AMIs with grid services installed. There's at least one simple reason for this: you can't run RAC (grid) without shared-access low-level storage, and outside of installing your own virtual SAN, you can't share access to Amazon's EBS volumes.
But ASM requires grid, and ASM is awesome even without RAC. So how does an enterprising DBA get an Oracle database running on ASM in the cloud? (ahh, finally we get to the point, 5 paragraphs in).
Like a lot of the technology I dig around in, the answers themselves are simple, but the background involved in learning them is no small price to pay. In this case, there are a couple key decisions and one major crux move you must get right in order to get a successful result. I'm not going to document the entire install process, there are plenty of blogs out there that provide step-by-step instructions (eg: http://www.oracle-base.com/articles/11g/articles-11g.php ) and the details change based on what you're trying to accomplish. But there are a couple key details:
1. Pick the right AMI - Recent version of OEL with ASM support
Clearly, #3 is the crux move. It took me a lot of broken installs to get that maneuver right. For whatever reason, EC2 instances run in runlevel 4, not the 3 or 5 in a "normal" linux box. The trick is changing the installer's inittab to include run level 4 at the "7th inning stretch" in the process. That's the point when the binaries have been laid down on the drive and the installer pauses for you to run a couple of scripts as root. One of those scripts configures and starts up the cluster services, and needs the inittab to be right to do so. While the install is paused, and before you run the root scripts, edit the inittab file in bullet 3 above to include runlevel 4. You want it to look like this:
h1:345:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
Just to clear, the tricky part is to know that you've reached the 7th inning stretch, and that the next thing to run is going to copy the ohasd inittab config to the system's inittab and try to start up cluster services. If you don't get the right file modified at the right time, ohasd won't start and your whole setup is screwed. You might as well nuke it completely and start over from scratch. It might be possible to remove and reinstall without a reboot, I don't remember ever accomplishing that feat. The good news is that if you do modify it correctly, at the correct time, ohasd will start and you're good to go.
And really, who wants to have an in-house DBA anyway?
+++++
We use Oracle's database server as the core of our persistence layer. It's probably the main reason I have my job, otherwise I would expect a business (and tech team) this size to operate on a different technology stack (LAMP or whatever you call SS+ASP). But Oracle's database is a kick-ass piece of infrastructure. You can do a *lot* with it, if you know how to configure and maintain it.
As a DBA, one of the many things I really like about the 11g series is how much less maintenance work is on my shoulders. I imagine in a large installation the DBAs still want to handle a bunch of the daily/regular tasks with logs and backups and such, but I really like having jobs scheduled in Enterprise Manager and occasionally checking to make sure they're happy. Along those same lines, there's a feature in the 11g family called ASM (Automated Storage Management), a low-level database service that handles most (all?) of the storage issues that DBAs are used to touching. Some DBAs may not like ASM, but I dig it and would not want to give up the storage management.
But here's the issue: we also use Amazon AWS for our production infrastructure. AWS is a tremendously cool collection of infrastructure services, and you can do a *lot* with it (them) if you know how to configure and maintain whatever you set up. And you can run Oracle inside AWS, either as a managed service (RDS) or on your own instance. There are machine images available with current versions of Oracle fully installed and configured, and of course you can install it yourself. This post is about a couple of the tricky bits along with the crux move in doing that very thing, with a tiny bit about motivation.
One of the things you will notice in using a pre-built AMI is that there are no ASM-configured images. I haven't done an install for a while, but when I was looking, there were no AMIs with grid services installed. There's at least one simple reason for this: you can't run RAC (grid) without shared-access low-level storage, and outside of installing your own virtual SAN, you can't share access to Amazon's EBS volumes.
But ASM requires grid, and ASM is awesome even without RAC. So how does an enterprising DBA get an Oracle database running on ASM in the cloud? (ahh, finally we get to the point, 5 paragraphs in).
Like a lot of the technology I dig around in, the answers themselves are simple, but the background involved in learning them is no small price to pay. In this case, there are a couple key decisions and one major crux move you must get right in order to get a successful result. I'm not going to document the entire install process, there are plenty of blogs out there that provide step-by-step instructions (eg: http://www.oracle-base.com/articles/11g/articles-11g.php ) and the details change based on what you're trying to accomplish. But there are a couple key details:
1. Pick the right AMI - Recent version of OEL with ASM support
- I prefer OEL without the database software
- choose the most-recent version supported in the public yum repos
- I like to use ephemeral for swap and scratch space
- keep 100G of eph1 unallocated (unpartitioned) for a rainy day
- change ohasd's inittab during 7th inning stretch (root.sh)
- change this: $GRID_HOME/crs/install/inittab
- ohasd starts just fine if you've done it right
Clearly, #3 is the crux move. It took me a lot of broken installs to get that maneuver right. For whatever reason, EC2 instances run in runlevel 4, not the 3 or 5 in a "normal" linux box. The trick is changing the installer's inittab to include run level 4 at the "7th inning stretch" in the process. That's the point when the binaries have been laid down on the drive and the installer pauses for you to run a couple of scripts as root. One of those scripts configures and starts up the cluster services, and needs the inittab to be right to do so. While the install is paused, and before you run the root scripts, edit the inittab file in bullet 3 above to include runlevel 4. You want it to look like this:
h1:345:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
Just to clear, the tricky part is to know that you've reached the 7th inning stretch, and that the next thing to run is going to copy the ohasd inittab config to the system's inittab and try to start up cluster services. If you don't get the right file modified at the right time, ohasd won't start and your whole setup is screwed. You might as well nuke it completely and start over from scratch. It might be possible to remove and reinstall without a reboot, I don't remember ever accomplishing that feat. The good news is that if you do modify it correctly, at the correct time, ohasd will start and you're good to go.
Thursday, July 12, 2012
Ready...Rant
Let's start this off with a rant. A StackOverflow rant. About human behavior, motivations, and going on tilt. I've been meaning to start posting to this blog for a couple months now, and even have a rack of long, technical write-ups already done, but haven't felt motivated to start boring people. But today I'm on tilt, which is a perfect time to launch a verbal barrage of BS on a fundamentally meaningless interchange on a site intended to help people like me. And maybe you.
I like StackOverflow. I use it almost every day, or one of its kindred StackExchange sites. Like Wikipedia, I learn something most of the time I go there, and have found answers to tricky technical problems reading other people's issues. A bunch of my friends work SO regularly, with reputations so high you wish they were bank balances. I think it's a good concept, and I wish them all success. And I enjoy Jeff Atwood's blog as well. So just to be clear, nothing in this rant should be interpreted as a problem with SO.
My issue with SO is that they don't know I'm not evil. I have very little reputation. So little that if it were my bank balance, I would be worried. And the way SO works, that means I'm not allowed to post comments, which means I'm not allowed to add on to a conversation unless I post an answer. Worse, it means I'm not able to add on a simple tweak to a good, already-accepted answer, even if that answer pointed me down the right path to the actual answer.
It's a bummer, but easily fixed: do more on SO. One of my friends with the huge reputation balance mentioned that the ability to comment comes at around 50 and that SO becomes a lot nicer at 100, so that has become my near-term goal. Mind you, not because I particularly want to help out the SO community, but because I want to be able to use SO more effectively for myself. Motivations.
Lately, things at my work have been a bit more relaxed, so I have put a little time into looking for questions to answer. And yesterday I thought I'd hit the jackpot: an unanswered question with a +150 bounty, on an attractive but somewhat esoteric Java technology I implemented for my company. I have direct experience suffering from the issues the OP was asking about, and remember the time it took me to get them nailed down (not to suck up, but the OP was making a good decision to spend 150 points to save those 6-8 hours). I have code in my version control system implementing those precise questions, and I remember the secret path through the online docs to get from nothing-works to wow-cool.
I wrote up my answer, structured to address the OP's direct questions. I linked to the correct API docs, differentiated between the two APIs you need to understand to drive the thing, and pointed out how to use the only standalone code example (from an earlier version of the API) to prototype the end solution. I saved my work offline, checked to make sure nobody else had answered the question while I was typing, and then submitted my answer. That's a good answer, I thought. Would have saved me a number of hours if I'd had that as a resource a year ago. Motivations.
This morning, I took a look at SO to see if any of my other answers had been accepted, and discovered that somebody had top-posted my answer on the bounty question. Same format, many of the same links, but wrong. @top-poster has clearly googled for the information but hasn't actually used either of APIs, and in fact referenced API-2 in a situation that requires API-1. But here's the thing: @top-poster formatted his (her?) answer better than I did, and garnered a vote. Erk. TILT.
Seriously? Somebody voted for that answer because it was formatted with bullet points? Somebody who doesn't understand the technology enough to know whether the answer is correct? Neither @top-poster nor whoever upvoted that answer have written the java, because if they had they would know you can't do that with API-2. It doesn't have those facilities.
Of course, this is where it gets ironic and funny. If I had the ability to post comments on SO, I would simply add a comment to @top-poster's answer saying: +1 Good answer, nicely written. I think you meant API-1 in your first bullet. Then @top-poster would edit his (her?) answer, collect the points, and anyone venturing into SO later would get a good answer to starting out with this difficult java technology.
But I don't have the ability to post comments, which is why I attempted to answer the bounty question. And because I want the bounty, I'm unwilling to improve @top-poster's answer, even if it means the OP accepts an incorrect answer. Instead, I edited my answer to use bullet points, copying @top-poster's answer. The OP will see two identical-looking answers, only one of which is actually usable. But since the OP hasn't used the technology yet, they will not be able to tell which is correct until they go spend the time digging through the docs and examples -- the thing they were trying to avoid by posting the bounty in the first place.
The end result of this is, fundamentally, a blog post. I understand why SO works the way it does, and appreciate the lack of spammers they seem to have achieved. I need to find some more questions to answer to achieve SO-nirvana, and the OP will have to do the homework they were trying to shortcut. Hopefully @top-poster will forgive me for copying formats. To the person who voted for a nice-looking but incorrect answer: thanks, you got me to start my blog. I owe you a beer.
I like StackOverflow. I use it almost every day, or one of its kindred StackExchange sites. Like Wikipedia, I learn something most of the time I go there, and have found answers to tricky technical problems reading other people's issues. A bunch of my friends work SO regularly, with reputations so high you wish they were bank balances. I think it's a good concept, and I wish them all success. And I enjoy Jeff Atwood's blog as well. So just to be clear, nothing in this rant should be interpreted as a problem with SO.
My issue with SO is that they don't know I'm not evil. I have very little reputation. So little that if it were my bank balance, I would be worried. And the way SO works, that means I'm not allowed to post comments, which means I'm not allowed to add on to a conversation unless I post an answer. Worse, it means I'm not able to add on a simple tweak to a good, already-accepted answer, even if that answer pointed me down the right path to the actual answer.
It's a bummer, but easily fixed: do more on SO. One of my friends with the huge reputation balance mentioned that the ability to comment comes at around 50 and that SO becomes a lot nicer at 100, so that has become my near-term goal. Mind you, not because I particularly want to help out the SO community, but because I want to be able to use SO more effectively for myself. Motivations.
Lately, things at my work have been a bit more relaxed, so I have put a little time into looking for questions to answer. And yesterday I thought I'd hit the jackpot: an unanswered question with a +150 bounty, on an attractive but somewhat esoteric Java technology I implemented for my company. I have direct experience suffering from the issues the OP was asking about, and remember the time it took me to get them nailed down (not to suck up, but the OP was making a good decision to spend 150 points to save those 6-8 hours). I have code in my version control system implementing those precise questions, and I remember the secret path through the online docs to get from nothing-works to wow-cool.
I wrote up my answer, structured to address the OP's direct questions. I linked to the correct API docs, differentiated between the two APIs you need to understand to drive the thing, and pointed out how to use the only standalone code example (from an earlier version of the API) to prototype the end solution. I saved my work offline, checked to make sure nobody else had answered the question while I was typing, and then submitted my answer. That's a good answer, I thought. Would have saved me a number of hours if I'd had that as a resource a year ago. Motivations.
This morning, I took a look at SO to see if any of my other answers had been accepted, and discovered that somebody had top-posted my answer on the bounty question. Same format, many of the same links, but wrong. @top-poster has clearly googled for the information but hasn't actually used either of APIs, and in fact referenced API-2 in a situation that requires API-1. But here's the thing: @top-poster formatted his (her?) answer better than I did, and garnered a vote. Erk. TILT.
Seriously? Somebody voted for that answer because it was formatted with bullet points? Somebody who doesn't understand the technology enough to know whether the answer is correct? Neither @top-poster nor whoever upvoted that answer have written the java, because if they had they would know you can't do that with API-2. It doesn't have those facilities.
Of course, this is where it gets ironic and funny. If I had the ability to post comments on SO, I would simply add a comment to @top-poster's answer saying: +1 Good answer, nicely written. I think you meant API-1 in your first bullet. Then @top-poster would edit his (her?) answer, collect the points, and anyone venturing into SO later would get a good answer to starting out with this difficult java technology.
But I don't have the ability to post comments, which is why I attempted to answer the bounty question. And because I want the bounty, I'm unwilling to improve @top-poster's answer, even if it means the OP accepts an incorrect answer. Instead, I edited my answer to use bullet points, copying @top-poster's answer. The OP will see two identical-looking answers, only one of which is actually usable. But since the OP hasn't used the technology yet, they will not be able to tell which is correct until they go spend the time digging through the docs and examples -- the thing they were trying to avoid by posting the bounty in the first place.
The end result of this is, fundamentally, a blog post. I understand why SO works the way it does, and appreciate the lack of spammers they seem to have achieved. I need to find some more questions to answer to achieve SO-nirvana, and the OP will have to do the homework they were trying to shortcut. Hopefully @top-poster will forgive me for copying formats. To the person who voted for a nice-looking but incorrect answer: thanks, you got me to start my blog. I owe you a beer.
Subscribe to:
Posts (Atom)