Paul Kuliniewicz » MBA After all, it could only cost you your life, and you got that for free. Mon, 28 Jan 2013 03:25:49 +0000 en-US hourly 1 How About That Wed, 25 May 2005 05:47:22 +0000 /?p=304

Dear Olga, Chris, Bruce, Paul and Jan,

We have now completed the reviewing stage over 209 papers for ECCB/JBI 2005. We are delighted to tell you that your long paper, entitled:

“Reconsidering Complete Search Algorithms for Protein Backbone NMR Assignment”

has been provisionally accepted for ECCB/JBI 2005.

For those of you keeping score at home, this is a paper talking about the project I was working on last summer.

]]> 1
Squeezing out more performance Tue, 10 Aug 2004 02:26:40 +0000 /?p=143 Vitek and I talked about an idea he had to make the exhaustive search in MBA a bit more efficient. Basically, it involves the program being able to remember things it’s tried before, and not bother trying them again if it knows it won’t work. As a result, you can eliminate consideration of lots of sets of constraints right away. While each one wouldn’t have taken long to evaluate, being able to skip over dozens of a time should still be a win.

Actually, the idea was pretty similar to one I had been kicking around before leaving for vacation two weeks ago, but the approach I had had in mind wouldn’t have worked. Vitek’s is the same basic idea, but done in a way that shrinks the search space too aggressively. Early tests suggest it could produce a noticeable speedup in execution time, which is always good, and should work especially well for the nastier data sets.

I don’t know if I ever mentioned here that I made a glorious hack on top of Doug Lea’s FJTask framework to allow it to be used to move tasks between computers instead of just between threads. Anyway, there’s a new guy in the lab, and he’s working on adapting that glorious hack into a more generalized remote computing system. That’s pretty cool. Unfortunately right now he’s stuck with backporting the current MBA code to Java 1.3 so that it’ll run on his customized VM. Doing so turns up all sorts of differences between Java 1.3 and Java 1.4; for example, did you know that Double.parseDouble("NaN") throws an exception in 1.3, but does just what you expect in 1.4?

]]> 0
I’m awesome and I didn’t even realize it Sat, 24 Jul 2004 04:16:26 +0000 /?p=133 So this week I was playing around with MBA, working on some experimental code to perform a stochastic search over the solution space instead of an exhaustive search. The plan was to Somehow choose to fix the number of missings in certain ranges in the protein at each iteration based on the results of the previous iteration. Not wanting to rush into this without some idea of what would make a good Somehow, I threw together a quick-and-dirty exhaustive search that used the same constraints mechanism that stochastic search would.

Guess what.

The constrained-exhaustive search appears to run an order of magnitude faster than the ordinary exhaustive search. As in, what used to take over an hour now takes twelve minutes. As in, what used to take over a day now takes two and a half hours! Constrained-exhaustive search even plays more nicely with RMI-based parallelization than the old exhaustive search.


I certainly wasn’t expecting that that I threw together after making big simplifications to the ideas Vitek and I had talked about, and intended to use only to see what happens when you throw constraints into the mix.

The moral? Try the stupidly simple approach first. You might be surprised just how well it actually works.

]]> 2
It only works if it works Thu, 15 Jul 2004 04:59:06 +0000 /?p=130 The only tangentially relevant intro: it turns out there’s a web page for the project I’m getting paid to work on. I don’t know how much of my code is in the tarball up for download, though; whether or not it is, I know for a fact it isn’t documented. (The license is BSD-ish, so go nuts.)

Anyway, Prof. Vitek’s come to the conclusion that a big cause of the long run times of the program for some inputs is caused by large numbers of missing spin systems. Since missings can go anywhere in the protein, whereas well-defined spin systems can only fit into a few places, having lots of missings vastly increases the size of the search space. And by “vastly,” I mean “from minutes to hours to days.”

One idea we’ve discussed to attack this problem was to find a way to figure out which positions in the protein are likely to be assigned the missings. If you could do that, you wouldn’t have to consider missings for large stretches of the protein, taking a nice big worthless chunk out of the search space.

For a few of the smaller data sets, a strategy that worked pretty well was to do an initial run with maxMissings deliberately set too low. You’d still get results, but they wouldn’t be very good. However, they had enough information to let you figure out which positions were the trouble spots. If you flag those spots, you can try again with more missings but limit them to those spots. When this works, it works quite well; for one data set, it cut execution time down from about 70 minutes to about 5 minutes. Not too shabby.

Unfortunately, it only works well if it works in the first place. For some of the other data sets, setting maxMissings too low initially prevented it from finding any solutions, or even getting through the precomputation phase entirely. Since the strategy I was taking relied on getting some results — any results — from the initial run, there’s nothing you can do in that case. And as I spent couple days figuring out, there doesn’t appear to be any feasible way to look at the scoring information to deduce which positions are likely to be the trouble spots. At best, there’s a very weak correlation between penalized scores and likelihood of needing a missing, but nothing nearly strong enough to justify trying to use as a heuristic. So much for that.

Well, at least in the process I refactored some parts of the code, which will help with implementing some other approaches to cutting down the search space. Hopefully they’ll turn out to work better.

]]> 0
Success! Sat, 03 Jul 2004 04:55:02 +0000 /?p=125 Remember Fgf?

It works.

The performance of the program even turned out better than I had expected. Using the old version (i.e., before I did anything do it) of the program, running on a pretty beefy machine, it took over 129 hours (that’s more than 5 days) to process. Under the current version, running in parallel on six wimpier machines, it only took 25 hours. That’s about 5 times faster!

Fgf is now the nastiest input data that the program can successfully process. Next up, my good friend 5507. What’s interesting about him is that he’s never been processed successfully, and he appears to resist some of the tricks that let other data sets be processed more rapidly. (At least, I think so; there’s no “known good results” to see what I shoudl expect to get out of the program.)

]]> 2
Bored at Work Tue, 29 Jun 2004 19:11:11 +0000 /?p=122 Not much going on today in the lab. Right now there’s not really much more I can do with the code until I get the results of some test runs that are going. Seeing as how the original run of Fgf took over five days to finish on a pretty beefy machine, I’m not holding my breath for it to finish anytime soon on the makeshift cluster of six halfway-decent machines. I would be shocked if it finished before I come in tomorrow morning.

There’s only one other machine I have access to that’s powerful enough to make running the code worthwhile, and right now I have it busy profiling the program, which will probably take another couple of hours. All the other machines I can use either have too little memory (this beast of a program needs at least 1.5 GB to process non-trivial inputs, and Sun’s JVM on x86 machines won’t let you go past 2 GB anyway) or too little processing power to run the program on anything but trivial inputs.

More annoyingly, all the inputs I have available fall into two main categories. The trivial ones can usually finish in under an hour, but lack the size and parallelism to really test the code and hit the corner cases and performance bottlenecks. The non-trivial ones are monstrous, like the aforementioned Fgf or the notorious 5507 (which still hasn’t gone through a successful run of the program, ever). In between, nothing. And at this point, the program seems to handle all the non-trivial inputs just fine.

I’ve already taken care of pretty much all the remaining little things to do in the code and squashed the few minor bugs that have popped up, but at this point I won’t know what else needs to be done until these test runs either finish or bomb out. I’m fairly confident now that they will work, but I really need to see some evidence of that fact. And there’s no telling how long that will take.

Plus, everyone else who’s usually in the lab, including the professor I’m working with, is off presenting their real-time Java stuff somewhere, so I’m the only one in the lab today.

So, if you’re looking for excitement today, this isn’t the place to find it.

]]> 6
Dear Simulated Protein 5507 Tue, 22 Jun 2004 05:20:57 +0000 /?p=119 I will admit, you have been a worthy adversary. You’ve managed to foil my efforts to get this program to process you for over a week. Your voracious appetite for memory has forced me to go though the heart of the code, line by line, nulling out unneeded references so the garbage collector can do its job. You’ve exposed inherent flaws in my memory reclamation thread. And you’ve been sneaky about it, letting the program chug along for hours before bringing it to its knees, sending the computational servers into death throes as they used up the last few bytes of heap space.

But your days are numbered.

For fourteen hours, you have been struggling against the latest build. Struggling and failing. Yes, you have demonstrated a fundamental inefficiency in the memory reclamation logic, but that’s all it is — an inefficiency. It is only a matter of time before your spin system mappings are exposed for all to see. There is no more escape.

And soon I will crush what little is left of your ability to stymie my task parallelization and distribution logic. For while you have been busy fighting with — and failing against — the current build, I have devised a new strategy to keep you from bringing in your ally OutOfMemoryError. You will soon find there is no escape from the dark voodoo magic that is SoftReference. How will you manage to exhaust the heap when the Java VM itself is aware of your little games? There shall be no more hiding your massive data structures from the garbage collector.

It is over now.

]]> 2
1335 Program Thu, 17 Jun 2004 05:49:04 +0000 /?p=116 Yep, it’s one of those “here’s a bunch of random bits that don’t warrant their own posts” posts.

And by “bunch,” I mean “two and a third.”

Progress has been slow lately at the day job. Did you know it’s quite possible to run out of memory in a Java program even with 1 GB of heap? Well, that’s what happens when you run the program for moderately non-trivial inputs. The things I’ve done so far to take care of the problem have proven to delay the onset of OutOfMemoryErrors, but they still eventually happen. What’s really fun is how when memory gets low, it feels like you’re spending more time running the garbage collector than you are running the program itself. Looks like I get to figure out if I’m keeping unneeded references around or if three Solution objects (and the transitive closure of stuff they reference) really do take a gig of RAM.

I also find myself wondering if a C or C++ implementation would avoid the slowdowns caused by the Java garbage collector in low-memory situations. Sure, you don’t have the collector running, but then you have to manually malloc and free everything, and there’s a lot of allocating going on, and there’s a lot of data shared between structures. The costs of memory management headaches could quite easily not be worth eliminating GC overhead. Of course, the real solution is to figure out how to get the program to quit using such an ungodly amount of memory in the first place, if that’s indeed possible.

On the Kingdom of Loathing front, I just reached my favorite area so far: The Valley of Rof L’m Fao. All the monsters there are based on annoying Internet things: you have Spam Witches, XXX pr0n, 1335 H4xX0rZ (“he has a long way to go before becoming 1337″), Anime Smilies, etc. As an added bonus, the trick you need to get into the Valley is an awful, awful, awful pun.

Amy: I’ll put up a post with some of the recommendations you asked for later.

]]> 3
Java Programming Tip Tue, 15 Jun 2004 01:45:02 +0000 /?p=115 1 GB of heap + 95% live data + frequent allocations = performance nightmare

Trust me on this.

]]> 1
Distributed Programming is Tricky Fri, 11 Jun 2004 03:41:17 +0000 /?p=112 The story of how a seemingly obvious assumption can cause a program’s behavior to spiral out of control.

Remember that distributed program I’m developing on my day job? Here’s the outline of the strategy I’m currently using to allocate work to each computer:

  1. The first server that becomes ready receives a task that represents the entire computation.
  2. Each task, when executed, is either a leaf (no further computation necessary) or is split into one or more subtasks after some amount of computation. These subtasks are immediately queued for local execution.
  3. Whenever a server runs out of tasks to execute, it first sends a message to the client containing the cumulative results of all the tasks it has executed. The client merges all these intermediate results into the final result.
  4. Whenever a server runs out of tasks to execute, it then sends a message to the client. The client polls the busy servers, trying to steal a task sitting on the work queue. If it gets a task, the client relays it to the server.
  5. The client observes that the computation is complete when all servers are idle, waiting for more work.

With this strategy alone, the client has no way of knowing if one of the servers crashes or becomes unresponsive for some reason. So, while all of this is going on, the client periodically pings each server. If it doesn’t get a response, it assumes the server failed and requeues the last task sent to it for execution by another server. This way, whenever a server dies, that piece of the problem won’t also be lost.

See the problem yet? OK, technically you can’t, since this is a simplified description of what’s going on, although it is the meat of the problem. Now watch how this strategy snowballs out of control when things go wrong.

The protein I tried to run was a larger one than I usually do. The “expected results” for this test case were incomplete, since the old single-threaded version of the program ran out of memory before coming up with the solutions. For some reason, it didn’t occur to me that if a single-threaded version ran out of memory, then surely a multi-threaded version, doing more stuff at one time and thus using even more memory, would run into the same problem. But if I had realized that I might not have found this nice little bug.

Anyway, everything’s running fine. The parts of the algorithm being run on each server that are the most memory-intensive are coded to recover from out-of-memory errors automatically. However, running out of memory degrades the performance of the program. How much? Enough to cause the program not to respond to the client’s ping before the client gets a timeout.

The client, not getting a response back, assumes that server 2 (the server that didn’t reply) is no longer reachable. It tries to close the connection (not surprisingly, also not getting a response back) and requeues server 2′s task for the next available node. However, server 2 is in fact still running and processing normally, blissfully unaware that the client believes him to be dead!

Here’s where the interesting behavior starts. All the machines I’m using are essentially made of equivalent hardware, so if a set of tasks causes OOM on one of them, any of the machines will get OOM when it tries to run it. The same problem will happen to it as happened to the original server. Which is exactly what happened to server 5, which got reassigned server 2′s task. If this happens, we expect all the servers in the cluster to eventually fall to this problem until the client assumes all the servers are dead.

But remember, the servers in this case aren’t aware that the client has written them off. After a while, server 2 contacted the client with the results of its computation and asked for more work. The client wasn’t written to handle the case where a presumed-dead server contacts the client. The client seemed to handle the situation well, accepting the results and giving server 2 another job while marking it as busy instead of dead. But then the thread that pings the server comes along and tries to ping server 2. But when the client thought server 2 died, it nulled out the reference to server 2, so the pinging thread crashes with a NullPointerException. Now the client has no way of noticing when servers do in fact die! If one does, the client will wait forever for the client to send in its results.

But wait, it gets better! Suppose we fixed this problem in the client so that it can accept a resurrected server gracefully. Remember that we still have the task that triggers the OOM condition (from now on, called the Task of Doom) floating around. If no server in the cluster can execute the Task of Doom without OOMing and being temporarily presumed dead by the client, we’re in even worse shape than before. Since the task appears never to get completed, it keeps getting passed from server to server. Since the servers are coming back alive (from the client’s perspective — technically they never actually died), the client never runs out of servers to reassign the task to! The program still doesn’t terminate, but now all the nodes will be busy, each running through the computation of the Task of Doom without any help from its fellow servers, and whenever one finished, it just gets the Task of Doom reassigned to it again! In this case it would’ve been particularly nasty, because the Task of Doom, which was first assigned to server 2, was the original problem!

And all of this is because we assume that a server that’s still reachable can be pinged successfully. (OK, there are also several other contributing problems, but the pinging thing is what triggers the whole mess in the first place.)

]]> 0