Archive for September, 2008

NYC initiation: rental application

We recently signed a lease to rent in UES. Besides the usual credit check, most places in NYC ask for a slew of personal information: bank statements (with balances and account numbers), federal tax return and W-2 copies, letter of employment stating yearly salary, and three character reference letters.  (As for the landlord, I only know her name)

I’m told that managed buildings may skip some of these, but the apartment we found is in a condominium. Even though the landlord had already approved us, our broker prepared all the paperwork to a tee for the upcoming condo board review.

He even sent us some anonymized character reference letter samples.  Some were quite amusing.  For example (emphasis mine):

[…] I have always found him to be serious and responsible about his works [sic] and his private life. His home life is extremely quiet, and I would think ideal for his neighbors. Virtually all of his social gatherings are conducted in restaurants. He travels throughout nine months of the year and would probably be at home for only short periods of time between those trips. And quite frankly, his time at home is usually spent resting as part of his recovery from his traveling and preparation for his next trip. He is just the kind of quiet, unobtrusive neighbor that I would like to have.

I couldn’t help but wonder how many boards went through letters like this one. For a moment or two, I entertained the thought of asking a friend to write a pithy one-liner instead:

Spiros = corpse – odor + money    ⇒    Spiros = dream tenant !

but I eventually decided that the “⇒” notation might be too much and dropped the idea altogether.

I just hope those sample letters do not really reflect life in NYC!

Comments (1)

Data harvesting with MapReduce

Combine harvesters
(original image source)

“The combine harvester, […] is a machine that combines the tasks of harvesting, threshing and cleaning grain crops.” If you have acres upon acres of wheat and want to separate the grain from the chaff, a group of combines is what you really want. If you have a bonsai tree and want to trim it, a harvester may be less than ideal.

MapReduce is like a pack of harvesters, well-suited for weeding through a huge volumes of data, residing on a distributed storage system. However, a lot of machine learning work is more akin to trimming bonsai into elaborate patterns. Vice versa, it’s not uncommon to see trimmers used to harvest a wheat field. Well-established and respected researchers, as recently as this year write in their paper “Planetary Scale Views on a Large Instant-messaging Network“:

We gathered data for 30 days of June 2006. Each day yielded about 150 gigabytes of compressed text logs (4.5 terabytes in total). Copying the data to a dedicated eight-processor server with 32 gigabytes of memory took 12 hours. Our log-parsing system employed a pipeline of four threads that parse the data in parallel, collapse the session join/leave events into sets of conversations, and save the data in a compact compressed binary format. This process compressed the data down to 45 gigabytes per day. Processing the data took an additional 4 to 5 hours per day.

Doing the math, that’s five full days of processing to parse and compress the data on a beast of a machine. Even more surprisingly, I found this exact quote singled out among all the interesting results in the paper! Let me make clear that I’m not criticizing the study; in fact, both the dataset and the exploratory analysis are interesting in many ways. However, it is somewhat surprising that, at least among the research community, such a statement is still treated more like a badge of honor rather than an admission of masochism.

The authors should be applauded for their effort. Me, I’m an impatient sod. Wait one day for the results, I think I can do that. Two days, what the heck. But five? For an exploratory statistical analysis? I’d be long gone before that. And what if I found a serious bug half way down the road? That’s after more than two days of waiting, in case you weren’t counting. Or what if I decided I needed a minor modification to extract some other statistic? Wait another five days? Call me a Matlab-spoiled brat, but forget what I said just now about waiting one day. I changed my mind already. A few hours, tops. But we need a lot more studies like this. Consequently, we need the tools to facilitate them.

Read the rest of this entry »

Comments (1)

Bad planning

Some visions do not really translate into plans.  Becoming rich, established or happy, for example.  It’s like saying “I want to be a superhero!” How do you go about that?

How to become Spiderman

Nope.  Not much of a plan.

Comments (1)