I’m now experimenting with a separate tumblelog to post most random thoughts but, in the meantime, here you go Google: one more post about “pointless life (bits)”!
The post I wrote a few days ago about Android is all over the place. The right elements are in that post, but my composition and conclusions are somewhat incoherent. Perhaps I have been partly infected by the conventional thinking (of, e.g., various older, big corporations) and missed the obvious. Read the rest of this entry »
Every piece of content has a creator and owner (in this post, I will assume they are by default the same entity). I do not mean ownership in the traditional sense of, e.g., stashing a piece of paper in a drawer, but in the metaphysical sense that each artifact is forever associated with one or more “creators.”
This is certainly true of the end-products of intellectual labor, such as the article you are reading. However, it is also true of more mundane things, such as checkbook register entries or credit card activity. Whenever you pay a bill or purchase an item, you implicitly “create” a piece of content: the associated entry in your statement. This has two immediately identifiable “creators”: the payer (you) and the payee. The same is true for, e.g., your email, your IM chats, your web searches, etc. Interesting tidbit: over 20% of search terms entered daily in Google are new, which would imply roughly 20 million new pieces of content per day, or over 7 billion (over twice the earth’s population) per year—all this from just one activity on one website.
When I spend a few weeks working on, say, a research paper, I have certain expectations and demands about my rights as a “creator.” However, I give almost no thought to my rights on the trail of droppings (digital or otherwise) that I “create” each day, by searching the web, filling up the gas tank, getting coffee, going through a toll booth, swiping my badge, and so on. However, with the increasing ease of data collection and distribution in digital form, we should re-think our attitudes towards “authorship”.
Update: I’ll keep this post for the record, even though I’ve completely changed my mind.
I recently upgraded to a T-Mobile G1 (aka. HTC Dream), running Android. The G1 is a very nice and functional device. It’s also compact and decent looking, but perhaps not quite a fashion statement: unlike the iPhone my girlfriend got last year, which was immediately recognizable and a stare magnet, I pretty much have to slap people on the face with the G1 to make them look at it. Also, battery life is acceptable, but just barely. But this post is not about the G1, it’s about Android, which is Google’s Linux-based, open-source mobile application platform.
I’ll start with some light comments, by one of the greatest entertainers out there today: Monkey Boy made fun of the iPhone in January, stating that “Apple is selling zero phones a year“. Now he’s making similar remarks about Android, summarized by his eloquent “blah dee blah dee blah” argument. Less than a year after that interview, the iPhone is ahead of Windows Mobile in worldwide market share of smartphone operating systems (7M versus 5.5M devices). Yep, this guy sure knows how entertain—even if he makes a fool of himself and Microsoft.
Furthermore, Monkey Boy said that “if I went to my shareholder meeting […] and said, hey, we’ve just launched a new product that has no revenue model! […] I’m not sure that my investors would take that very well. But that’s kind of what Google’s telling their investors about Android.” Even if this were true, perhaps no revenue model is better than a simian model.
Anyway, someone from Microsoft should really know better—and quite likely he does, but can’t really say it out loud. There are some obvious parallels between Microsoft MS-DOS and Google Android: Read the rest of this entry »
What about advice for CS teachers and professors?
That it’s time for us to start being more honest with ourselves about what our field is and how we should approach teaching it. Personally, I think that if we had named the field “Information Engineering” as opposed to “Computer Science,” we would have had a better culture for the discipline. For example, CS departments are notorious for not instilling concepts like testing and validation the way many other engineering disciplines do.
Is there anything you wish someone had told you before you began your own studies?
Just that being technically strong is only one aspect of an education.
Alice has proven phenomenally successful at teaching young women, in particular, to program. What else should we be doing to get more women engaged in computer science?
Well, it’s important to note that Alice works for both women and men. I think female-specific “approaches” can be dangerous for lots of reasons, but approaches like Alice, which focus on activities like storytelling, work across gender, age, and cultural background. It’s something very fundamental to want to tell stories. And Caitlin Kelleher’s dissertation did a fantastic job of showing just how powerful that approach is.
The interview was conducted a few weeks before his death. I’ll just say that, somehow, I suspect someone not in his position would never have said at least one of these things. It’s a sad thought, but Randy’s message is, as always, positive.
Early this week we moved from White Plains to Manhattan. So far, we’ve decorated the apartment using organic landscape elements, in harmony with the surrounding environment. Here is what I mean:
On the left is the view outside the window and on the right is what you see inside.
We recently signed a lease to rent in UES. Besides the usual credit check, most places in NYC ask for a slew of personal information: bank statements (with balances and account numbers), federal tax return and W-2 copies, letter of employment stating yearly salary, and three character reference letters. (As for the landlord, I only know her name)
I’m told that managed buildings may skip some of these, but the apartment we found is in a condominium. Even though the landlord had already approved us, our broker prepared all the paperwork to a tee for the upcoming condo board review.
He even sent us some anonymized character reference letter samples. Some were quite amusing. For example (emphasis mine):
[…] I have always found him to be serious and responsible about his works [sic] and his private life. His home life is extremely quiet, and I would think ideal for his neighbors. Virtually all of his social gatherings are conducted in restaurants. He travels throughout nine months of the year and would probably be at home for only short periods of time between those trips. And quite frankly, his time at home is usually spent resting as part of his recovery from his traveling and preparation for his next trip. He is just the kind of quiet, unobtrusive neighbor that I would like to have.
I couldn’t help but wonder how many boards went through letters like this one. For a moment or two, I entertained the thought of asking a friend to write a pithy one-liner instead:
Spiros = corpse – odor + money ⇒ Spiros = dream tenant !
but I eventually decided that the “⇒” notation might be too much and dropped the idea altogether.
I just hope those sample letters do not really reflect life in NYC!
(original image source)
“The combine harvester, […] is a machine that combines the tasks of harvesting, threshing and cleaning grain crops.” If you have acres upon acres of wheat and want to separate the grain from the chaff, a group of combines is what you really want. If you have a bonsai tree and want to trim it, a harvester may be less than ideal.
MapReduce is like a pack of harvesters, well-suited for weeding through a huge volumes of data, residing on a distributed storage system. However, a lot of machine learning work is more akin to trimming bonsai into elaborate patterns. Vice versa, it’s not uncommon to see trimmers used to harvest a wheat field. Well-established and respected researchers, as recently as this year write in their paper “Planetary Scale Views on a Large Instant-messaging Network“:
We gathered data for 30 days of June 2006. Each day yielded about 150 gigabytes of compressed text logs (4.5 terabytes in total). Copying the data to a dedicated eight-processor server with 32 gigabytes of memory took 12 hours. Our log-parsing system employed a pipeline of four threads that parse the data in parallel, collapse the session join/leave events into sets of conversations, and save the data in a compact compressed binary format. This process compressed the data down to 45 gigabytes per day. Processing the data took an additional 4 to 5 hours per day.
Doing the math, that’s five full days of processing to parse and compress the data on a beast of a machine. Even more surprisingly, I found this exact quote singled out among all the interesting results in the paper! Let me make clear that I’m not criticizing the study; in fact, both the dataset and the exploratory analysis are interesting in many ways. However, it is somewhat surprising that, at least among the research community, such a statement is still treated more like a badge of honor rather than an admission of masochism.
The authors should be applauded for their effort. Me, I’m an impatient sod. Wait one day for the results, I think I can do that. Two days, what the heck. But five? For an exploratory statistical analysis? I’d be long gone before that. And what if I found a serious bug half way down the road? That’s after more than two days of waiting, in case you weren’t counting. Or what if I decided I needed a minor modification to extract some other statistic? Wait another five days? Call me a Matlab-spoiled brat, but forget what I said just now about waiting one day. I changed my mind already. A few hours, tops. But we need a lot more studies like this. Consequently, we need the tools to facilitate them.