Towards laws of the 3D printable design web

With the explosive growth of 3D printing, and rapid manufacturing at the consumer level in general, physical objects can be designed and manipulated in a computer. However, like other forms of digital content (e.g., documents, software, music), this is only part of the story: digital representation also enables online sharing and collaboration (as Chris Anderson has pointed out). A prime example of the potential of all these technologies combined with online sharing and collaboration is the initial design of consumer-grade 3D printers themselves which, perhaps unsurprisingly, was what many early adopters of the technology used it for.  Considering that the rest of us is where those early adopters were five or more years ago, the future should be interesting.

Despite hearing about 3D printing daily, very few studies have looked at the digital content of physical things, and the processes that generate it. I collected data some time ago, and started off with this visualization, which I wrote about before. A further initial analysis of the data has some interesting stories to tell.

Read the rest of this entry »


Thingiverse remix graph: visualizing the net of physical things

I recently became a happy owner of a Solidoodle 2 3D printer. This has been the start of a beautiful addiction, but more on the hardware hacking aspects in another post.

If you haven’t heard of it before, 3D printing refers to a family of manufacturing methods, originally developed for rapid prototyping, the first of which appeared almost three decades ago. Much like mainframe computers in the 1960s, professional 3D printers cost up to hundred thousands of dollars. Starting with the RepRap project a few years ago, home 3D printers are now becoming available, in the few hundred to a couple of thousand dollar price range.  For now, these are targeted mostly to tinkerers, much closer to an Altair or, at best, an Apple II, than a MacBook. Despite the hype that currently surrounds 3D printing, empowering average users to turn bits into atoms (and vice versa) will likely have profound effects, similar to those witnessed when content (music, news, books, etc) went digital, as Chris Anderson eloquently argues with his usual, captivating dramatic flair. Personally, I’m similarly excited about this as I was about “big data” (for lack of a better term) around 2006 and mobile around 2008, so I’ll take this as a good sign. :)

One of the key challenges, however, is finding things to print!  This is crucial for 3D printing to really take off. Learning CAD software and successfully designing 3D objects takes substantial, time, effort, and skill. Affordable 3D scanners (like the ones from MatterformCADscan, and Makerbot) are beginning to appear. However, the most common way to find things is via online sharing of designs. Thingiverse is the most popular online community for “thing” sharing. Thingiverse items are freely available (usually under Creative Commons licenses), but there is also commercial potential: companies like Shapeways offer both manufacturing (using industrial 3D printers and manual post-processing) and marketing services for “thing” designs.

I’ve become a huge fan of Thingiverse.  You can check out my own user profile to find things that I’ve designed myself, or things that I’ve virtually “collected” because I thought they were really cool or useful (or both). Thingiverse is run by MakerBot, which manufactures and sells 3D printers, and needs to help people find things to print. It’s a social networking site centered around “thing” designs. Consequently, the main entities are people (users) and things, and links/relationships revolve around people creating things, people liking things, people downloading and making things, people virtually collecting things, and so on. Other than people-thing relationships, links can also represent people following other people (a-la Twitter or Facebook), and things remixing other things (more on this soon). Each thing also has a number of associated files (polygon meshes for 3D printing, vector paths for lasercutting, original CAD files—anything that’s needed to make the thing).

Read the rest of this entry »


Data harvesting with MapReduce

Combine harvesters
(original image source)

“The combine harvester, […] is a machine that combines the tasks of harvesting, threshing and cleaning grain crops.” If you have acres upon acres of wheat and want to separate the grain from the chaff, a group of combines is what you really want. If you have a bonsai tree and want to trim it, a harvester may be less than ideal.

MapReduce is like a pack of harvesters, well-suited for weeding through a huge volumes of data, residing on a distributed storage system. However, a lot of machine learning work is more akin to trimming bonsai into elaborate patterns. Vice versa, it’s not uncommon to see trimmers used to harvest a wheat field. Well-established and respected researchers, as recently as this year write in their paper “Planetary Scale Views on a Large Instant-messaging Network“:

We gathered data for 30 days of June 2006. Each day yielded about 150 gigabytes of compressed text logs (4.5 terabytes in total). Copying the data to a dedicated eight-processor server with 32 gigabytes of memory took 12 hours. Our log-parsing system employed a pipeline of four threads that parse the data in parallel, collapse the session join/leave events into sets of conversations, and save the data in a compact compressed binary format. This process compressed the data down to 45 gigabytes per day. Processing the data took an additional 4 to 5 hours per day.

Doing the math, that’s five full days of processing to parse and compress the data on a beast of a machine. Even more surprisingly, I found this exact quote singled out among all the interesting results in the paper! Let me make clear that I’m not criticizing the study; in fact, both the dataset and the exploratory analysis are interesting in many ways. However, it is somewhat surprising that, at least among the research community, such a statement is still treated more like a badge of honor rather than an admission of masochism.

The authors should be applauded for their effort. Me, I’m an impatient sod. Wait one day for the results, I think I can do that. Two days, what the heck. But five? For an exploratory statistical analysis? I’d be long gone before that. And what if I found a serious bug half way down the road? That’s after more than two days of waiting, in case you weren’t counting. Or what if I decided I needed a minor modification to extract some other statistic? Wait another five days? Call me a Matlab-spoiled brat, but forget what I said just now about waiting one day. I changed my mind already. A few hours, tops. But we need a lot more studies like this. Consequently, we need the tools to facilitate them.

Read the rest of this entry »

Comments (1)

Data Mining: “I’m feeling lucky” ?

In an informal presentation on MapReduce that I recently gave, I included the following graphic, to summarize the “holy grail” of systems vs. mining:

Systems vs. Data mining

This was originally inspired by a quote that I read sometime ago:

Search is more about systems software than algorithms or relevance tricks.

How often do you click the “lucky” button, instead of “search”? Incidentally, I would be very interested in finding some hard numbers on this (I couldn’t)—but that button must exist for good reason, so a number of people must be using it. Anyway, I believe it’s a safe assumption that “search” gets clicked more often than “lucky” by most people. And when you click “search”, you almost always expect to get something relevant, even if not perfectly so.

In machine learning or data mining, the holy grail is to invent algorithms that “learn from the data” or that “discover the golden nugget of information in the massive rubble of data”. But how often have you taken a random learning algorithm, fed it a random dataset, and expected to get something useful. I’d venture a guess: not very often.

So it doesn’t quite work that way. The usefulness of the results is a function of both the data and the algorithm. That’s common sense: drawing any kind of inference involves both (i) making the right observations, and (ii) using them in the right way. I would argue that in most succesful applications, it’s the data takes center stage, rather than the algorithms. Furthermore, mining aims to develop the analytic algorithms, but systems development is what enables running those algorithms on the appropriate and, often, massive data sets. So, I do not see how the former makes sense without the latter. In research however, we sometimes forget this, and simply pick our favorite hammer and clumsily wield it in the air, ignoring both (i) the data collection and pre-processing step, and (ii) the systems side.

It may be that “I’m feeling lucky” often hits the target (try it, you may be surprised). However, in machine learning and mining research, we sometimes shoot the arrow first, and paint the bullseye around it. There are many reasons for this, but perhaps one stands out. A well-known European academic (from way up north) once said that his government’s funding agency once criticized him for succeeding too often. Now, that’s something rare!