Towards laws of the 3D printable design web

With the explosive growth of 3D printing, and rapid manufacturing at the consumer level in general, physical objects can be designed and manipulated in a computer. However, like other forms of digital content (e.g., documents, software, music), this is only part of the story: digital representation also enables online sharing and collaboration (as Chris Anderson has pointed out). A prime example of the potential of all these technologies combined with online sharing and collaboration is the initial design of consumer-grade 3D printers themselves which, perhaps unsurprisingly, was what many early adopters of the technology used it for.  Considering that the rest of us is where those early adopters were five or more years ago, the future should be interesting.

Despite hearing about 3D printing daily, very few studies have looked at the digital content of physical things, and the processes that generate it. I collected data some time ago, and started off with this visualization, which I wrote about before. A further initial analysis of the data has some interesting stories to tell.

Read the rest of this entry »

Comments (1)

Research and new media: the academic clowd

I have a little secret: Slashdot may have lost its lustre now, but back in 2001, shortly after returning from my refreshing internship at Almaden, I posted a question to “Ask Slashdot” for the first and last time. I posed the question rather poorly and was ignored. Although I could not find exactly what I wrote back then, it was something along the lines of “why aren’t academic venues more like SourceForge?” You have to remember that this was the early 2000’s, when large and transparent user communities existed only in the technical sphere, and things like SourceForge were the prototypical sites for online focused communities. So why couldn’t academia and the research community open things up a bit more, and leverage new media to set up virtual forums for world-wide lively discussions and collaborations?

Fast-forward seven years. I got a feeling of deja-vu when I saw two recent blog posts and a Slashdot post. The first two question specific aspects of current publishing practices, while the “Ask Slashdot” post wonders whether academic journals are obsolete. The technologies and media have changed dramatically since then, but the essence remains the same.

Going over the comments on Slashdot, even though there are some surprisingly (for Slashdot) insightful ones, there is also one fundamental misconception. I was genuinely surprised at its prevalence. Many commenters seem to identify the general notion of “peer evaluation” with the specific mechanisms currently employed to do it. Is the current way of doing things so deeply entrenched, that people are blind to other possibilities?

Quoting a random vicious comment: “The purpose of restricting published work to that which has passed peer review is to ensure that results do not become obsolete. They must uphold the same quality standards that we expect from all scientific disciplines—not blog-style fads that have become popular and at some stage will cease to be popular.” I wonder if commenter has ever written a blog himself, or whether he even just taken a look at, say, Technorati: there are over four million blogs out there and 99% have just one reader (the author). Very few blogs are popular (i.e., the actually read by a significant number of people). An explosion in quantity of published content does not imply a proprtional explosion in its consumption; quite the contrary. If anything, there is more competition for attention, not less.

Another commenter said that “there isn’t any direct communication between reviewers and submitters.” Not so. Take a look at Julian Besag’s “On the Statistical Analysis of Dirty Pictures” (unfortunately JSTOR is restricted-access, but maybe your institution has a subscription), published in the Journal of the Royal Statistical Society as recently as 1986. The actual paper is 21 pages, while the other 23 pages are devoted to an open discussion. This looks oddly familiar (deja vu again): it looks like very popular blogs, which often have comment sections larger than the original posts. A free and open discussion of ideas has always been an organic part of the research process. A few centuries ago, scientific articles appeared with a date on which they were “read” to the community (just take a look at, e.g., the an issue of the Philosophical Transactions of the Royal Society).

Research on the web

Reaching far out into the long tail of ideas, which I also discussed in a previous post, should arguably be a top priority for research. In other endeavors it is an important means to success (financial or otherwise), but in academia and the research community it is usually an end in itself. The web itself was originally conceived as a venue for the exchange of scientific ideas, but even its creators probably did not envision the full potential nor realize all the implications of democratizing publication.

Modern technology allows more researchers (whether they work for startups, academic institutions, or large corporations) to try out more ideas. In other words, the production of research output is scaling up to unprecedented levels. However, I strongly suspect that traditional ways for evaluating research will not scale for much longer, being unable to keep up with the explosive growth in the rate of new ideas.

The typical process for evaluating and disseminating research—at least in computer science with which I am familiar—seems to be the following (with perhaps a few exceptions). First you come up with an interesting idea. Next, you build a story around it and do the minimal work to support that story. If everything works out, you write it up and submitted to a conference or, more rarely, a journal. On average, three people (chosen largely at random) review your work, making some comments in private. Once your work is published, you move on to the next paper.

I would simply name two artifacts as the main “products” of computer science research: papers and software. The latter is often overlooked, but it’s at least as important as the first. Anyway, what might be the state-of-the-art media for each of those artifacts?

There are some well-known efforts to use the web for the former. For example, there is arXiv for physics and sciences, CoRR for computer science, and PLOS for life sciences. There is also VideoLectures for open access to some talks. All of these, however, largely mirror the established ways of doing things: they are still built using the paradigm of a “library”. Although very important steps in the right direction, they perhaps play second fiddle to traditional media (there is a reason that arXiv is called a “pre-print server”) and thus fail to fully realize the potential offered by the rapidly emerging social media.

Things are perhaps a little more advanced for software artifacts. There are SourceForge, Google Code, and countless other similar sites for hosting source code, tracking issues and holding online discussions. There is also Freshmeat, Ohloh, and other project directories, as well as source code search engines such as Koders. However, none of these (or, as far as I know, anything similar) have been widely embraced by the research community.

Enough about today. It is more interesting to try and imagine how all these things, and more, may come together in the future.

The academic clowd scenario

Shamelessly copying this post, let’s imagine the academic clowd (cloud + crowd).

You have a great new idea and decide to try it out. You write a proof-of-concept implementation and run it on the cloud, using large datasets that also live out there. The implementation itself is available to the clowd, which can analyze the revision control logs and find out who really worked on what.

Your idea works and you decide to write a research article about it. The clowd knows what papers you wrote, who are your co-authors and which conferences and journals you publish in (cf. DBLP). It also knows the content of your papers (cf. CiteSeer). So, when you publish your new article, it compares it with the existing literature and finds the most relevant experts (in terms of content, co-citations, venues of publication, etc) to evaluate your work. It knows who your close friends and relatives are (from Facebook) and automatically excludes them from the list of potential reviewers. It also exlcudes your co-authors from the past three years. Then, it solicits reviews from those experts. Of course, it also allows others who are interested to participate in the discusssion.

In addition to the original paper, all review comments are public and can be moderated (say, similar to Digg or to Slashdot, but perhaps in a more principled and civilized manner). Thus, the review comments are ranked for their correctness, originality and usefulness. These rankings propagate to the papers they refer to.

You present your work in public and the video of your lecture is on the clowd, exposing you to a much larger audience. Anyone can also comment on it and respond to it. The videos are linked to each other, as well as to the articles and to the implementations. They are organized into thousands “virtual research tracks” with several tens of talks in each. “Best of” virtual conference compilations appear on the clowd.

Rising papers and their authors get introduced to each other by the clowd. You can easily find ten potential new collaborators with mutual interests. You try out more things together, write more articles, and so on …until one day you all save the world together (well, maybe not, but it would be nice! :-).

So, what will the future really look like?

Well, who knows? I’m pretty sure the above scenario will seem as ridiculous in ten years, as the SourceForge ideal looks today (what was I thinking then?). Nonetheless, I believe it should be part of the current vision for research. I don’t think that the web and social media will lead to less selection via peer evaluation. Quite the contrary. Nor do I think that they will lead to less elitism. This follows from simple math. Taking the simplistic but common measure of “acceptance ratio”, the numerator cannot grow much, because people’s capacity to absorb information will not grow that much. But, if the potential to produce published content makes the denominator grow to infinity, then the ratio has to approach zero. Methods for evaluating research output need to scale up to this level of filtering, and I simply don’t think that the current way of evaluating research can achieve this.

Comments (1)

The shift from private to public channels of information

Many discussions about privacy these days obsess over the shifting balance between public and private channels of information, while missing the real issues and opportunities.

The information landscape is unquestionably changing. We are experiencing the emergence and rapid proliferation of social media, such as instant messaging (e.g., IRC, Jabber et al., AIM, MSN, Skype), sharing sites (e.g., Flickr, Picasa, YouTube, Plaxo), blogs (e.g., Blogger, WordPress, LiveJournal) and forums (e.g., Epinions), wikis (e.g., Wikipedia, PBWiki), microblogs (e.g., Twitter, Jaiku), social networks (e.g., MySpace, Facebook, Ning), and so on. Also, much financial information (e.g., your bank’s website or Quicken) as well as health records are or soon will be online.

A rather obvious distinction is between public vs. private channels of information or content:

  • In public channels, the default policy on data sharingis “opt-in”.
  • In private channels, the default is “opt-out” (along with some, hopefully enforceable, guarantees that this is the case).

Most people, at least of a certain age, take the former for granted. However, this is changing. Just a couple of decades back, schoolchildren would keep journals (you know, those with a locket and “Hello Kitty” or “Transformers” on the cover). These days they are on MySpace and Twitter, and they do not assume “opt-out” is the default. Quoting from the article “The Talk of Town: You” (subscriber-only access) in the MIT Technology Review:

New York‘s reporter made a big deal about how “the kids” made her “feel very, very old.” Not only did they casually accept that the record of their lives could be Googled by anyone at any time, but they also tended to think of themselves as having an audience. Some even considered their elders’ expectations about privacy to be a weird, old-fogey thing—a narcissistic hang-up.

Said differently, an increasing fraction of content is produced in public, rather than private channels and “opt-in” is becoming the norm rather than the exception. Social aggregation sites, such as Profilactic, are a step towards easy access to this corpus. Despite some alarmism about blogs, Twitter, MySpace profiles, etc, all this information is, by definition, in public channels. Perhaps soon 99% of information will be in public channels.

So, which information channels should be perceived as public? Many people have a knee-jerk reaction when it comes to thinking of what should be private. For example, this blog is clearly a public channel. But how about your health records? In an interesting opinion about making health records public, most commenters’ expressed a fear of being denied health coverage by an insurance company. However, this is more an indication of a broken healthcare system, than of a problem with making this data public. Most countries (the U.S. included) are behind in this area, but others (such as the Scandinavians or Koreans) are making important steps forward. Now, how about your financial records? For example, credit reporting already relies on aggregation and analysis of publicly available data. How about your company’s financial records? Or how about your phonecall records? Or your images captured by surveillance cameras? The list can go on forever.

We should avoid that knee-jerk reaction and carefully consider what can be gained by moving to public channels, as well as what technology and regulation is required to make this work. The benefits can be substantial; for example, the success of the open source movement is largely due to switching to public, transparent channels of communication, as well as open standards. Openness is usually a good thing.

Even in the enterprise world of grownups, tools such as SmallBlue (aka. Atlas) are effectively changing the nature of intra-company email from a private to a (partially) public channel. The alternative would be to establish new public channels and favor their use over the older, “traditional” (and usually private) channels. Both approaches are equivalent.

Moreover, how should we deal with the information in private channels? The danger with private channels arises when privacy is breached. If that happens, not only do you get a false sense of security when you have none, but you may also have a very hard time proving that it happened. However, the notion itself of a “breach” in public channels is clearly meaningless. In that sense, public channels are a safer option and should be carefully considered.

Even when the data itself is private, who is accessing it and for what purpose should be public information. The MIT TR article continues to mention David Brin’s opinion that

“[…] our only real choice is between a society that offers the illusion of privacy, by restricting the power of surveillance to those in power, and one where the masses have it too.”

The need for full transparency on data how they are used is more pressing than ever. Ensuring that individuals’ rights are not violated requires less secrecy, not more. A recent CACM article by a gang of CS authority figures makes a similar case (although their proposal for an ontology-based heavyweight scheme for all data out there is somewhat dubious; it might make sense for the 1% niche of sensitive data, though). Interestingly, one of their key examples is essentially about health records and they also come to the same conclusion, i.e., that the problem is inappropriate use of the data.

I actually look forward to the day I’ll be able to type “creator:spapadim@bitquill.net” on Google (as well as any other search engine) and find all the content that I ever produced. And going one step beyond that, also find the “list of citations” (i.e., all the content that referenced or used my data), like I can find for my research papers on Google scholar, or for posts on this blog with trackbacks. Although I cannot grasp all the implications, it would at least mean we’ve addressed most of these issues and the world is a more open, democratic place. McLuhan’s notion of the global village is more relevant than ever, but his doom and gloom is largely misplaced; let’s focus on the positive potential instead.

Comments (1)