{"id":18,"date":"2008-06-25T01:09:49","date_gmt":"2008-06-25T06:09:49","guid":{"rendered":"http:\/\/www.bitquill.net\/blog\/?p=18"},"modified":"2008-11-22T12:35:00","modified_gmt":"2008-11-22T17:35:00","slug":"the-bless-of-dimensionality","status":"publish","type":"post","link":"http:\/\/bitquill.net\/blog\/the-bless-of-dimensionality\/","title":{"rendered":"The bless of dimensionality"},"content":{"rendered":"<p>The cover story in Wired by Chris Anderson, &#8220;<a href=\"http:\/\/www.wired.com\/science\/discoveries\/magazine\/16-07\/pb_theory\">The End of Theory<\/a>&#8221; relies on a silent assumption, which may be obvious but is still worth stating. The reason that such a &#8220;petabyte approach&#8221; works is that reality occupies only a tiny fraction of the space of all possibilities.\u00c2\u00a0 For example, the human genome consists of about three billion base pairs.\u00c2\u00a0 However, not every billion-lengths string of four symbols corresponds to a viable organism, much less an existing one or a human individual.\u00c2\u00a0 In other words, the intrinsic dimensionality of your sample (the human population) is much smaller than the raw dimensionality of the possibilities (about 4^3,2000,000 strings).<\/p>\n<p>I won&#8217;t <a href=\"http:\/\/behind-the-enemy-lines.blogspot.com\/2008\/06\/massive-data-and-end-of-scientific.html\">try to justify<\/a> &#8220;traditional&#8221; models. <strong>But I also wouldn&#8217;t go so far as to say that models will disappear, just that many will be increasingly statistical in nature.<\/strong> If you can throw the dice a large enough number of times, it doesn&#8217;t matter whether &#8220;God&#8221; plays them or not.\u00c2\u00a0 The famous quote by Einstein suggests that <strong>quantum mechanics was originally seen as a cop-out<\/strong> by some: we can&#8217;t find the underlying &#8220;truth&#8221;, so we settle with probability distributions for position and momentum.\u00c2\u00a0 However, this was only the beggining.<\/p>\n<p>Still, we need models.\u00c2\u00a0 Going back to the DNA example, I suspect that few people models the genome as a single, huge, billion-length string.\u00c2\u00a0 That is not a very useful random variable.\u00c2\u00a0 Chopping it up into pieces with different functional significance and coming up with the appropriate random variables, so one can draw statistical inferences, sounds very much like modeling to me.<\/p>\n<p>Furthermore, hypothesis testing and confidence intervals won&#8217;t go away either.\u00c2\u00a0 After all, anyone who has taken a course in experimental physics knows that repeating a measurement and calculating confidence intervals based on multiple data points is a fundamental part of the process (and also the main motivating force in the original development of statistics).\u00c2\u00a0 Now we can collect petabytes of data points.\u00c2\u00a0 Maybe there is a shift in balance between theory (in the traditional, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Laplace%27s_demon\">Laplacian sense<\/a>, which I suspect is what the article really refers to) and experiment.\u00c2\u00a0 But the fundamental principles remain much the same.<\/p>\n<p>So, perhaps more is not fundamentally different after all, and we still need to be careful not to overfit.\u00c2\u00a0 I&#8217;ll leave you with a quote from &#8220;<a href=\"http:\/\/books.google.com\/books?id=v8ENTFP29tkC\">A Random Walk down Wall Street<\/a>&#8221; by <a href=\"http:\/\/www.princeton.edu\/~bmalkiel\/\">Burt Malkiel<\/a> (emphasis mine):<\/p>\n<blockquote><p>[&#8230;] it&#8217;s sometimes possible to correlate two completely unrelated events.\u00c2\u00a0 Indeed, Mark Hulbert reports that stock-market researcher David Leinweber <em>found that the indicator most closely correlated with the S&amp;P 500 Index is the volume of butter production in Bangladesh<\/em>.<\/p><\/blockquote>\n<p>Dimensionality may be a bless, but it can still be a curse sometimes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The cover story in Wired by Chris Anderson, &#8220;The End of Theory&#8221; relies on a silent assumption, which may be obvious but is still worth stating. The reason that such a &#8220;petabyte approach&#8221; works is that reality occupies only a tiny fraction of the space of all possibilities.\u00c2\u00a0 For example, the human genome consists of [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[45],"tags":[58,11,5],"class_list":["post-18","post","type-post","status-publish","format-standard","hentry","category-scitech","tag-opinion","tag-research","tag-science"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p7x9xm-i","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/bitquill.net\/blog\/wp-json\/wp\/v2\/posts\/18","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bitquill.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bitquill.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bitquill.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/bitquill.net\/blog\/wp-json\/wp\/v2\/comments?post=18"}],"version-history":[{"count":0,"href":"http:\/\/bitquill.net\/blog\/wp-json\/wp\/v2\/posts\/18\/revisions"}],"wp:attachment":[{"href":"http:\/\/bitquill.net\/blog\/wp-json\/wp\/v2\/media?parent=18"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bitquill.net\/blog\/wp-json\/wp\/v2\/categories?post=18"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bitquill.net\/blog\/wp-json\/wp\/v2\/tags?post=18"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}