I have stopped worrying what can be inferred about me, because I’ve accepted the simple fact that, given enough time (data) and resources, anything can be inferred. Consider, as an example, “location privacy.” A number of approaches rely on adaptively coarsening the detail of reported location (using all sorts of criteria to decide detail, from mobility patterns, to spatial query workload characteristics, etc). For example, instead of revealing my exact location, I can reveal my location at a city-block level. In an area like NYC, this would conflate me with hundreds of other people that happen to be on the same block, but a block-level location is still accurate enough to be useful (e.g., for finding nearby shops and restaurants). This might work if I’m reporting my location just once. However, if I travel from home to work, then my trajectory over a few days, even at a city-block granularity, is likely sufficient to distinguish me from other people. I could perhaps counter this by revealing my location at a city-level or state-level. Then a few days worth of data might not be enough to identify me. However, I often travel and data over a period of, say, a year, would likely be enough to identify me even if location detail is quite coarse. Of course, I could take things to the extreme and just reveal that “I am on planet Earth”. But that’s the same as not publishing my location, since this fact is true for everyone.
If it’s technically possible to infer my identity (given a long enough period of observation, and enough resources and time to piece the various, possibly inaccurate, pieces of information together), someone (with enough patience and resources) will likely do it. Therefore, as the amount of data about me tends to infinity (which, on the Internet, it probably does), the fraction that I have to hide in order to maintain my privacy tends to one: you have long-term privacy only if you never reveal anything. There are various ways of not revealing anything. One is to simply not do it. Another might be to keep it to yourself and never put it in any digital media. Yet another might be encrypting the information.
However, not revealing anything isn’t really a solution (if a tree falls in the forest and nobody hears it… the tree has privacy, I guess). There is an alternative, of course: precise access control. Your privacy can be safeguarded by a centralized, trusted gatekeeper that controls all access to data. This leads to something of a paradox: guaranteeing privacy (access control) implies zero privacy from the trusted gatekeeper: they (have to) know and control everything. Many people are still confused about this. For example, a form of this dichotomy can be seen in peoples’ reactions towards Facebook: on one hand, people complain about giving Facebook complete control and ownership of their data, but they also complain when Facebook essentially gives up that control by making something “public” in one way or another. [Note: there is the valid issue of Facebook changing its promises here, but that’s not my point—people post certain information on Facebook and not on, say, Twitter or the “open web” precisely because they believe that Facebook guarantees them access control which, by the way, is a very tall order, leading to confusion on all sides, as I hope to convince you.]
Although I learned not to worry about what can be inferred about me, I am perhaps somewhat worried about knowing who is accessing my data (and making inferences), and how they are using it. Particularly if this is done by parties that have far more resources and determination than myself. However, who uses my information and how is also another piece of information (data) itself. Although everything is information, there seems to be an asymmetry: when my information is revealed and used, it may be called “intelligence”, but when the information that it was used is revealed, it may be called “whistleblowing” or even “treason“. This asymmetry does not seem to have any technical grounding—one might make valid arguments on political, legal, moral, etc grounds, but not on technical grounds. Seen in this context, Zuckerberg’s calls for “more transparency” make perfect sense—he’s calling for less asymmetry.
More generally, privacy does not really seem to be a technical problem, much like DRM isn’t really a technical problem. That privacy can be guaranteed by technical means seems to be a delusion and, perhaps, a dangerous one, because it gives a false sense of security. Privacy is, for the most part, a social, political and legal problem about how data can be used (any and all data!) and by whom. The apparent technical infeasibility of privacy had led me to believe that people will, eventually, get over the idea. After all, privacy is a 200-300 year old concept (at least in the western world; interestingly, Greek did not have a corresponding word until very recently). I may have missed something obvious, however: if privacy is attainable via a centralized, trusted gatekeeper, then perhaps privacy is the “killer app” for centralization and “walled gardens”. “I want full control over your data” is tougher to sell than “I want to protect your privacy”. Which is why Eric Schmidt’s recent backpedaling is somewhat worrying, even if the goal is noble (and there currently isn’t any evidence to believe otherwise).
I don’t think there are any (technical) solutions to privacy. Also, enforcing transparency is perhaps almost as hard as enforcing privacy, although I have slightly more hope for the former—but that’s a separate discussion. Privacy is cat-and-mouse game, much like “piracy” and DRM. However, our expectations should be tempered by the reality of near-zero-cost transmission, collection, and storage of “inifinitely” growing amounts of information, and we should perhaps re-examine existing notions of privacy under this light. I find that many non-technical people are still surprised when I explain the simple example in the opening paragraph, even though they consider it obvious in retrospect.
Personally, I find it safer to just assume that I have no privacy. Saves me the aggravation.