Who owns my data? What really matters is who controls it
Friday, April 25th, 2008By Laurent Liscia and Pierre-Loic Assayag
SUMMARY
What is Traackr’s goal?
Our goal is to collect data from a variety of sites in order to calculate a user’s buzz, popularity and reach (which are proprietary definitions). Today this data is not readily available and requires ad hoc extraction processes that are not convenient to the user or require considerable logic resources. Making it available, however will raise a host of issues, and possibly user revolt, unless the users are put in charge of their own data.
Types of user data
What kind of data are we looking at?
- Data posted by the user (such as audio, video, and text content)
- Data posted by other users about the user
- Additional data resulting from a remote site computation: for instance, an Amazon reviewer ranking, or a LinkedIn number of connections
Our argument is that this data should be readily available to third parties, not just Traackr. The data should be interoperable (ie understandable, parsable, usable) from one social site to the next and to other sites that rely on it for computational purposes.
This poses three very clear societal issues: privacy, validity, and ownership. Do I hear you groaning? No, really, you need to read this.
Privacy
First, let us point you to a resource page overflowing with brilliant and generally pessimistic thoughts about the whittling away of our privacy by the Internet:
http://digitalenterprise.org/privacy/privacy.html
There are two schools of thought on this: the industry should regulate itself, based on user feedback or revolt, as the case may be; conversely, a governing body should regulate the industry. The first camp is winning out: in these Homeland Security- flavored times, there seems to be no desire on anyone’s part to genuinely protect individual privacy.
Sure, there are industry guidelines defined by the Federal Trade Commission, that stress the need to notify the user, give them a choice (opt out), ensure their participation in the privacy process, secure their data, and instate an enforcement mechanism in case a violation occurs. Other feeble attempts include the W3C’s P3P platform, a standard for privacy policies and the OPA’s privacy guidelines.
In our view, the lack of a more stringent framework is a class-action suit waiting to happen, possibly one brought by the deep and wonderful minds at the Electronic Frontier Foundation.
Validity
While privacy is a societal issue, validity is a social one. (It’s nice to split hairs from time to time). What we mean: how do I know you are who you say you are online? With word of mouth accounting for a sizable amount of purchases, and online WOM accounting for an increasing portion of total WOM, how can I trust another user’s product review to be genuine – and not, say, a post from the manufacturer? (See our blog post on user reviews).
There are interesting but isolated initiatives to answer the question. One is Amazon’s “Real Name”, by which users agree to authenticate themselves, and take responsibility for everything they post on Amazon. The upside is that they also create a trusted micro-brand in doing so. Another initiative under way is OASIS’ recently announced online reputation standards.
This feels right: Internet users should have responsibilities corresponding to their rights. In Traackr’s influence paradigm, your incentive for posting responsibly is that you increase the value of your micro-brand.
Of course, there will be cases where the value of your brand grows with each new inflammatory statement; but let’s not go down that path today.
Ownership and control
Data ownership is the economic crux of the data conundrum, and in our opinion, the way it will be resolved. If our prophecy proves to be wrong, you can always nail us to the wall; or ask us to buy you a beer.
If you ask Facebook who owns the data that appears on your Facebook profile, they will say they do. If you demand that they remove it, they will simply tell you to delete your account; but they will hold on to the data. If we ask you who owns this data, you will say you. Who’s right? Both parties. In Traackr’s view, you have the right, and in fact, the responsibility to manage your public data as a brand. If this sounds crazy, just ask yourself why you dress up to go to work every morning: because you have an image that you must uphold with your co-workers. You watch what you say, you measure how you respond to people, you evaluate what you did and sometimes circle back: every day, you are managing your micro-brand.
You need to do this online because increasingly employers and marketers are trying to find out who and what you are. What would you not want to control this data? Control, in our opinion, is more important than ownership. It’s Facebook’s business to own some of your data. They need it to make a buck. But it’s your job to control it. That’s why we created a tool that lets you aggregate all your data feeds and assess how your online brand is doing. The good news is that there’s tremendous value for you in this; you can constantly monitor, manage and improve your image via Traackr.
TECHNICAL ISSUES
How are people getting to your data today?
What we and other people are doing right now:
- Requiring passwords to access user accounts
- Screen-scraping/parsing of result pages (such as LinkedIn or Facebook profile pages for instance, based on the user’s first and last name or nickname)
- Aggregating search results and using logic to weed out irrelevant content (ZoomInfo, Buzz)
What could be done
There are roughly four approaches to collecting user data; these are not new, and have been around since the beginnings of the Internet in various guises.
A. A browser-based approach
This would implement a sort of super-cookie; where something like AutoFill would allow the user to control exactly what amount of information is dispensed on any given site; and a recap of the default profile information the user is willing to surrender. This could be changed on the fly. This approach is probably the most straight forward but the most limited as well as it only pertains to the data that the user him or herself controls and wouldn’t include, for example, what others say about me.
B. APIs for data extraction
This is what most sites are doing. LinkedIn will be next, under pressure from Facebook.
C. A new XML standard for social networking
This feels like a logical extension of the previous point. Major players would get together and agree that content would be formatted according to this schema. There would be some room for variability in the schema, but anything specific to the site would be implemented in the schema, and divulged to the standards partnership.
D. A de facto standard that will emerge due to market leadership.
For instance, a data format Google might impose.
The Traackr project is a recognition that there needs to be such a standard, in whichever way and shape it emerges. Our contention is that this standard will empower users to share only what they want to share, when they want to do it, and in a way that is much more effective than it is today. Of course, there will be pluses on the marketing side as well; but one might argue that if an ad is truly well targeted (ie nano-targeted), then it is no longer an ad, but valuable information to the purchaser. Additionally, the ability for a user to parlay their online data into an influence score, or a sphere of influence will create huge opportunities for marketers and consumers both.
