Diaspora Part One

Published on Friday, May 28th 2010. Edited by Rat Outzipape. tag

I have divided the Diaspora problem domain into the areas below.
This reflects my own research, interests and thoughts.
Why am I interested in this area?
There are several reasons for this interest.
In broad brush strokes -
1. I have long been interested in the Semantic Web.
2. I have been working on commercial applications whose technology intersects strongly with some of the areas below.
3. Open Source remains the best learning playground and there is always more to be learnt.
4. Activitystreams (as they are called) provides good test data for work on other filtering etc. concepts including algorithms and display techniques, that interest me.
I am not trying to pre-empt the activity of the core Diaspora group or to second guess them. I am more interested in seeing where my thoughts about the issues below intersect with how these various problems i outline are resolved.
These problems includes issues about protocols which determine the scope and behaviour of the system under development, as well as development methodology which determines the human behaviour of the group.
However, in this post I leave off discussion of development methodology. This shouldn't be taken to mean I am harbouring some secret criticism, far from it.
So far the the Diaspora core have raised fair funds to take their efforts well into 2011. A very fortunate position to be in.
They have also responded, as I expected, to the surge in interest very responsibly.
They have set up a group of advisers, necessary due to the large spend they now have at their disposal and the intense interest in the project this necessarily generates.
Social Networking Solutions
Needless to say, the area is not unknown, there is already a lot of work in it, but it hasn't been brought together into a viable offering quite like what it seems the Diaspora team have in mind.
This discussion, while referring to Diaspora, is in most places equally applicable to GNU Social and some other projects except that Diaspora is entirely uncharted, allowing greater range to creative speculation. I do not assume that my coverage, or that of my sources, is absolutely comprehensive. I would like this overview to be rounded and in depth though.
The Conflict of Commercialisation
In reading my entries where issues of privacy and security are mentioned it is helpful to bear in mind that there is a gap between user expectations and actual practice. It may be that in some cases that pertain to privacy and security there are tools available that the user does not use, while in other cases there may be none of tools, internal mechanisms, standards or laws. I have not tried to be absolutely thorough in identifying each instance but present the general picture.
Further background for a picture of what the Social Semantic Web could be can be found in Jeff Sayre's article A Flock of Twitters Decentralized Semantic microblogging, however, my emphasis here is not the same as Jeff's. Jeff Sayre gives a good description of what knowledge streams might be like in terms of the filtering activity any one individual imposes on data they consume and (re-)publish. He offers a conceptual framework for this which is interesting enough. I am much more concerned with what is possible now, without exploring all the detail needed for more advanced use cases.
Extended Discussion of Business Case
Why now? Who else is working in this area?
Before we go any further this is the first question that has to be examined.
Roughly, there are two approaches for entering into a complex domain like this, that is just to move in trying to create a presentable application (from what exists using a lot of glue) as a best first cut effort or to study the field and move as precisely as possible into an identified niche.
Commercial Potential for Diaspora
There is no doubt that commercial companies can miss opportunities, especially in such a vast, expanding area. Importantly, they can overlook some specific areas because they conflict with their main efforts.
For any would be entrant pin point accuracy might be the best weapon in finding that niche.
The following questions must be answered.
1. Is there a commercial opportunity here?
2. Why is this not being done or, actually, is it being done?
It is not possible to give a fully comprehensive overview of the respective business models of the big players.
The pertinent question is whether there is an alternative offering wanted by sufficient numbers that is precluded from being offered by those players by virtue of their business model.
Here I take it as a given that technological commitments of companies reflect their business model and alternative technical solutions only make a company vulnerable if they represent an alternative revenue stream of some threat to the existing company.
Is There an Identifiable Alternative Service Offering?
What, actually, do people want? Short on research, and short on time to look at what ever research there may be in the pubic realm, I am going to posit an hypothesis:-
A Simple Hypothesis Concerning Social Networking and Privacy Management
The hypothesis is that while Social Networking offers its users something that they want, an ability to broadcast and be broadcast to, the means by which this is achieved mimics the feeling of having a non virtual social network. People find this confusing. This confusion manifests itself in various ways which I briefly explore.
The network is called 'virtual', but there is something misleading in that term, as it is a network of real people. What is being examined is the way in which communication in these networks is mediated by the services providing the communication.
I take the following two points as givens:
1. Social networking media are used successfully to influence opinion and buying habits by advertisers using new techniques such as viral advertising and crowd sourcing, based on both mining of user behaviour data and targeting the release of associated information through various channels, including the social networking channel itself, aside from overt advertising. These techniques also include buying rank and position for products, that is in some way making them seem popular, favoured and successful. For example, in general terms this applies to the 'like' button of Facebook.
2. Users of social networking services will project (or imbue) their network and its context with feelings that have to do with themselves and their social network. These feelings are capable of distorting their perceptions of the behaviour and role of their network of 'friends' and its context as a service of the provider.
It remains to be investigated what it is that people actually object to about Facebook at the moment. (But see last section, The Facebook Response, where I briefly look at their immiediate response.)
In terms of a set of feelings about FB I wonder which ones dominate. Is it a feeling of powerlessness? If so, why would that be different to feelings about a TV channel or a telephone service provider?
My hypothesis is that the feelings are different to those expressed about other services and it is due to a combination of points 1. and 2. above in combination with the far less certain and less well known regulatory and normative rules framework governing internet activity. (That is the lack of high profile regulation in the area.)
I posit that objections fall into the following categories.
  1. Data mining.
  2. Privacy of the personal.
  3. Data security.
  4. Revocation of statements.
  5. Data assuredness.
  6. Degree of Broadcast Propagation.
  7. Asset mining.
  8. Information gaming.
  9. Information accuracy.

  1. Data mining.
    Where the user goes on a page is monitored, possibly down to fine detail of how long the mouse hovers in a certain area, which buttons are selected and so on. All three aspects of the user journey is monitored:-
    When the user enters an area (landing page) from where they have come is monitored (start page) and when they leave the landing page the to where (target page) is also monitored.
  2. Privacy of the personal.
    As a logged in user there is very little information about the user that might not be associated with this information. What would not, or should not be associated with this information is the users name and contact details, or, of course, log in details.
    But certainly location, ethnicity, age, sex, sexual preference, marital or partner status, political opinion, economic status and many other details can be harvested (in one way or another and the methods are growing in sophistication). It is not associated with this particular user, in fact it is more valuable associated with a group of people. There are different ways in which such a statistical group may be arrived at. Similar behaviour groups users together into a statistical category group*. However, in the use of FOAF technology there is nothing that stops such a category group being formed from other friends in the network, in other words information about friends and friends of friends, but all the time anonymously.
    There is now an explosion in the data available to advertisers, the difficulty they now face is what types of data they require for the product or service they are offering. The line between targeted advertising and influencing group thinking has become very much slimmer as a result.
    Since many 'applications' that is the games etc popular on these sites, have the same privilages as the user using them they, too, would be able to access otherwise private information that may be introduced to them by the user. They may also take information from the page context in which they are being used.
    * I believe this is what is referred to by Facebook and others as a demographic. See my final post - part six - for an examination of recent moves and statements by Facebook as of 23rd May 2010.
  3. Data security.
    Security is a big word. It is quite clear from the above that data is not secure in that it is mined. Most contexts on a web site like Facebook are available for data harvesting. The way to demonstrate this is to search Google for subjects that have been discussed on Facebook. So the issue here is not just that the data is public (it has been indexed by Google, or anyone else who is interested) but whether the additional information mined, as described, is any more of a security threat to the individual, the subject of the data.
    It can be seen that security really is a concern about two aspects of social networking, one is the way anonymous data is used, the other is whether private data is actually secure (can not be seen by those who should not) and safe (has duration as expected by the user).
    If we accept that anything that would attach a particular user to usage patterns is intended to be kept private, is the system effective in this? The answer to this must be that it is as safe as the provider can make them. This aspect, at least, of web site management is legislated for and monitored. (Data Privacy Acts in the UK.)
  4. Revocation of statements.
    Sometimes users may create content that they wish to revoke at some future point. I haven't seen this to be particularly difficult in social media I have used. But there is the caveat that some streams of data are public and contributions to those streams do not retain the rights of the originator (to edit or delete).
  5. Data assuredness.
    Users may want to know that what they have created by way of content will be there in the future. When it comes to Facebook once the data is there it is difficult to move it elsewhere. It may not be possible to do so in a meaningful way since, in principal, some conversation threads will comprise items that others have rights over, such as the right to delete.
  6. Degree of Broadcast Propagation.
    Users may want a degree of control over how messages are broadcast, such as depth, extent and over time. Moreover users may not want publicised content to be republished or amplified.
    It is probable that Facebook does satisfy items 3. through 6. to some extent. (I am not going to become an 'expert' FB user to test this out.)
  7. Asset mining.
    Data gathered in the way outlined above is an asset, that is different aspects of the data matrix are given value and sold to advertisers. This would be in a process much the same as the way that a page of newsprint or a time slot of TV is sold to advertisers. There may be secondary markets as advertisers pre-book or attempt to monopolise certain types of data, by location, time frame or other combination of criteria. While this process is hidden from the user, is there any objection to it? How is it different to more traditional media, such as TV advertising and product placement?
    These questions are quite fundamental.
    For instance do users want an advert free service, or do they want one that conforms to, as yet, undefined rules?
    Moreover, just because I use a web site does this mean that how I use it is my data, owned by myself, rather than usage data owned by the service I am using?
    It would not be possible to run an effective service without some usage data, and a great deal that users expect from Facebook would not be available if data access for the purpose of gathering usage data were very restricted.
This will apply to Diaspora too. They will need to know what works and what doesn't, how to tweak things. That is best done on the back of usage statistics. It is just that in the case of FB, along with usage statistics there is a super set of data that is gathered expressly for the benefit of advertisers (analytics), which is common practice in many web sites, including Government web sites and, possibly, University web sites.
If there is an objection here it would be better to be clear what that objection is, for instance would it be the extent of the analytics gathered, the way they are gathered or the huge population over whom they are gathered? 8. Information gaming.
This activity is very similar to product placement. It should be pointed out that only recently have the rules on product placement in the UK been relaxed, and, at that, not to the extent of what is common in the US. Needless to say there are no rules governing this aspect of commercial behaviour on the Web.
Information gaming is where some category of information has its value or status inflated by some process based on analytics. A typical example would be where favourite tunes are derived from user usage patterns and a music publisher is allowed to associate other works with this result by placing these works along side the returned result. Clearly this strategy may be more effective in some contexts than in others. For example, it may be found that because of the restricted space on a mobile device more impulsive buy behaviour is induced to purchase the side by side item.
I would categorise this as a common advertising ploy and include the way Facebook applications are targeted at particular users as an edge case. 9. Information accuracy.
This is a very broad area.
This includes anything from masquerading an identity to falsifying data.
Examples would be where a person goes on line pretending to be someone who doesn't exist with an implied connection with someone who does (I have examples, is this wanted or unwanted behaviour?) More extreme would be masquerading as another person. This is identity theft and could do a lot of damage to the real person without gaining access to their account. (I have also heard of examples of this in the non virtual world. Legal authorities were uncertain if they had power to act and did nothing for many years. The consequences were very disturbing for the imitated individual.)
As to falsifying data, I am uncertain what the protection against this might be apart from what happens now, which is that claims get exposed. Perhaps in the virtual world, where claims are very transient, this may be more of a problem, e.g. a music band that polls number one or something?
I believe there is ample opportunity for data falsification of this nature, that it does happen and is an increasing risk, but my evidence is slight and not at all associated with Facebook.
The Parallel With Broadcasting
There is much that might be learnt from the evolution and regulation of broadcasting in the UK (both commercial and public). It is quite certain that public broadcasting, despite the acceptable climate it generates for its services, will have little influence over the future of the issues being discussed here.
Broadcasting Myself
All of the above has to be taken in the context of the rather immodest desire of people (including myself) to share and broadcast themselves.
Some of these points are taken up in a mild way in this series of blog posts on the subject:-
Why we share: a sideways look at privacy
Here the author http://confusedofcalcutta.com summarises and quotes another author Danah Boyd
  • We must differentiate between personally identifiable information (PII) and personally embarrassing information (PEI).
  • We’re seeing an inversion of defaults when it comes to what’s public and what’s private….you have to choose to limit access rather than assuming that it won’t spread very far.
  • People regularly calculate both what they have to lose and what they have to gain when entering public situations.
  • People don’t always make material publicly accessible because they want the world to see it.
  • Just because something is publicly accessible does not mean that people want it to be publicized.
    Making something that is public more public is a violation of privacy.
A further point made by Danah Boyd is that:- Fundamentally, privacy is about having control over how information flows.
Facebook have no manopoly over the means by which social networking may take place nor the desire of people to share. A huge user base is attractive to advertisers and may, actually, be stimulating to users, what with the buzz of the crowd and highly targeted advertising feeling like attention being paid to the individual. Powerful ingredients. It remains to be seen whether there is actually a great demand for an environment far more under user control than could possibly be offered by Facebook becuase that degree of control would conflict with their revenue model. There is no reason why Diaspora should not offer advertising as well. The issues are slightly complex but it could be that each individual could opt in or out at will. Ideally that should not be a cost to Diaspora: Diaspora is far less expensive infrastructure than Facebook.
I discuss these points in my further posts.
0: Group:GNUSocial/ProjectComparison
1: "a-flock-of-twitters"
2: confusedofcalcutta
3: Danah_Boyd
Adam Saltiel
May 2010