Diaspora Run Down

Background
Original post, now updated and in six sections.
I have divided the Diaspora problem domain into the areas below.
This reflects my own research, interests and thoughts.
Why am I interested in this area?
There are several reasons for this interest.
In broad brush strokes -
1. I have long been interested in the Semantic Web.
2. I have been working on commercial applications whose technology intersects strongly with some of the areas below.
3. Open Source remains the best learning playground and there is always more to be learnt.
4. Activitystreams (as they are called) provides good test data for work on other filtering etc. concepts including algorithms and display techniques, that interest me.
I am not trying to pre-empt the activity of the core Diaspora group or to second guess them. I am more interested in seeing where my thoughts about the issues below intersect with how these various problems i outline are resolved.
These problems includes issues about protocols which determine the scope and behaviour of the system under development, as well as development methodology which determines the human behaviour of the group.
However, in this post I leave off discussion of development methodology. This shouldn't be taken to mean I am harbouring some secret criticism, far from it.
So far the the Diaspora core have raised fair funds to take their efforts well into 2011. A very fortunate position to be in.
They have also responded, as I expected, to the surge in interest very responsibly.
They have set up a group of advisers, necessary due to the large spend they now have at their disposal and the intense interest in the project this necessarily generates.
Social Networking Solutions
Group:GNUSocial/ProjectComparison
Needless to say, the area is not unknown, there is already a lot of work in it, but it hasn't been brought together into a viable offering quite like what it seems the Diaspora team have in mind.
This discussion, while referring to Diaspora, is in most places equally applicable to GNU Social and some other projects except that Diaspora is entirely uncharted, allowing greater range to creative speculation. I do not assume that my coverage, or that of my sources, is absolutely comprehensive. I would like this overview to be rounded and in depth though.
In reading this entry where issues of privacy and security are mentioned it is helpful to bear in mind that there is a gap between user expectations and actual practice. It maybe that in some cases there are tools available that the user does not use, while in other cases there are no tools, internal mechanisms, standards or laws. I have not tried to be thorough in identifying each instance but present the general picture.
Further background can be found in Jeff Sayre's article A Flock of Twitters Decentralized Semantic microblogging however my emphasis here is not the same as Jeff's. Jeff Sayre gives a good description of what knowledge streams might be like in terms of the filtering activity any one individual imposes on data they consume and (re-)publish. He offers a conceptual framework for this which is interesting enough. I am much more concerned with what is possible now, without all the detail needed for more advanced use cases.
Extended Discussion of Business Case
Why now? Who else is working in this area?
Before we go any further this is the first question that has to be examined.
Roughly, there are two approaches for entering into a complex domain like this, that is just to move in trying to create a presentable application (from what exists using a lot of glue) as a best first cut effort or to study the field and move as precisely as possible into an identified niche.
Commercial Potential
There is no doubt that commercial companies can miss opportunities, especially in such a vast, expanding area. Importantly, they can overlook some specific areas because they conflict with their main efforts.
For any would be entrant pin point accuracy might be the best weapon in finding that niche.
The following questions must be answered.
1. Is there a commercial opportunity here?
2. Why is this not being done or, actually, is it being done?
It is not possible to give a fully comprehensive overview of the respective business models of the big players.
The pertinent question is whether there is an alternative offering wanted by sufficient numbers that is precluded from being offered by the big players by virtue of their business model.
Here I take it as a given that technological commitments of companies reflect their business model and alternative technical solutions only make a company vulnerable if they represent an alternative revenue stream of some threat to the existing company.
Is There an Identifiable Alternative Service Offering?
What, actually, do people want? Short on research, and short on time to look at what ever research there may be in the pubic realm, I am going to posit an hypothesis:-
A Simple Hypothesis Concerning Social Networking and Privacy Management
The hypothesis is that while Social Networking offers its users something that they want, an ability to broadcast and be broadcast to, the means by which this is achieved mimics the feeling of having a non virtual social network. People find this confusing. This confusion manifests itself in various ways which I briefly explore.
The network is called 'virtual', but there is something misleading in that term, as it is a network of real people. What is being examined is the way in which communication in these networks is mediated by the services providing the communication.
I take the following two points as givens:
1. Social networking media are used successfully to influence opinion and buying habits by advertisers using new techniques such as viral advertising and crowd sourcing, based on both mining of user behaviour data and targeting the release of associated information through various channels, including the social networking channel itself, aside from overt advertising. These techniques also include buying rank and position for products, that is in some way making them seem popular, favoured and successful. For example, in general terms this applies to the 'like' button of Facebook.
2. Users of social networking services will project (or imbue) their network and its context with feelings that have to do with themselves and their social network. These feelings are capable of distorting their perceptions of the behaviour and role of their network of 'friends' and its context as a service of the provider.
It remains to be investigated what it is that people actually object to about Facebook at the moment. (But see last section, The Facebook Response, where I briefly look at their immiediate response.)
In terms of a set of feelings about FB I wonder which ones dominate. Is it a feeling of powerlessness? If so, why would that be different to feelings about a TV channel or a telephone service provider?
My hypothesis is that the feelings are different to those felt about other services and it is due to a combination of points 1. and 2. above combined with the far less certain and less well know regulatory and normative rules framework governing internet activity. (The lack of high profile regulation in the area.)
I posit that objections fall into all of the following categories.
  1. Data mining.
  2. Privacy of the personal.
  3. Data security.
  4. Revocation of statements.
  5. Data assuredness.
  6. Degree of Broadcast Propagation.
  7. Asset mining.
  8. Information gaming.
  9. Information accuracy.

  1. Data mining. Where the user goes on a page is monitored, possibly down to fine detail of how long the mouse hovers in a certain area, which buttons are selected and so on. All three aspects of the user journey is monitored:-
    When the user enters an area (landing page) from where they have come is monitored (start page) and when they leave the landing page the to where (target page) is also monitored.
  2. Privacy of the personal.
    As a logged in user there is very little information about the user that might not be associated with this information. What would not, or should not be associated with this information is the users name and contact details, or, of course, log in details.
    But certainly location, ethnicity, age, sex, sexual preference, marital or partner status, political opinion, economic status and many other details can be harvested (in one way or another and the methods are growing in sophistication). It is not associated with this particular user, in fact it is more valuable associated with a group of people. There are different ways in which such a statistical group may be arrived at. Similar behaviour groups users together into a statistical category group. However, in the use of FOAF technology there is nothing that stops such a category group being formed from other friends in the network, in other words information about friends and friends of friends, but all the time anonymously.
    There is now an explosion in the data available to advertisers, the difficulty they now face is what types of data they require for the product or service they are offering. The line between targeted advertising and influencing group thinking has become very much slimmer as a result.
    Since many 'applications' that is the games etc popular on these sites, have the same privilages as the user using them they, too, would be able to access otherwise private information that may be introduced to them by the user. They may also take information from the page context in which they are being used.
  3. Data security.
    Security is a big word. It is quite clear from the above that data is not secure in that it is mined. Most contexts on a web site like Facebook are available for data harvesting. The way to demonstrate this is to search Google for subjects that have been discussed on Facebook. So the issue here is not just that the data is public (it has been indexed by Google, or anyone else who is interested) but whether the additional information mined, as described, is any more of a security threat to the individual, the subject of the data.
    It can be seen that security really is a concern about two aspects of social networking, one is the way anonymous data is used, the other is whether private data is actually secure (can not be seen by those who should not) and safe (has durration as expected by the user).
    If we accept that anything that would attach a particular user to usage patterns is intended to be kept private, is the system effective in this? The answer to this must be that it is as safe as the provider can make them. This aspect, at least, of web site management is legislated for and monitored. (Data Privacy Acts in the UK.)
  4. Revocation of statements.
    Sometimes users may create content that they wish to revoke at some future point. I haven't seen this to be particularly difficult in social media I have used. But there is the caveat that some streams of data are public and contributions to those streams do not retain the rights of the originator (to edit or delete).
  5. Data assuredness.
    Users may want to know that what they have created by way of content will be there in the future. When it comes to Facebook once the data is there it is difficult to move it elsewhere. It may not be possible to do so in a meaningful way since, in principal, some conversation threads will comprise items that others have rights over, such as the right to delete.
  6. Degree of Broadcast Propagation.
    Users may want a degree of control over how messages are broadcast, such as depth, extent and over time. Moreover users may not want publisised content to be republished or amplified.
    It is probable that Facebook does satisfy items 3. through 6. to some extent. (I am not going to become an 'expert' FB user to find this out.)
  7. Asset mining.
    Data gathered in the way outlined above is an asset, that is different aspects of the data matrix are given value and sold to advertisers. This would be in a process much the same as the way that a page of newsprint or a time slot of TV is sold to advertisers. There may be secondary markets as advertisers pre-book or attempt to monopolise certain types of data, by location, time frame or other combination of criteria. While this process is hidden from the user, is there any objection to it? How is it different to more traditional media, such as TV advertising and product placement?
    These questions are quite fundamental.
    For instance do users want an advert free service, or do they want one that conforms to, as yet, undefined rules?
    Moreover, just because I use a web site does this mean that how I use it is my data, owned by myself, rather than usage data owned by the service I am using?
    It would not be possible to run an effective service without some usage data, and a great deal that users expect from Facebook would not be available if data access were very restricted. This will apply to Diaspora too. They will need to know what works and what doesn't, how to tweak things. That is best done on the back of usage statistics. It is just that in the case of FB, along with usage statistics there is a super set of data that is gathered expressly for the benefit of advertisers (analytics), which is common practice in many web sites, including Government web sites and, possibly, University web sites.
    If there is an objection here it would be better to be clear what that objection is, for instance would it be the extent of the analytics gathered, the way they are gathered or the huge population over whom they are gathered?
  8. Information gaming.
    This activity is very similar to product placement. It should be pointed out that only recently have the rules on product placement in the UK been relaxed, and at that not to the extent of what is common in the US. Needless to say there are no rules governing this aspect of commercial behaviour on the Web.
    Information gaming is where some category of information has its value or status inflated by some process based on analytics. A typical example would be where favourite tunes are derived from user usage patterns and a music publisher is allowed to associate other works with this result by placing these works along side the returned result. Clearly this strategy may be more effective in some contexts than in others. For example, it may be found that because of the restricted space on a mobile device more impulsive buy behaviour is induced to purchase the side by side item.
    I would categorise this as a common advertising ploy and include the way Facebook applications are targeted at particular users as an edge case.
  9. Information accuracy.
    This is a very broad area.
    This includes anything from masquerading an identity to falsifying data.
    Examples would be where a person goes on line pretending to be someone who doesn't exist with an implied connection with someone who does (I have examples, is this wanted or unwanted behaviour?) More extreme would be masquerading as another person. This is identity theft and could do a lot of damage to the real person without gaining access to their account. (I have also heard of examples of this in the non virtual world. Legal authorities were uncertain if they had power to act and did nothing for many years. The consequences were very disturbing for the imitated individual.)
    As to falsifying data, I am uncertain what the protection against this might be apart from what happens now, which is that claims get exposed. Perhaps in the virtual world, where claims are very transient, this may be more of a problem, e.g. a music band that polls number one or something?
    I believe there is ample opportunity for data falsification of this nature and that it does happen and is an increasing risk, but my evidence is slight and not associated with Facebook.
There is much that might be learnt from the evolution and regulation of broadcasting in the UK (both commercial and public). It is quite certain that public broadcasting despite the acceptable climate it generates for its services, will have little influence over the future of the issues being discussed here.
All of the above has to be taken in the context of the rather immodest desire of people (including myself) to share and broadcast themselves.
Some of these points are taken up in a mild way in this series of blog posts on the subject:-
Why we share: a sideways look at privacy
Here the author http://confusedofcalcutta.com summarises and quotes another author Danah Boyd
  • We must differentiate between personally identifiable information (PII) and personally embarrassing information (PEI).
  • We’re seeing an inversion of defaults when it comes to what’s public and what’s private….you have to choose to limit access rather than assuming that it won’t spread very far.
  • People regularly calculate both what they have to lose and what they have to gain when entering public situations.
  • People don’t always make material publicly accessible because they want the world to see it.
  • Just because something is publicly accessible does not mean that people want it to be publicized.
    Making something that is public more public is a violation of privacy.
A further point made by Danah Boyd is that:- Fundamentally, privacy is about having control over how information flows.
Does the Proposed Service Conflict with Existing Popular Services?
The 'Diaspora' System in a nutshell
The last point about control over information flow is truistic. But a service that concentrates on addressing the issues of how to control information flow is certainly different to what we have at the moment.
In the sense of design philosophy it does conflict with what is offered at the moment. As such a system evolves (as a result of user feedback however generated) this will become increasingly apparent. I believe that the defaults of Diaspora will revert to what was the norm. I assume the goal of having user control over each piece of data or communication to expand, contract or remove access and to edit, version or delete as ownership and system constraints allow, whether flowing to the individual or flowing from the individual.
There are three distinct points of departure from Facebook in the envisaged architecture of Diaspora.
1. Facebook could not offer full security of the type possible with Diaspora, its architecture probably would not sustain this, or do so with difficulty.
2. Another consequence of Facebook architecture is that a huge amount of traffic is going through the same domain, this means that the autonomy of each Facebook profile does not exist apart from through the Facebook super domain. This is both a network issue (I understand that it is technically 'unhealthy' but lack details for this assertion) and contradicts one of the basic principals of the design of the internet, that each item (page) has a transparent and reliable identifying address (URL). This second point is a bit technical. It pertains to the ease with which Facebook may exchange data with other applications (while also respecting defined privacy).
This is not possible with Facebook, while I expect it will be intrinsic to Diaspora.
3. The Diaspora architecture is intrinsically less expensive to maintain. Without the centralised architecture there is no need to create such massive revenue streams to maintain and show profit from infrastructure.
W3C Initiatives
The Wider Technical Community
Casting our net for further guidance the W3C has several initiatives in the area we are interested in. Parts of most of their work intersect with our concerns.
It should be noted that, to my knowledge, W3C work is not based on 'customer' surveys.
W3C is well named, it is a fee paying consortium. It is based on polling interested parties, usually those from academia and industry who can give sufficient sponsorship to individuals to carry them through the writing and presentation to conference of papers and steer recommendations through different stages to acceptance.
However, for our purposes, the consortia structure works in reverse: We can use what surfaces in W3C as a measure of the concerns of different types of internet user.
It is also important to note that W3C has huge reputation but does not have any legal powers to impose recommendations or standards. W3C make recommendations on the basis of consensual committee agreements (how ever achieved, there may be a voting system for those with a registered interest). Sometimes those recommendations languish, or or ignored by the wider technical community. (This has happened often, providing some note worthy historical cases.)
One point to be made here is that, to my knowledge, in the UK, one of the largest users of IT that also actually intersects with much of W3C work, the UK Government, has not introduced a program of evaluation and adoption as a series of contractual obligations with it suppliers.
In other words, W3C can be circumvented in the implementation domain on a grand scale. As I will show later, when discussing the lack of standards that apply to the information domain, the behaviour of government as an influential lead body does have an impact on us.
W3C has done a lot of work in the area of privacy, Intellectual Property (that is concerning patents of solutions presented to W3C which are licensed on royalty free terms imposed on members of working groups), DRM, policy management and other.
Here I am going to narrow down my exploration by concentrating on one authentication and trust framework solution called FOAF+SSL. I build out from this, mentioning how it contrasts with other different proposals and measures.
I explain this in the section Architectural Objectives below.
The W3C Work on Privacy
What follows is quoted from these materials:- 2002-04-16
The Platform for Privacy Preferences 1.0 (P3P1.0) Specification
Group Notes
2006-11-13
P3P really has become the mechanism by which web sites inform their users of how they use or intend to use their data. It has come to be restricted to the policy on sharing user addresses with other parties and so forth, but it's original intention was much broader in scope.
From The Platform for Privacy Preferences 1.1 (P3P1.1) Specification :-
  1. Introduction
The Platform for Privacy Preferences Project (P3P) enables Web sites to express their privacy practices in a standard format that can be retrieved automatically and interpreted easily by user agents. P3P user agents will allow users to be informed of site practices (in both machine- and human-readable formats) and to automate decision-making based on these practices when appropriate. Thus users need not read the privacy policies at every site they visit.
In Looking Back at P3P: Lessons for the Future, November 11, 2009, Ari Schwartz from The Centre for Democracy and Technology says:-
Although P3P provides a technical mechanism for ensuring that users can be informed about privacy policies before they release personal information, it does not provide a technical mechanism for making sure sites act according to their policies. Products implementing this specification MAY provide some assistance in that regard, but that is up to specific implementations and outside the scope of this specification. However, P3P is complementary to laws and self-regulatory programs that can provide enforcement mechanisms. In addition, P3P does not include mechanisms for transferring data or for securing personal data in transit or storage. P3P may be built into tools designed to facilitate data transfer. These tools should include appropriate security safeguards.
The following shows part of the specification definition and its modification by a later note:-
1.1 The P3P 1.1 Specification
The P3P1.1 specification defines the syntax and semantics of P3P privacy policies, and the mechanisms for associating policies with Web resources. P3P policies consist of statements made using the P3P vocabulary for expressing privacy practices. P3P policies also reference elements of the P3P base data schema -- a standard set of data elements that all P3P user agents should be aware of. The P3P specification includes a mechanism for defining new data elements and data sets, and a simple mechanism that allows for extensions to the P3P vocabulary.
1.1.1 Goals and Capabilities of P3P 1.1
P3P version 1.0 is a protocol designed to inform Web users about the data-collection practices of Web sites. It provides a way for a Web site to encode its data-collection and data-use practices in a machine-readable XML format known as a P3P policy. The P3P specification defines:
* A standard schema for data a Web site may wish to collect, known as the "P3P base data schema" (5.5)
* A standard set of uses, recipients, data categories, and other privacy disclosures
* An XML format for expressing a privacy policy
* A means of associating privacy policies with Web pages or sites, and cookies
* A mechanism for transporting P3P policies over HTTP
The goal of P3P is twofold. First, it allows Web sites to present their data-collection practices in a standardized, machine-readable, easy-to-locate manner. Second, it enables Web users to understand what data will be collected by sites they visit, how that data will be used, and what data/uses they may "opt-out" of or "opt-in" to.
From P3P Specification Note
The W3C Work on Privacy
Privacy Bird
This is a W3C tool used to filter browsing of other web sites, it is a filter of information coming in, not of user information going out:-
The Privacy Bird will help Internet users stay informed about how information they provide to Web sites could be used. The tool automatically searches for privacy policies at every website you visit. You can tell the software about your privacy concerns, and it will tell you whether each site's policies match your personal privacy preferences by using bird icons.
Privacy Bird
The W3C Work on Privacy
Protocol for Web Description Resources (POWDER)
W3C Recommendation 1 September 2009
This recent recommendation has great relevance to our present purpose. The recommendation is the subject of ongoing usage and implementation research. Notice "publication of descriptions of multiple resources" which is essentially a Semantic Web action, and difficult to achieve without using that technology. Facebook, as it is constructed, would find it difficult to comply with this recommendation and for that reason it is referred to as a walled garden. There is no way of understanding what is inside from outside, nor accessing it in a consistent manner (despite it being indexed and mined for analytics). Advanced implementations of foaf+ssl that I am advocating here, are designed exactly for this purpose.
The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. These descriptions are always attributed to a named individual, organization or entity that may or may not be the creator of the described resources. This contrasts with more usual metadata that typically applies to a single resource, such as a specific document's title, which is usually provided by its author.
From POWDER 2009
(Example) Use case
2.1.8 Child protection B
  1. Thomas creates a portal offering what he considers to be terrific content for children. He adds a Description Resource expressing the view that all material available on the portal is suitable for children of all ages.
  2. Independently, a large content classification company, classification.example.org, crawls Thomas's portal and classifies it as being safe for children.
  3. Discovering this, Thomas updates his Description Resource with a link to the relevant entry in the online database operated at classification.example.org.
  4. 5 year old Briana visit's the portal. The parental control software installed by her parents notes the presence of the Description Resource and seeks confirmation of the claim that the site is child-safe by following the link to the classification.example.org database, which her parents have deemed trustworthy.
  5. On receiving such confirmation, access is granted and Briana enjoys the content Thomas has created. From POWDER 2007
Extended Discussion of Security, Privacy and Trust
Security, Privacy and Trust
Security and Privacy
I distinguish between security and privacy. Security is one means by which privacy is obtained but does not create it. A reasonable form of privacy can also be obtained without extreme security measures.
Security can be visualised as on a sliding scale. The most extreme is what I will call paraanoid. If I were a student from Tianamin Square through Tehran to Bangkok that is the sort of security I would want on my mobile and blog.
Two Common Negatives
One argument against giving people greater powers of privacy is this:-
This might make it more difficult to intercept criminals engaged in various nefarious activities.
I am aware that this is a common concern and raised it with Henry Story just recently. He pointed out that groups of people, who are citizens of the larger society, are responsible for themselves and can be self policing. At least in the main. I expect that policing authorities would be more concerned about cells (political, criminal or terrorist) but I really have to draw the line here at what I am competant to discuss.
I will have to say the same about another common objection, that this might be popularising tools that enable and encourage Copyright infringement. This encouragment, anyway, already occurs on a much larger scale, for instance by Google, as I point out elsewhere.
To clarify, there are different types of security and privacy that the Diaspora application could offer.
Security
The greatest security would be achieved by having all data encrypted whereever it is stored and encrypted when ever it is transported across the public internet. Since trusted users (the data creator and others) must have access to the same data items both locally and across the internet the means by which the data is decrypted could be the same in both situations. Nevertheless, the encryption of all data is an extra burden that would have implications in different parts of the Diaspora system.
Two terms are introduced. End to end encryption and group encryption.
End to end refers to the SSL certificate authentication, as might be used by a bank where crucial data is being sent to the bank in encrypted form. Usually only certain data is sent in this way. But the mechanism offers and other systems might use encryption of all communications, for instance encrypted email. Group encryption is where access to a domain is always encrypted. Typical use case here is VPN, where companies assure access to their own intranet to employees accessing it over the public internet. Some reasonable decisions must be made about what is needed here. Encrypted email exists for the situation where the traffic to a known domain is of interest to intercept. This is two things, the traffic is of interest and the destination is known.
While just about all the data flowing in and out of Facebook could be intercepted, since it is a series of very well known destinations, there is a certain safety in numbers. Any one piece of data is likely lost in (although in principal recoverable from) the general noise. In more specific intercepts user domains are needed. (Presumably these are readily available by one or another means.)
In the case of a more distributed system intercepts would have to follow on the more difficult to find user domains.
So it is true that if further, near absolute, security is required all data would have to be encrypted, perhaps as a user choice.
This is less of a lightweight process though.
Privacy
In my Simple Hypothesis above I state that there is a gap between user expectations and perception of the service they are using which is a product of the type of service being used and the way the sevice provider gains its revenue.
Privacy, on the other hand, can be satisfied by having easy to configure controls, essentially these are read and write controls over content that travel with that content irrespective of context. This is a series of issues quite separate from security apart from access to the privaleges to change read write status of a content item.
Trust
In the Diaspora architecture it can be seen that the possibility of amalgamating two or three broad category approaches is being considered.
These are the Open Profile / Browser Certificates approach with the Federation of Servers approach and Peer to Peer by allowing for servers to run on the user's computer. The two issues that this would address are:-
a. By putting a certain degree of trust into servers, these become unencrypted trust networks, in addition are the SSL keys considered safe on servers for the future?
Explanation of a.
Unencrypted trust networks. This means that various servers in the federation hold various amounts of data about the users of the network.
Aggregating this data might have great value, for instance for an unscrupulous business or determined government.
There would be no protection against this as the measure of protection that encryption might aford in such circumstances could not be enforced or might be revoked if it existed, by one or more of the federated servers.
The safety of SSL keys refers to the private keys held on behalf of distinct user entities. The question is how safe is the commodity service being relied on at this point.
b. Even if only encrypted messages are transported and stored, the social graphs would still be entrusted to remote servers.
Explanation of b.
This shows both what is being considered in conection with trust of external servers and that peer to peer is considered the highest secure solution. To achieve the highest degree of security all data must be encrypted where ever it is held and in transport, or, next rung down, be encrypted as it is sent over the network.
The social graph refers to the relationship between friends, items and the history of this interaction.
A Reasonable Question About Capability and Capacity
It would seem (in the common perception) that only very big services might be relied on for seemless storage over time. Here the assumptions are that they have the resources to tend to the infrastructure, and a reputation they wish to maintain. Powerful drivers to maintain the offered service.
Again it should be noted that there may not be any contract between provider and service consumer of this kind.
The Diaspora distributed way of guarunteeing capability and capacity is to rely on several nodes, replication between nodes and careful engineering of the relationship between on line and off line nodes in the context of the information that should be delivered to each node (public notices and FOAF relationships).
Note the recurrance of the use of the FOAF profile. This is another powerful reason to create an architecture that uses FOAF directly and would be able to exploit its potential as a Semantic medium.
Separating the Profile from the Social Graph
Profiles should be conceived of as a set or rules that allow for dynamic negotiation between different particular profiles into a profile set.
Initial implementation can be straightforward and should just take account of access controls. Later semantic reasoning tools can be applied to the data set for more expressive results.
The Security Context of the Social Graph The social graph must be available for read and write according to a schema which is the intersection of profiles as controlled by the principle profile, this user.
Existing Solutions
Protocols
Normalisation protocols
This situation has been very much in flux in that there is no one clear set. The two main protocol contenders for normalising multi-sourced data that I am considering here are SMOB Home and SMOB code and StatusNet write up and OStatus wiki. The reason for this is StatusNet may be favoured by the Diaspora team and SMOB is a semantic pathway.
Architecture
The best reference I have found for this is on the libreplanet web site called GNU social. There are three pages of interest. 1. A categorised list of existing architectures, goals and ideas with links and discusion. [15:]
2. A categorised list Group:GNU Social/Project Comparison of existing projects with links and discussion. [16:] 3. A design description of the GNU Social node by a Washington University 2010 graduate Ted Smith. [17:]
Architectural Objectives
Here are some questions.
What are the desirable trades offs between security and a light weight program foot print?
What is the extent to which locally held information should be encrypted?
In other words how far up should the security control be turned?
Some features that very strong security would make more difficult to deliver using p2p architecture and PGP (the most common solution in this space) are:-
i. A lighweight solution suitable for mobiles. ii. A lightweight solution that is easy to install and use on a user local machine (non-mobile).
libreplanet and Diaspora seem to favour p2p for maximum security, which seems to imply one of the engines discussed in those referrences.
Those engines offer many capabilities, however to use the straight forward mechanism of foaf+ssl some mapping between their existing security mechanism and this protocol would have to take place (p2p usually relies on PGP).
Further mappings between the protocal of the streams and a semantic representation would also be needed at some stage. It would be important not to be restricted by the communication protocol in such a way as to makes this mapping task more complex.
There are two main p2p protocols available, Pyc (which project is incoperating a javascript http server) and XMPP.
There main reasons for looking at these solutions is that they offer secure communication in the p2p context and may also immiediately offer the means to track distributed historical data. These reasons are not compelling, the same capability is available in servers using the HTTP protocol.
There are also powerful reasons not to use these p2p solutions.
1. They do not use HTTP and this can have consequences in terms of firewalls as well as passing binary data such as large files.
2. The p2p engine for XMPP may not be as versatile in deployment options as an HTTP based solution, in many deployments it is an extra engine where the engine for HTTP would already be in place.
FOAF+SSL alternative
Research indicates that a very minimal install is possible where the main engine is the browser consumming HTML5 and javascript (AJAX). File reading and writing (with some restrictions) from javascript is possible on the local machine, cross-domain AJAX (using the script tag) can consume JSON(P).
It would also be able to authenticate point to point and encrypt and decrypt content locally.
Summary of Advantages of foaf+ssl for Identification and Authentication
1. HTTP will not encounter firewall problems.
2. Engines, importantly for the client, are ubiquitous and will not need to be installed.
3. It can deal with binary data without extra work.
4. Likely the server footprint is small and available for all sort of devices including mobile.
5. Different sorts of servers can be used for different purposes, for instance non-blocking polling, comet, reverse AJAX can all be made available, as needed. Not all server services have to be or should be deployed on every node.
6. Replication and redundancy can be introduced in this way without every node having to play a full part in either.
And in the foaf piece:-
1. Provides a ready solution to the problem of data aggregation and all sorts of additional mark up of that data, e.g. history, availability, authorship and so on.
2. Immediately associates access rights of individuals and groups with designated data types.
3. Allows immediate propagation of changes in those rights.
4. Can be stored in LOD (Linked Object Cloud ref needed) cloud, i.e. very versatile manner, not dependant on one data base or type of schema so long as it handles the rdf.
5. Opens out the rich vista of semantic mark-up and queries.
A Taste of Semantic solutions
The following gives a feel for the subjects covered by Semantic Web interested parties.
Call for Papers
     1st Workshop on Personal Semantic Data: PSD 2010

    http://semanticweb.org/wiki/Personal_Semantic_Data

             co-located with EKAW 2010

        11th - 15th October, Lisbon, Portugal
Personal Semantic Data is scattered over several media, and while semantic technologies are already successfully deployed on the Web as well on the desktop, data integration is not always straightforward. The transition from the desktop to a distributed system for Personal Information Management (PIM) raises new challenges which need to be addressed. These challenges overlap areas related to human-computer interaction, privacy and security, information extraction, retrieval and matching.
This workshop will bring together academics and industrial practitioners with the goal of fostering cross-domain collaborations to further advance the use of technologies from the Semantic Web and the Web of Data for PIM and to explore and discuss the challenges and approaches for improving PIM through the use of vast amounts of (semantic) information available online. At the same time we want to provide a platform for discussing research topics and challenges related to personal semantic data.
Quotes The following are quotes I noticed on the joindiaspora site.
@Laurent Eschenauer - Ostatus is great but for 140 characters so I think it's supplemental to projects like my own (get6d.com) and perfect for businesses.
Onesocialweb is also great but being built on XMPP means not all shared hosts can use it. We experimented with using XMPP a year or so ago and came to the conclusion if the lowliest of shared hosts can't host a person's site/identity then it wasn't a viable solution. This is why we built ours using just php. We do have future plans to extend it with possibly XMPP or node.js but first we want to get basic site framework finished. Either way because we built ours RESTfully it'll allow us to repackage the message in anyway later so we could talk to XMPP or whatever else.
Posted by: Erik Bigelow | May 6, 2010 5:27 AM
Resources
0: Group:GNUSocial/ProjectComparison
1: "a-flock-of-twitters"
2: confusedofcalcutta
3: Danah_Boyd
4: P3P 1.1
5: The Centre for Democracy and Technology
6: P3P Specification Note
7: Privacy Bird
8: POWDER 2009
9: POWDER 2007
10: Henry Story
11: smob.me home
12: smob.me code
13: StatusNet write up
14: StatusNet OStatus wiki
15: GNU Social categorised list
16: Group:GNU Social/Project Comparison
17:Ted Smith's GNU Social node
Adam Saltiel
May 2010
 

0 comments:

top