Introducing NextGraph

Why centralisation is bad? What about Federated systems? NextGraph brings about the convergence between P2P and Semantic Web, towards a cloud based on CRDTs.

Why centralisation is bad for us ?

The web as it is for now, based on DNS, HTML and HTTP is reaching several limits inherent to its topology, business model, and technology.

What emerged in the 80’s and 90’s as a network of networks thanks to the Internet Protocol, was an infrastructure for freedom. Anybody could reach and exchange with anybody else on the ever-growing federation of interconnected hosts, as long as some peering agreements were enforced between the Autonomous Systems of the IP network.

When the web picked up at the end of the 20th century, with a boom of investment at the turn of the millennium, businesses took over what universities and public entities had initiated.

Since then, the web became the playground of big business, that totally transformed its nature.

Surely great digital technology advances were made in the last 20 years, under the lead of huge multi-national corporations, that became the most valued global companies ever, and ushered in a new era of infinite digital services.

But this had a cost. What we now call Surveillance capitalism, orchestrated by big companies like Google, Apple, Facebook, Amazon and Microsoft (GAFAM) is seriously endangering privacy, security, freedom, democracy, and even ecology.

These companies that keep our digital lives captive of their datacenters, with proprietary formats and centralised architectures, constantly gather information about us, that they in turn sell to advertisers. This is their business model, and it forced the web to become and to remain highly centralised.

This centralised topology brings all kind of negative outcomes including: exposure to cyberattack and leaks, global outage and downtimes, surveillance by state intelligence or private interests, top-down control over the flow of media information, dependence on third parties, and even geopolitical concerns.

As a consequence, an ever growing number of users and companies are aware of these issues and are seeking alternatives.

Federated systems

The last 10 years saw some increased work on federated web systems, where the data is not stored in a monolithic global system but instead is being spread into a multitude of systems, in what we could call a multi-lithic topology, or the federation. It still uses the same technologies of the Web 2.0, with DNS, HTTP, and centralised databases, but focuses on splitting the ownership into multiple hosting entities, who federate their data and services in a peer to peer manner. The oldest example of it remains the email system: SMTP. \n With federated systems, users can chose their service provider and obtain an identifier of the form user@serviceprovider.tld. While the cloud services are so far not federated (each cloud provider remains totally hermetic to competitors), some alternatives start to emerge with NextCloud, CozyCloud, Matrix, XMPP, Solid or ActivityPub-based social networks like Mastodon.

The question of ownership is not solved. In the federated web, users are still captive of their service provider. But they can choose which service provider they trust. Also they might not easily be able to switch from one provider to another, and this is when things get more complicated. By example, a user’s Solid POD hosted on a POD provider, will be identified with the domain name of such provider. This identifier is then used to label each LDP resource that the user will create and host on his POD. Once this resource is shared and linked into the web of semantic data, it is impossible for this user to switch to another provider, as all the links to his resources would become broken. The same happens with email addresses that are difficult to forward when the user is switching SMTP provider.

Another caveat of federated systems is the privacy issues still faced by users who need to trust their service provider about what happens to their personal data. No matter how transparent the provider is and how much the legal frame forces it to implement privacy measures (GDPR), at the end of the day, the system administrator is always the almighty boss of all the data stored on the system, and has full access to all our personal data, which can be leaked, by accident or voluntarily, and that can remain on some servers or backups forever, with very little the user or the public authority can do to have it removed. Unless end-to-end-encryption (E2EE) is enforced, federated systems do not improve privacy. And when E2EE is enforced, the server becomes just a relay for unreadable data, which reduces greatly the set of functionalities the service can provide to the end-user.

The existing federated solutions see their user base increasing slowly, but do not solve the main challenges we are facing today. The question of the User Experience is also primordial to understand, as mainstream users of cloud services tend to stay with their current providers (GAFAM) because the alternatives do not provide them with the same level of integration of products and ease of usage. Also their current providers do not offer them a way to easily switch to federated or P2P alternatives. Only some laws and regulations could force those companies into data portability, interoperability and openness. But efficient and competing alternatives should emerge first.

Convergence between P2P and Semantic Web

As developers and software engineers, we know where the limitations of the web 2.0 are, and how we could overcome the caveats of the current technology, while providing the best experience to developers and users alike.

As long as you have DNS in your system, you are not really decentralised.

and

As long as your data sits in a monolithic database, no matter how much distributed it is, or how many REST APIs you have access to, you are not fully interoperable nor portable.

and

As long as end to end encryption is not ubiquitous in your system, you are not really respecting privacy.

To each of these concerns, there is a technological answer:

  • Decentralisation is enabled by P2P technologies that uses only IP and get rid of DNS for the transport layers. Among them are IPFS, DAT, torrent, GNUnet, libp2p. Routing and Discovery is done with DHTs most of the time, which creates more privacy issues than it solves.

  • Data interoperability can be achieved with the Semantic Web, RDF, linked data, and ontologies for describing schemas.

  • E2EE encryption is available today with several technologies including Double Ratchet protocol and X3DH from Signal/OpenWhisperSystems, Olm by the matrix people. And of course PGP of old, or any asymmetric encryption applied on the client side before sending data to peers. Until today, E2EE encryption is mostly used in combination with centralised servers for routing and serving user interfaces, which defeats its privacy feature. Services like indexing, search, discovery, graph traversal need then to be moved to the client, and so far such technology is lacking.

RDF and linked data use URIs as unique identifier for resources. Those are most of the time URLs that start with the typical https://domain.tld/ which make them stuck in the web 2.0 world. Using hashes and public keys is a solution that lets semantic data be shared on P2P networks.

On the P2P side, DHTs do not solve all the issues of discovery, and are quite slow and not privacy oriented as they scatter data all over the internet. Going back to IP for transport brings some headaches as devices change IPs all the time. Furthermore, internet traffic has to be minimised on mobile devices. Therefor decentralised PubSub technologies should be used in order to synchronise efficiently semantic data between peers.

E2EE forces the data to be present and manipulated locally, which in turn implies that data processing (query engine) must run in embedded/mobile devices. This technology is not ready yet. We still have to develop lightweight Semantic engines that can run in small devices. Federated queries on P2P networks is the solution for implementing high value services, without the need for centralised servers.

Then, if the data is spread all over the P2P network and does not sit in a centralised or distributed database, ACID properties cannot be enforced globally, and conflicts will inevitably arise when concurrent modification of the data occur in 2 independent peers or partitioned networks. Modification of data inside a transaction across several devices, becomes inconsistent. CRDTs are the answer to this challenge as they bring strong eventual consistency and also offline-first capabilities.

More challenges arise: how does decentralised identity and authentication work? How to replace the centralised queries when a search on the global graph is needed, while preserving response times and accuracy ? How to index scattered data while preserving privacy? Are the frontends and middleware tools ready for such a technological shift ? How to provide a coherent framework for developers to start reinventing the web ?

Blockchains / Web3

In recent years, we have seen a huge interest for blockchain technologies emerging both in the crowds of developers who try to achieve similar goals on decentralisation, and in the general public in search for quick and easy ways of profit.

Some blockchain technologies like Ethereum have been used for real decentralised projects like DAOs (Decentralised Autonomous Organisations), supply chain, or smart contracts. Those blockchain technologies have the peculiar feature of being based on a single, global, unique chain of blocks. Only the verification and validation of the transactions are decentralised, while the chain is global and unique, and changing its rules leads to forks and incompatibility between them; different chains being totally hermetic one to another. It is also deplorable to see those blockchain technologies having difficulties to sever the historical link they have with cryptocurrencies.

Some permissioned blockchain softwares, like Hyperledger, let software developers decide on the scale, rules and topology of the chain, have no dependency to any cryptocurrency, and bring more decentralisation into the world of blockchain. Then there is Holochain, that associates one separate chain to each application.

It is undoubtable that smart contracts and their decentralised validation are an interesting technology. Still, the current softwares do not deliver on their promise of full decentralisation, security or privacy, and confusion reign on the terminology and the goals of such technologies.

Web 3.0 used to be a term referring to the advent of the Semantic Web, and was first used by Tim Berners-Lee in 2006. It has since then been reused by the cryptocurrency enthusiasts in the form “Web3”.

With NextGraph, we are aiming at the advent of a truly decentralised technology, that would bring about the Semantic Web envisioned by Tim Berners-Lee and his Giant Global Graph of interoperable data. We do understand the need for transactions to be validated in a decentralised way, and we integrated this feature in our design, without all the hassle of a unique chain or a cryptocurrency.

At the end of the day, we could use the term web0 or Small Web, as defined by Aral Balkan

But what matters the most eventually is not so much the name we give it, but instead to be able to provide efficient and reliable softwares that prepare for this technological revolution in the making.

NextGraph

Share: