GPT could centralise power online like nothing we've seen

Posted to: Hacker News; DEV

When I was doing my Computer Science degree in around 2005, I passionately believed in the huge potential of the internet for the growth of humanity and democracy.

I thought that it would open up the voices of many millions of people who would otherwise have been voiceless, suffering in the dark. It would shine a light on many dark areas of human society, and involve everyone in the global conversation about the future direction of humanity.

The centralisation of the internet

Since then the internet has experienced some growing pains, to put it mildly.

Instead of everyone having a website, as was the initial vision, big tech monopolies have created walled gardens to control and manipulate people’s speech. Facebook were complicit in genocide, Cambridge Analytica may have influenced key political decisions like Brexit and the 2016 US presidential campaign.

To a large extent, decisions Google make in the YouTube recommendation algorithm have more impact than any individual speech. The beautiful garden of human online creativity is being manipulated and funnelled to serve the interests of a few profiteers.

One key place we can see this playing out is the fight over “zero click searches”. For many years now, Google has abused its monopoly position as the de-facto search engine to keep people on Google.com and prevent them jumping off to other parts of the internet. They steal content from websites and repackage it as their own right there in the search results, so you never need to go to find the real source, and you never need to give them any credit or gain for producing the content. Content producers online are begging for the scraps that fall from Google’s table.

This is the state of the world today. But things are about to get a thousand times worse.

Enter GPT

GPT stands for Generative Pre-trained Transformer, and is the leading large-scale language model. It can analyse and learn from huge expansive data-sets, and then generate unique natural language following the patterns in the training data.

In June 2020, GPT-3 splashed onto the scene, and GPT-4 is expected in the next few months.

(Aside: I try to avoid using the term “artificial intelligence” to describe GPT because it is categorically not Skynet or iRobot, and those narratives tend to distract us from useful analysis.)

GPT-3 has so far give rise to, most notably:

Since the launch of the ChatGPT public beta, we’ve seen many examples, both incredibly impressive and deeply worrying, about what it can do. It will also straight-up lie - but this article isn’t about ChatGPT’s defects.

GPT is quite clearly revolutionary. This technology to automate creating deeply natural-sounding and often accurate and informative content will change the world. The real question is, how will it change the world. Who will it benefit?

Are OpenAI good or evil?

OpenAI, the creators and owners of GPT and DALL-E, were set up in 2016 with a clear mission:

to ensure that artificial general intelligence benefits all of humanity.

In their introductory blog post, they introduced OpenAI as a non-profit company who would share their products “freely” with the world (emphasis mine):

… it’ll be important to have a leading research institution which can prioritize a good outcome for all over its own self-interest.

We’re hoping to grow OpenAI into such an institution. As a non-profit, our aim is to build value for everyone rather than shareholders. Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world. We’ll freely collaborate with others across many institutions and expect to work with companies to research and deploy new technologies.

Oh my how that all changed in 2019. From Wikipedia:

In 2019, OpenAI did not publicly release GPT-3’s precursor model, breaking from OpenAI’s previous open-source practices, citing concerns that the model would perpetuate fake news. OpenAI eventually released a version of GPT-2 that was 8% of the original model’s size. In the same year, OpenAI restructured to be a for-profit company. In 2020, Microsoft announced the company had exclusive licensing of GPT-3 for Microsoft’s products and services following a multi-billion dollar investment in OpenAI. The agreement permits OpenAI to offer a public-facing API such that users can send text to GPT-3 to receive the model’s output, but only Microsoft will have access to GPT-3’s source code.

As soon as OpenAI created something of worth, it was immediately coopted by the biggest tech monopoly out there, and turned into a profit machine. The march to profit from GPT now looks like an unstoppable juggernaut.

A few weeks ago, Microsoft’s plans to invest and extract profit from OpenAI came to light. It is rumoured to be investing a further $10 billion on the understanding that it will get 75% of the profits until the money is paid back, and thereafter will own a 49% stake in the company. This of course puts OpenAI heavily in debt to Microsoft and forces them to monetise as quickly as possible.

Economics of the OpenAI deal

And like clockwork, 3 days later OpenAI announced plans to monetise ChatGPT by providing a paid-for API for using it within applications.

Microsoft’s plan for world domination

OpenAI’s links to Microsoft go way back.

Reid Hoffman was one of the principal founders of OpenAI in 2015. Very shortly after that he sold his company LinkedIn to Microsoft, and promptly joined Microsoft’s board.

Here there’s an interesting twist in my research. ChatGPT tells me that even prior to 2016, Hoffman was an advisor to M12, Microsoft’s strategic investment fund, and that he was involved in other strategic investments. But ChatGPT doesn’t provide references, and I can’t find the information elsewhere.

ChatGPT's answer to "What is Reid Hoffman's history with Microsoft?"

When Microsoft acquired GitHub in 2018, much of the commentary assumed that their strategy was mostly to cosy up to the open source community and get developers on side. But then in 2021 they introduced GitHub Copilot, a code generation tool built on GPT, using their “exclusive” licensing from 2020 and trained on source code from GitHub (which is effectively all the public source code in the world).

Copilot is now, inevitably, the target of a lawsuit, with more almost certain to follow. The tool generates programming code by “learning from” existing code hosted on GitHub. Code which almost always has a licence attached with specific usage terms, which Copilot ignores. Microsoft’s argument will likely be that the product of the GPT algorithm is unique, and no different from a human reading something and then producing unique works. But even if this argument holds up, Copilot has been shown to regurgitate existing code almost verbatim.

GitHub’s annual revenue has jumped from $200-$300 million at acquisition to over $1 billion in 2022. This is largely due to business subscriptions for “GitHub Actions” cloud services. But with Copilot’s code generating capabilities now only available via paid subscription, it’s possible this revenue will jump significantly this year. It seems pretty likely that it was this exact plan that motivated Microsoft to acquire GitHub.

So what might Microsoft’s plans be for ChatGPT to motivate this reported $10 billion investment?

ChatGPT, the interface to the internet

The copyright fights over GitHub Copilot are probably just the warm-up act for what’s to follow with ChatGPT and DALL-E.

If you spend some time using ChatGPT, its weaknesses quickly become clear. It’s often inaccurate, overly verbose, overconfident, caveat-laden and often oddly vague. But there are a couple of things it clearly excels at:

It’s the latter example here that I think forms the most obvious use-case for ChatGPT.

The de-facto way of finding information up to now has been Google search. But Google’s search is looking more and more dated. With the ever growing amount of content online, the lack of innovation in the search engine, and the constant pressure of companies trying to game the algorithm, it’s become extremely hard to find anything esoteric or nuanced on Google.

ChatGPT really feels like it could step in to take Google’s mantle, potentially providing Microsoft with a way to unseat Google as king of the internet. Although Google are of course working on their own rival language model.

But if this happens, the implications would be huge. If a ChatGPT-like service became our main interface to the world’s information, it would mask and exploit the web’s data like never before. Far beyond Google’s current “zero click searches”, ChatGPT explicitly refuses to reveal its sources. It repackages data in an opaque way, to provide an alluring service that completely disempowers and trivialises the actual producers of the original content.

In this world, what incentive is there for anyone to create original content online? What happens to the voices of all those people trying to take part in human discourse, in shaping the future of the human race?

I truly believe that this, or something like this, is Microsoft’s ultimate plan. And it seems like a powerful possibility that they will succeed. And if they do, what will become of the internet?

By @nottrobin