Submission
Privacy Policy
Code of Ethics
Newsletter

Personal Intelligence: A Gateway Drug to All-knowing AI?—Part I.

Apple has recently announced that its latest generation of operating systems will now run its personal assistant service on a serious AI basis and will have direct access to all personal data on the system. While privacy was at least as much of a focus in the unveiling, the question is, what will be the consequences when omniscient AI systems become part of everyday life? Who and how will handle our personal data, who and how will guarantee its protection? And above all, does anyone think that there should be a “do not request these services” button?

Reflecting on the announcement, we also read about the big mobile manufacturers’ vision for GenAI, and the details of Apple’s vision for AI, as announced at WWDC 2024. But the most exciting and worrying part of the announcement is not what was said. Let’s step away from the specifics for a moment and take a broader look at what trends we can expect from “omniscient” AI assistants.

It is perhaps worth approaching the question from the perspective of how the commercial Internet as we know it today came into being, what the implications of the current “business model” are, and how (gen)AI will be integrated into current trends soon.

The structure we know today as the Internet was originally launched as part of the ARPANET project with funding from the US Department of Defense in the late 1960s. Its aim was to facilitate communication and information sharing between researchers and institutions. In the 1980s, the NSFNET project further expanded the Internet and laid the foundations for widespread network access.

Then, in the early 1990s, the emergence of the World Wide Web (WWW), as we know it today, brought about major changes. Tim Berners-Lee of the European Organization for Nuclear Research (CERN) developed the HTML, HTTP, and URL technologies that made it easy to create and browse web pages. At the same time, in 1991, the United States Congress passed an amendment to the NSFNET policy that allowed commercial traffic on the Internet.

It’s hard to imagine from today’s perspective, but the early internet was completely free of targeted data collection about users. The situation started to change when advertisers started to appear on the network alongside users. The first banner ad was placed on the HotWired website by AT&T in 1994, mostly as an experiment. The aim was mainly to plant the idea in the minds of users that the various banners could lead them to useful places. The experiment was hugely successful, with over 40% of the target audience clicking on the ad (by comparison, the typical rate today is less than 1%).

Of course, the marketing community quickly took notice of the success, and this new way of reaching people quickly seeped into the public consciousness. The first step in the use of the Internet for advertising was to target advertisers to the sites where the demographic groups they wanted to reach were most likely to visit. In the early days, however, a visit to the Internet looked radically different from what it does today. Each visit was a completely anonymous, unique event, unconnected by any kind of link.

The situation is of course drastically different today. Data collection from users is taking place on a massive scale, using a variety of methods, across all online platforms. One well-known method is the use of cookies. These were historically the first tools to help collect data on users, even if that was not their original purpose. They were in fact text files, originally intended to store only the settings we had previously used on websites, or additional data needed to improve the user experience. An iconic example of how their role has changed over the years is the use of third-party cookies.

Initially, only the web pages that the user specifically visited started collecting data on the user’s computer.  The situation was completely different for third-party cookies. In most cases, these were uploaded to users’ machines without any notification or approval (mostly through advertisements), which they had never visited before. The point here is to exchange data about users.

In a typical case, when a user visits a website, the advertisements displayed on the website are from another party, which becomes the advertisement provider. The cookie of the ad provider is then installed along with the ads. In any case, when the user visits a second web page with the same ad provider, for example, the ad provider will also have access to the data accumulated during the visit to the previous web page. In this way, complete user profiles are compiled, for example, with a list of websites visited.

This situation became interesting when the world’s advertising industry began to be concentrated in the hands of a few large companies. With the rise of search engines, smaller websites became more and more interested in installing cookies from the big companies. As a result, tech giants like Google, whose main source of revenue is the sale of advertising, began to be present in the background on a huge number of websites at once.

From a broader perspective, there was a clear evolutionary trajectory for cookies. The sites’ own data collection was followed by a centralized version (third-party cookies), which in most cases took place without any notification to the user. This kind of unrestricted user tracking eventually led to strong action by the Federal Trade Commission and the European Union. In 2011, the Cookie Law, also known as Directive 2009/136/EC, came into force. However, it did not abolish this practice either but only stated that third-party cookies are prohibited on the user’s device without the user’s consent. Companies could no longer collect data without the users’ information and consent.

The EU’s General Data Protection Regulation (GDPR), which was adopted in 2016 and entered into force in 2018, has also had a significant impact on data collection techniques without the user’s knowledge. In connection with GDPR, it is worth mentioning two important groups of regulations related to data use. The opt-out frameworks operate broadly on the principle of “everything is free that is not directly prohibited”. Until recent years, this was pretty much the case for online data collection everywhere, i.e. if the user did not explicitly object, data could be stored and collected about them. The biggest problem with this is that in many cases the user is not even aware of what they are supposed to be speaking out against. The simple reason for this is that most data collection is carried out via complicated technologies, opaque to the everyday user. This is much less obvious than, for example, if someone were to place prominent cameras in our homes. In the case of opt-out, the logic is reversed; here, the user must give explicit consent before a site can use their data. The big change with GDPR was that the legislation now regulates such consent under an opt-in framework.


István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the HUN-REN Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.

Print Friendly, PDF & Email