Navigating Privacy in 2023: Exploring Data Colonialism, AI Advancements, and Legal Landmarks in Getty Images vs. Stability AI and the New York Times vs. Microsoft and OpenAI

Adele Spitz
Mar 5, 2024
8 min read

In June 2013, Edward Snowden travelled to Hong Kong. There, Snowden met discreetly with journalists and handed over documents exposing a government program called Tempora. Established in 2011, this program collected data from individuals inside and outside the United States. Snowden's disclosure of these documents brought forward the extensive surveillance activities conducted by the National Security Agency (NSA) and other U.S. government agencies.

Edward Snowden's attempt to enlighten the American people to the extent of U.S. government surveillance proved successful. However, his efforts have yet to result in the cessation of current data collection and surveillance practices within the United States.

Data collection has evolved into what some perceive as a manifestation of ‘Data Colonialism.’ The seemingly innocuous act of agreeing to terms of service on the internet has transformed into a means for corporations to exploit human experiences as a resource for behavioural data. Accepted terms of service agreements that appear when logging onto a new site or social media platform have transformed into a complex set of data relations that unfold in ways the average person does not comprehend. Despite the lengthy and vague statements designed to be challenging to understand and infrequently read, people continue to consent to them, regardless of their future implications.

Our contemporary way of life is intricately tied to online activities, with essential platforms like GPS, social media, ride-share apps, and health apps all necessitating data sharing. The internet has become so deeply ingrained within our society that rejecting these terms or forgoing internet usage seems inconceivable. Large corporations capitalise on this widespread acceptance, which enables them to harvest personal data as a resource for profit. The complexities of this new form of connection draw certain parallels to historical forms of colonialism and highlight the nuanced terrain of data collection and surveillance.

Historically, colonialism has been characterised by several distinct features, including resource appropriation, the establishment of unequal social power relations, the concentration of wealth through resource exploitation, and the propagation of an opaque ideology that obscures the outcomes of such interactions. The term ‘Data Colonialism’ has come under scrutiny, as critics argue that historical colonialism involved violence and forced seizure of land, resources, and populations, rendering the analogy to ‘Data Colonialism’ inaccurate. However, as the concept of colonialism evolves, explicit parallels with history become apparent. Four key features demonstrate the similarities between historical colonialism and ‘Data Colonialism’:

Companies appropriate resources by taking human experience and action and turning it into data for profit.
New social relations are formed to establish the secure usage of personal data. In online usage, all social relations increasingly take the form of data relations.
Companies extract wealth and benefit financially from these interactions by assigning economic value to data collection.
This interaction is disguised by a new ideology that posits data collection, tracking, and surveillance as inevitable, presenting one’s online footprint as the necessary price of being connected in modern society.

These four features link ideas of historical colonialism to the evolving landscape of technology today and present the case for the concept of ‘Data Colonialism.’

On the surface, this interaction is not inherently detrimental. While government tracking and surveillance may seem unsettling, it is universally accepted. This surveillance becomes a consensual aspect of using the internet, being a citizen or simply participating in society. We actively embrace it as a form of security. Street surveillance, for instance, serves as a protective measure, ensuring the safety of individuals. From this perspective, the awareness of being watched is a means to reassure ourselves of our security in daily life. Similarly, online tracking safeguards the average person from cyber attacks, terrorism, and other forms of crime. Rather than instilling fear, the idea of being surveilled should primarily evoke a sense of safety.

Yet, this is rarely the case. Surveillance is often perceived as intrusive and unsettling, further fueled by media depictions such as George Orwell’s novel 1984. The surveillance presented in 1984 highlights the expansive capabilities of technology and its use as a means of persecution, demonstrating technology not as a tool for innovation but for oppression. This published work examines a dystopian world, exploring an imaginative future in which the government and technology are interwoven. However, comparing modern-day data collection and surveillance and the setting depicted in Orwell’s novel is challenging. The issues Orwell addresses do not reflect our contemporary reality, and any attempt to establish a direct comparison would be inaccurate.

In a more nuanced argument, a deeper analysis of the reasons behind people’s concerns about data collection and surveillance and its potential outcomes is necessary. As new technology advances, understanding how the data collection process affects the average person and the extent to which it can lead to exploitation is becoming significantly more important. Unveiling these intricacies is crucial to understanding the implications of personal data harvesting.

First, we can consider the consequences of data collection for most individuals: personalised online advertisements. Personalised advertisements seem like a harmless way to tailor internet usage and craft an online profile that resonates with its users. However, it is crucial to recognise the potential for exploitation that these ads attract. The utilisation of personalised advertisements has enabled corporations to employ data-driven assessments to target vulnerable members of society. Corporations, including insurers, service providers, credit raters and government agencies, can leverage personal data collection to disadvantage specific individuals. This can include predatory lending practices, such as offering loans with high-interest rates or unfavourable terms, and insurers charging high premiums or offering policies with inadequate coverage. These practices are just a few examples of the negative consequences of personal data collection. People who lack the resources to resist such practices are specifically targeted, often due to their limited financial abilities to make legal claims, their lack of understanding about the systems that enable these interactions, and their unawareness of being manipulated.

Today, personal data collection and surveillance may have more complex impacts beyond exploitation. This rising issue can be better explained by a concept known as ‘Attention Economics.’ First coined by Herbert Simon in the late 1960s, Attention Economics is the idea that human attention has transformed into a resource that can be commoditized. As technology’s rising inseparability to our lives continues, access and consumption of information are increasingly prevalent. As a result, human attention is diminishing. This scarcity of human attention has prompted the growing ‘price of attention,’ or rather, the cost of your attention if allocated to something else. For example, when you use social media apps like Instagram, Facebook, and TikTok, your attention becomes a valuable commodity that companies can purchase. Once purchased, these companies analyse your interests and behaviour, and based on that, they target you with relevant headlines and advertisements, ultimately profiting from your attention.

Currently, the U.S. is experiencing an elevated risk concerning the dangers of 'Attention Economics' as political tensions continue to escalate. Corporations, campaign offices, and organisations can sell targeted ads to users more susceptible to fake news, propaganda, or a pack-like mentality. This practice can influence human decisions and opinions, leading to a polarised climate of distrust and increasingly radical action. This practice also contributes to a growing scepticism of online news and discourse generated by the proliferation of fake news platforms and the intensification of unreliable sources. As distrust in online information grows, public trust in reliable and essential information is at risk.

These interactions symbolise a new social order that normalises surveillance and echoes a historical relationship previously valuing appropriations of human labour and resources yet now prioritising the appropriation of human life by converting it into data. These interactions parallel traditional colonisation, as powerful corporations exploit resources in the form of data for their benefit, further enhancing their capital gains and control over internet usage. As data colonialism takes advantage of human experience and online usage, it demonstrates its harmful effects.

The infringement of personal data usage and collection by corporations and governments onto individuals extends to conflicts between large corporations. These disputes provide the opportunity to robustly initiate the introduction of legislation that has the potential to impose clear limitations on data collection and usage to safeguard the rights of individuals. This is best evidenced by examining the legal battle between Getty Images and Stability AI. On 3 February 2023, Getty Images filed a lawsuit accusing Stability AI of "brazen infringement" of intellectual property, alleging that Stability AI copied twelve million photographs, data, and captions without permission. Stability AI operates a program called Stable Diffusion, which generates photo-realistic images from text inputs. This A.I. tool has become controversial for enhancing technology that allows for the creation of fake images that contribute to the proliferation of fake news originating from A.I.-generated photographs. Getty Images claimed that Stability AI used its intellectual property without permission to train its deep learning model to create competing content.

Getty Images filed a suit under the U.S. Copyright Law, a set of laws formed from the basic framework of The Copyright Act of 1976, the primary statute governing copyright in the United States. The Copyright Law is a legal right protecting an intellectual property owner. While primarily protecting intellectual property, with the onset of A.I. and expanding demand for human data, there is potential for the expansion of this law to safeguard personal data collection. Terms of service agreements often include clauses that grant the companies the ability to use or reproduce user-generated content, and most users will unknowingly grant the company rights over their intellectual property. Additionally, when companies extract personal data, they may extract copyrighted material without permission, leading to United States Copyright Law violations. Further legislation in this sector includes laws protecting copyrighted information in training machine learning algorithms, as displayed in Getty Images v. Stability AI. Privacy and Data Protection Laws have also been implemented to govern data protection and consent for tracking, complementing similar copyright laws.

The legislation protecting intellectual property and enabling copyright claims is becoming increasingly crucial as A.I. evolves. The surge in A.I.’s prominence has created a broader stage, elevating the significance of human data as a resource to be extracted. A.I. and technology companies persist in accumulating vast amounts of private data sourced from individuals and corporations. This reservoir of data allows them to generate extensive datasets, which is essential in training A.I. learning models. This advancing landscape assigns new value to human data, capturing A.I. companies’ interest in data that can serve as the instructional foundation for these learning models, providing insights into human thought processes and social interactions.

This new realm of technology that utilises human data for development intensifies the interest in personal data collection, thereby posing additional challenges for legislation aimed at safeguarding the privacy of individuals and corporations. Notably, A.I. has increasingly faced scrutiny regarding the input and output of these learning models concerning copyright and authorship. As in the case of Getty Images, Stability AI’s use of photos from Getty Images to train its learning model has prompted complaints of copyright violations for both the input and output of the A.I. model.

The ongoing legal battle between Getty Images and Stability A.I. is a pivotal issue in shaping the trajectory of A.I.’s future and exploring its boundaries. This legal precedent foreshadowed a subsequent A.I.-relevant legal battle. On 27 December 2023, the New York Times initiated legal action against OpenAI and Microsoft for copyright infringement. The claim asserted that OpenAI employed copyrighted material, including published works from the Times, to train its deep learning model. This case draws parallels with Getty Images v. Stability AI, as The New York Times contends that not only did OpenAI utilise copyrighted content for input and training, but the AI models produced by OpenAI employed the New York Times’s intellectual property, positioning themselves as a competing platform for reliable information. A glimpse into the future may explore conversations and potential agreements between news outlets and A.I. companies that resolve copyright disputes while doctoring data licensing agreements to clarify the limits of A.I. companies and the information they are allowed to use. Moreover, potential agreements between companies may become the difference between success and failure in A.I. These agreements will also begin to shape the future of the U.S. Copyright Law and what it means for data usage in technology.

In the ever-evolving technological landscape, novel challenges in data collection, privacy concerns, and intellectual property protection continually emerge. The complexities that layer recent advancements in online platforms, surveillance practices and data collection will continue to expand, and the practicality of shielding individuals from the extensive capabilities of technology persists as an ongoing obstacle. A.I.’s contemporary value of personal data as a lucrative commodity highlights the concerns surrounding data colonialism and privacy issues. These legal battles serve as poignant reminders of the barriers posed by developing A.I. technology, offering insights into the legal contours that may form as A.I. permeates diverse industries.