- The Loop
- Posts
- 🔆 The Art of Fair Exchange: Exploitative AI and the Data Commons
🔆 The Art of Fair Exchange: Exploitative AI and the Data Commons
Why data collection has an impact on our trust in AI, progress in cancer diagnosis, UNGA consensus and more
Was this email forwarded to you? Sign up here
🗞️ Issue 11 // ⏱️ Read Time: 9 min
Hello 👋
In this week's newsletter
What are we talking about? How to build data contribution on consent, credit, and compensation.
Why is it relevant? Generative AI use of content is sparking backlash from creators, but alternative models are emerging.
How is it impacting society? Exploitative ways of building AI hinder innovation and the potential adoption of technology that could help us solve some of the world’s most pressing problems.
Big tech news of the week 👀
🇦🇪 Abu Dhabi in talks to invest in OpenAI chip venture. According to a report by the Financial Times, the United Arab Emirates (UAE) is looking at investing in OpenAI’s ambitious plans to develop its own semiconductor chips, which are used for powering advanced AI models. Sources say that MGX, a state-backed group in Abu Dhabi, is in discussions to support OpenAI’s venture.
🦄 AI Inflection Unicorn moves on. Mustafa Suleyman announced that he’s joining Microsoft as the CEO of a new team that handles the company’s consumer-facing AI products, including Copilot, Bing, and Edge.
☝️ Open Interpreter introduces the first open-source language model computer on the market - 01 Lite. It allows hands-free computing and represents a new wave in the Large Action Model (LAM) trend. The 01 project is an open-source ecosystem for artificially intelligent devices. Combining code-interpreting language models with speech recognition and voice synthesis, the 01’s flagship operating system can power conversational, computer-operating AI devices similar to the Rabbit R1 or the Humane Pin.
💯 Consensus in the United Nations General Assembly? You are not hallucinating. The UNGA has unanimously adopted the first global resolution on artificial intelligence to encourage the protection of personal data, the monitoring of AI for risks, and the safeguarding of human rights.
Data: A resource built by humans
This week we are looking at a growing controversy. As you probably know by now, the value of Generative AI models is, to a large extent, dependent on the training data. This training data is built by humans, in one way or another. Most of the content on the internet, including the images you’ve uploaded on your social media, news articles, chat forums, and artwork, is gathered by crawlers that automate the process of data collection. This data is largely collected without permission, credit, or compensation to the contributors and makers of this crucial resource.
This is causing tension. Lately, it seems like AI litigation is becoming hotter than generative AI itself, with numerous cases highlighting that credit should be recognised when credit is due: There is a growing concern over the extraction of this data without clear consent, acknowledgment, or financial reparation to those who own it or are featured within it. Rich Skrenta who is the head of CommonCrawl, a tool scraping the internet for data, expresses the view of many AI companies: “You posted your information on the internet, intentionally so that people could come and see it. And robots are people too.” He continues: “By publishing something on the internet without explicitly telling robots to avoid it, you’re consenting to its use by AI.”
The Consequences
These exploitative ways of collecting training data not only raise questions about privacy and consent but also the fairness of profiting from communal or individual contributions without fair compensation. Some of the consequences of these methods are:
Economic Inequity: AI companies are reaping the benefits, while creators and contributors of data are getting left out of the economic value chain.
Trust Erosion: The unchecked use of data leads to a loss of trust in AI technologies, and the companies developing them. Subsequently, this hinders innovation and the adoption of a technology that could potentially help us solve some of the most pressing problems we face, including climate change, biodiversity loss, and poverty.
What choices can leaders make to contribute to a fair use of data?
Cases from reality
👩🏽🎓 A student at the Stockholm School of Economics is developing an AI product aimed at assisting teachers in their work. The objective is to commercialise the product for school use. Utilising the principle of public access to records—a cornerstone of Swedish democracy—the student has obtained historical examination papers from schools to train the AI model. However, the crux of the matter lies in the intended purpose of this democratic principle, which is to ensure transparency and accountability within the government, enabling public scrutiny of government officials' and agencies' actions. In this case, there are concerns regarding the potential misappropriation of resources (data) for the company’s self-interest.
⚖️ The New York Times, a leading American daily newspaper, is suing OpenAI and Microsoft for allegedly using its copyrighted content without permission. The lawsuit, filed at the end of last year, contends that millions of articles published by the Times were utilized to train ChatGPT and other AI products developed by the defendants. Although the lawsuit does not specify an exact amount in monetary demands, it asserts that OpenAI and Microsoft should be liable for "billions of dollars in statutory and actual damages" due to the "unlawful copying and usage of The Times's uniquely valuable works."