Lumiera
Posts
🔆 Everybody is Invited: Open Source Software

🔆 Everybody is Invited: Open Source Software

Also: AI interference in US elections, Van Gogh ice cream art and a summary of the most common LLM's

Lumi Era
April 11, 2024

🗞️ Issue 13 // ⏱️ Read Time: 9 min

Hello 👋

Open source projects play an important role in democratising AI technologies. This collaborative approach helps tackle complex problems and promotes interdisciplinary solutions. However, the challenge for open source projects to keep up with increasing computation requirements has led to concerns about a future where only a few leading commercial AI corporations can afford to develop the next state-of-the-art (SOTA) model.

Would you like to learn more about Open Source? Keep reading. 👀

In this week's newsletter

What are we talking about? The open source concept, the benefits of its collaborative nature, and how it applies to AI.

Why is it relevant? Open source makes AI tools and frameworks available. It empowers individuals, startups, and large organisations to leverage AI capabilities, fostering innovation and levelling the playing field.

How is it impacting society? Open source AI accelerates research and development by enabling collaboration among experts from diverse backgrounds and democratising access to advanced technologies.

Meanwhile in Big Tech 👀

💰️ Google is considering charging for AI-powered search. This would represent the first time that Google has made people pay for enhancements to its core search product.

🚀 Databricks has announced the launch of DBRX, a powerful new open source large language model that it claims sets a new standard by outperforming all established open source models on standard benchmarks.

📈 The Alan Turing Institute released a report that explores the adoption of LLMs across the finance ecosystem, exploring how large language models could revolutionise the finance sector within two years and emerging opportunities for safe, trustworthy adoption.

👾 Microsoft released a report that China is using AI to divide and influence the outcome of the U.S. presidential election in its favour. The Taiwanese presidential election in January 2024 was the first time that Microsoft Threat Intelligence has witnessed a nation-state actor using AI content in attempts to influence a foreign election.

Closed vs Open Source Software

The French company Mistral AI released its latest model, Mistral 7B v0.2, at the San Francisco Hackathon in March. This groundbreaking open source language model sets new benchmarks for performance and efficiency.

Closed-source software (proprietary software) and open source software are two different approaches to software development and distribution. They differ in terms of availability of the source code (the human-readable instructions written in a programming language that tells a computer what to do), customisation options, community involvement, and licensing. To illustrate the contrasts between closed and open source software, let’s look at two popular web browsers:

Google Chrome is an example of closed source software. The source code of Chrome is not publicly available, and users cannot view or modify it. Google Chrome is distributed as a finished product, and users have to obtain a license to use it.
Mozilla Firefox, on the other hand, is an example of open source software. The source code of Firefox is publicly available, and users have the freedom to view, modify, and distribute it. Mozilla encourages community involvement and provides resources for developers to contribute to the Firefox source code.

Open source software (OSS) impacts the growth of AI and machine learning by providing researchers and developers access to a vast array of code and algorithms developed by the community. Open source AI is a subset of OSS, where AI technologies are freely available for use, modification, and distribution. The open source movement fosters innovation, promotes transparency, and avoids vendor lock-in, a situation where the cost of switching to a different vendor is so high that the customer is essentially stuck with the original vendor.

DALL-E prompt: An image in the style of Van Gogh featuring Open Source models in a wild ice cream world.

Let’s have a look at three of the most common Large Language Models (LLMs) on the market and their approaches to Open Source:

Contrary to its name, OpenAI, the company behind ChatGPT, was never open source and it has gradually shifted towards a more corporate and competitive research culture. Without access to the underlying code and data, it becomes challenging to ensure fairness, transparency, and accountability in the decision-making process of their models.
While Meta has made the starting code and model weights for their Llama 2 models freely available for research and commercial use, certain restrictions in its licensing agreement have led it to be categorised as “open approach.”
Mistral’s 7B models, however, are fully open source. The latest V0.2 model outperforms Llama models on almost all benchmarks and can be used without restrictions.

When deciding between open or closed-source language models, it’s important to consider factors beyond performance, including the ability to fine-tune using domain-specific data for closed-source models and the cost of deploying the model for open source models.

Benefits:

Affordability: Open source generative AI models provide an affordable alternative to building an AI system from the ground up, which requires a significant investment of resources. This makes it easier for smaller organisations to explore and experiment with AI technology. The 2023 State of Open Source Report highlighted that 80% of the respondents said that their organisation had increased its use of OSS in the last 12 months. The main reason for the increase was reported as no license cost or overall cost reduction. For context, GPT-4 was rumoured to have cost US$100 million to develop, and the estimated cost for the next-generation SOTA model is approaching US$1 billion.
Enhanced data security and privacy: By using open source Large Language Models (LLMs), organisations keep full control of personal data, making them solely responsible for its protection. This addresses the concern of data leaks or unauthorised access to sensitive data by the LLM provider.
Collaboration: In 2022 there was a reported record-breaking 413 million contributions to open source projects on GitHub, a developer platform that allows developers to create, store, manage, and share their code. This indicates the demand for Open Source collaboration.

Challenges:

Lack of Support: Open source AI models often lack the dedicated support and regular updates provided by proprietary models, which can be challenging for businesses that rely on them.
Fragmented licenses: The open source community offers a wide range of licenses, each with its own terms and conditions. It can be difficult for developers and organisations to navigate and understand the implications of various licenses, especially when combining or integrating multiple open source components.
Lack of compute resources: AI models, especially deep learning models, require significant computational power (the ability of a computer or other electronic device to perform mathematical calculations and process large amounts of data) to train effectively. However, not all developers and organisations have access to the necessary compute resources to train these models. This limitation impacts the capabilities of open source models and is a reason they often fall short of SOTA models.

Open Source AI Tools to Know

Team Lumiera has picked out some of the most relevant Open Source AI Tools for you. TensorFlow and PyTorch, developed by Google and Facebook, respectively, fuelled ChatGPT, and Hugging Face has been a key agent in the democratisation of AI.

TensorFlow is a versatile framework with extensive community support, enabling programmers to build and deploy machine learning models across different platforms, making it accessible for beginners and experts alike.
PyTorch offers an intuitive interface and strong integration with Python libraries, making it a popular choice among researchers and developers for rapid prototyping and deep learning research.
Hugging Face is a collaborative platform that empowers anyone, from novice to expert, to create, train, and deploy NLP and ML models using open-source code.

Are you interested in learning more about the Basics of GenAI or diving deeper into Ethics & AI? Sign up here for our workshops.

Until next time.
Emma, Sarah, and Allegra

Lumiera has gathered the brightest people from the technology and policy sectors to give you top-quality advice to navigate the new AI Era.

Follow the carefully curated Lumiera podcast playlist to stay informed and challenged on all things AI.