Back to blog

Introducing StarCoder – The Revolutionary Open-Source Code LLM


Abhinav Girdhar
By Abhinav Girdhar | Last Updated on June 8th, 2023 10:55 am
StarCoder - The Revolutionary Open-Source Code LLM | Appy Pie

Appy Pie is excited to explore and review StarCoder, a groundbreaking open-source Code Language Model (LLM) developed as part of the BigCode initiative led by Hugging Face and ServiceNow. Our goal is to delve into the capabilities of this impressive LLM and provide our insights to the developer community.

StarCoder and StarCoderBase, two cutting-edge Code LLMs, have been meticulously trained using GitHub’s openly licensed data. This comprehensive dataset includes 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks. To achieve remarkable performance, the models were trained with 15 billion parameters, utilizing 1 billion tokens. StarCoder is an enhanced version of the StarCoderBase model, specifically trained on an astounding 35 billion Python tokens. Extensive benchmark testing has demonstrated that StarCoderBase outperforms other open Code LLMs and rivals closed models like OpenAI’s code-Cushman-001, which powered early versions of GitHub Copilot. With a context length exceeding 8,000 tokens, the StarCoder models have the capability to process more input than any other open LLM, presenting exciting new possibilities.

StarCoder and comparable devices have undergone rigorous testing across a range of benchmarks. Notably, StarCoder and StarCoderBase have proven to be more effective than larger models like PaLM, LaMDA, and LLaMA on popular programming benchmarks, including the widely used HumanEval for Python, which evaluates a model’s ability to complete a function based solely on its signature and docstring.

While exploring StarCoder, we found several key features worth highlighting:

  1. Major Open-Source Code-LLM: StarCoder represents a significant milestone as a major open-source Code LLM, embracing collaboration and innovation.
  2. Utilizing GitHub Data: The models are trained on GitHub’s openly licensed data, incorporating a wide range of programming languages and real-world scenarios.
  3. Exceptional Performance on Benchmarks: StarCoder consistently achieves top performance on major open-source programming benchmarks, highlighting its prowess and reliability.
  4. Technical Assistant in 80+ Programming Languages: StarCoder serves as a valuable technical assistant, capable of generating realistic code and supporting over 80 programming languages.
  5. Extensive Training: The models were trained on an impressive 1 trillion tokens with a context window of 8192 tokens, providing unmatched expertise and understanding.
  6. Ethical Usage: StarCoder utilizes only legally authorized information, ensuring compliance and ethical usage.

If you’re eager to experience StarCoder’s capabilities firsthand, we invite you to access its various tools and demos available on Hugging Face’s website. These resources include a list of plugins that seamlessly integrate with popular coding environments like VS Code and Jupyter, enabling efficient auto-complete tasks.

Additionally, you can explore the bigcode/bigcode-playground space to play with the base model’s code completion feature. Engage in fine-tuned chat conversations with the starchat-alpha model in the HuggingFaceH4/starchat-playground space. Furthermore, the bigcode/bigcode-editor space offers a simple code editor to experiment with.

For those interested in deploying and running the starchat-alpha model locally, we have prepared a Google Colab notebook.

[Access the StarCoder Google Colab Notebook by Appy Pie AI Team]

Please note that running the model may require substantial resources, such as a minimum A100 GPU with 19GB of RAM. Feel free to make a copy of the Colab notebook and explore the capabilities of the starchat-alpha model in your own environment.

While StarCoder presents exciting possibilities, it’s important to acknowledge its limitations. Like other LLMs, StarCoder has the potential to produce erroneous, rude, deceptive, ageist, sexist, or stereotypically reinforcing information. Addressing these concerns and continuously improving the model’s performance and ethical boundaries is of utmost importance.

Researchers are actively analyzing StarCoder’s coding abilities and natural language understanding, comparing them to English-only benchmarks. Expanding research into the efficacy and limitations of Code LLMs across different natural languages will broaden the applicability of these models.

At Appy Pie, we are committed to providing developers with valuable insights and resources to leverage the power of StarCoder and other cutting-edge technologies. Explore these tools and demos to witness the potential of StarCoder firsthand and enhance your coding experience. We aim to contribute to the research and developer community by exploring and reviewing StarCoder, ensuring improved access, repeatability, and transparency in the world of Code LLMs.

Want to learn more about the fascinating world of large language models? Explore our other posts on the topics – Mastering LLM Training with Appy Pie, Dolly by Databricks, StableLM Alpha 7b by Stability AI, and StableLM Alpha 7b vs Dolly.

Abhinav Girdhar

Founder and CEO of Appy Pie

App Builder

Most Popular Posts