Inside CodeGen, Our In-House Open-Source LLM

— by

CodeGen, part of Salesforce’s own family of large language models (LLMs), is an open-source LLM for code understanding and code generation …

Here are highlights from article Inside CodeGen, Our In-House Open-Source LLM

1. CodeGen:

– Salesforce has created an in-house large language model (LLM) called CodeGen for code understanding and generation.
– CodeGen has been trained on a variety of programming languages, including Apex.
– Three major versions of CodeGen have been released: CodeGen 1.0, CodeGen 2.0, and CodeGen 2.5.
– CodeGen 2.5 is optimized for production use cases, delivering high-quality responses with low latency.
– CodeGen 2.5 has been trained on the StarCoderData dataset and is widely used, with over 600,000 downloads per month.

2. Einstein for Developers:

– The LLM that powers Einstein for Developers is based on CodeGen 2.5, but it is fine-tuned for Apex.
– The Einstein for Developers model is proprietary and includes additional capabilities.
– Continuous learning of Apex is performed by running internal testing and gathering feedback from Apex experts.
– Contextual grounding features are included in the extension, such as custom object metadata and active open file contents.
– More advanced contextual grounding capabilities with org metadata are planned for the future.

3. Benefits of CodeGen:

– CodeGen has demonstrated the ability to save approximately 90 minutes per day, per developer.
– The improved quality and low latency of CodeGen 2.5 contribute to a positive user experience and cost-to-serve metrics.
– CodeGen is widely used and has a significant number of monthly downloads.

4. Open-source and Proprietary:

– CodeGen is an open-source LLM, while the Einstein for Developers model is proprietary.
– The proprietary model includes additional capabilities and is continuously improved based on feedback and testing.

5. Future Developments:

– Support for LWC (Lightning Web Components) will be coming soon to the Einstein for Developers model.
– More advanced contextual grounding capabilities with org metadata are planned for the extension.

You can read it here: https://sfdc.blog/DvqFk

Source from developer(dot)salesforce(dot)com

Newsletter

My latest updates in your e-mail.