Show HN: Beating GPT-4 on HumanEval with a fine-tuned CodeLlama-34B Hi HN, We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67%. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset. The CodeLlama models released yesterday demonstrate impressive performance on HumanEval. - CodeLlama-34B achieved 48.8% pass@1 on HumanEval - CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens. Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples. The methodology is: - For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters. - A match was identified if any sampled substring was a substring of the processed training example. For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models: - Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval - Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval Note on GPT-4 According to the official technical report in March, OpenAI reported a pass@1 score of 67% for GPT-4's performance on HumanEval. Since then, there have been claims reporting higher scores. However, it's essential to note that there hasn't been any concrete evidence pointing towards an enhancement in the model's coding abilities since then. It's also crucial to highlight that these elevated figures lack the rigorous contamination analysis that the official statistic underwent, making them less of a reliable comparison. As a result, we consider 67% as the pass@1 score for GPT-4. Download We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results. Phind-CodeLlama-34B-v1: https://ift.tt/y0kzMEK Phind-CodeLlama-34B-Python-v1: https://ift.tt/c1p6m9i We'd love to hear your thoughts! Best, The Phind Team https://ift.tt/5juhn9v August 26, 2023 at 03:38AM
Show HN: Beating GPT-4 on HumanEval with a fine-tuned CodeLlama-34B https://ift.tt/sSlhb7u
Related Articles
Show HN: Describe SQL using natural language, and execute against real data https://ift.tt/3p3pDTaShow HN: Describe SQL using natural language, and execute against real… Read More
Show HN: A Chrome extension to generate markup link reference for current page https://ift.tt/3H2xfvHShow HN: A Chrome extension to generate markup link reference for curr… Read More
Show HN: Python decorator that enables arbitrarily-deep tail/non-tail recursion https://ift.tt/3pdjvIfShow HN: Python decorator that enables arbitrarily-deep tail/non-tail … Read More
Show HN: Release 0.8 of sbctl, Secure Boot key manager https://ift.tt/3qaQ7l8Show HN: Release 0.8 of sbctl, Secure Boot key manager https://ift.tt/… Read More
Show HN: Emergency Wallet Cards https://ift.tt/3DVGs6SShow HN: Emergency Wallet Cards https://ift.tt/3pWh4c8 December 14, 20… Read More
Show HN: Turn a Raspberry Pi into a Bluetooth MIDI Device https://ift.tt/3H1yknrShow HN: Turn a Raspberry Pi into a Bluetooth MIDI Device https://ift.… Read More
Show HN: I built my own second brain software tool https://ift.tt/3sqSIKIShow HN: I built my own second brain software tool https://ift.tt/3Fi8… Read More
Show HN: I Made a book for entrepreneur that want to create better landing page https://ift.tt/3rW1ZKoShow HN: I Made a book for entrepreneur that want to create better lan… Read More
0 Comments: