- Productionizing the latest LLM-as-a-judge research
- Improving on your existing judge
- Building annotation UIs
- Designing wireframes for collaborative annotation between humans and AI
- Runs LLMs in production or are planning to soon
- Has LLM Judges and found them to be unreliable
- Wants to learn more about using LLMs as a judge
- Are a LLM Judge skeptic
Greg Kamradt, Founder, Data Independent
Eugene Yan, Senior Applied Scientist, Amazon
Charles Frye, AI Engineer, Modal Labs
Shreya Shankar, ML Engineer, PhD at UC Berkeley
Shawn Lewis, CTO and Co-founder, W&B
Anish Shah, Growth ML Engineer, W&B
Tim Sweeney, Staff Software Engineer, W&B
- New projects only
- Maximum team size: 4
- Make friends
- Prize eligibility:
- Project is open sourced on GitHub
- Use W&B Weave where applicable
Evaluating LLM outputs accurately is critical to being able to iterate quickly on a LLM system. Human annotations can be slow and expensive and using LLMs instead promises to solve this. However aligning a LLM Judge with human judgements is often hard with many implementation details to consider.
During the hackathon, let’s try build LLM Judges together and move the field forward a little by:
This hackathon is for you if you are an AI Engineer who:
LLM API credits will be provided to those who need them.
$5,000 cash equivalent prizes will be awarded for top 3 overall projects with a bonus category for most on-theme projects.
Judges
Rules:
Timing:
Saturday, Sept 21: 10am-10pm
Sunday, Sept 22: 9:30am-5pm
Please register here.