Eval Quadratic Python Code Tutorial

CATArena: Engineering-Level Tournament Evaluation Platform for LLM-Driven Code Agents

CATArena (Code Agent Tournament Arena) is an open-ended environment where LLMs write executable code agents to battle each other and then learn from each other. CATArena is an engineering-level ...

IEEE

Model-Agnostic Empirical Evaluation of Test-Driven Prompt Engineering on Improving Accuracy and Efficiency in Large Language Models Python Code Generation

Abstract: Although Large Language Models (LLMs) are widely adopted for code generation, the generated code can be semantically incorrect, requiring iterations of evaluation and refinement. Test-driven ...

IEEE

Bayesian Neural Networks via MCMC: A Python-Based Tutorial

Abstract: Bayesian inference provides a methodology for parameter estimation and uncertainty quantification in machine learning and deep learning methods. Variational inference and Markov Chain ...

GitHub

ojbench/oj-eval-claude-code-017-20260128052249

This assignment requires implementing a train ticket booking system similar to 12306. The system must store user data, ticket data, and train data locally and perform efficient operations on them.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results