This package contains modules that can be used for preparing leaderboard submissions.

We provide leaderboards to track the performance of user-submitted algorithms on compiler optimization tasks. The goal of the leaderboards is to provide a venue for researchers to promote their work, and to provide a common framework for evaluating and comparing different approaches. We accept submissions to the leaderboards through pull requests, see here for instructions.


LLVM Instruction Count

LLVM is a popular open source compiler used widely in industry and research. The llvm-ic-v0 environment exposes LLVM’s optimizing passes as a set of actions that can be applied to a particular program. The goal of the agent is to select the sequence of optimizations that lead to the greatest reduction in instruction count in the program being compiled. Reward is the reduction in instruction count achieved scaled to the reduction achieved by LLVM’s builtin -Oz pipeline.





Observation Space


Reward Space

Instruction count reduction relative to -Oz.

Test Dataset

The 23 cBench benchmarks.

Users who wish to create a submission for this leaderboard may use eval_llvm_instcount_policy() to automatically evaluate their agent on the test set.

compiler_gym.leaderboard.llvm_instcount.eval_llvm_instcount_policy(policy: Callable[[LlvmEnv], None]) None[source]

Evaluate an LLVM codesize policy and generate results for a leaderboard submission.

To use it, you define your policy as a function that takes an LlvmEnv instance as input and modifies it in place. For example, for a trivial random policy:

>>> from compiler_gym.envs import LlvmEnv
>>> def my_policy(env: LlvmEnv) -> None:
....   # Defines a policy that takes 10 random steps.
...    for _ in range(10):
...        _, _, done, _ = env.step(env.action_space.sample())
...        if done: break

If your policy is stateful, you can use a class and override the __call__() method:

>>> class MyPolicy:
...     def __init__(self):
...         self.my_stateful_vars = {}  # or similar
...     def __call__(self, env: LlvmEnv) -> None:
...         pass # ... do fun stuff!
>>> my_policy = MyPolicy()

The role of your policy is to perform a sequence of actions on the supplied environment so as to maximize cumulative reward. By default, no observation space is set on the environment, so env.step() will return None for the observation. You may set a new observation space:

>>> env.observation_space = "InstCount"  # Set a new space for env.step()
>>> env.observation["InstCount"]  # Calculate a one-off observation.

However, the policy may not change the reward space of the environment, or the benchmark.

Once you have defined your policy, call the eval_llvm_instcount_policy() helper function, passing it your policy as its only argument:

>>> eval_llvm_instcount_policy(my_policy)

The eval_llvm_instcount_policy() function calls the policy function for each benchmark in the dataset, one at a time, from a single thread. Stateful policies can assume thread safe access to member variables.

Put together as a complete example, a leaderboard submission script may look like:

from compiler_gym.leaderboard.llvm_instcount import eval_llvm_instcount_policy
from compiler_gym.envs import LlvmEnv

def my_policy(env: LlvmEnv) -> None:
    env.observation_space = "InstCount"  # we're going to use instcount space
    pass # ... do fun stuff!

if __name__ == "__main__":

The eval_llvm_instcount_policy() helper defines a number of commandline flags that can be overriden to control the behavior of the evaluation. For example the flag --n determines the number of times the policy is run on each benchmark (default is 10), and --leaderboard_results determines the path of the generated results file:

$ python --n=5 --leaderboard_results=my_policy_results.csv

You can use --helpfull flag to list all of the flags that are defined:

$ python --helpfull

Once you are happy with your approach, see the contributing guide for instructions on preparing a submission to the leaderboard.