compiler_gym.envs

The CompilerEnv environment is a drop-in replacement for the basic gym.Env class, with extended functionality for compilers. Some compiler services may further extend the functionality by subclassing from CompilerEnv. The following environment classes are available:

Environment Classes:

CompilerEnv

class compiler_gym.envs.CompilerEnv[source]

An OpenAI gym environment for compiler optimizations.

The easiest way to create a CompilerGym environment is to call gym.make() on one of the registered environments:

>>> env = gym.make("llvm-v0")

See compiler_gym.COMPILER_GYM_ENVS for a list of registered environment names.

Alternatively, an environment can be constructed directly, such as by connecting to a running compiler service at localhost:8080 (see this document for more details):

>>> env = ClientServiceCompilerEnv(
...     service="localhost:8080",
...     observation_space="features",
...     reward_space="runtime",
...     rewards=[env_reward_spaces],
... )

Once constructed, an environment can be used in exactly the same way as a regular gym.Env, e.g.

>>> observation = env.reset()
>>> cumulative_reward = 0
>>> for i in range(100):
>>>     action = env.action_space.sample()
>>>     observation, reward, done, info = env.step(action)
>>>     cumulative_reward += reward
>>>     if done:
>>>         break
>>> print(f"Reward after {i} steps: {cumulative_reward}")
Reward after 100 steps: -0.32123
abstract __init__()[source]

Construct an environment.

Do not construct an environment directly. Use gym.make() on one of the registered environments:

>>> with gym.make("llvm-v0") as env:
...     pass  # Use environment
abstract property action_space: Space

The current action space.

Getter

Get the current action space.

Setter

Set the action space to use. Must be an entry in action_spaces. If None, the default action space is selected.

abstract property action_spaces: List[str]

A list of supported action space names.

abstract apply(state: CompilerEnvState) None[source]

Replay this state on the given environment.

Parameters

state – A CompilerEnvState instance.

Raises

ValueError – If this state cannot be applied.

abstract property benchmark: Benchmark

Get or set the benchmark to use.

Getter

Get Benchmark that is currently in use.

Setter

Set the benchmark to use. Either a Benchmark instance, or the URI of a benchmark as in env.datasets.benchmark_uris().

Note

Setting a new benchmark has no effect until env.reset() is called.

abstract close()[source]

Close the environment.

Once closed, reset() must be called before the environment is used again.

Note

You must make sure to call env.close() on a CompilerGym environment when you are done with it. This is needed to perform manual tidying up of temporary files and processes. See the FAQ for more details.

abstract commandline() str[source]

Interface for CompilerEnv subclasses to provide an equivalent commandline invocation to the current environment state.

See also commandline_to_actions().

Returns

A string commandline invocation.

abstract commandline_to_actions(commandline: str) List[ActionType][source]

Interface for CompilerEnv subclasses to convert from a commandline invocation to a sequence of actions.

See also commandline().

Returns

A list of actions.

abstract property compiler_version: str

The version string of the underlying compiler that this service supports.

abstract property episode_reward: Optional[float]

CompilerEnv.reward_space <compiler_gym.envs.CompilerGym.reward_space> is set, this value is the sum of all rewards for the current episode.

Type

If

Type

func

abstract property episode_walltime: float

Return the amount of time in seconds since the last call to reset().

abstract fork() CompilerEnv[source]

Fork a new environment with exactly the same state.

This creates a duplicate environment instance with the current state. The new environment is entirely independently of the source environment. The user must call close() on the original and new environments.

If not already in an episode, reset() is called.

Example usage:

>>> env = gym.make("llvm-v0")
>>> env.reset()
# ... use env
>>> new_env = env.fork()
>>> new_env.state == env.state
True
>>> new_env.step(1) == env.step(1)
True

Note

The client/service implementation of CompilerGym means that the forked and base environments share a common backend resource. This means that if either of them crash, such as due to a compiler assertion, both environments must be reset.

Returns

A new environment instance.

abstract property in_episode: bool

Whether the service is ready for step() to be called, i.e. reset() has been called and close() has not.

Returns

True if in an episode, else False.

abstract multistep(actions: Iterable[ActionType], observation_spaces: Optional[Iterable[Union[str, ObservationSpaceSpec]]] = None, reward_spaces: Optional[Iterable[Union[str, Reward]]] = None, observations: Optional[Iterable[Union[str, ObservationSpaceSpec]]] = None, rewards: Optional[Iterable[Union[str, Reward]]] = None, timeout: float = 300)[source]

Take a sequence of steps and return the final observation and reward.

Parameters
  • action – A sequence of actions to apply in order.

  • observation_spaces – A list of observation spaces to compute observations from. If provided, this changes the observation element of the return tuple to be a list of observations from the requested spaces. The default env.observation_space is not returned.

  • reward_spaces – A list of reward spaces to compute rewards from. If provided, this changes the reward element of the return tuple to be a list of rewards from the requested spaces. The default env.reward_space is not returned.

  • timeout – The maximum number of seconds to wait for the steps to succeed. Accepts a float value. The default is 300 seconds.

Returns

A tuple of observation, reward, done, and info. Observation and reward are None if default observation/reward is not set.

abstract property observation: ObservationView

A view of the available observation spaces that permits on-demand computation of observations.

abstract property observation_space: Optional[Space]

The observation space that is used to return an observation value in step().

Getter

Returns the underlying observation space, or None if not set.

Setter

Set the default observation space.

abstract render(mode='human') Optional[str][source]

Render the environment.

Parameters

mode – The render mode to use.

Raises

TypeError – If a default observation space is not set, or if the requested render mode does not exist.

abstract reset(benchmark: Optional[Union[str, Benchmark]] = None, action_space: Optional[str] = None, observation_space: Union[OptionalArgumentValue, str, ObservationSpaceSpec] = OptionalArgumentValue.UNCHANGED, reward_space: Union[OptionalArgumentValue, str, Reward] = OptionalArgumentValue.UNCHANGED, timeout: float = 300) Optional[ObservationType][source]

Reset the environment state.

This method must be called before step().

Parameters
  • benchmark – The name of the benchmark to use. If provided, it overrides any value that was set during __init__(), and becomes subsequent calls to reset() will use this benchmark. If no benchmark is provided, and no benchmark was provided to __init___(), the service will randomly select a benchmark to use.

  • action_space – The name of the action space to use. If provided, it overrides any value that set during __init__(), and subsequent calls to reset() will use this action space. If no action space is provided, the default action space is used.

  • observation_space – Compute and return observations at each step() from this space. Accepts a string name or an ObservationSpaceSpec. If None, step() returns None for the observation value. If OptionalArgumentValue.UNCHANGED (the default value), the observation space remains unchanged from the previous episode. For available spaces, see env.observation.spaces.

  • reward_space – Compute and return reward at each step() from this space. Accepts a string name or a Reward. If None, step() returns None for the reward value. If OptionalArgumentValue.UNCHANGED (the default value), the observation space remains unchanged from the previous episode. For available spaces, see env.reward.spaces.

  • timeout – The maximum number of seconds to wait for reset to succeed.

Returns

The initial observation.

Raises
  • BenchmarkInitError – If the benchmark is invalid. This can happen if the benchmark contains code that the compiler does not support, or because of some internal error within the compiler. In this case, another benchmark must be used.

  • TypeError – If no benchmark has been set, and the environment does not have a default benchmark to select from.

abstract property reward: RewardView

A view of the available reward spaces that permits on-demand computation of rewards.

abstract property reward_range: Tuple[float, float]

A tuple indicating the range of reward values.

Default range is (-inf, +inf).

abstract property reward_space: Optional[Reward]

The default reward space that is used to return a reward value from step().

Getter

Returns a Reward, or None if not set.

Setter

Set the default reward space.

abstract property state: CompilerEnvState

The tuple representation of the current environment state.

abstract step(action: ActionType, observation_spaces: Optional[Iterable[Union[str, ObservationSpaceSpec]]] = None, reward_spaces: Optional[Iterable[Union[str, Reward]]] = None, observations: Optional[Iterable[Union[str, ObservationSpaceSpec]]] = None, rewards: Optional[Iterable[Union[str, Reward]]] = None, timeout: float = 300) Tuple[Optional[Union[ObservationType, List[ObservationType]]], Optional[Union[float, List[float]]], bool, Dict[str, Any]][source]

Take a step.

Parameters
  • action – An action.

  • observation_spaces – A list of observation spaces to compute observations from. If provided, this changes the observation element of the return tuple to be a list of observations from the requested spaces. The default env.observation_space is not returned.

  • reward_spaces – A list of reward spaces to compute rewards from. If provided, this changes the reward element of the return tuple to be a list of rewards from the requested spaces. The default env.reward_space is not returned.

  • timeout – The maximum number of seconds to wait for the step to succeed. Accepts a float value. The default is 300 seconds.

Returns

A tuple of observation, reward, done, and info. Observation and reward are None if default observation/reward is not set.

abstract validate(state: Optional[CompilerEnvState] = None) ValidationResult[source]

Validate an environment’s state.

Parameters

state – A state to environment. If not provided, the current state is validated.

Returns

A ValidationResult.

abstract property version: str

The version string of the compiler service.

LlvmEnv

class compiler_gym.envs.LlvmEnv(*args, benchmark: Optional[Union[str, Benchmark]] = None, datasets_site_path: Optional[Path] = None, **kwargs)[source]

A specialized ClientServiceCompilerEnv for LLVM.

This extends the default ClientServiceCompilerEnv environment, adding extra LLVM functionality. Specifically, the actions use the CommandlineFlag space, which is a type of Discrete space that provides additional documentation about each action, and the LlvmEnv.commandline() method can be used to produce an equivalent LLVM opt invocation for the current environment state.

commandline(textformat: bool = False) str[source]

Returns an LLVM opt command line invocation for the current environment state.

Parameters

textformat – Whether to generate a command line that processes text-format LLVM-IR or bitcode (the default).

Returns

A command line string.

commandline_to_actions(commandline: str) List[int][source]

Returns a list of actions from the given command line.

Parameters

commandline – A command line invocation, as generated by env.commandline().

Returns

A list of actions.

Raises

ValueError – In case the command line string is malformed.

fork()[source]

Fork a new environment with exactly the same state.

This creates a duplicate environment instance with the current state. The new environment is entirely independently of the source environment. The user must call close() on the original and new environments.

If not already in an episode, reset() is called.

Example usage:

>>> env = gym.make("llvm-v0")
>>> env.reset()
# ... use env
>>> new_env = env.fork()
>>> new_env.state == env.state
True
>>> new_env.step(1) == env.step(1)
True

Note

The client/service implementation of CompilerGym means that the forked and base environments share a common backend resource. This means that if either of them crash, such as due to a compiler assertion, both environments must be reset.

Returns

A new environment instance.

property ir: str

Print the LLVM-IR of the program in its current state.

Alias for env.observation["Ir"].

Returns

A string of LLVM-IR.

property ir_sha1: str

Return the 40-characeter hex sha1 checksum of the current IR.

Equivalent to: hashlib.sha1(env.ir.encode("utf-8")).hexdigest().

Returns

A 40-character hexadecimal sha1 string.

make_benchmark(inputs: Union[str, Path, ClangInvocation, List[Union[str, Path, ClangInvocation]]], copt: Optional[List[str]] = None, system_includes: bool = True, timeout: int = 600) Benchmark[source]

Create a benchmark for use with this environment.

This function takes one or more inputs and uses them to create an LLVM bitcode benchmark that can be passed to compiler_gym.envs.LlvmEnv.reset().

The following input types are supported:

File Suffix

Treated as

Converted using

.bc

LLVM IR bitcode

No conversion required.

.ll

LLVM IR text format

Assembled to bitcode using llvm-as.

.c, .cc, .cpp, .cxx

C / C++ source

Compiled to bitcode using clang and the given copt.

Note

The LLVM IR format has no compatability guarantees between versions (see LLVM docs). You must ensure that any .bc and .ll files are compatible with the LLVM version used by CompilerGym, which can be reported using env.compiler_version.

E.g. for single-source C/C++ programs, you can pass the path of the source file:

>>> benchmark = env.make_benchmark('my_app.c')
>>> env = gym.make("llvm-v0")
>>> env.reset(benchmark=benchmark)

The clang invocation used is roughly equivalent to:

$ clang my_app.c -O0 -c -emit-llvm -o benchmark.bc

Additional compile-time arguments to clang can be provided using the copt argument:

>>> benchmark = env.make_benchmark('/path/to/my_app.cpp', copt=['-O2'])

If you need more fine-grained control over the options, you can directly construct a ClangInvocation to pass a list of arguments to clang:

>>> benchmark = env.make_benchmark(
    ClangInvocation(['/path/to/my_app.c'], system_includes=False, timeout=10)
)

For multi-file programs, pass a list of inputs that will be compiled separately and then linked to a single module:

>>> benchmark = env.make_benchmark([
    'main.c',
    'lib.cpp',
    'lib2.bc',
    'foo/input.bc'
])
Parameters
  • inputs – An input, or list of inputs.

  • copt – A list of command line options to pass to clang when compiling source files.

  • system_includes – Whether to include the system standard libraries during compilation jobs. This requires a system toolchain. See get_system_library_flags().

  • timeout – The maximum number of seconds to allow clang to run before terminating.

Returns

A Benchmark instance.

Raises
  • FileNotFoundError – If any input sources are not found.

  • TypeError – If the inputs are of unsupported types.

  • OSError – If a suitable compiler cannot be found.

  • BenchmarkInitError – If a compilation job fails.

  • TimeoutExpired – If a compilation job exceeds timeout seconds.

make_benchmark_from_command_line(cmd: Union[str, List[str]], replace_driver: bool = True, system_includes: bool = True, timeout: int = 600) Benchmark[source]

Create a benchmark for use with this environment.

This function takes a command line compiler invocation as input, modifies it to produce an unoptimized LLVM-IR bitcode, and then runs the modified command line to produce a bitcode benchmark.

For example, the command line:

>>> benchmark = env.make_benchmark_from_command_line(
...     ["gcc", "-DNDEBUG", "a.c", "b.c", "-o", "foo", "-lm"]
... )

Will compile a.c and b.c to an unoptimized benchmark that can be then passed to reset().

The way this works is to change the first argument of the command line invocation to the version of clang shipped with CompilerGym, and to then append command line flags that causes the compiler to produce LLVM-IR with optimizations disabled. For example the input command line:

gcc -DNDEBUG a.c b.c -o foo -lm

Will be rewritten to be roughly equivalent to:

/path/to/compiler_gym/clang -DNDEG a.c b.c \
    -Xclang -disable-llvm-passes -Xclang -disable-llvm-optzns \ -c
    -emit-llvm  -o -

The generated benchmark then has a method compile() which completes the linking and compilatilion to executable. For the above example, this would be roughly equivalent to:

/path/to/compiler_gym/clang environment-bitcode.bc -o foo -lm
Parameters
  • cmd – A command line compiler invocation, either as a list of arguments (e.g. ["clang", "in.c"]) or as a single shell string (e.g. "clang in.c").

  • replace_driver – Whether to replace the first argument of the command with the clang driver used by this environment.

  • system_includes – Whether to include the system standard libraries during compilation jobs. This requires a system toolchain. See get_system_library_flags().

  • timeout – The maximum number of seconds to allow the compilation job to run before terminating.

Returns

A BenchmarkFromCommandLine instance.

Raises
  • ValueError – If no command line is provided.

  • BenchmarkInitError – If executing the command line fails.

  • TimeoutExpired – If a compilation job exceeds timeout seconds.

render(mode='human') Optional[str][source]

Render the environment.

ClientServiceCompilerEnv instances support two render modes: “human”, which prints the current environment state to the terminal and return nothing; and “ansi”, which returns a string representation of the current environment state.

Parameters

mode – The render mode to use.

Raises

TypeError – If a default observation space is not set, or if the requested render mode does not exist.

reset(*args, **kwargs)[source]

Reset the environment state.

This method must be called before step().

Parameters
  • benchmark – The name of the benchmark to use. If provided, it overrides any value that was set during __init__(), and becomes subsequent calls to reset() will use this benchmark. If no benchmark is provided, and no benchmark was provided to __init___(), the service will randomly select a benchmark to use.

  • action_space – The name of the action space to use. If provided, it overrides any value that set during __init__(), and subsequent calls to reset() will use this action space. If no action space is provided, the default action space is used.

  • observation_space – Compute and return observations at each step() from this space. Accepts a string name or an ObservationSpaceSpec. If None, step() returns None for the observation value. If OptionalArgumentValue.UNCHANGED (the default value), the observation space remains unchanged from the previous episode. For available spaces, see env.observation.spaces.

  • reward_space – Compute and return reward at each step() from this space. Accepts a string name or a Reward. If None, step() returns None for the reward value. If OptionalArgumentValue.UNCHANGED (the default value), the observation space remains unchanged from the previous episode. For available spaces, see env.reward.spaces.

  • timeout – The maximum number of seconds to wait for reset to succeed.

Returns

The initial observation.

Raises
  • BenchmarkInitError – If the benchmark is invalid. This can happen if the benchmark contains code that the compiler does not support, or because of some internal error within the compiler. In this case, another benchmark must be used.

  • TypeError – If no benchmark has been set, and the environment does not have a default benchmark to select from.

property runtime_observation_count: int

The number of runtimes to return for the Runtime observation space.

See the Runtime observation space reference for further details.

Example usage:

>>> env = compiler_gym.make("llvm-v0")
>>> env.reset()
>>> env.runtime_observation_count = 10
>>> len(env.observation.Runtime())
10
Getter

Returns the number of runtimes that will be returned when a Runtime observation is requested.

Setter

Set the number of runtimes to compute when a Runtime observation is requested.

Type

int

property runtime_warmup_runs_count: int

The number of warmup runs of the binary to perform before measuring the Runtime observation space.

See the Runtime observation space reference for further details.

Example usage:

>>> env = compiler_gym.make("llvm-v0")
>>> env.reset()
>>> env.runtime_observation_count = 10
>>> len(env.observation.Runtime())
10
Getter

Returns the number of runs that be performed before measuring the Runtime observation is requested.

Setter

Set the number of warmup runs to perform when a Runtime observation is requested.

Type

int

write_bitcode(path: Union[Path, str]) Path[source]

Write the current program state to a bitcode file.

Parameters

path – The path of the file to write.

Returns

The input path argument.

write_ir(path: Union[Path, str]) Path[source]

Write the current program state to a file.

Parameters

path – The path of the file to write.

Returns

The input path argument.

GccEnv

class compiler_gym.envs.GccEnv(*args, gcc_bin: Union[str, Path] = 'docker:gcc:11.2.0', benchmark: Union[str, Benchmark] = 'benchmark://chstone-v0/adpcm', datasets_site_path: Optional[Path] = None, connection_settings: Optional[ConnectionOpts] = None, **kwargs)[source]

A specialized ClientServiceCompilerEnv for GCC.

This class exposes the optimization space of GCC’s command line flags as an environment for reinforcement learning. For further details, see the GCC Environment Reference.

property asm: str

Get the assembly code.

property asm_hash: str

Get a hash of the assembly code.

property asm_size: int

Get the assembly code size in bytes.

property choices: List[int]

Get the current choices

commandline() str[source]

Return a string representing the command line options.

Returns

A string.

property gcc_spec: GccSpec

A GccSpec description of the compiler specification.

property instruction_counts: Dict[str, int]

Get a count of the instruction types in the assembly code.

Note, that it will also count fields beginning with a ., like .bss and .align. Make sure to remove those if not needed.

property obj: bytes

Get the object code.

property obj_hash: str

Get a hash of the object code.

property obj_size: int

Get the object code size in bytes.

reset(benchmark: Optional[Union[str, Benchmark]] = None, action_space: Optional[str] = None, observation_space: Union[OptionalArgumentValue, str, ObservationSpaceSpec] = OptionalArgumentValue.UNCHANGED, reward_space: Union[OptionalArgumentValue, str, Reward] = OptionalArgumentValue.UNCHANGED) Optional[ObservationType][source]

Reset the environment state.

This method must be called before step().

Parameters
  • benchmark – The name of the benchmark to use. If provided, it overrides any value that was set during __init__(), and becomes subsequent calls to reset() will use this benchmark. If no benchmark is provided, and no benchmark was provided to __init___(), the service will randomly select a benchmark to use.

  • action_space – The name of the action space to use. If provided, it overrides any value that set during __init__(), and subsequent calls to reset() will use this action space. If no action space is provided, the default action space is used.

  • observation_space – Compute and return observations at each step() from this space. Accepts a string name or an ObservationSpaceSpec. If None, step() returns None for the observation value. If OptionalArgumentValue.UNCHANGED (the default value), the observation space remains unchanged from the previous episode. For available spaces, see env.observation.spaces.

  • reward_space – Compute and return reward at each step() from this space. Accepts a string name or a Reward. If None, step() returns None for the reward value. If OptionalArgumentValue.UNCHANGED (the default value), the observation space remains unchanged from the previous episode. For available spaces, see env.reward.spaces.

  • timeout – The maximum number of seconds to wait for reset to succeed.

Returns

The initial observation.

Raises
  • BenchmarkInitError – If the benchmark is invalid. This can happen if the benchmark contains code that the compiler does not support, or because of some internal error within the compiler. In this case, another benchmark must be used.

  • TypeError – If no benchmark has been set, and the environment does not have a default benchmark to select from.

property rtl: str

Get the final rtl of the program.

property source: str

Get the source code.