
This directory contains the core C++ implementation of the LLVM environment for CompilerGym. The base session is implemented by a compiler_gym::llvm_service::LlvmSession class, defined in LlvmSession.h.


#include "compiler_gym/envs/llvm/service/ActionSpace.h"

namespace compiler_gym
namespace llvm_service


enum class LlvmActionSpace

The available action spaces for LLVM.


Implementation housekeeping rules - to add a new action space:

  1. Add a new entry to this LlvmActionSpace enum.

  2. Add a new switch case to getLlvmActionSpaceList() to return the ActionSpace.

  3. Add a new switch case to LlvmSession::step() to compute the actual action.

  4. Run bazel test //compiler_gym/... and update the newly failing tests.


enumerator PASSES_ALL

The full set of transform passes for LLVM.


#include "compiler_gym/envs/llvm/service/Benchmark.h"

namespace compiler_gym
namespace llvm_service


using BenchmarkHash = llvm::ModuleHash

A 160 bits SHA1 that identifies an LLVM module.

using Bitcode = llvm::SmallString<0>

A bitcode.


grpc::Status readBitcodeFile(const boost::filesystem::path &path, Bitcode *bitcode)

Read a bitcode file from disk.

  • path – The path of the bitcode file to read.

  • bitcode – The destination bitcode.


OK on success, NOT_FOUND if the file is not found, or INVALID_ARGUMENT if the file is invalid.

grpc::Status writeBitcodeFile(const llvm::Module &module, const boost::filesystem::path &path)

Write the module bitcode to the given path.

  • module – The module to write to file.

  • path – The path of the bitcode file to write.


OK on success.

std::unique_ptr<llvm::Module> makeModule(llvm::LLVMContext &context, const Bitcode &bitcode, const std::string &name, grpc::Status *status)

Construct an LLVM module from a bitcode.

Parses the given bitcode into a module and strips the identifying ModuleID and source_filename attributes.

  • context – An LLVM context for the new module.

  • bitcode – The bitcode to parse.

  • name – The name of the module.

  • status – An error status that is set to OK on success or INVALID_ARGUMENT if the bitcode cannot be parsed.


A unique pointer to an LLVM module, or nullptr on error and sets status.

boost::filesystem::path createBenchmarkScratchDirectoryOrDie(const boost::filesystem::path &workingDirectory)

Create a temporary directory to use as a scratch pad for on-disk storage.

This directory is guaranteed to exist.

Errors in this function are fatal.


fs::path A path.


constexpr int kDefaultRuntimesPerObservationCount = 1

The number of times a benchmark is executed.

This can be overriden using the “llvm.set_runtimes_per_observation_count” session parameter.

constexpr int kDefaultWarmupRunsPerRuntimeObservationCount = 0

The default number of warmup runs that a benchmark is executed before measuring the runtimes.

This can be overriden using the “llvm.set_warmup_runs_count_per_runtime_observation” session parameter.

constexpr int kDefaultBuildtimesPerObservationCount = 1

The number of times a benchmark is built.

This can be overriden using the “llvm.set_buildtimes_per_observation_count” session parameter.

class Benchmark
#include <Benchmark.h>

An LLVM module and the LLVM context that owns it.

A benchmark is mutable and can be changed over the course of a session.

Public Functions

Benchmark(const std::string &name, const Bitcode &bitcode, const compiler_gym::BenchmarkDynamicConfig &dynamicConfig, const boost::filesystem::path &workingDirectory, const BaselineCosts &baselineCosts)

Construct a benchmark from a bitcode.

Benchmark(const std::string &name, std::unique_ptr<llvm::LLVMContext> context, std::unique_ptr<llvm::Module> module, const compiler_gym::BenchmarkDynamicConfig &dynamicConfig, const boost::filesystem::path &workingDirectory, const BaselineCosts &baselineCosts)

Construct a benchmark from an LLVM module.

void close()
std::unique_ptr<Benchmark> clone(const boost::filesystem::path &workingDirectory) const

Make a copy of the benchmark.


workingDirectory – The working directory for the new benchmark.


A copy of the benchmark.

BenchmarkHash module_hash() const

Compute and return a SHA1 hash of the module.


A SHA1 hash of the module.

grpc::Status verify_module()

Wrapper around llvm::verifyModule() which returns an error status on failure.


OK on success, else DATA_LOSS if verification fails.

grpc::Status writeBitcodeToFile(const boost::filesystem::path &path)

Write the module bitcode to the given path.

grpc::Status computeRuntime(Event &observation)

Compute a list of runtimes.

If the benchmark is not runnable, the list is empty.

grpc::Status computeBuildtime(Event &observation)

Compute a list of buildtimes.

If the benchmark is not buildable, the list is empty.

grpc::Status compile()
bool applyBaselineOptimizations(unsigned optLevel, unsigned sizeLevel)

Apply the given baseline optimizations.

  • optLevel – The runtime optimization level.

  • sizeLevel – The size optimization level


Whether the baseline optimizations modified the module.

inline const std::string &name() const

The name of the benchmark.

inline void markModuleModified()

Mark that the LLVM module has been modified.

inline llvm::Module &module()

The underlying LLVM module.

inline const llvm::Module &module() const

The underlying LLVM module.

inline llvm::LLVMContext &context()

The underlying LLVM context.

inline const llvm::LLVMContext &context() const

The underlying LLVM context.

inline const BaselineCosts &baselineCosts() const
inline const llvm::LLVMContext *context_ptr() const

A pointer to the underlying LLVM context.

inline const llvm::Module *module_ptr() const

A pointer to the underlying LLVM module.

inline const BenchmarkDynamicConfig &dynamicConfig() const

A reference to the dynamic configuration object.

inline bool isBuildable() const
inline bool isRunnable() const
inline void replaceModule(std::unique_ptr<llvm::Module> module)

Replace the benchmark module with a new one.

This is to enable out-of-process modification of the IR by serializing the benchmark to a file, modifying the file, then loading the modified file and updating the module pointer here.


module – A new module.

inline int64_t lastBuildTimeMicroseconds()
inline int getRuntimesPerObservationCount() const
inline void setRuntimesPerObservationCount(const int value)
inline int getWarmupRunsPerRuntimeObservationCount() const
inline void setWarmupRunsPerRuntimeObservationCount(const int value)
inline int getBuildtimesPerObservationCount() const
inline void setBuildtimesPerObservationCount(const int value)

Private Functions

inline const boost::filesystem::path &scratchDirectory() const
inline const boost::filesystem::path workingDirectory() const

Private Members

std::unique_ptr<llvm::LLVMContext> context_
std::unique_ptr<llvm::Module> module_
const boost::filesystem::path scratchDirectory_
const compiler_gym::BenchmarkDynamicConfig dynamicConfigProto_
const BenchmarkDynamicConfig dynamicConfig_
const BaselineCosts baselineCosts_
const std::string name_

The directory used for storing build / runtime artifacts.

The difference between the scratch directory and the working directory is that the working directory may be shared across multiple Benchmark instances. The scratch directory is unique.

bool needsRecompile_
int64_t buildTimeMicroseconds_
int runtimesPerObservationCount_
int warmupRunsPerRuntimeObservationCount_
int buildtimesPerObservationCount_


#include "compiler_gym/envs/llvm/service/BenchmarkFactory.h"

namespace compiler_gym
namespace llvm_service


constexpr size_t kMaxLoadedBenchmarksCount = 128

Maximum number of benchmark instances to cache before eviction.

Benchmarks are loaded from disk and cached in-memory so that future uses do not require a disk access. The number of benchmarks that may be simultaneously loaded is specified here. Once this number is reached, 50% of the cached benchmarks are selected randomly and evicted.

class BenchmarkFactory
#include <BenchmarkFactory.h>

A factory object for instantiating LLVM modules for use in optimization sessions.

Example usage:

BenchmarkFactory factory;
auto benchmark = factory.getBenchmark("file:////tmp/my_bitcode.bc");
// ... do fun stuff

Public Functions

void close()
grpc::Status getBenchmark(const compiler_gym::Benchmark &benchmarkMessage, std::unique_ptr<Benchmark> *benchmark)

Get the requested named benchmark.

  • benchmarkMessage – A Benchmark protocol message.

  • benchmark – A benchmark instance to assign this benchmark to.


OK on success, or INVALID_ARGUMENT if the protocol message is invalid.

Public Static Functions

static inline BenchmarkFactory &getSingleton(const boost::filesystem::path &workingDirectory, std::optional<std::mt19937_64> rand = std::nullopt, size_t maxLoadedBenchmarksCount = kMaxLoadedBenchmarksCount)

Return the global benchmark factory singleton.

  • workingDirectory – The working directory.

  • rand – An optional random number generator. This is used for cache evictions.

  • maxLoadedBenchmarksCount – The maximum number of benchmarks to cache.


The benchmark factory singleton instance.

Private Functions

grpc::Status addBitcode(const std::string &uri, const Bitcode &bitcode, std::optional<compiler_gym::BenchmarkDynamicConfig> dynamicConfig = std::nullopt)
grpc::Status addBitcode(const std::string &uri, const boost::filesystem::path &path, std::optional<compiler_gym::BenchmarkDynamicConfig> dynamicConfig = std::nullopt)
BenchmarkFactory(const boost::filesystem::path &workingDirectory, std::optional<std::mt19937_64> rand, size_t maxLoadedBenchmarksCount)

Construct a benchmark factory.

  • workingDirectory – A filesystem directory to use for storing temporary files.

  • rand – is a random seed used to control the selection of random benchmarks.

  • maxLoadedBenchmarksCount – is the maximum combined size of the bitcodes that may be cached in memory. Once this size is reached, benchmarks are offloaded so that they must be re-read from disk.

BenchmarkFactory(const BenchmarkFactory&) = delete
BenchmarkFactory &operator=(const BenchmarkFactory&) = delete

Private Members

std::unordered_map<std::string, Benchmark> benchmarks_

A mapping from URI to benchmarks which have been loaded into memory.

const boost::filesystem::path workingDirectory_
std::mt19937_64 rand_
const size_t maxLoadedBenchmarksCount_

The maximum allowed size of the benchmark cache.


#include "compiler_gym/envs/llvm/service/Cost.h"

namespace compiler_gym
namespace llvm_service


using BaselineCosts = std::array<double, numBaselineCosts>
using PreviousCosts = std::array<std::optional<double>, numCosts>


enum class LlvmCostFunction

A cost function for LLVM benchmarks.



The number of instructions in the LLVM-IR module.

IR instruction count is fast to compute and deterministic.


Returns the size (in bytes) of the .TEXT section of the compiled module.

enumerator TEXT_SIZE_BYTES

Returns the size (in bytes) of the .TEXT section of the compiled binary.

enum class LlvmBaselinePolicy

LLVM’s builtin policies.


enumerator O0

No optimizations.

enumerator O3

-O3 optimizations.

enumerator Oz

-Oz optimizations.


grpc::Status setCost(const LlvmCostFunction &costFunction, llvm::Module &module, const boost::filesystem::path &workingDirectory, const BenchmarkDynamicConfig &dynamicConfig, double *cost)

Compute the cost using a given cost function.

A lower cost is better.

  • costFunction – The cost function to use.

  • module – The module to compute the cost for.

  • workingDirectory – A directory that can be used for temporary file storage.

  • cost – The cost to write.


OK on success.

grpc::Status setBaselineCosts(llvm::Module &unoptimizedModule, const boost::filesystem::path &workingDirectory, const BenchmarkDynamicConfig &dynamicConfig, BaselineCosts *baselineCosts)

Compute the costs of baseline policies.


The unoptimizedModule parameter is unmodified, but is not const because various LLVM API calls require a mutable reference.

  • unoptimizedModule – The module to compute the baseline costs of.

  • baselineCosts – The costs to write.

  • workingDirectory – A directory that can be used for temporary file storage.


constexpr size_t numCosts = magic_enum::enum_count<LlvmCostFunction>()
constexpr size_t numBaselineCosts = magic_enum::enum_count<LlvmBaselinePolicy>() * numCosts


#include "compiler_gym/envs/llvm/service/LlvmSession.h"

namespace compiler_gym
namespace llvm_service
class LlvmSession : public compiler_gym::CompilationSession
#include <LlvmSession.h>

An interactive LLVM compilation session.

This class exposes the LLVM optimization pipeline for an LLVM module as an interactive environment. It can be used directly as a C++ API, or it can be accessed through an RPC interface using the CompilerGym RPC runtime.

Public Functions

LlvmSession(const boost::filesystem::path &workingDirectory)
virtual std::string getCompilerVersion() const final override

Get the compiler version.


A string indicating the compiler version.

virtual std::vector<ActionSpace> getActionSpaces() const final override

A list of action spaces describing the capabilities of the compiler.


A list of ActionSpace instances.

virtual std::vector<ObservationSpace> getObservationSpaces() const final override

A list of feature vectors that this compiler provides.


A list of ObservationSpace instances.

grpc::Status init(const ActionSpace &actionSpace, const compiler_gym::Benchmark &benchmark) final override
virtual grpc::Status init(CompilationSession *other) final override

Initialize a CompilationSession from another CompilerSession.

Think of this like a copy constructor, except that this method is allowed to fail.

This will be called after construction and before applyAction() or computeObservation(). This will only be called once.


other – The CompilationSession to initialize from.


OK on success, else an errro code and message.

virtual grpc::Status applyAction(const Event &action, bool &endOfEpisode, std::optional<ActionSpace> &newActionSpace, bool &actionHadNoEffect) final override

Apply an action.

  • action – The action to apply.

  • newActionSpace – If applying the action mutated the action space, set this value to the new action space.

  • actionHadNoEffect – If the action had no effect, set this to true.


OK on success, else an errro code and message.

virtual grpc::Status endOfStep(bool actionHadNoEffect, bool &endOfEpisode, std::optional<ActionSpace> &newActionSpace) final override


This will be called after all applyAction() and computeObservation() in a step. Use this method if you would like to perform post-transform validation of compiler state.


OK on success, else an errro code and message.

virtual grpc::Status computeObservation(const ObservationSpace &observationSpace, Event &observation) final override

Compute an observation.


OK on success, else an errro code and message.

virtual grpc::Status handleSessionParameter(const std::string &key, const std::string &value, std::optional<std::string> &reply) final override

Handle a session parameter send by the frontend.

Session parameters provide a method to send ad-hoc key-value messages to a compilation session through the env.send_session_parameter() method. It us up to the client/service to agree on a common schema for encoding and decoding these parameters.

Implementing this method is optional.

  • key – The parameter key.

  • value – The parameter value.

  • reply – A string response message for the parameter, or leave as std::nullopt if the parameter is unknown.


OK on success, else an errro code and message.

inline const LlvmActionSpace actionSpace() const

Private Functions

grpc::Status computeObservation(LlvmObservationSpace observationSpace, Event &observation)
grpc::Status init(const LlvmActionSpace &actionSpace, std::unique_ptr<Benchmark> benchmark)
inline const Benchmark &benchmark() const
inline Benchmark &benchmark()
grpc::Status applyPassAction(LlvmAction action, bool &actionHadNoEffect)

Run the requested action.

  • action – An action to apply.

  • actionHadNoEffect – Set to true if LLVM reported that any passes that were run made no modifications to the module.


OK on success.

bool runPass(llvm::Pass *pass)

Run the given pass, possibly modifying the underlying LLVM module.


Whether the module was modified.

bool runPass(llvm::FunctionPass *pass)

Run the given pass, possibly modifying the underlying LLVM module.


Whether the module was modified.

grpc::Status runOptWithArgs(const std::vector<std::string> &optArgs)

Run the commandline opt tool on the current LLVM module with the given arguments, replacing the environment state with the generated output.

inline const llvm::TargetLibraryInfoImpl &tlii() const
template<typename PassManager, typename Pass>
inline void setupPassManager(PassManager *passManager, Pass *pass)

Setup pass manager with depdendent passes and the specified pass.

Private Members

const std::unordered_map<std::string, LlvmObservationSpace> observationSpaceNames_
LlvmActionSpace actionSpace_
std::unique_ptr<Benchmark> benchmark_
llvm::TargetLibraryInfoImpl tlii_


#include "compiler_gym/envs/llvm/service/Observation.h"

namespace compiler_gym
namespace llvm_service


grpc::Status setObservation(LlvmObservationSpace space, const boost::filesystem::path &workingDirectory, Benchmark &benchmark, Event &reply)

Compute an observation using the given space.

  • space – The observation space to compute.

  • workingDirectory – A scratch directory.

  • benchmark – The benchmark to compute the observation on.

  • reply – The observation to set.


OK on success.


#include "compiler_gym/envs/llvm/service/ObservationSpaces.h"

namespace compiler_gym
namespace llvm_service


enum class LlvmObservationSpace

The available observation spaces for LLVM.


Housekeeping rules - to add a new observation space:

  1. Add a new entry to this LlvmObservationSpace enum.

  2. Add a new switch case to getLlvmObservationSpaceList() to return the ObserverationSpace.

  3. Add a new switch case to LlvmSession::getObservation() to compute the actual observation.

  4. Run bazel test //compiler_gym/... and update the newly failing tests.


enumerator IR

The entire LLVM module as an IR string.

This allows the user to do their own feature extraction.

enumerator IR_SHA1

The 40-digit hex SHA1 checksum of the LLVM module.

enumerator BITCODE

Get the bitcode as a bytes array.

enumerator BITCODE_FILE

Write the bitcode to a file and return its path as a string.

enumerator INST_COUNT

The counts of all instructions in a program.

enumerator AUTOPHASE

The Autophase feature vector.


Huang, Q., Haj-Ali, A., Moses, W., Xiang, J., Stoica, I., Asanovic, K.,
& Wawrzynek, J. (2019). Autophase: Compiler phase-ordering for HLS with
deep reinforcement learning. FCCM.

enumerator PROGRAML

Returns the graph representation of a program as a networkx Graph.


Cummins, C., Fisches, Z. V., Ben-Nun, T., Hoefler, T., & Leather, H.
(2020). ProGraML: Graph-based Deep Learning for Program Optimization
and Analysis. ArXiv:2003.10536.

enumerator PROGRAML_JSON

Returns the graph representation of a program as a JSON node-link graph.


Cummins, C., Fisches, Z. V., Ben-Nun, T., Hoefler, T., & Leather, H.
(2020). ProGraML: Graph-based Deep Learning for Program Optimization
and Analysis. ArXiv:2003.10536.

enumerator CPU_INFO

A JSON dictionary of properties describing the CPU.


The number of LLVM-IR instructions in the current module.


The number of LLVM-IR instructions normalized to -O0.


The number of LLVM-IR instructions normalized to -O3.


The number of LLVM-IR instructions normalized to -Oz.


The platform-dependent size of the .text section of the lowered module.

enumerator OBJECT_TEXT_SIZE_O0

The platform-dependent size of the .text section of the lowered module.

enumerator OBJECT_TEXT_SIZE_O3

The platform-dependent size of the .text section of the lowered module.


The platform-dependent size of the .text section of the lowered module.

enumerator TEXT_SIZE_BYTES

The platform-dependent size of the .text section of the compiled binary.

enumerator TEXT_SIZE_O0

The platform-dependent size of the .text section of the compiled binary.

enumerator TEXT_SIZE_O3

The platform-dependent size of the .text section of the compiled binary.

enumerator TEXT_SIZE_OZ

The platform-dependent size of the .text section of the compiled binary.

enumerator IS_BUILDABLE

Return 1 if the benchmark is buildable, else 0.

enumerator IS_RUNNABLE

Return 1 if the benchmark is runnable, else 0.

enumerator RUNTIME

The runtime of the compiled program.

Returns a list of runtime measurements in microseconds. This is not available to all benchmarks. When not available, a list of zeros are returned.

enumerator BUILDTIME

The time it took to compile the program.

Returns a list of measurments in seconds. This is not available to all benchmarks. When not available, a list of zeros are returned.

enumerator LEXED_IR

The LLVM-lexer token IDs of the input IR.

Returns a dictionary of aligned lists (token_idx, token_kind,token_category, str_token_value) one list element for every tokenized word in the IR.