compiler_gym/envs/llvm/service
This directory contains the core C++ implementation of the LLVM environment for
CompilerGym. The base session is implemented by a
compiler_gym::llvm_service::LlvmSession
class, defined in
LlvmSession.h.
ActionSpace.h
#include "compiler_gym/envs/llvm/service/ActionSpace.h"
-
namespace compiler_gym
-
namespace llvm_service
Enums
-
enum class LlvmActionSpace
The available action spaces for LLVM.
Note
Implementation housekeeping rules - to add a new action space:
Add a new entry to this LlvmActionSpace enum.
Add a new switch case to getLlvmActionSpaceList() to return the ActionSpace.
Add a new switch case to LlvmSession::step() to compute the actual action.
Run
bazel test //compiler_gym/...
and update the newly failing tests.
Values:
-
enumerator PASSES_ALL
The full set of transform passes for LLVM.
-
enum class LlvmActionSpace
-
namespace llvm_service
Benchmark.h
#include "compiler_gym/envs/llvm/service/Benchmark.h"
-
namespace compiler_gym
-
namespace llvm_service
Typedefs
-
using BenchmarkHash = llvm::ModuleHash
A 160 bits SHA1 that identifies an LLVM module.
-
using Bitcode = llvm::SmallString<0>
A bitcode.
Functions
-
grpc::Status readBitcodeFile(const boost::filesystem::path &path, Bitcode *bitcode)
Read a bitcode file from disk.
- Parameters
path – The path of the bitcode file to read.
bitcode – The destination bitcode.
- Returns
OK
on success,NOT_FOUND
if the file is not found, orINVALID_ARGUMENT
if the file is invalid.
-
grpc::Status writeBitcodeFile(const llvm::Module &module, const boost::filesystem::path &path)
Write the module bitcode to the given path.
- Parameters
module – The module to write to file.
path – The path of the bitcode file to write.
- Returns
OK
on success.
-
std::unique_ptr<llvm::Module> makeModule(llvm::LLVMContext &context, const Bitcode &bitcode, const std::string &name, grpc::Status *status)
Construct an LLVM module from a bitcode.
Parses the given bitcode into a module and strips the identifying
ModuleID
andsource_filename
attributes.- Parameters
context – An LLVM context for the new module.
bitcode – The bitcode to parse.
name – The name of the module.
status – An error status that is set to
OK
on success orINVALID_ARGUMENT
if the bitcode cannot be parsed.
- Returns
A unique pointer to an LLVM module, or
nullptr
on error and setsstatus
.
-
boost::filesystem::path createBenchmarkScratchDirectoryOrDie(const boost::filesystem::path &workingDirectory)
Create a temporary directory to use as a scratch pad for on-disk storage.
This directory is guaranteed to exist.
Errors in this function are fatal.
- Returns
fs::path A path.
Variables
-
constexpr int kDefaultRuntimesPerObservationCount = 1
The number of times a benchmark is executed.
This can be overriden using the “llvm.set_runtimes_per_observation_count” session parameter.
-
constexpr int kDefaultWarmupRunsPerRuntimeObservationCount = 0
The default number of warmup runs that a benchmark is executed before measuring the runtimes.
This can be overriden using the “llvm.set_warmup_runs_count_per_runtime_observation” session parameter.
-
constexpr int kDefaultBuildtimesPerObservationCount = 1
The number of times a benchmark is built.
This can be overriden using the “llvm.set_buildtimes_per_observation_count” session parameter.
-
class Benchmark
- #include <Benchmark.h>
An LLVM module and the LLVM context that owns it.
A benchmark is mutable and can be changed over the course of a session.
Public Functions
-
Benchmark(const std::string &name, const Bitcode &bitcode, const compiler_gym::BenchmarkDynamicConfig &dynamicConfig, const boost::filesystem::path &workingDirectory, const BaselineCosts &baselineCosts)
Construct a benchmark from a bitcode.
-
Benchmark(const std::string &name, std::unique_ptr<llvm::LLVMContext> context, std::unique_ptr<llvm::Module> module, const compiler_gym::BenchmarkDynamicConfig &dynamicConfig, const boost::filesystem::path &workingDirectory, const BaselineCosts &baselineCosts)
Construct a benchmark from an LLVM module.
-
void close()
-
std::unique_ptr<Benchmark> clone(const boost::filesystem::path &workingDirectory) const
Make a copy of the benchmark.
- Parameters
workingDirectory – The working directory for the new benchmark.
- Returns
A copy of the benchmark.
-
BenchmarkHash module_hash() const
Compute and return a SHA1 hash of the module.
- Returns
A SHA1 hash of the module.
-
grpc::Status verify_module()
Wrapper around
llvm::verifyModule()
which returns an error status on failure.- Returns
OK
on success, elseDATA_LOSS
if verification fails.
-
grpc::Status writeBitcodeToFile(const boost::filesystem::path &path)
Write the module bitcode to the given path.
-
grpc::Status computeRuntime(Event &observation)
Compute a list of runtimes.
If the benchmark is not runnable, the list is empty.
-
grpc::Status computeBuildtime(Event &observation)
Compute a list of buildtimes.
If the benchmark is not buildable, the list is empty.
-
grpc::Status compile()
-
bool applyBaselineOptimizations(unsigned optLevel, unsigned sizeLevel)
Apply the given baseline optimizations.
- Parameters
optLevel – The runtime optimization level.
sizeLevel – The size optimization level
- Returns
Whether the baseline optimizations modified the module.
-
inline const std::string &name() const
The name of the benchmark.
-
inline void markModuleModified()
Mark that the LLVM module has been modified.
-
inline llvm::Module &module()
The underlying LLVM module.
-
inline const llvm::Module &module() const
The underlying LLVM module.
-
inline llvm::LLVMContext &context()
The underlying LLVM context.
-
inline const llvm::LLVMContext &context() const
The underlying LLVM context.
-
inline const BaselineCosts &baselineCosts() const
-
inline const llvm::LLVMContext *context_ptr() const
A pointer to the underlying LLVM context.
-
inline const llvm::Module *module_ptr() const
A pointer to the underlying LLVM module.
-
inline const BenchmarkDynamicConfig &dynamicConfig() const
A reference to the dynamic configuration object.
-
inline bool isBuildable() const
-
inline bool isRunnable() const
-
inline void replaceModule(std::unique_ptr<llvm::Module> module)
Replace the benchmark module with a new one.
This is to enable out-of-process modification of the IR by serializing the benchmark to a file, modifying the file, then loading the modified file and updating the module pointer here.
- Parameters
module – A new module.
-
inline int64_t lastBuildTimeMicroseconds()
-
inline int getRuntimesPerObservationCount() const
-
inline void setRuntimesPerObservationCount(const int value)
-
inline int getWarmupRunsPerRuntimeObservationCount() const
-
inline void setWarmupRunsPerRuntimeObservationCount(const int value)
-
inline int getBuildtimesPerObservationCount() const
-
inline void setBuildtimesPerObservationCount(const int value)
Private Functions
-
inline const boost::filesystem::path &scratchDirectory() const
-
inline const boost::filesystem::path workingDirectory() const
Private Members
-
std::unique_ptr<llvm::LLVMContext> context_
-
std::unique_ptr<llvm::Module> module_
-
const boost::filesystem::path scratchDirectory_
-
const compiler_gym::BenchmarkDynamicConfig dynamicConfigProto_
-
const BenchmarkDynamicConfig dynamicConfig_
-
const BaselineCosts baselineCosts_
-
const std::string name_
The directory used for storing build / runtime artifacts.
The difference between the scratch directory and the working directory is that the working directory may be shared across multiple Benchmark instances. The scratch directory is unique.
-
bool needsRecompile_
-
int64_t buildTimeMicroseconds_
-
int runtimesPerObservationCount_
-
int warmupRunsPerRuntimeObservationCount_
-
int buildtimesPerObservationCount_
-
Benchmark(const std::string &name, const Bitcode &bitcode, const compiler_gym::BenchmarkDynamicConfig &dynamicConfig, const boost::filesystem::path &workingDirectory, const BaselineCosts &baselineCosts)
-
using BenchmarkHash = llvm::ModuleHash
-
namespace llvm_service
BenchmarkFactory.h
#include "compiler_gym/envs/llvm/service/BenchmarkFactory.h"
-
namespace compiler_gym
-
namespace llvm_service
Variables
-
constexpr size_t kMaxLoadedBenchmarksCount = 128
Maximum number of benchmark instances to cache before eviction.
Benchmarks are loaded from disk and cached in-memory so that future uses do not require a disk access. The number of benchmarks that may be simultaneously loaded is specified here. Once this number is reached, 50% of the cached benchmarks are selected randomly and evicted.
-
class BenchmarkFactory
- #include <BenchmarkFactory.h>
A factory object for instantiating LLVM modules for use in optimization sessions.
Example usage:
BenchmarkFactory factory; auto benchmark = factory.getBenchmark("file:////tmp/my_bitcode.bc"); // ... do fun stuff
Public Functions
-
~BenchmarkFactory()
-
void close()
-
grpc::Status getBenchmark(const compiler_gym::Benchmark &benchmarkMessage, std::unique_ptr<Benchmark> *benchmark)
Get the requested named benchmark.
- Parameters
benchmarkMessage – A Benchmark protocol message.
benchmark – A benchmark instance to assign this benchmark to.
- Returns
OK
on success, orINVALID_ARGUMENT
if the protocol message is invalid.
Public Static Functions
-
static inline BenchmarkFactory &getSingleton(const boost::filesystem::path &workingDirectory, std::optional<std::mt19937_64> rand = std::nullopt, size_t maxLoadedBenchmarksCount = kMaxLoadedBenchmarksCount)
Return the global benchmark factory singleton.
- Parameters
workingDirectory – The working directory.
rand – An optional random number generator. This is used for cache evictions.
maxLoadedBenchmarksCount – The maximum number of benchmarks to cache.
- Returns
The benchmark factory singleton instance.
Private Functions
-
grpc::Status addBitcode(const std::string &uri, const Bitcode &bitcode, std::optional<compiler_gym::BenchmarkDynamicConfig> dynamicConfig = std::nullopt)
-
grpc::Status addBitcode(const std::string &uri, const boost::filesystem::path &path, std::optional<compiler_gym::BenchmarkDynamicConfig> dynamicConfig = std::nullopt)
-
BenchmarkFactory(const boost::filesystem::path &workingDirectory, std::optional<std::mt19937_64> rand, size_t maxLoadedBenchmarksCount)
Construct a benchmark factory.
- Parameters
workingDirectory – A filesystem directory to use for storing temporary files.
rand – is a random seed used to control the selection of random benchmarks.
maxLoadedBenchmarksCount – is the maximum combined size of the bitcodes that may be cached in memory. Once this size is reached, benchmarks are offloaded so that they must be re-read from disk.
-
BenchmarkFactory(const BenchmarkFactory&) = delete
-
BenchmarkFactory &operator=(const BenchmarkFactory&) = delete
Private Members
-
std::unordered_map<std::string, Benchmark> benchmarks_
A mapping from URI to benchmarks which have been loaded into memory.
-
const boost::filesystem::path workingDirectory_
-
std::mt19937_64 rand_
-
const size_t maxLoadedBenchmarksCount_
The maximum allowed size of the benchmark cache.
-
~BenchmarkFactory()
-
constexpr size_t kMaxLoadedBenchmarksCount = 128
-
namespace llvm_service
Cost.h
#include "compiler_gym/envs/llvm/service/Cost.h"
-
namespace compiler_gym
-
namespace llvm_service
Typedefs
-
using BaselineCosts = std::array<double, numBaselineCosts>
Enums
-
enum class LlvmCostFunction
A cost function for LLVM benchmarks.
Values:
-
enumerator IR_INSTRUCTION_COUNT
The number of instructions in the LLVM-IR module.
IR instruction count is fast to compute and deterministic.
-
enumerator OBJECT_TEXT_SIZE_BYTES
Returns the size (in bytes) of the .TEXT section of the compiled module.
-
enumerator TEXT_SIZE_BYTES
Returns the size (in bytes) of the .TEXT section of the compiled binary.
-
enumerator IR_INSTRUCTION_COUNT
Functions
-
grpc::Status setCost(const LlvmCostFunction &costFunction, llvm::Module &module, const boost::filesystem::path &workingDirectory, const BenchmarkDynamicConfig &dynamicConfig, double *cost)
Compute the cost using a given cost function.
A lower cost is better.
- Parameters
costFunction – The cost function to use.
module – The module to compute the cost for.
workingDirectory – A directory that can be used for temporary file storage.
cost – The cost to write.
- Returns
OK
on success.
-
grpc::Status setBaselineCosts(llvm::Module &unoptimizedModule, const boost::filesystem::path &workingDirectory, const BenchmarkDynamicConfig &dynamicConfig, BaselineCosts *baselineCosts)
Compute the costs of baseline policies.
Note
The
unoptimizedModule
parameter is unmodified, but is not const because various LLVM API calls require a mutable reference.- Parameters
unoptimizedModule – The module to compute the baseline costs of.
baselineCosts – The costs to write.
workingDirectory – A directory that can be used for temporary file storage.
Variables
-
constexpr size_t numCosts = magic_enum::enum_count<LlvmCostFunction>()
-
constexpr size_t numBaselineCosts = magic_enum::enum_count<LlvmBaselinePolicy>() * numCosts
-
using BaselineCosts = std::array<double, numBaselineCosts>
-
namespace llvm_service
LlvmSession.h
#include "compiler_gym/envs/llvm/service/LlvmSession.h"
-
namespace compiler_gym
-
namespace llvm_service
-
class LlvmSession : public compiler_gym::CompilationSession
- #include <LlvmSession.h>
An interactive LLVM compilation session.
This class exposes the LLVM optimization pipeline for an LLVM module as an interactive environment. It can be used directly as a C++ API, or it can be accessed through an RPC interface using the CompilerGym RPC runtime.
Public Functions
-
LlvmSession(const boost::filesystem::path &workingDirectory)
-
virtual std::string getCompilerVersion() const final override
Get the compiler version.
- Returns
A string indicating the compiler version.
-
virtual std::vector<ActionSpace> getActionSpaces() const final override
A list of action spaces describing the capabilities of the compiler.
- Returns
A list of ActionSpace instances.
-
virtual std::vector<ObservationSpace> getObservationSpaces() const final override
A list of feature vectors that this compiler provides.
- Returns
A list of ObservationSpace instances.
-
grpc::Status init(const ActionSpace &actionSpace, const compiler_gym::Benchmark &benchmark) final override
-
virtual grpc::Status init(CompilationSession *other) final override
Initialize a CompilationSession from another CompilerSession.
Think of this like a copy constructor, except that this method is allowed to fail.
This will be called after construction and before applyAction() or computeObservation(). This will only be called once.
- Parameters
other – The CompilationSession to initialize from.
- Returns
OK
on success, else an errro code and message.
-
virtual grpc::Status applyAction(const Event &action, bool &endOfEpisode, std::optional<ActionSpace> &newActionSpace, bool &actionHadNoEffect) final override
Apply an action.
- Parameters
action – The action to apply.
newActionSpace – If applying the action mutated the action space, set this value to the new action space.
actionHadNoEffect – If the action had no effect, set this to true.
- Returns
OK
on success, else an errro code and message.
-
virtual grpc::Status endOfStep(bool actionHadNoEffect, bool &endOfEpisode, std::optional<ActionSpace> &newActionSpace) final override
Optional.
This will be called after all applyAction() and computeObservation() in a step. Use this method if you would like to perform post-transform validation of compiler state.
- Returns
OK
on success, else an errro code and message.
-
virtual grpc::Status computeObservation(const ObservationSpace &observationSpace, Event &observation) final override
Compute an observation.
- Returns
OK
on success, else an errro code and message.
-
virtual grpc::Status handleSessionParameter(const std::string &key, const std::string &value, std::optional<std::string> &reply) final override
Handle a session parameter send by the frontend.
Session parameters provide a method to send ad-hoc key-value messages to a compilation session through the env.send_session_parameter() method. It us up to the client/service to agree on a common schema for encoding and decoding these parameters.
Implementing this method is optional.
- Parameters
key – The parameter key.
value – The parameter value.
reply – A string response message for the parameter, or leave as std::nullopt if the parameter is unknown.
- Returns
OK
on success, else an errro code and message.
-
inline const LlvmActionSpace actionSpace() const
Private Functions
-
grpc::Status computeObservation(LlvmObservationSpace observationSpace, Event &observation)
-
grpc::Status init(const LlvmActionSpace &actionSpace, std::unique_ptr<Benchmark> benchmark)
-
grpc::Status applyPassAction(LlvmAction action, bool &actionHadNoEffect)
Run the requested action.
- Parameters
action – An action to apply.
actionHadNoEffect – Set to true if LLVM reported that any passes that were run made no modifications to the module.
- Returns
OK
on success.
-
bool runPass(llvm::Pass *pass)
Run the given pass, possibly modifying the underlying LLVM module.
- Returns
Whether the module was modified.
-
bool runPass(llvm::FunctionPass *pass)
Run the given pass, possibly modifying the underlying LLVM module.
- Returns
Whether the module was modified.
-
grpc::Status runOptWithArgs(const std::vector<std::string> &optArgs)
Run the commandline
opt
tool on the current LLVM module with the given arguments, replacing the environment state with the generated output.
-
inline const llvm::TargetLibraryInfoImpl &tlii() const
-
template<typename PassManager, typename Pass>
inline void setupPassManager(PassManager *passManager, Pass *pass) Setup pass manager with depdendent passes and the specified pass.
Private Members
-
const std::unordered_map<std::string, LlvmObservationSpace> observationSpaceNames_
-
LlvmActionSpace actionSpace_
-
llvm::TargetLibraryInfoImpl tlii_
-
LlvmSession(const boost::filesystem::path &workingDirectory)
-
class LlvmSession : public compiler_gym::CompilationSession
-
namespace llvm_service
Observation.h
#include "compiler_gym/envs/llvm/service/Observation.h"
-
namespace compiler_gym
-
namespace llvm_service
Functions
-
grpc::Status setObservation(LlvmObservationSpace space, const boost::filesystem::path &workingDirectory, Benchmark &benchmark, Event &reply)
Compute an observation using the given space.
- Parameters
space – The observation space to compute.
workingDirectory – A scratch directory.
benchmark – The benchmark to compute the observation on.
reply – The observation to set.
- Returns
OK
on success.
-
grpc::Status setObservation(LlvmObservationSpace space, const boost::filesystem::path &workingDirectory, Benchmark &benchmark, Event &reply)
-
namespace llvm_service
ObservationSpaces.h
#include "compiler_gym/envs/llvm/service/ObservationSpaces.h"
-
namespace compiler_gym
-
namespace llvm_service
Enums
-
enum class LlvmObservationSpace
The available observation spaces for LLVM.
Note
Housekeeping rules - to add a new observation space:
Add a new entry to this LlvmObservationSpace enum.
Add a new switch case to getLlvmObservationSpaceList() to return the ObserverationSpace.
Add a new switch case to LlvmSession::getObservation() to compute the actual observation.
Run
bazel test //compiler_gym/...
and update the newly failing tests.
Values:
-
enumerator IR
The entire LLVM module as an IR string.
This allows the user to do their own feature extraction.
-
enumerator IR_SHA1
The 40-digit hex SHA1 checksum of the LLVM module.
-
enumerator BITCODE
Get the bitcode as a bytes array.
-
enumerator BITCODE_FILE
Write the bitcode to a file and return its path as a string.
-
enumerator INST_COUNT
The counts of all instructions in a program.
-
enumerator AUTOPHASE
The Autophase feature vector.
From:
Huang, Q., Haj-Ali, A., Moses, W., Xiang, J., Stoica, I., Asanovic, K., & Wawrzynek, J. (2019). Autophase: Compiler phase-ordering for HLS with deep reinforcement learning. FCCM.
-
enumerator PROGRAML
Returns the graph representation of a program as a networkx Graph.
From:
Cummins, C., Fisches, Z. V., Ben-Nun, T., Hoefler, T., & Leather, H. (2020). ProGraML: Graph-based Deep Learning for Program Optimization and Analysis. ArXiv:2003.10536. https://arxiv.org/abs/2003.10536
-
enumerator PROGRAML_JSON
Returns the graph representation of a program as a JSON node-link graph.
From:
Cummins, C., Fisches, Z. V., Ben-Nun, T., Hoefler, T., & Leather, H. (2020). ProGraML: Graph-based Deep Learning for Program Optimization and Analysis. ArXiv:2003.10536. https://arxiv.org/abs/2003.10536
-
enumerator CPU_INFO
A JSON dictionary of properties describing the CPU.
-
enumerator IR_INSTRUCTION_COUNT
The number of LLVM-IR instructions in the current module.
-
enumerator IR_INSTRUCTION_COUNT_O0
The number of LLVM-IR instructions normalized to
-O0
.
-
enumerator IR_INSTRUCTION_COUNT_O3
The number of LLVM-IR instructions normalized to
-O3
.
-
enumerator IR_INSTRUCTION_COUNT_OZ
The number of LLVM-IR instructions normalized to
-Oz
.
-
enumerator OBJECT_TEXT_SIZE_BYTES
The platform-dependent size of the .text section of the lowered module.
-
enumerator OBJECT_TEXT_SIZE_O0
The platform-dependent size of the .text section of the lowered module.
-
enumerator OBJECT_TEXT_SIZE_O3
The platform-dependent size of the .text section of the lowered module.
-
enumerator OBJECT_TEXT_SIZE_OZ
The platform-dependent size of the .text section of the lowered module.
-
enumerator TEXT_SIZE_BYTES
The platform-dependent size of the .text section of the compiled binary.
-
enumerator TEXT_SIZE_O0
The platform-dependent size of the .text section of the compiled binary.
-
enumerator TEXT_SIZE_O3
The platform-dependent size of the .text section of the compiled binary.
-
enumerator TEXT_SIZE_OZ
The platform-dependent size of the .text section of the compiled binary.
-
enumerator IS_BUILDABLE
Return 1 if the benchmark is buildable, else 0.
-
enumerator IS_RUNNABLE
Return 1 if the benchmark is runnable, else 0.
-
enumerator RUNTIME
The runtime of the compiled program.
Returns a list of runtime measurements in microseconds. This is not available to all benchmarks. When not available, a list of zeros are returned.
-
enumerator BUILDTIME
The time it took to compile the program.
Returns a list of measurments in seconds. This is not available to all benchmarks. When not available, a list of zeros are returned.
-
enumerator LEXED_IR
The LLVM-lexer token IDs of the input IR.
Returns a dictionary of aligned lists (token_idx, token_kind,token_category, str_token_value) one list element for every tokenized word in the IR.
-
enum class LlvmObservationSpace
-
namespace llvm_service