The compiler_gym.envs.gcc module contains datasets and API extensions for the GCC Environments. See GccEnv for the class definition.

Document contents:

Compiler Description

class compiler_gym.envs.gcc.gcc.Gcc(bin: Union[str, pathlib.Path])[source]

This class represents an instance of the GCC compiler, either as a binary or a docker image.

  • bin (str) – A string version of the constructor argument.

  • spec (GccSpec) – A GccSpec instance.

__call__(*args: str, timeout: int, cwd: Optional[pathlib.Path] = None, volumes: Optional[Dict[str, Dict[str, str]]] = None)str[source]

Run GCC with the given args.

  • args – The command line arguments to append.

  • timeout – A timeout in seconds.

  • cwd – The working directory.

  • volumes – A dictionary of volume bindings for docker.

  • TimeoutError – If GCC fails to complete within timeout.

  • ServiceError – In case GCC fails.

class compiler_gym.envs.gcc.gcc.GccSpec(gcc: compiler_gym.envs.gcc.gcc.Gcc, version: str, options: List[compiler_gym.envs.gcc.gcc.Option])[source]

This class combines all of the information about the version and options for a GCC instance.

gcc: compiler_gym.envs.gcc.gcc.Gcc

A compiler instance.

options: List[compiler_gym.envs.gcc.gcc.Option]

A list of options exposed by the compiler.

property size: int

Calculate the size of the option space. This is the product of the cardinalities of all the options.

version: str

The GCC version string.

class compiler_gym.envs.gcc.gcc.Option[source]

An Option is either a command line optimization setting or a parameter.

It is essentially a list of the possible values that can be taken. Each item is command line parameter. In GCC, all of these are single settings, so only need one string to describe them, rather than a list.

class compiler_gym.envs.gcc.gcc.GccOOption[source]

This class represents the -O0, -O1, -O2, -O3, -Os, and -Ofast options.

This class starts with no values, we fill them in with _gcc_parse_optimize(). The suffixes to append to -O are stored in self.values.

class compiler_gym.envs.gcc.gcc.GccFlagOption(name: str, no_fno: bool = False)[source]

An ordinary -f flag.

These have two possible settings. For a given flag name there are '-f<name>' and :code:’-fno-<name>. If no_fno is true, then there is only the -f<name> form.

class compiler_gym.envs.gcc.gcc.GccFlagEnumOption(name: str, values: List[str])[source]

A flag of style -f<name>=[val1, val2, ...]. holds the name. self.values holds the values.

class compiler_gym.envs.gcc.gcc.GccFlagIntOption(name: str, min: int, max: int)[source]

A flag of style -f<name>=<integer> where the integer is between min and max.

class compiler_gym.envs.gcc.gcc.GccFlagAlignOption(name: str)[source]

Alignment flags. These take several forms. See the GCC documentation.

class compiler_gym.envs.gcc.gcc.GccParamEnumOption(name: str, values: List[str])[source]

A parameter --param=<name>=[val1, val2, val3].

class compiler_gym.envs.gcc.gcc.GccParamIntOption(name: str, min: int, max: int)[source]

A parameter --param=<name>=<integer>, where the integer is between min and max.


compiler_gym.envs.gcc.datasets.get_gcc_datasets(gcc_bin: Union[str, pathlib.Path], site_data_base: Optional[pathlib.Path] = None)List[compiler_gym.datasets.dataset.Dataset][source]

Instantiate the builtin GCC datasets.

  • gcc_bin – The GCC binary to use.

  • site_data_base – The root of the site data path.


An iterable sequence of Dataset instances.

class compiler_gym.envs.gcc.datasets.AnghaBenchDataset(site_data_base: pathlib.Path, sort_order: int = 0, manifest_url: Optional[str] = None, manifest_sha256: Optional[str] = None, deprecated: Optional[str] = None, name: Optional[str] = None)[source]

A dataset of C programs curated from GitHub source code.

The dataset is from:

da Silva, Anderson Faustino, Bruno Conde Kind, José Wesley de Souza Magalhaes, Jerônimo Nunes Rocha, Breno Campos Ferreira Guimaraes, and Fernando Magno Quinão Pereira. “ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction.” In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 378-390. IEEE, 2021.

And is available at:

class compiler_gym.envs.gcc.datasets.CHStoneDataset(gcc_bin: pathlib.Path, site_data_base: pathlib.Path, sort_order: int = 0)[source]

A dataset of C programs curated from GitHub source code.

The dataset is from:

Hara, Yuko, Hiroyuki Tomiyama, Shinya Honda, Hiroaki Takada, and Katsuya Ishii. “Chstone: A benchmark program suite for practical c-based high-level synthesis.” In 2008 IEEE International Symposium on Circuits and Systems, pp. 1192-1195. IEEE, 2008.

And is available at:

class compiler_gym.envs.gcc.datasets.CsmithDataset(gcc_bin: Union[pathlib.Path, str], site_data_base: pathlib.Path, sort_order: int = 0, csmith_bin: Optional[pathlib.Path] = None, csmith_includes: Optional[pathlib.Path] = None)[source]

A dataset which uses Csmith to generate programs.

Csmith is a tool that can generate random conformant C99 programs. It is described in the publication:

Yang, Xuejun, Yang Chen, Eric Eide, and John Regehr. “Finding and understanding bugs in C compilers.” In Proceedings of the 32nd ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pp. 283-294. 2011.

For up-to-date information about Csmith, see:

Note that Csmith is a tool that is used to find errors in compilers. As such, there is a higher likelihood that the benchmark cannot be used for an environment and that env.reset() will raise BenchmarkInitError.