SIGN IN SIGN UP
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
//! Compilation of all Zig source code is represented by one `Module`.
//! Each `Compilation` has exactly one or zero `Module`, depending on whether
//! there is or is not any zig source code, respectively.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
const std = @import("std");
const mem = std.mem;
const Allocator = std.mem.Allocator;
const ArrayListUnmanaged = std.ArrayListUnmanaged;
const assert = std.debug.assert;
const log = std.log.scoped(.module);
const BigIntConst = std.math.big.int.Const;
const BigIntMutable = std.math.big.int.Mutable;
const Target = std.Target;
const Ast = std.zig.Ast;
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
const Module = @This();
const Compilation = @import("Compilation.zig");
const Cache = @import("Cache.zig");
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
const Value = @import("value.zig").Value;
const Type = @import("type.zig").Type;
const TypedValue = @import("TypedValue.zig");
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
const Package = @import("Package.zig");
const link = @import("link.zig");
const Air = @import("Air.zig");
const Zir = @import("Zir.zig");
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
const trace = @import("tracy.zig").trace;
const AstGen = @import("AstGen.zig");
const Sema = @import("Sema.zig");
const target_util = @import("target.zig");
const build_options = @import("build_options");
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// General-purpose allocator. Used for both temporary and long-term storage.
gpa: *Allocator,
comp: *Compilation,
/// Where our incremental compilation metadata serialization will go.
zig_cache_artifact_directory: Compilation.Directory,
/// Pointer to externally managed resource.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
root_pkg: *Package,
/// Normally, `main_pkg` and `root_pkg` are the same. The exception is `zig test`, in which
/// `root_pkg` is the test runner, and `main_pkg` is the user's source file which has the tests.
main_pkg: *Package,
/// Used by AstGen worker to load and store ZIR cache.
global_zir_cache: Compilation.Directory,
/// Used by AstGen worker to load and store ZIR cache.
local_zir_cache: Compilation.Directory,
/// It's rare for a decl to be exported, so we save memory by having a sparse
/// map of Decl pointers to details about them being exported.
/// The Export memory is owned by the `export_owners` table; the slice itself
/// is owned by this table. The slice is guaranteed to not be empty.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
decl_exports: std.AutoArrayHashMapUnmanaged(*Decl, []*Export) = .{},
/// This models the Decls that perform exports, so that `decl_exports` can be updated when a Decl
/// is modified. Note that the key of this table is not the Decl being exported, but the Decl that
/// is performing the export of another Decl.
/// This table owns the Export memory.
export_owners: std.AutoArrayHashMapUnmanaged(*Decl, []*Export) = .{},
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
/// The set of all the files in the Module. We keep track of this in order to iterate
/// over it and check which source files have been modified on the file system when
/// an update is requested, as well as to cache `@import` results.
/// Keys are fully resolved file paths. This table owns the keys and values.
import_table: std.StringArrayHashMapUnmanaged(*File) = .{},
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
/// The set of all the generic function instantiations. This is used so that when a generic
/// function is called twice with the same comptime parameter arguments, both calls dispatch
/// to the same function.
/// TODO: remove functions from this set when they are destroyed.
monomorphed_funcs: MonomorphedFuncsSet = .{},
/// The set of all comptime function calls that have been cached so that future calls
/// with the same parameters will get the same return value.
memoized_calls: MemoizedCallSet = .{},
/// Contains the values from `@setAlignStack`. A sparse table is used here
/// instead of a field of `Fn` because usage of `@setAlignStack` is rare, while
/// functions are many.
/// TODO: remove functions from this set when they are destroyed.
align_stack_fns: std.AutoHashMapUnmanaged(*const Fn, SetAlignStack) = .{},
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// We optimize memory usage for a compilation with no compile errors by storing the
/// error messages and mapping outside of `Decl`.
/// The ErrorMsg memory is owned by the decl, using Module's general purpose allocator.
/// Note that a Decl can succeed but the Fn it represents can fail. In this case,
/// a Decl can have a failed_decls entry but have analysis status of success.
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
failed_decls: std.AutoArrayHashMapUnmanaged(*Decl, *ErrorMsg) = .{},
/// Keep track of one `@compileLog` callsite per owner Decl.
/// The value is the AST node index offset from the Decl.
compile_log_decls: std.AutoArrayHashMapUnmanaged(*Decl, i32) = .{},
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Using a map here for consistency with the other fields here.
/// The ErrorMsg memory is owned by the `File`, using Module's general purpose allocator.
failed_files: std.AutoArrayHashMapUnmanaged(*File, ?*ErrorMsg) = .{},
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Using a map here for consistency with the other fields here.
/// The ErrorMsg memory is owned by the `Export`, using Module's general purpose allocator.
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
failed_exports: std.AutoArrayHashMapUnmanaged(*Export, *ErrorMsg) = .{},
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
stage2: type declarations ZIR encode AnonNameStrategy which can be either parent, func, or anon. Here's the enum reproduced in the commit message for convenience: ```zig pub const NameStrategy = enum(u2) { /// Use the same name as the parent declaration name. /// e.g. `const Foo = struct {...};`. parent, /// Use the name of the currently executing comptime function call, /// with the current parameters. e.g. `ArrayList(i32)`. func, /// Create an anonymous name for this declaration. /// Like this: "ParentDeclName_struct_69" anon, }; ``` With this information in the ZIR, a future commit can improve the names of structs, unions, enums, and opaques. In order to accomplish this, the following ZIR instruction forms were removed and replaced with Extended op codes: * struct_decl * struct_decl_packed * struct_decl_extern * union_decl * union_decl_packed * union_decl_extern * enum_decl * enum_decl_nonexhaustive By being extended opcodes, one more u32 is needed, however we more than make up for it by repurposing the 16 "small" bits to provide shorter encodings for when decls_len == 0, fields_len == 0, a source node is not provided, etc. There tends to be no downside, and in fact sometimes upsides, to using an extended op code when there is a need for flag bits, which is the case for all three of these. Likewise, the container layout can be encoded in these bits rather than into the opcode. The following 4 ZIR instructions were added, netting a total of 4 freed up ZIR enum tags for future use: * opaque_decl_anon * opaque_decl_func * error_set_decl_anon * error_set_decl_func This is so that opaques and error sets can have the same name hint as structs, enums, and unions. `std.builtin.ContainerLayout` gets an explicit integer tag type so that it can be used inside packed structs. This commit also makes `Module.Namespace` use a separate set for anonymous decls, thus allowing anonymous decls to share the same `Decl.name` as their owner `Decl` objects.
2021-05-10 21:34:43 -07:00
next_anon_name_index: usize = 0,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Candidates for deletion. After a semantic analysis update completes, this list
/// contains Decls that need to be deleted if they end up having no references to them.
deletion_set: std.AutoArrayHashMapUnmanaged(*Decl, void) = .{},
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Error tags and their values, tag names are duped with mod.gpa.
2021-03-28 19:38:19 -07:00
/// Corresponds with `error_name_list`.
global_error_set: std.StringHashMapUnmanaged(ErrorInt) = .{},
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
2021-03-28 19:38:19 -07:00
/// ErrorInt -> []const u8 for fast lookups for @intToError at comptime
/// Corresponds with `global_error_set`.
error_name_list: ArrayListUnmanaged([]const u8) = .{},
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Incrementing integer used to compare against the corresponding Decl
/// field to determine whether a Decl's status applies to an ongoing update, or a
/// previous analysis.
generation: u32 = 0,
stage1_flags: packed struct {
have_winmain: bool = false,
have_wwinmain: bool = false,
have_winmain_crt_startup: bool = false,
have_wwinmain_crt_startup: bool = false,
have_dllmain_crt_startup: bool = false,
have_c_main: bool = false,
reserved: u2 = 0,
} = .{},
job_queued_update_builtin_zig: bool = true,
compile_log_text: ArrayListUnmanaged(u8) = .{},
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
emit_h: ?*GlobalEmitH,
test_functions: std.AutoArrayHashMapUnmanaged(*Decl, void) = .{},
const MonomorphedFuncsSet = std.HashMapUnmanaged(
*Fn,
void,
MonomorphedFuncsContext,
std.hash_map.default_max_load_percentage,
);
const MonomorphedFuncsContext = struct {
pub fn eql(ctx: @This(), a: *Fn, b: *Fn) bool {
_ = ctx;
return a == b;
}
/// Must match `Sema.GenericCallAdapter.hash`.
pub fn hash(ctx: @This(), key: *Fn) u64 {
_ = ctx;
var hasher = std.hash.Wyhash.init(0);
// The generic function Decl is guaranteed to be the first dependency
// of each of its instantiations.
const generic_owner_decl = key.owner_decl.dependencies.keys()[0];
const generic_func = generic_owner_decl.val.castTag(.function).?.data;
std.hash.autoHash(&hasher, @ptrToInt(generic_func));
// This logic must be kept in sync with the logic in `analyzeCall` that
// computes the hash.
const comptime_args = key.comptime_args.?;
const generic_ty_info = generic_owner_decl.ty.fnInfo();
for (generic_ty_info.param_types) |param_ty, i| {
if (generic_ty_info.paramIsComptime(i) and param_ty.tag() != .generic_poison) {
comptime_args[i].val.hash(param_ty, &hasher);
}
}
return hasher.final();
}
};
pub const MemoizedCallSet = std.HashMapUnmanaged(
MemoizedCall.Key,
MemoizedCall.Result,
MemoizedCall,
std.hash_map.default_max_load_percentage,
);
pub const MemoizedCall = struct {
pub const Key = struct {
func: *Fn,
args: []TypedValue,
};
pub const Result = struct {
val: Value,
arena: std.heap.ArenaAllocator.State,
};
pub fn eql(ctx: @This(), a: Key, b: Key) bool {
_ = ctx;
if (a.func != b.func) return false;
assert(a.args.len == b.args.len);
for (a.args) |a_arg, arg_i| {
const b_arg = b.args[arg_i];
if (!a_arg.eql(b_arg)) {
return false;
}
}
return true;
}
/// Must match `Sema.GenericCallAdapter.hash`.
pub fn hash(ctx: @This(), key: Key) u64 {
_ = ctx;
var hasher = std.hash.Wyhash.init(0);
// The generic function Decl is guaranteed to be the first dependency
// of each of its instantiations.
std.hash.autoHash(&hasher, @ptrToInt(key.func));
// This logic must be kept in sync with the logic in `analyzeCall` that
// computes the hash.
for (key.args) |arg| {
arg.hash(&hasher);
}
return hasher.final();
}
};
pub const SetAlignStack = struct {
alignment: u32,
/// TODO: This needs to store a non-lazy source location for the case of an inline function
/// which does `@setAlignStack` (applying it to the caller).
src: LazySrcLoc,
};
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
/// A `Module` has zero or one of these depending on whether `-femit-h` is enabled.
pub const GlobalEmitH = struct {
/// Where to put the output.
loc: Compilation.EmitLoc,
/// When emit_h is non-null, each Decl gets one more compile error slot for
/// emit-h failing for that Decl. This table is also how we tell if a Decl has
/// failed emit-h or succeeded.
failed_decls: std.AutoArrayHashMapUnmanaged(*Decl, *ErrorMsg) = .{},
/// Tracks all decls in order to iterate over them and emit .h code for them.
decl_table: std.AutoArrayHashMapUnmanaged(*Decl, void) = .{},
};
2021-03-28 19:38:19 -07:00
pub const ErrorInt = u32;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
pub const Export = struct {
options: std.builtin.ExportOptions,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
src: LazySrcLoc,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Represents the position of the export, if any, in the output file.
link: link.File.Export,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// The Decl that performs the export. Note that this is *not* the Decl being exported.
owner_decl: *Decl,
/// The Decl containing the export statement. Inline function calls
/// may cause this to be different from the owner_decl.
src_decl: *Decl,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// The Decl being exported. Note this is *not* the Decl performing the export.
exported_decl: *Decl,
status: enum {
in_progress,
failed,
/// Indicates that the failure was due to a temporary issue, such as an I/O error
/// when writing to the output file. Retrying the export may succeed.
failed_retryable,
complete,
},
pub fn getSrcLoc(exp: Export) SrcLoc {
return .{
2021-10-01 12:05:12 -05:00
.file_scope = exp.src_decl.getFileScope(),
.parent_decl_node = exp.src_decl.src_node,
.lazy = exp.src,
};
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
};
/// When Module emit_h field is non-null, each Decl is allocated via this struct, so that
/// there can be EmitH state attached to each Decl.
pub const DeclPlusEmitH = struct {
decl: Decl,
emit_h: EmitH,
};
pub const CaptureScope = struct {
parent: ?*CaptureScope,
/// Values from this decl's evaluation that will be closed over in
/// child decls. Values stored in the value_arena of the linked decl.
/// During sema, this map is backed by the gpa. Once sema completes,
/// it is reallocated using the value_arena.
captures: std.AutoHashMapUnmanaged(Zir.Inst.Index, TypedValue) = .{},
};
pub const WipCaptureScope = struct {
scope: *CaptureScope,
finalized: bool,
gpa: *Allocator,
perm_arena: *Allocator,
pub fn init(gpa: *Allocator, perm_arena: *Allocator, parent: ?*CaptureScope) !@This() {
const scope = try perm_arena.create(CaptureScope);
scope.* = .{ .parent = parent };
return @This(){
.scope = scope,
.finalized = false,
.gpa = gpa,
.perm_arena = perm_arena,
};
}
pub fn finalize(noalias self: *@This()) !void {
assert(!self.finalized);
// use a temp to avoid unintentional aliasing due to RLS
const tmp = try self.scope.captures.clone(self.perm_arena);
self.scope.captures = tmp;
self.finalized = true;
}
pub fn reset(noalias self: *@This(), parent: ?*CaptureScope) !void {
if (!self.finalized) try self.finalize();
self.scope = try self.perm_arena.create(CaptureScope);
self.scope.* = .{ .parent = parent };
self.finalized = false;
}
pub fn deinit(noalias self: *@This()) void {
if (!self.finalized) {
self.scope.captures.deinit(self.gpa);
}
self.* = undefined;
}
};
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
pub const Decl = struct {
2021-05-07 18:52:11 -07:00
/// Allocated with Module's allocator; outlives the ZIR code.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
name: [*:0]const u8,
/// The most recent Type of the Decl after a successful semantic analysis.
/// Populated when `has_tv`.
ty: Type,
/// The most recent Value of the Decl after a successful semantic analysis.
/// Populated when `has_tv`.
val: Value,
/// Populated when `has_tv`.
align_val: Value,
/// Populated when `has_tv`.
linksection_val: Value,
2021-09-02 14:46:31 +02:00
/// Populated when `has_tv`.
@"addrspace": std.builtin.AddressSpace,
/// The memory for ty, val, align_val, linksection_val, and captures.
/// If this is `null` then there is no memory management needed.
value_arena: ?*std.heap.ArenaAllocator.State = null,
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
/// The direct parent namespace of the Decl.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Reference to externally owned memory.
/// In the case of the Decl corresponding to a file, this is
/// the namespace of the struct, since there is no parent.
src_namespace: *Namespace,
/// The scope which lexically contains this decl. A decl must depend
/// on its lexical parent, in order to ensure that this pointer is valid.
/// This scope is allocated out of the arena of the parent decl.
src_scope: ?*CaptureScope,
/// An integer that can be checked against the corresponding incrementing
/// generation field of Module. This is used to determine whether `complete` status
/// represents pre- or post- re-analysis.
generation: u32,
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
/// The AST node index of this declaration.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Must be recomputed when the corresponding source file is modified.
src_node: Ast.Node.Index,
/// Line number corresponding to `src_node`. Stored separately so that source files
/// do not need to be loaded into memory in order to compute debug line numbers.
src_line: u32,
/// Index to ZIR `extra` array to the entry in the parent's decl structure
/// (the part that says "for every decls_len"). The first item at this index is
/// the contents hash, followed by line, name, etc.
/// For anonymous decls and also the root Decl for a File, this is 0.
zir_decl_index: Zir.Inst.Index,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Represents the "shallow" analysis status. For example, for decls that are functions,
/// the function type is analyzed with this set to `in_progress`, however, the semantic
/// analysis of the function body is performed with this value set to `success`. Functions
/// have their own analysis status field.
analysis: enum {
/// This Decl corresponds to an AST Node that has not been referenced yet, and therefore
/// because of Zig's lazy declaration analysis, it will remain unanalyzed until referenced.
unreferenced,
/// Semantic analysis for this Decl is running right now.
/// This state detects dependency loops.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
in_progress,
/// The file corresponding to this Decl had a parse error or ZIR error.
/// There will be a corresponding ErrorMsg in Module.failed_files.
file_failure,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// This Decl might be OK but it depends on another one which did not successfully complete
/// semantic analysis.
dependency_failure,
/// Semantic analysis failure.
/// There will be a corresponding ErrorMsg in Module.failed_decls.
sema_failure,
/// There will be a corresponding ErrorMsg in Module.failed_decls.
/// This indicates the failure was something like running out of disk space,
/// and attempting semantic analysis again may succeed.
sema_failure_retryable,
/// There will be a corresponding ErrorMsg in Module.failed_decls.
codegen_failure,
/// There will be a corresponding ErrorMsg in Module.failed_decls.
/// This indicates the failure was something like running out of disk space,
/// and attempting codegen again may succeed.
codegen_failure_retryable,
/// Everything is done. During an update, this Decl may be out of date, depending
/// on its dependencies. The `generation` field can be used to determine if this
/// completion status occurred before or after a given update.
complete,
/// A Module update is in progress, and this Decl has been flagged as being known
/// to require re-analysis.
outdated,
},
/// Whether `typed_value`, `align_val`, `linksection_val` and `addrspace` are populated.
has_tv: bool,
/// If `true` it means the `Decl` is the resource owner of the type/value associated
/// with it. That means when `Decl` is destroyed, the cleanup code should additionally
/// check if the value owns a `Namespace`, and destroy that too.
owns_tv: bool,
/// This flag is set when this Decl is added to `Module.deletion_set`, and cleared
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// when removed.
deletion_flag: bool,
/// Whether the corresponding AST decl has a `pub` keyword.
is_pub: bool,
/// Whether the corresponding AST decl has a `export` keyword.
is_exported: bool,
/// Whether the ZIR code provides an align instruction.
has_align: bool,
2021-09-02 14:46:31 +02:00
/// Whether the ZIR code provides a linksection and address space instruction.
has_linksection_or_addrspace: bool,
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
/// Flag used by garbage collection to mark and sweep.
/// Decls which correspond to an AST node always have this field set to `true`.
/// Anonymous Decls are initialized with this field set to `false` and then it
/// is the responsibility of machine code backends to mark it `true` whenever
/// a `decl_ref` Value is encountered that points to this Decl.
/// When the `codegen_decl` job is encountered in the main work queue, if the
/// Decl is marked alive, then it sends the Decl to the linker. Otherwise it
/// deletes the Decl on the spot.
alive: bool,
/// Whether the Decl is a `usingnamespace` declaration.
is_usingnamespace: bool,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Represents the position of the code in the output file.
/// This is populated regardless of semantic analysis and code generation.
link: link.File.LinkBlock,
/// Represents the function in the linked output file, if the `Decl` is a function.
/// This is stored here and not in `Fn` because `Decl` survives across updates but
/// `Fn` does not.
/// TODO Look into making `Fn` a longer lived structure and moving this field there
/// to save on memory usage.
fn_link: link.File.LinkFn,
/// The shallow set of other decls whose typed_value could possibly change if this Decl's
/// typed_value is modified.
dependants: DepsTable = .{},
/// The shallow set of other decls whose typed_value changing indicates that this Decl's
/// typed_value may need to be regenerated.
dependencies: DepsTable = .{},
pub const DepsTable = std.AutoArrayHashMapUnmanaged(*Decl, void);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
pub fn clearName(decl: *Decl, gpa: *Allocator) void {
2021-05-07 18:52:11 -07:00
gpa.free(mem.spanZ(decl.name));
decl.name = undefined;
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
pub fn destroy(decl: *Decl, module: *Module) void {
const gpa = module.gpa;
2021-05-07 16:05:44 -07:00
log.debug("destroy {*} ({s})", .{ decl, decl.name });
_ = module.test_functions.swapRemove(decl);
if (decl.deletion_flag) {
assert(module.deletion_set.swapRemove(decl));
}
if (decl.has_tv) {
2021-05-07 18:52:11 -07:00
if (decl.getInnerNamespace()) |namespace| {
namespace.destroyDecls(module);
}
decl.clearValues(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
decl.dependants.deinit(gpa);
decl.dependencies.deinit(gpa);
decl.clearName(gpa);
if (module.emit_h != null) {
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
const decl_plus_emit_h = @fieldParentPtr(DeclPlusEmitH, "decl", decl);
decl_plus_emit_h.emit_h.fwd_decl.deinit(gpa);
gpa.destroy(decl_plus_emit_h);
} else {
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
gpa.destroy(decl);
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
pub fn clearValues(decl: *Decl, gpa: *Allocator) void {
if (decl.getFunction()) |func| {
func.deinit(gpa);
gpa.destroy(func);
}
2021-05-07 18:52:11 -07:00
if (decl.getVariable()) |variable| {
gpa.destroy(variable);
}
if (decl.value_arena) |arena_state| {
arena_state.promote(gpa).deinit();
decl.value_arena = null;
decl.has_tv = false;
decl.owns_tv = false;
}
}
pub fn finalizeNewArena(decl: *Decl, arena: *std.heap.ArenaAllocator) !void {
assert(decl.value_arena == null);
const arena_state = try arena.allocator.create(std.heap.ArenaAllocator.State);
arena_state.* = arena.state;
decl.value_arena = arena_state;
}
/// This name is relative to the containing namespace of the decl.
/// The memory is owned by the containing File ZIR.
pub fn getName(decl: Decl) ?[:0]const u8 {
2021-10-01 12:05:12 -05:00
const zir = decl.getFileScope().zir;
return decl.getNameZir(zir);
}
pub fn getNameZir(decl: Decl, zir: Zir) ?[:0]const u8 {
assert(decl.zir_decl_index != 0);
const name_index = zir.extra[decl.zir_decl_index + 5];
if (name_index <= 1) return null;
return zir.nullTerminatedString(name_index);
}
pub fn contentsHash(decl: Decl) std.zig.SrcHash {
2021-10-01 12:05:12 -05:00
const zir = decl.getFileScope().zir;
return decl.contentsHashZir(zir);
}
pub fn contentsHashZir(decl: Decl, zir: Zir) std.zig.SrcHash {
assert(decl.zir_decl_index != 0);
const hash_u32s = zir.extra[decl.zir_decl_index..][0..4];
const contents_hash = @bitCast(std.zig.SrcHash, hash_u32s.*);
return contents_hash;
}
pub fn zirBlockIndex(decl: *const Decl) Zir.Inst.Index {
assert(decl.zir_decl_index != 0);
2021-10-01 12:05:12 -05:00
const zir = decl.getFileScope().zir;
return zir.extra[decl.zir_decl_index + 6];
}
pub fn zirAlignRef(decl: Decl) Zir.Inst.Ref {
if (!decl.has_align) return .none;
assert(decl.zir_decl_index != 0);
2021-10-01 12:05:12 -05:00
const zir = decl.getFileScope().zir;
2021-09-02 14:46:31 +02:00
return @intToEnum(Zir.Inst.Ref, zir.extra[decl.zir_decl_index + 7]);
}
pub fn zirLinksectionRef(decl: Decl) Zir.Inst.Ref {
2021-09-02 14:46:31 +02:00
if (!decl.has_linksection_or_addrspace) return .none;
assert(decl.zir_decl_index != 0);
2021-10-01 12:05:12 -05:00
const zir = decl.getFileScope().zir;
2021-09-02 14:46:31 +02:00
const extra_index = decl.zir_decl_index + 7 + @boolToInt(decl.has_align);
return @intToEnum(Zir.Inst.Ref, zir.extra[extra_index]);
}
pub fn zirAddrspaceRef(decl: Decl) Zir.Inst.Ref {
if (!decl.has_linksection_or_addrspace) return .none;
assert(decl.zir_decl_index != 0);
2021-10-01 12:05:12 -05:00
const zir = decl.getFileScope().zir;
2021-09-02 14:46:31 +02:00
const extra_index = decl.zir_decl_index + 7 + @boolToInt(decl.has_align) + 1;
return @intToEnum(Zir.Inst.Ref, zir.extra[extra_index]);
}
/// Returns true if and only if the Decl is the top level struct associated with a File.
pub fn isRoot(decl: *const Decl) bool {
2021-10-01 12:05:12 -05:00
if (decl.src_namespace.parent != null)
return false;
2021-10-01 12:05:12 -05:00
return decl == decl.src_namespace.getDecl();
}
pub fn relativeToLine(decl: Decl, offset: u32) u32 {
return decl.src_line + offset;
}
pub fn relativeToNodeIndex(decl: Decl, offset: i32) Ast.Node.Index {
return @bitCast(Ast.Node.Index, offset + @bitCast(i32, decl.src_node));
2021-03-20 17:09:06 -07:00
}
pub fn nodeIndexToRelative(decl: Decl, node_index: Ast.Node.Index) i32 {
return @bitCast(i32, node_index) - @bitCast(i32, decl.src_node);
2021-03-20 17:09:06 -07:00
}
pub fn tokSrcLoc(decl: Decl, token_index: Ast.TokenIndex) LazySrcLoc {
return .{ .token_offset = token_index - decl.srcToken() };
}
pub fn nodeSrcLoc(decl: Decl, node_index: Ast.Node.Index) LazySrcLoc {
2021-03-20 17:09:06 -07:00
return .{ .node_offset = decl.nodeIndexToRelative(node_index) };
}
pub fn srcLoc(decl: Decl) SrcLoc {
return decl.nodeOffsetSrcLoc(0);
}
pub fn nodeOffsetSrcLoc(decl: Decl, node_offset: i32) SrcLoc {
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
return .{
.file_scope = decl.getFileScope(),
.parent_decl_node = decl.src_node,
.lazy = .{ .node_offset = node_offset },
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
};
}
pub fn srcToken(decl: Decl) Ast.TokenIndex {
2021-10-01 12:05:12 -05:00
const tree = &decl.getFileScope().tree;
return tree.firstToken(decl.src_node);
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
}
pub fn srcByteOffset(decl: Decl) u32 {
2021-10-01 12:05:12 -05:00
const tree = &decl.getFileScope().tree;
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
return tree.tokens.items(.start)[decl.srcToken()];
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
pub fn renderFullyQualifiedName(decl: Decl, writer: anytype) !void {
const unqualified_name = mem.spanZ(decl.name);
return decl.src_namespace.renderFullyQualifiedName(unqualified_name, writer);
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
}
pub fn renderFullyQualifiedDebugName(decl: Decl, writer: anytype) !void {
const unqualified_name = mem.spanZ(decl.name);
return decl.src_namespace.renderFullyQualifiedDebugName(unqualified_name, writer);
2021-09-30 21:43:30 -05:00
}
pub fn getFullyQualifiedName(decl: Decl, gpa: *Allocator) ![:0]u8 {
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
var buffer = std.ArrayList(u8).init(gpa);
defer buffer.deinit();
try decl.renderFullyQualifiedName(buffer.writer());
return buffer.toOwnedSliceSentinel(0);
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
}
pub fn typedValue(decl: Decl) error{AnalysisFail}!TypedValue {
if (!decl.has_tv) return error.AnalysisFail;
return TypedValue{
.ty = decl.ty,
.val = decl.val,
};
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
pub fn value(decl: *Decl) error{AnalysisFail}!Value {
return (try decl.typedValue()).val;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
pub fn isFunction(decl: *Decl) !bool {
const tv = try decl.typedValue();
return tv.ty.zigTypeTag() == .Fn;
}
/// If the Decl has a value and it is a struct, return it,
/// otherwise null.
pub fn getStruct(decl: *Decl) ?*Struct {
if (!decl.owns_tv) return null;
const ty = (decl.val.castTag(.ty) orelse return null).data;
const struct_obj = (ty.castTag(.@"struct") orelse return null).data;
assert(struct_obj.owner_decl == decl);
return struct_obj;
}
/// If the Decl has a value and it is a union, return it,
/// otherwise null.
pub fn getUnion(decl: *Decl) ?*Union {
if (!decl.owns_tv) return null;
const ty = (decl.val.castTag(.ty) orelse return null).data;
const union_obj = (ty.cast(Type.Payload.Union) orelse return null).data;
assert(union_obj.owner_decl == decl);
return union_obj;
}
/// If the Decl has a value and it is a function, return it,
/// otherwise null.
pub fn getFunction(decl: *const Decl) ?*Fn {
if (!decl.owns_tv) return null;
const func = (decl.val.castTag(.function) orelse return null).data;
assert(func.owner_decl == decl);
return func;
}
2021-05-07 18:52:11 -07:00
pub fn getVariable(decl: *Decl) ?*Var {
if (!decl.owns_tv) return null;
2021-05-07 18:52:11 -07:00
const variable = (decl.val.castTag(.variable) orelse return null).data;
assert(variable.owner_decl == decl);
2021-05-07 18:52:11 -07:00
return variable;
}
/// Gets the namespace that this Decl creates by being a struct, union,
/// enum, or opaque.
/// Only returns it if the Decl is the owner.
pub fn getInnerNamespace(decl: *Decl) ?*Namespace {
if (!decl.owns_tv) return null;
2021-05-07 18:52:11 -07:00
const ty = (decl.val.castTag(.ty) orelse return null).data;
switch (ty.tag()) {
.@"struct" => {
const struct_obj = ty.castTag(.@"struct").?.data;
assert(struct_obj.owner_decl == decl);
2021-05-07 18:52:11 -07:00
return &struct_obj.namespace;
},
stage2: field type expressions support referencing locals The big change in this commit is making `semaDecl` resolve the fields if the Decl ends up being a struct or union. It needs to do this while the `Sema` is still in scope, because it will have the resolved AIR instructions that the field type expressions possibly reference. We do this after the decl is populated and set to `complete` so that a `Decl` may reference itself. Everything else is fixes and improvements to make the test suite pass again after making this change. * New AIR instruction: `ptr_elem_ptr` - Implemented for LLVM backend * New Type tag: `type_info` which represents `std.builtin.TypeInfo`. It is used by AstGen for the operand type of `@Type`. * ZIR instruction `set_float_mode` uses `coerced_ty` to avoid superfluous `as` instruction on operand. * ZIR instruction `Type` uses `coerced_ty` to properly handle result location type of operand. * Fix two instances of `enum_nonexhaustive` Value Tag not handled properly - it should generally be handled the same as `enum_full`. * Fix struct and union field resolution not copying Type and Value objects into its Decl arena. * Fix enum tag value resolution discarding the ZIR=>AIR instruction map for the child Sema, when they still needed to be accessed. * Fix `zirResolveInferredAlloc` use-after-free in the AIR instructions data array. * Fix `elemPtrArray` not respecting const/mutable attribute of pointer in the result type. * Fix LLVM backend crashing when `updateDeclExports` is called before `updateDecl`/`updateFunc` (which is, according to the API, perfectly legal for the frontend to do). * Fix LLVM backend handling element pointer of pointer-to-array. It needed another index in the GEP otherwise LLVM saw the wrong type. * Fix LLVM test cases not returning 0 from main, causing test failures. Fixes a regression introduced in 6a5094872f10acc629543cc7f10533b438d0283a. * Implement comptime shift-right. * Implement `@Type` for integers and `@TypeInfo` for integers. * Implement union initialization syntax. * Implement `zirFieldType` for unions. * Implement `elemPtrArray` for a runtime-known operand. * Make `zirLog2IntType` support RHS of shift being `comptime_int`. In this case it returns `comptime_int`. The motivating test case for this commit was originally: ```zig test "example" { var l: List(10) = undefined; l.array[1] = 1; } fn List(comptime L: usize) type { var T = u8; return struct { array: [L]T, }; } ``` However I changed it to: ```zig test "example" { var l: List = undefined; l.array[1] = 1; } const List = blk: { const T = [10]u8; break :blk struct { array: T, }; }; ``` Which ended up being a similar, smaller problem. The former test case will require a similar solution in the implementation of comptime function calls - checking if the result of the function call is a struct or union, and using the child `Sema` before it is destroyed to resolve the fields.
2021-08-20 15:23:55 -07:00
.enum_full, .enum_nonexhaustive => {
const enum_obj = ty.cast(Type.Payload.EnumFull).?.data;
assert(enum_obj.owner_decl == decl);
2021-05-07 18:52:11 -07:00
return &enum_obj.namespace;
},
.empty_struct => {
return ty.castTag(.empty_struct).?.data;
2021-05-07 18:52:11 -07:00
},
.@"opaque" => {
const opaque_obj = ty.cast(Type.Payload.Opaque).?.data;
assert(opaque_obj.owner_decl == decl);
return &opaque_obj.namespace;
2021-05-07 18:52:11 -07:00
},
.@"union", .union_tagged => {
const union_obj = ty.cast(Type.Payload.Union).?.data;
assert(union_obj.owner_decl == decl);
2021-05-07 18:52:11 -07:00
return &union_obj.namespace;
},
else => return null,
}
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
pub fn dump(decl: *Decl) void {
const loc = std.zig.findLineColumn(decl.scope.source.bytes, decl.src);
2021-01-02 19:03:14 -07:00
std.debug.print("{s}:{d}:{d} name={s} status={s}", .{
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
decl.scope.sub_file_path,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
loc.line + 1,
loc.column + 1,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mem.spanZ(decl.name),
@tagName(decl.analysis),
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
});
if (decl.has_tv) {
std.debug.print(" ty={} val={}", .{ decl.ty, decl.val });
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
std.debug.print("\n", .{});
}
pub fn getFileScope(decl: Decl) *File {
2021-10-01 12:05:12 -05:00
return decl.src_namespace.file_scope;
}
pub fn getEmitH(decl: *Decl, module: *Module) *EmitH {
assert(module.emit_h != null);
const decl_plus_emit_h = @fieldParentPtr(DeclPlusEmitH, "decl", decl);
return &decl_plus_emit_h.emit_h;
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
fn removeDependant(decl: *Decl, other: *Decl) void {
assert(decl.dependants.swapRemove(other));
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
fn removeDependency(decl: *Decl, other: *Decl) void {
assert(decl.dependencies.swapRemove(other));
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
};
/// This state is attached to every Decl when Module emit_h is non-null.
pub const EmitH = struct {
fwd_decl: ArrayListUnmanaged(u8) = .{},
};
2021-03-28 19:38:19 -07:00
/// Represents the data that an explicit error set syntax provides.
pub const ErrorSet = struct {
/// The Decl that corresponds to the error set itself.
2021-03-28 19:38:19 -07:00
owner_decl: *Decl,
/// Offset from Decl node index, points to the error set AST node.
node_offset: i32,
names_len: u32,
/// The string bytes are stored in the owner Decl arena.
/// They are in the same order they appear in the AST.
2021-05-04 13:58:08 -07:00
/// The length is given by `names_len`.
2021-03-28 19:38:19 -07:00
names_ptr: [*]const []const u8,
pub fn srcLoc(self: ErrorSet) SrcLoc {
return .{
.file_scope = self.owner_decl.getFileScope(),
.parent_decl_node = self.owner_decl.src_node,
.lazy = .{ .node_offset = self.node_offset },
};
}
2021-03-28 19:38:19 -07:00
};
/// Represents the data that a struct declaration provides.
pub const Struct = struct {
/// The Decl that corresponds to the struct itself.
owner_decl: *Decl,
/// Set of field names in declaration order.
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
fields: Fields,
/// Represents the declarations inside this struct.
namespace: Namespace,
/// Offset from `owner_decl`, points to the struct AST node.
node_offset: i32,
/// Index of the struct_decl ZIR instruction.
zir_index: Zir.Inst.Index,
layout: std.builtin.TypeInfo.ContainerLayout,
status: enum {
none,
field_types_wip,
have_field_types,
layout_wip,
have_layout,
},
/// If true, definitely nonzero size at runtime. If false, resolving the fields
/// is necessary to determine whether it has bits at runtime.
known_has_bits: bool,
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
pub const Fields = std.StringArrayHashMapUnmanaged(Field);
stage2: field type expressions support referencing locals The big change in this commit is making `semaDecl` resolve the fields if the Decl ends up being a struct or union. It needs to do this while the `Sema` is still in scope, because it will have the resolved AIR instructions that the field type expressions possibly reference. We do this after the decl is populated and set to `complete` so that a `Decl` may reference itself. Everything else is fixes and improvements to make the test suite pass again after making this change. * New AIR instruction: `ptr_elem_ptr` - Implemented for LLVM backend * New Type tag: `type_info` which represents `std.builtin.TypeInfo`. It is used by AstGen for the operand type of `@Type`. * ZIR instruction `set_float_mode` uses `coerced_ty` to avoid superfluous `as` instruction on operand. * ZIR instruction `Type` uses `coerced_ty` to properly handle result location type of operand. * Fix two instances of `enum_nonexhaustive` Value Tag not handled properly - it should generally be handled the same as `enum_full`. * Fix struct and union field resolution not copying Type and Value objects into its Decl arena. * Fix enum tag value resolution discarding the ZIR=>AIR instruction map for the child Sema, when they still needed to be accessed. * Fix `zirResolveInferredAlloc` use-after-free in the AIR instructions data array. * Fix `elemPtrArray` not respecting const/mutable attribute of pointer in the result type. * Fix LLVM backend crashing when `updateDeclExports` is called before `updateDecl`/`updateFunc` (which is, according to the API, perfectly legal for the frontend to do). * Fix LLVM backend handling element pointer of pointer-to-array. It needed another index in the GEP otherwise LLVM saw the wrong type. * Fix LLVM test cases not returning 0 from main, causing test failures. Fixes a regression introduced in 6a5094872f10acc629543cc7f10533b438d0283a. * Implement comptime shift-right. * Implement `@Type` for integers and `@TypeInfo` for integers. * Implement union initialization syntax. * Implement `zirFieldType` for unions. * Implement `elemPtrArray` for a runtime-known operand. * Make `zirLog2IntType` support RHS of shift being `comptime_int`. In this case it returns `comptime_int`. The motivating test case for this commit was originally: ```zig test "example" { var l: List(10) = undefined; l.array[1] = 1; } fn List(comptime L: usize) type { var T = u8; return struct { array: [L]T, }; } ``` However I changed it to: ```zig test "example" { var l: List = undefined; l.array[1] = 1; } const List = blk: { const T = [10]u8; break :blk struct { array: T, }; }; ``` Which ended up being a similar, smaller problem. The former test case will require a similar solution in the implementation of comptime function calls - checking if the result of the function call is a struct or union, and using the child `Sema` before it is destroyed to resolve the fields.
2021-08-20 15:23:55 -07:00
/// The `Type` and `Value` memory is owned by the arena of the Struct's owner_decl.
pub const Field = struct {
/// Uses `noreturn` to indicate `anytype`.
/// undefined until `status` is `have_field_types` or `have_layout`.
ty: Type,
abi_align: Value,
/// Uses `unreachable_value` to indicate no default.
default_val: Value,
/// undefined until `status` is `have_layout`.
offset: u32,
is_comptime: bool,
};
pub fn getFullyQualifiedName(s: *Struct, gpa: *Allocator) ![:0]u8 {
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
return s.owner_decl.getFullyQualifiedName(gpa);
}
pub fn srcLoc(s: Struct) SrcLoc {
return .{
.file_scope = s.owner_decl.getFileScope(),
.parent_decl_node = s.owner_decl.src_node,
.lazy = .{ .node_offset = s.node_offset },
};
}
pub fn haveFieldTypes(s: Struct) bool {
return switch (s.status) {
.none,
.field_types_wip,
=> false,
.have_field_types,
.layout_wip,
.have_layout,
=> true,
};
}
};
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
/// Represents the data that an enum declaration provides, when the fields
/// are auto-numbered, and there are no declarations. The integer tag type
/// is inferred to be the smallest power of two unsigned int that fits
/// the number of fields.
pub const EnumSimple = struct {
/// The Decl that corresponds to the enum itself.
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
owner_decl: *Decl,
/// Set of field names in declaration order.
fields: std.StringArrayHashMapUnmanaged(void),
/// Offset from `owner_decl`, points to the enum decl AST node.
node_offset: i32,
pub fn srcLoc(self: EnumSimple) SrcLoc {
return .{
.file_scope = self.owner_decl.getFileScope(),
.parent_decl_node = self.owner_decl.src_node,
.lazy = .{ .node_offset = self.node_offset },
};
}
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
};
/// Represents the data that an enum declaration provides, when there are no
/// declarations. However an integer tag type is provided, and the enum tag values
/// are explicitly provided.
pub const EnumNumbered = struct {
/// The Decl that corresponds to the enum itself.
owner_decl: *Decl,
/// An integer type which is used for the numerical value of the enum.
/// Whether zig chooses this type or the user specifies it, it is stored here.
tag_ty: Type,
/// Set of field names in declaration order.
fields: NameMap,
/// Maps integer tag value to field index.
/// Entries are in declaration order, same as `fields`.
/// If this hash map is empty, it means the enum tags are auto-numbered.
values: ValueMap,
/// Offset from `owner_decl`, points to the enum decl AST node.
node_offset: i32,
pub const NameMap = EnumFull.NameMap;
pub const ValueMap = EnumFull.ValueMap;
pub fn srcLoc(self: EnumNumbered) SrcLoc {
return .{
.file_scope = self.owner_decl.getFileScope(),
.parent_decl_node = self.owner_decl.src_node,
.lazy = .{ .node_offset = self.node_offset },
};
}
};
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
/// Represents the data that an enum declaration provides, when there is
/// at least one tag value explicitly specified, or at least one declaration.
pub const EnumFull = struct {
/// The Decl that corresponds to the enum itself.
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
owner_decl: *Decl,
/// An integer type which is used for the numerical value of the enum.
/// Whether zig chooses this type or the user specifies it, it is stored here.
tag_ty: Type,
/// Set of field names in declaration order.
fields: NameMap,
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
/// Maps integer tag value to field index.
/// Entries are in declaration order, same as `fields`.
/// If this hash map is empty, it means the enum tags are auto-numbered.
values: ValueMap,
/// Represents the declarations inside this enum.
namespace: Namespace,
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
/// Offset from `owner_decl`, points to the enum decl AST node.
node_offset: i32,
pub const NameMap = std.StringArrayHashMapUnmanaged(void);
pub const ValueMap = std.ArrayHashMapUnmanaged(Value, void, Value.ArrayHashContext, false);
pub fn srcLoc(self: EnumFull) SrcLoc {
return .{
.file_scope = self.owner_decl.getFileScope(),
.parent_decl_node = self.owner_decl.src_node,
.lazy = .{ .node_offset = self.node_offset },
};
}
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
};
pub const Union = struct {
/// The Decl that corresponds to the union itself.
owner_decl: *Decl,
/// An enum type which is used for the tag of the union.
/// This type is created even for untagged unions, even when the memory
/// layout does not store the tag.
/// Whether zig chooses this type or the user specifies it, it is stored here.
/// This will be set to the null type until status is `have_field_types`.
tag_ty: Type,
/// Set of field names in declaration order.
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
fields: Fields,
/// Represents the declarations inside this union.
namespace: Namespace,
/// Offset from `owner_decl`, points to the union decl AST node.
node_offset: i32,
/// Index of the union_decl ZIR instruction.
zir_index: Zir.Inst.Index,
layout: std.builtin.TypeInfo.ContainerLayout,
status: enum {
none,
field_types_wip,
have_field_types,
layout_wip,
have_layout,
},
pub const Field = struct {
/// undefined until `status` is `have_field_types` or `have_layout`.
ty: Type,
abi_align: Value,
};
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
pub const Fields = std.StringArrayHashMapUnmanaged(Field);
pub fn getFullyQualifiedName(s: *Union, gpa: *Allocator) ![:0]u8 {
return s.owner_decl.getFullyQualifiedName(gpa);
}
pub fn srcLoc(self: Union) SrcLoc {
return .{
.file_scope = self.owner_decl.getFileScope(),
.parent_decl_node = self.owner_decl.src_node,
.lazy = .{ .node_offset = self.node_offset },
};
}
pub fn haveFieldTypes(u: Union) bool {
return switch (u.status) {
.none,
.field_types_wip,
=> false,
.have_field_types,
.layout_wip,
.have_layout,
=> true,
};
}
pub fn hasAllZeroBitFieldTypes(u: Union) bool {
assert(u.haveFieldTypes());
for (u.fields.values()) |field| {
if (field.ty.hasCodeGenBits()) return false;
}
return true;
}
pub fn mostAlignedField(u: Union, target: Target) u32 {
assert(u.haveFieldTypes());
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
var most_alignment: u32 = 0;
var most_index: usize = undefined;
for (u.fields.values()) |field, i| {
if (!field.ty.hasCodeGenBits()) continue;
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
const field_align = a: {
if (field.abi_align.tag() == .abi_align_default) {
break :a field.ty.abiAlignment(target);
} else {
break :a @intCast(u32, field.abi_align.toUnsignedInt());
}
};
if (field_align > most_alignment) {
most_alignment = field_align;
most_index = i;
}
}
return @intCast(u32, most_index);
}
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
pub fn abiAlignment(u: Union, target: Target, have_tag: bool) u32 {
var max_align: u32 = 0;
if (have_tag) max_align = u.tag_ty.abiAlignment(target);
for (u.fields.values()) |field| {
if (!field.ty.hasCodeGenBits()) continue;
const field_align = a: {
if (field.abi_align.tag() == .abi_align_default) {
break :a field.ty.abiAlignment(target);
} else {
break :a @intCast(u32, field.abi_align.toUnsignedInt());
}
};
max_align = @maximum(max_align, field_align);
}
assert(max_align != 0);
return max_align;
}
pub fn abiSize(u: Union, target: Target, have_tag: bool) u64 {
return u.getLayout(target, have_tag).abi_size;
}
pub const Layout = struct {
abi_size: u64,
abi_align: u32,
most_aligned_field: u32,
most_aligned_field_size: u64,
biggest_field: u32,
payload_size: u64,
payload_align: u32,
tag_align: u32,
tag_size: u64,
};
pub fn getLayout(u: Union, target: Target, have_tag: bool) Layout {
assert(u.status == .have_layout);
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
const is_packed = u.layout == .Packed;
if (is_packed) @panic("TODO packed unions");
var most_aligned_field: usize = undefined;
var most_aligned_field_size: u64 = undefined;
var biggest_field: usize = undefined;
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
var payload_size: u64 = 0;
var payload_align: u32 = 0;
for (u.fields.values()) |field, i| {
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
if (!field.ty.hasCodeGenBits()) continue;
const field_align = a: {
if (field.abi_align.tag() == .abi_align_default) {
break :a field.ty.abiAlignment(target);
} else {
break :a @intCast(u32, field.abi_align.toUnsignedInt());
}
};
const field_size = field.ty.abiSize(target);
if (field_size > payload_size) {
payload_size = field_size;
biggest_field = i;
}
if (field_align > payload_align) {
payload_align = field_align;
most_aligned_field = i;
most_aligned_field_size = field_size;
}
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
}
if (!have_tag) return .{
.abi_size = std.mem.alignForwardGeneric(u64, payload_size, payload_align),
.abi_align = payload_align,
.most_aligned_field = @intCast(u32, most_aligned_field),
.most_aligned_field_size = most_aligned_field_size,
.biggest_field = @intCast(u32, biggest_field),
.payload_size = payload_size,
.payload_align = payload_align,
.tag_align = 0,
.tag_size = 0,
};
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
// Put the tag before or after the payload depending on which one's
// alignment is greater.
const tag_size = u.tag_ty.abiSize(target);
const tag_align = u.tag_ty.abiAlignment(target);
var size: u64 = 0;
if (tag_align >= payload_align) {
// {Tag, Payload}
size += tag_size;
size = std.mem.alignForwardGeneric(u64, size, payload_align);
size += payload_size;
size = std.mem.alignForwardGeneric(u64, size, tag_align);
} else {
// {Payload, Tag}
size += payload_size;
size = std.mem.alignForwardGeneric(u64, size, tag_align);
size += tag_size;
size = std.mem.alignForwardGeneric(u64, size, payload_align);
}
return .{
.abi_size = size,
.abi_align = @maximum(tag_align, payload_align),
.most_aligned_field = @intCast(u32, most_aligned_field),
.most_aligned_field_size = most_aligned_field_size,
.biggest_field = @intCast(u32, biggest_field),
.payload_size = payload_size,
.payload_align = payload_align,
.tag_align = tag_align,
.tag_size = tag_size,
};
stage2: support nested structs and arrays and sret * Add AIR instructions: ret_ptr, ret_load - This allows Sema to be blissfully unaware of the backend's decision to implement by-val/by-ref semantics for struct/union/array types. Backends can lower these simply as alloc, load, ret instructions, or they can take advantage of them to use a result pointer. * Add AIR instruction: array_elem_val - Allows for better codegen for `Sema.elemVal`. * Implement calculation of ABI alignment and ABI size for unions. * Before appending the following AIR instructions to a block, resolveTypeLayout is called on the type: - call - return type - ret - return type - store_ptr - elem type * Sema: fix memory leak in `zirArrayInit` and other cleanups to this function. * x86_64: implement the full x86_64 C ABI according to the spec * Type: implement `intInfo` for error sets. * Type: implement `intTagType` for tagged unions. The Zig type tag `Fn` is now used exclusively for function bodies. Function pointers are modeled as `*const T` where `T` is a `Fn` type. * The `call` AIR instruction now allows a function pointer operand as well as a function operand. * Sema now has a coercion from function body to function pointer. * Function type syntax, e.g. `fn()void`, now returns zig tag type of Pointer with child Fn, rather than Fn directly. - I think this should probably be reverted. Will discuss the lang specs before doing this. Idea being that function pointers would need to be specified as `*const fn()void` rather than `fn() void`. LLVM backend: * Enable calling the panic handler (previously this just emitted `@breakpoint()` since the backend could not handle the panic function). * Implement sret * Introduce `isByRef` and implement it for structs and arrays. Types that are `isByRef` are now passed as pointers to functions, and e.g. `elem_val` will return a pointer instead of doing a load. * Move the function type creating code from `resolveLlvmFunction` to `llvmType` where it belongs; now there is only 1 instance of this logic instead of two. * Add the `nonnull` attribute to non-optional pointer parameters. * Fix `resolveGlobalDecl` not using fully-qualified names and not using the `decl_map`. * Implement `genTypedValue` for pointer-like optionals. * Fix memory leak when lowering `block` instruction and OOM occurs. * Implement volatile checks where relevant.
2021-10-11 11:00:32 -07:00
}
};
pub const Opaque = struct {
/// The Decl that corresponds to the opaque itself.
owner_decl: *Decl,
/// Represents the declarations inside this opaque.
namespace: Namespace,
/// Offset from `owner_decl`, points to the opaque decl AST node.
node_offset: i32,
pub fn srcLoc(self: Opaque) SrcLoc {
return .{
.file_scope = self.owner_decl.getFileScope(),
.parent_decl_node = self.owner_decl.src_node,
.lazy = .{ .node_offset = self.node_offset },
};
}
pub fn getFullyQualifiedName(s: *Opaque, gpa: *Allocator) ![:0]u8 {
return s.owner_decl.getFullyQualifiedName(gpa);
}
};
/// Some Fn struct memory is owned by the Decl's TypedValue.Managed arena allocator.
/// Extern functions do not have this data structure; they are represented by
/// the `Decl` only, with a `Value` tag of `extern_fn`.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
pub const Fn = struct {
/// The Decl that corresponds to the function itself.
owner_decl: *Decl,
stage2: basic generic functions are working The general strategy is that Sema will pre-map comptime arguments into the inst_map, and then re-run the block body that contains the `param` and `func` instructions. This re-runs all the parameter type expressions except with comptime values populated. In Sema, param instructions are now handled specially: they detect whether they are comptime-elided or not. If so, they skip putting a value in the inst_map, since it is already pre-populated. If not, then they append to the `fields` field of `Sema` for use with the `func` instruction. So when the block body is re-run, a new function is generated with all the comptime arguments elided, and the new function type has only runtime parameters in it. TODO: give the generated Decls better names than "foo__anon_x". The new function is then added to the work queue to have its body analyzed and a runtime call AIR instruction to the new function is emitted. When the new function gets semantically analyzed, comptime parameters are pre-mapped to the corresponding `comptime_args` values rather than mapped to an `arg` AIR instruction. `comptime_args` is a new field that `Fn` has which is a `TypedValue` for each parameter. This field is non-null for generic function instantiations only. The values are the comptime arguments. For non-comptime parameters, a sentinel value is used. This is because we need to know the information of which parameters are comptime-known. Additionally: * AstGen: align and section expressions are evaluated in the scope that has comptime parameters in it. There are still some TODO items left; see the BRANCH_TODO file.
2021-08-03 22:34:22 -07:00
/// If this is not null, this function is a generic function instantiation, and
stage2: fix generics with non-comptime anytype parameters The `comptime_args` field of Fn has a clarified purpose: For generic function instantiations, there is a `TypedValue` here for each parameter of the function: * Non-comptime parameters are marked with a `generic_poison` for the value. * Non-anytype parameters are marked with a `generic_poison` for the type. Sema now has a `fn_ret_ty` field. Doc comments reproduced here: > When semantic analysis needs to know the return type of the function whose body > is being analyzed, this `Type` should be used instead of going through `func`. > This will correctly handle the case of a comptime/inline function call of a > generic function which uses a type expression for the return type. > The type will be `void` in the case that `func` is `null`. Various places in Sema are modified in accordance with this guidance. Fixed `resolveMaybeUndefVal` not returning `error.GenericPoison` when Value Tag of `generic_poison` is encountered. Fixed generic function memoization incorrect equality checking. The logic now clearly deals properly with any combination of anytype and comptime parameters. Fixed not removing generic function instantiation from the table in case a compile errors in the rest of `call` semantic analysis. This required introduction of yet another adapter which I have called `GenericRemoveAdapter`. This one is nice and simple - it's the same hash function (the same precomputed hash is passed in) but the equality function checks pointers rather than doing any logic. Inline/comptime function calls coerce each argument in accordance with the function parameter type expressions. Likewise the return type expression is evaluated and provided (see `fn_ret_ty` above). There's a new compile error "unable to monomorphize function". It's pretty unhelpful and will need to get improved in the future. It happens when a type expression in a generic function did not end up getting resolved at a callsite. This can happen, for example, if a runtime parameter is attempted to be used where it needed to be comptime known: ```zig fn foo(x: anytype) [x]u8 { _ = x; } ``` In this example, even if we pass a number such as `10` for `x`, it is not marked `comptime`, so `x` will have a runtime known value, making the return type unable to resolve. In the LLVM backend I implement cmp instructions for float types to pass some behavior tests that used floats.
2021-08-06 16:24:39 -07:00
/// there is a `TypedValue` here for each parameter of the function.
/// Non-comptime parameters are marked with a `generic_poison` for the value.
/// Non-anytype parameters are marked with a `generic_poison` for the type.
stage2: basic generic functions are working The general strategy is that Sema will pre-map comptime arguments into the inst_map, and then re-run the block body that contains the `param` and `func` instructions. This re-runs all the parameter type expressions except with comptime values populated. In Sema, param instructions are now handled specially: they detect whether they are comptime-elided or not. If so, they skip putting a value in the inst_map, since it is already pre-populated. If not, then they append to the `fields` field of `Sema` for use with the `func` instruction. So when the block body is re-run, a new function is generated with all the comptime arguments elided, and the new function type has only runtime parameters in it. TODO: give the generated Decls better names than "foo__anon_x". The new function is then added to the work queue to have its body analyzed and a runtime call AIR instruction to the new function is emitted. When the new function gets semantically analyzed, comptime parameters are pre-mapped to the corresponding `comptime_args` values rather than mapped to an `arg` AIR instruction. `comptime_args` is a new field that `Fn` has which is a `TypedValue` for each parameter. This field is non-null for generic function instantiations only. The values are the comptime arguments. For non-comptime parameters, a sentinel value is used. This is because we need to know the information of which parameters are comptime-known. Additionally: * AstGen: align and section expressions are evaluated in the scope that has comptime parameters in it. There are still some TODO items left; see the BRANCH_TODO file.
2021-08-03 22:34:22 -07:00
comptime_args: ?[*]TypedValue = null,
2021-04-30 14:36:02 -07:00
/// The ZIR instruction that is a function instruction. Use this to find
/// the body. We store this rather than the body directly so that when ZIR
/// is regenerated on update(), we can map this to the new corresponding
/// ZIR instruction.
zir_body_inst: Zir.Inst.Index,
/// Relative to owner Decl.
lbrace_line: u32,
/// Relative to owner Decl.
rbrace_line: u32,
lbrace_column: u16,
rbrace_column: u16,
state: Analysis,
is_cold: bool = false,
is_noinline: bool = false,
pub const Analysis = enum {
queued,
/// This function intentionally only has ZIR generated because it is marked
/// inline, which means no runtime version of the function will be generated.
inline_only,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
in_progress,
/// There will be a corresponding ErrorMsg in Module.failed_decls
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
sema_failure,
/// This Fn might be OK but it depends on another Decl which did not
/// successfully complete semantic analysis.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
dependency_failure,
success,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
};
2021-06-19 21:10:22 -04:00
pub fn deinit(func: *Fn, gpa: *Allocator) void {
if (func.getInferredErrorSet()) |map| {
map.deinit(gpa);
}
}
pub fn getInferredErrorSet(func: *Fn) ?*std.StringHashMapUnmanaged(void) {
const ret_ty = func.owner_decl.ty.fnReturnType();
if (ret_ty.tag() == .generic_poison) {
return null;
}
if (ret_ty.zigTypeTag() == .ErrorUnion) {
if (ret_ty.errorUnionSet().castTag(.error_set_inferred)) |payload| {
return &payload.data.map;
}
}
return null;
2021-06-19 21:10:22 -04:00
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
};
pub const Var = struct {
/// if is_extern == true this is undefined
init: Value,
owner_decl: *Decl,
is_extern: bool,
is_mutable: bool,
is_threadlocal: bool,
};
/// The container that structs, enums, unions, and opaques have.
pub const Namespace = struct {
parent: ?*Namespace,
file_scope: *File,
/// Will be a struct, enum, union, or opaque.
ty: Type,
/// Direct children of the namespace. Used during an update to detect
/// which decls have been added/removed from source.
/// Declaration order is preserved via entry order.
/// Key memory is owned by `decl.name`.
/// TODO save memory with https://github.com/ziglang/zig/issues/8619.
/// Anonymous decls are not stored here; they are kept in `anon_decls` instead.
decls: std.StringArrayHashMapUnmanaged(*Decl) = .{},
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
anon_decls: std.AutoArrayHashMapUnmanaged(*Decl, void) = .{},
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Key is usingnamespace Decl itself. To find the namespace being included,
/// the Decl Value has to be resolved as a Type which has a Namespace.
/// Value is whether the usingnamespace decl is marked `pub`.
usingnamespace_set: std.AutoHashMapUnmanaged(*Decl, bool) = .{},
pub fn deinit(ns: *Namespace, mod: *Module) void {
ns.destroyDecls(mod);
ns.* = undefined;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
pub fn destroyDecls(ns: *Namespace, mod: *Module) void {
const gpa = mod.gpa;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
log.debug("destroyDecls {*}", .{ns});
2021-01-28 22:38:25 +02:00
var decls = ns.decls;
ns.decls = .{};
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
var anon_decls = ns.anon_decls;
ns.anon_decls = .{};
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
for (decls.values()) |value| {
value.destroy(mod);
}
decls.deinit(gpa);
for (anon_decls.keys()) |key| {
key.destroy(mod);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
anon_decls.deinit(gpa);
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
pub fn deleteAllDecls(
ns: *Namespace,
mod: *Module,
outdated_decls: ?*std.AutoArrayHashMap(*Decl, void),
) !void {
const gpa = mod.gpa;
log.debug("deleteAllDecls {*}", .{ns});
var decls = ns.decls;
ns.decls = .{};
var anon_decls = ns.anon_decls;
ns.anon_decls = .{};
// TODO rework this code to not panic on OOM.
// (might want to coordinate with the clearDecl function)
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
for (decls.values()) |child_decl| {
mod.clearDecl(child_decl, outdated_decls) catch @panic("out of memory");
child_decl.destroy(mod);
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
}
decls.deinit(gpa);
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
for (anon_decls.keys()) |child_decl| {
mod.clearDecl(child_decl, outdated_decls) catch @panic("out of memory");
child_decl.destroy(mod);
2021-09-30 21:43:30 -05:00
}
anon_decls.deinit(gpa);
}
// This renders e.g. "std.fs.Dir.OpenOptions"
pub fn renderFullyQualifiedName(
ns: Namespace,
name: []const u8,
writer: anytype,
) @TypeOf(writer).Error!void {
if (ns.parent) |parent| {
const decl = ns.getDecl();
try parent.renderFullyQualifiedName(mem.spanZ(decl.name), writer);
} else {
try ns.file_scope.renderFullyQualifiedName(writer);
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
}
if (name.len != 0) {
try writer.writeAll(".");
try writer.writeAll(name);
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
}
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// This renders e.g. "std/fs.zig:Dir.OpenOptions"
pub fn renderFullyQualifiedDebugName(
ns: Namespace,
name: []const u8,
writer: anytype,
) @TypeOf(writer).Error!void {
var separator_char: u8 = '.';
if (ns.parent) |parent| {
const decl = ns.getDecl();
try parent.renderFullyQualifiedDebugName(mem.spanZ(decl.name), writer);
} else {
try ns.file_scope.renderFullyQualifiedDebugName(writer);
separator_char = ':';
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
}
if (name.len != 0) {
try writer.writeByte(separator_char);
try writer.writeAll(name);
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
}
}
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
pub fn getDecl(ns: Namespace) *Decl {
return ns.ty.getOwnerDecl();
}
};
pub const File = struct {
status: enum {
never_loaded,
retryable_failure,
parse_failure,
astgen_failure,
success_zir,
},
source_loaded: bool,
tree_loaded: bool,
zir_loaded: bool,
/// Relative to the owning package's root_src_dir.
/// Memory is stored in gpa, owned by File.
sub_file_path: []const u8,
/// Whether this is populated depends on `source_loaded`.
source: [:0]const u8,
/// Whether this is populated depends on `status`.
stat_size: u64,
/// Whether this is populated depends on `status`.
stat_inode: std.fs.File.INode,
/// Whether this is populated depends on `status`.
stat_mtime: i128,
/// Whether this is populated or not depends on `tree_loaded`.
tree: Ast,
/// Whether this is populated or not depends on `zir_loaded`.
zir: Zir,
/// Package that this file is a part of, managed externally.
pkg: *Package,
/// The Decl of the struct that represents this File.
root_decl: ?*Decl,
/// Used by change detection algorithm, after astgen, contains the
/// set of decls that existed in the previous ZIR but not in the new one.
deleted_decls: std.ArrayListUnmanaged(*Decl) = .{},
/// Used by change detection algorithm, after astgen, contains the
/// set of decls that existed both in the previous ZIR and in the new one,
/// but their source code has been modified.
outdated_decls: std.ArrayListUnmanaged(*Decl) = .{},
/// The most recent successful ZIR for this file, with no errors.
/// This is only populated when a previously successful ZIR
/// newly introduces compile errors during an update. When ZIR is
/// successful, this field is unloaded.
prev_zir: ?*Zir = null,
pub fn unload(file: *File, gpa: *Allocator) void {
file.unloadTree(gpa);
file.unloadSource(gpa);
file.unloadZir(gpa);
}
pub fn unloadTree(file: *File, gpa: *Allocator) void {
if (file.tree_loaded) {
file.tree_loaded = false;
file.tree.deinit(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
pub fn unloadSource(file: *File, gpa: *Allocator) void {
if (file.source_loaded) {
file.source_loaded = false;
gpa.free(file.source);
}
}
pub fn unloadZir(file: *File, gpa: *Allocator) void {
if (file.zir_loaded) {
file.zir_loaded = false;
file.zir.deinit(gpa);
}
}
pub fn deinit(file: *File, mod: *Module) void {
const gpa = mod.gpa;
log.debug("deinit File {s}", .{file.sub_file_path});
file.deleted_decls.deinit(gpa);
file.outdated_decls.deinit(gpa);
if (file.root_decl) |root_decl| {
root_decl.destroy(mod);
}
gpa.free(file.sub_file_path);
file.unload(gpa);
if (file.prev_zir) |prev_zir| {
prev_zir.deinit(gpa);
gpa.destroy(prev_zir);
}
file.* = undefined;
}
pub fn getSource(file: *File, gpa: *Allocator) ![:0]const u8 {
if (file.source_loaded) return file.source;
2021-09-30 21:43:30 -05:00
const root_dir_path = file.pkg.root_src_directory.path orelse ".";
log.debug("File.getSource, not cached. pkgdir={s} sub_file_path={s}", .{
root_dir_path, file.sub_file_path,
});
// Keep track of inode, file size, mtime, hash so we can detect which files
// have been modified when an incremental update is requested.
var f = try file.pkg.root_src_directory.handle.openFile(file.sub_file_path, .{});
defer f.close();
const stat = try f.stat();
if (stat.size > std.math.maxInt(u32))
return error.FileTooBig;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
const source = try gpa.allocSentinel(u8, @intCast(usize, stat.size), 0);
defer if (!file.source_loaded) gpa.free(source);
const amt = try f.readAll(source);
if (amt != stat.size)
return error.UnexpectedEndOfFile;
2021-06-06 21:08:31 +03:00
// Here we do not modify stat fields because this function is the one
// used for error reporting. We need to keep the stat fields stale so that
// astGenFile can know to regenerate ZIR.
stage2: support recursive inline/comptime functions zir.Inst no longer has an `analyzed_inst` field. This is previously how we mapped ZIR to their TZIR counterparts, however with the way inline and comptime function calls work, we can potentially have the same ZIR structure being analyzed by multiple different analyses, such as during a recursive inline function call. This would cause the `analyzed_inst` field to become clobbered. So instead, we use a table to map the instructions to their semantically analyzed counterparts. This will help with multi-threaded compilation as well. Scope.Block.Inlining is split into 2 different layers of "sharedness". The first layer is shared by the whole inline/comptime function call stack. It contains the callsite where something is being inlined and the branch count/quota. The second layer is different per function call but shared by all the blocks within the function being inlined. Add support for debug dumping br and brvoid TZIR instructions. Remove the "unreachable code" error. It was happening even for this case: ```zig if (comptime_condition) return; bar(); // error: unreachable code ``` We will need smarter logic for when it is legal to emit this compile error. Remove the ZIR test cases. These are redundant with other higher level Zig source tests we have, and maintaining support for ZIRModule as a first-class top level abstraction is getting in the way of clean compiler design for the main use case. We will have ZIR/TZIR based test cases someday to help with testing optimization passes and ZIR to TZIR analysis, but as is, these test cases are not accomplishing that, and they are getting in the way.
2021-01-02 22:42:07 -07:00
file.source = source;
file.source_loaded = true;
return source;
}
2021-06-23 14:32:21 -04:00
pub fn getTree(file: *File, gpa: *Allocator) !*const Ast {
if (file.tree_loaded) return &file.tree;
2021-09-19 11:51:32 +03:00
const source = try file.getSource(gpa);
file.tree = try std.zig.parse(gpa, source);
file.tree_loaded = true;
return &file.tree;
}
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
pub fn destroy(file: *File, mod: *Module) void {
const gpa = mod.gpa;
file.deinit(mod);
gpa.destroy(file);
}
pub fn renderFullyQualifiedName(file: File, writer: anytype) !void {
// Convert all the slashes into dots and truncate the extension.
const ext = std.fs.path.extension(file.sub_file_path);
const noext = file.sub_file_path[0 .. file.sub_file_path.len - ext.len];
for (noext) |byte| switch (byte) {
'/', '\\' => try writer.writeByte('.'),
else => try writer.writeByte(byte),
};
}
pub fn renderFullyQualifiedDebugName(file: File, writer: anytype) !void {
for (file.sub_file_path) |byte| switch (byte) {
'/', '\\' => try writer.writeByte('/'),
else => try writer.writeByte(byte),
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
};
}
pub fn fullyQualifiedNameZ(file: File, gpa: *Allocator) ![:0]u8 {
var buf = std.ArrayList(u8).init(gpa);
defer buf.deinit();
try file.renderFullyQualifiedName(buf.writer());
return buf.toOwnedSliceSentinel(0);
}
stage2: more principled approach to comptime references * AIR no longer has a `variables` array. Instead of the `varptr` instruction, Sema emits a constant with a `decl_ref`. * AIR no longer has a `ref` instruction. There is no longer any instruction that takes a value and returns a pointer to it. If this is desired, Sema must either create an anynomous Decl and return a constant `decl_ref`, or in the case of a runtime value, emit an `alloc` instruction, `store` the value to it, and then return the `alloc`. * The `ref_val` Value Tag is eliminated. `decl_ref` should be used instead. Also added is `eu_payload_ptr` which points to the payload of an error union, given an error union pointer. In general, Sema should avoid calling `analyzeRef` if it can be helped. For example in the case of field_val and elem_val, there should never be a reason to create a temporary (alloc or decl). Recent previous commits made progress along that front. There is a new abstraction in Sema, which looks like this: var anon_decl = try block.startAnonDecl(); defer anon_decl.deinit(); // here 'anon_decl.arena()` may be used const decl = try anon_decl.finish(ty, val); // decl is typically now used with `decl_ref`. This pattern is used to upgrade `ref_val` usages to `decl_ref` usages. Additional improvements: * Sema: fix source location resolution for calling convention expression. * Sema: properly report "unable to resolve comptime value" for loads of global variables. There is now a set of functions which can be called if the callee wants to obtain the Value even if the tag is `variable` (indicating comptime-known address but runtime-known value). * Sema: `coerce` resolves builtin types before checking equality. * Sema: fix `u1_type` missing from `addType`, making this type have a slightly more efficient representation in AIR. * LLVM backend: fix `genTypedValue` for tags `decl_ref` and `variable` to properly do an LLVMConstBitCast. * Remove unused parameter from `Value.toEnum`. After this commit, some test cases are no longer passing. This is due to the more principled approach to comptime references causing more anonymous decls to get sent to the linker for codegen. However, in all these cases the decls are not actually referenced by the runtime machine code. A future commit in this branch will implement garbage collection of decls so that unused decls do not get sent to the linker for codegen. This will make the tests go back to passing.
2021-07-29 15:59:51 -07:00
/// Returns the full path to this file relative to its package.
pub fn fullPath(file: File, ally: *Allocator) ![]u8 {
return file.pkg.root_src_directory.join(ally, &[_][]const u8{file.sub_file_path});
}
stage2: more principled approach to comptime references * AIR no longer has a `variables` array. Instead of the `varptr` instruction, Sema emits a constant with a `decl_ref`. * AIR no longer has a `ref` instruction. There is no longer any instruction that takes a value and returns a pointer to it. If this is desired, Sema must either create an anynomous Decl and return a constant `decl_ref`, or in the case of a runtime value, emit an `alloc` instruction, `store` the value to it, and then return the `alloc`. * The `ref_val` Value Tag is eliminated. `decl_ref` should be used instead. Also added is `eu_payload_ptr` which points to the payload of an error union, given an error union pointer. In general, Sema should avoid calling `analyzeRef` if it can be helped. For example in the case of field_val and elem_val, there should never be a reason to create a temporary (alloc or decl). Recent previous commits made progress along that front. There is a new abstraction in Sema, which looks like this: var anon_decl = try block.startAnonDecl(); defer anon_decl.deinit(); // here 'anon_decl.arena()` may be used const decl = try anon_decl.finish(ty, val); // decl is typically now used with `decl_ref`. This pattern is used to upgrade `ref_val` usages to `decl_ref` usages. Additional improvements: * Sema: fix source location resolution for calling convention expression. * Sema: properly report "unable to resolve comptime value" for loads of global variables. There is now a set of functions which can be called if the callee wants to obtain the Value even if the tag is `variable` (indicating comptime-known address but runtime-known value). * Sema: `coerce` resolves builtin types before checking equality. * Sema: fix `u1_type` missing from `addType`, making this type have a slightly more efficient representation in AIR. * LLVM backend: fix `genTypedValue` for tags `decl_ref` and `variable` to properly do an LLVMConstBitCast. * Remove unused parameter from `Value.toEnum`. After this commit, some test cases are no longer passing. This is due to the more principled approach to comptime references causing more anonymous decls to get sent to the linker for codegen. However, in all these cases the decls are not actually referenced by the runtime machine code. A future commit in this branch will implement garbage collection of decls so that unused decls do not get sent to the linker for codegen. This will make the tests go back to passing.
2021-07-29 15:59:51 -07:00
pub fn dumpSrc(file: *File, src: LazySrcLoc) void {
const loc = std.zig.findLineColumn(file.source.bytes, src);
std.debug.print("{s}:{d}:{d}\n", .{ file.sub_file_path, loc.line + 1, loc.column + 1 });
}
stage2: more principled approach to comptime references * AIR no longer has a `variables` array. Instead of the `varptr` instruction, Sema emits a constant with a `decl_ref`. * AIR no longer has a `ref` instruction. There is no longer any instruction that takes a value and returns a pointer to it. If this is desired, Sema must either create an anynomous Decl and return a constant `decl_ref`, or in the case of a runtime value, emit an `alloc` instruction, `store` the value to it, and then return the `alloc`. * The `ref_val` Value Tag is eliminated. `decl_ref` should be used instead. Also added is `eu_payload_ptr` which points to the payload of an error union, given an error union pointer. In general, Sema should avoid calling `analyzeRef` if it can be helped. For example in the case of field_val and elem_val, there should never be a reason to create a temporary (alloc or decl). Recent previous commits made progress along that front. There is a new abstraction in Sema, which looks like this: var anon_decl = try block.startAnonDecl(); defer anon_decl.deinit(); // here 'anon_decl.arena()` may be used const decl = try anon_decl.finish(ty, val); // decl is typically now used with `decl_ref`. This pattern is used to upgrade `ref_val` usages to `decl_ref` usages. Additional improvements: * Sema: fix source location resolution for calling convention expression. * Sema: properly report "unable to resolve comptime value" for loads of global variables. There is now a set of functions which can be called if the callee wants to obtain the Value even if the tag is `variable` (indicating comptime-known address but runtime-known value). * Sema: `coerce` resolves builtin types before checking equality. * Sema: fix `u1_type` missing from `addType`, making this type have a slightly more efficient representation in AIR. * LLVM backend: fix `genTypedValue` for tags `decl_ref` and `variable` to properly do an LLVMConstBitCast. * Remove unused parameter from `Value.toEnum`. After this commit, some test cases are no longer passing. This is due to the more principled approach to comptime references causing more anonymous decls to get sent to the linker for codegen. However, in all these cases the decls are not actually referenced by the runtime machine code. A future commit in this branch will implement garbage collection of decls so that unused decls do not get sent to the linker for codegen. This will make the tests go back to passing.
2021-07-29 15:59:51 -07:00
pub fn okToReportErrors(file: File) bool {
return switch (file.status) {
.parse_failure, .astgen_failure => false,
else => true,
stage2: more principled approach to comptime references * AIR no longer has a `variables` array. Instead of the `varptr` instruction, Sema emits a constant with a `decl_ref`. * AIR no longer has a `ref` instruction. There is no longer any instruction that takes a value and returns a pointer to it. If this is desired, Sema must either create an anynomous Decl and return a constant `decl_ref`, or in the case of a runtime value, emit an `alloc` instruction, `store` the value to it, and then return the `alloc`. * The `ref_val` Value Tag is eliminated. `decl_ref` should be used instead. Also added is `eu_payload_ptr` which points to the payload of an error union, given an error union pointer. In general, Sema should avoid calling `analyzeRef` if it can be helped. For example in the case of field_val and elem_val, there should never be a reason to create a temporary (alloc or decl). Recent previous commits made progress along that front. There is a new abstraction in Sema, which looks like this: var anon_decl = try block.startAnonDecl(); defer anon_decl.deinit(); // here 'anon_decl.arena()` may be used const decl = try anon_decl.finish(ty, val); // decl is typically now used with `decl_ref`. This pattern is used to upgrade `ref_val` usages to `decl_ref` usages. Additional improvements: * Sema: fix source location resolution for calling convention expression. * Sema: properly report "unable to resolve comptime value" for loads of global variables. There is now a set of functions which can be called if the callee wants to obtain the Value even if the tag is `variable` (indicating comptime-known address but runtime-known value). * Sema: `coerce` resolves builtin types before checking equality. * Sema: fix `u1_type` missing from `addType`, making this type have a slightly more efficient representation in AIR. * LLVM backend: fix `genTypedValue` for tags `decl_ref` and `variable` to properly do an LLVMConstBitCast. * Remove unused parameter from `Value.toEnum`. After this commit, some test cases are no longer passing. This is due to the more principled approach to comptime references causing more anonymous decls to get sent to the linker for codegen. However, in all these cases the decls are not actually referenced by the runtime machine code. A future commit in this branch will implement garbage collection of decls so that unused decls do not get sent to the linker for codegen. This will make the tests go back to passing.
2021-07-29 15:59:51 -07:00
};
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
};
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
/// This struct holds data necessary to construct API-facing `AllErrors.Message`.
/// Its memory is managed with the general purpose allocator so that they
/// can be created and destroyed in response to incremental updates.
/// In some cases, the File could have been inferred from where the ErrorMsg
/// is stored. For example, if it is stored in Module.failed_decls, then the File
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
/// would be determined by the Decl Scope. However, the data structure contains the field
/// anyway so that `ErrorMsg` can be reused for error notes, which may be in a different
/// file than the parent error message. It also simplifies processing of error messages.
pub const ErrorMsg = struct {
src_loc: SrcLoc,
msg: []const u8,
notes: []ErrorMsg = &.{},
pub fn create(
gpa: *Allocator,
src_loc: SrcLoc,
comptime format: []const u8,
args: anytype,
) !*ErrorMsg {
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
const err_msg = try gpa.create(ErrorMsg);
errdefer gpa.destroy(err_msg);
err_msg.* = try init(gpa, src_loc, format, args);
return err_msg;
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
}
/// Assumes the ErrorMsg struct and msg were both allocated with `gpa`,
/// as well as all notes.
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
pub fn destroy(err_msg: *ErrorMsg, gpa: *Allocator) void {
err_msg.deinit(gpa);
gpa.destroy(err_msg);
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
}
pub fn init(
gpa: *Allocator,
src_loc: SrcLoc,
comptime format: []const u8,
args: anytype,
) !ErrorMsg {
return ErrorMsg{
.src_loc = src_loc,
.msg = try std.fmt.allocPrint(gpa, format, args),
};
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
pub fn deinit(err_msg: *ErrorMsg, gpa: *Allocator) void {
for (err_msg.notes) |*note| {
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
note.deinit(gpa);
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
gpa.free(err_msg.notes);
gpa.free(err_msg.msg);
err_msg.* = undefined;
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
}
};
/// Canonical reference to a position within a source file.
pub const SrcLoc = struct {
file_scope: *File,
/// Might be 0 depending on tag of `lazy`.
parent_decl_node: Ast.Node.Index,
/// Relative to `parent_decl_node`.
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
lazy: LazySrcLoc,
pub fn declSrcToken(src_loc: SrcLoc) Ast.TokenIndex {
const tree = src_loc.file_scope.tree;
return tree.firstToken(src_loc.parent_decl_node);
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
pub fn declRelativeToNodeIndex(src_loc: SrcLoc, offset: i32) Ast.TokenIndex {
return @bitCast(Ast.Node.Index, offset + @bitCast(i32, src_loc.parent_decl_node));
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
}
pub fn byteOffset(src_loc: SrcLoc, gpa: *Allocator) !u32 {
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
switch (src_loc.lazy) {
.unneeded => unreachable,
2021-04-20 17:03:18 -07:00
.entire_file => return 0,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
.byte_abs => |byte_index| return byte_index,
.token_abs => |tok_index| {
const tree = try src_loc.file_scope.getTree(gpa);
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_abs => |node| {
const tree = try src_loc.file_scope.getTree(gpa);
const token_starts = tree.tokens.items(.start);
const tok_index = tree.firstToken(node);
return token_starts[tok_index];
},
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
.byte_offset => |byte_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const token_starts = tree.tokens.items(.start);
return token_starts[src_loc.declSrcToken()] + byte_off;
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
},
.token_offset => |tok_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const tok_index = src_loc.declSrcToken() + tok_off;
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset, .node_offset_bin_op => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node = src_loc.declRelativeToNodeIndex(node_off);
assert(src_loc.file_scope.tree_loaded);
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[node];
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_back2tok => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node = src_loc.declRelativeToNodeIndex(node_off);
const tok_index = tree.firstToken(node) - 2;
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_var_decl_ty => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node = src_loc.declRelativeToNodeIndex(node_off);
const node_tags = tree.nodes.items(.tag);
const full = switch (node_tags[node]) {
.global_var_decl => tree.globalVarDecl(node),
.local_var_decl => tree.localVarDecl(node),
.simple_var_decl => tree.simpleVarDecl(node),
.aligned_var_decl => tree.alignedVarDecl(node),
else => unreachable,
};
const tok_index = if (full.ast.type_node != 0) blk: {
const main_tokens = tree.nodes.items(.main_token);
break :blk main_tokens[full.ast.type_node];
} else blk: {
break :blk full.ast.mut_token + 1; // the name token
};
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_builtin_call_arg0 => |n| return src_loc.byteOffsetBuiltinCallArg(gpa, n, 0),
.node_offset_builtin_call_arg1 => |n| return src_loc.byteOffsetBuiltinCallArg(gpa, n, 1),
.node_offset_builtin_call_arg2 => |n| return src_loc.byteOffsetBuiltinCallArg(gpa, n, 2),
.node_offset_builtin_call_arg3 => |n| return src_loc.byteOffsetBuiltinCallArg(gpa, n, 3),
.node_offset_builtin_call_arg4 => |n| return src_loc.byteOffsetBuiltinCallArg(gpa, n, 4),
.node_offset_builtin_call_arg5 => |n| return src_loc.byteOffsetBuiltinCallArg(gpa, n, 5),
.node_offset_array_access_index => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_datas = tree.nodes.items(.data);
const node = src_loc.declRelativeToNodeIndex(node_off);
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[node_datas[node].rhs];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_slice_sentinel => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_tags = tree.nodes.items(.tag);
const node = src_loc.declRelativeToNodeIndex(node_off);
const full = switch (node_tags[node]) {
.slice_open => tree.sliceOpen(node),
.slice => tree.slice(node),
.slice_sentinel => tree.sliceSentinel(node),
else => unreachable,
};
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[full.ast.sentinel];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_call_func => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_tags = tree.nodes.items(.tag);
const node = src_loc.declRelativeToNodeIndex(node_off);
var params: [1]Ast.Node.Index = undefined;
const full = switch (node_tags[node]) {
.call_one,
.call_one_comma,
.async_call_one,
.async_call_one_comma,
=> tree.callOne(&params, node),
.call,
.call_comma,
.async_call,
.async_call_comma,
=> tree.callFull(node),
else => unreachable,
};
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[full.ast.fn_expr];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_field_name => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_datas = tree.nodes.items(.data);
const node_tags = tree.nodes.items(.tag);
const node = src_loc.declRelativeToNodeIndex(node_off);
const tok_index = switch (node_tags[node]) {
.field_access => node_datas[node].rhs,
else => tree.firstToken(node) - 2,
};
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_deref_ptr => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_datas = tree.nodes.items(.data);
const node = src_loc.declRelativeToNodeIndex(node_off);
const tok_index = node_datas[node].lhs;
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_asm_source => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_tags = tree.nodes.items(.tag);
const node = src_loc.declRelativeToNodeIndex(node_off);
const full = switch (node_tags[node]) {
.asm_simple => tree.asmSimple(node),
.@"asm" => tree.asmFull(node),
else => unreachable,
};
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[full.ast.template];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_asm_ret_ty => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_tags = tree.nodes.items(.tag);
const node = src_loc.declRelativeToNodeIndex(node_off);
const full = switch (node_tags[node]) {
.asm_simple => tree.asmSimple(node),
.@"asm" => tree.asmFull(node),
else => unreachable,
};
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
const asm_output = full.outputs[0];
const node_datas = tree.nodes.items(.data);
const ret_ty_node = node_datas[asm_output].lhs;
const main_tokens = tree.nodes.items(.main_token);
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
const tok_index = main_tokens[ret_ty_node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_for_cond, .node_offset_if_cond => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node = src_loc.declRelativeToNodeIndex(node_off);
const node_tags = tree.nodes.items(.tag);
const src_node = switch (node_tags[node]) {
.if_simple => tree.ifSimple(node).ast.cond_expr,
.@"if" => tree.ifFull(node).ast.cond_expr,
.while_simple => tree.whileSimple(node).ast.cond_expr,
.while_cont => tree.whileCont(node).ast.cond_expr,
.@"while" => tree.whileFull(node).ast.cond_expr,
.for_simple => tree.forSimple(node).ast.cond_expr,
.@"for" => tree.forFull(node).ast.cond_expr,
else => unreachable,
};
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[src_node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_bin_lhs => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node = src_loc.declRelativeToNodeIndex(node_off);
const node_datas = tree.nodes.items(.data);
const src_node = node_datas[node].lhs;
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[src_node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_bin_rhs => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node = src_loc.declRelativeToNodeIndex(node_off);
const node_datas = tree.nodes.items(.data);
const src_node = node_datas[node].rhs;
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[src_node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_switch_operand => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node = src_loc.declRelativeToNodeIndex(node_off);
const node_datas = tree.nodes.items(.data);
const src_node = node_datas[node].lhs;
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[src_node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_switch_special_prong => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const switch_node = src_loc.declRelativeToNodeIndex(node_off);
const node_datas = tree.nodes.items(.data);
const node_tags = tree.nodes.items(.tag);
const main_tokens = tree.nodes.items(.main_token);
const extra = tree.extraData(node_datas[switch_node].rhs, Ast.Node.SubRange);
const case_nodes = tree.extra_data[extra.start..extra.end];
for (case_nodes) |case_node| {
const case = switch (node_tags[case_node]) {
.switch_case_one => tree.switchCaseOne(case_node),
.switch_case => tree.switchCase(case_node),
else => unreachable,
};
const is_special = (case.ast.values.len == 0) or
(case.ast.values.len == 1 and
node_tags[case.ast.values[0]] == .identifier and
mem.eql(u8, tree.tokenSlice(main_tokens[case.ast.values[0]]), "_"));
if (!is_special) continue;
const tok_index = main_tokens[case_node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
} else unreachable;
},
.node_offset_switch_range => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const switch_node = src_loc.declRelativeToNodeIndex(node_off);
const node_datas = tree.nodes.items(.data);
const node_tags = tree.nodes.items(.tag);
const main_tokens = tree.nodes.items(.main_token);
const extra = tree.extraData(node_datas[switch_node].rhs, Ast.Node.SubRange);
const case_nodes = tree.extra_data[extra.start..extra.end];
for (case_nodes) |case_node| {
const case = switch (node_tags[case_node]) {
.switch_case_one => tree.switchCaseOne(case_node),
.switch_case => tree.switchCase(case_node),
else => unreachable,
};
const is_special = (case.ast.values.len == 0) or
(case.ast.values.len == 1 and
node_tags[case.ast.values[0]] == .identifier and
mem.eql(u8, tree.tokenSlice(main_tokens[case.ast.values[0]]), "_"));
if (is_special) continue;
for (case.ast.values) |item_node| {
if (node_tags[item_node] == .switch_range) {
const tok_index = main_tokens[item_node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
}
}
} else unreachable;
},
.node_offset_fn_type_cc => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
stage2: more principled approach to comptime references * AIR no longer has a `variables` array. Instead of the `varptr` instruction, Sema emits a constant with a `decl_ref`. * AIR no longer has a `ref` instruction. There is no longer any instruction that takes a value and returns a pointer to it. If this is desired, Sema must either create an anynomous Decl and return a constant `decl_ref`, or in the case of a runtime value, emit an `alloc` instruction, `store` the value to it, and then return the `alloc`. * The `ref_val` Value Tag is eliminated. `decl_ref` should be used instead. Also added is `eu_payload_ptr` which points to the payload of an error union, given an error union pointer. In general, Sema should avoid calling `analyzeRef` if it can be helped. For example in the case of field_val and elem_val, there should never be a reason to create a temporary (alloc or decl). Recent previous commits made progress along that front. There is a new abstraction in Sema, which looks like this: var anon_decl = try block.startAnonDecl(); defer anon_decl.deinit(); // here 'anon_decl.arena()` may be used const decl = try anon_decl.finish(ty, val); // decl is typically now used with `decl_ref`. This pattern is used to upgrade `ref_val` usages to `decl_ref` usages. Additional improvements: * Sema: fix source location resolution for calling convention expression. * Sema: properly report "unable to resolve comptime value" for loads of global variables. There is now a set of functions which can be called if the callee wants to obtain the Value even if the tag is `variable` (indicating comptime-known address but runtime-known value). * Sema: `coerce` resolves builtin types before checking equality. * Sema: fix `u1_type` missing from `addType`, making this type have a slightly more efficient representation in AIR. * LLVM backend: fix `genTypedValue` for tags `decl_ref` and `variable` to properly do an LLVMConstBitCast. * Remove unused parameter from `Value.toEnum`. After this commit, some test cases are no longer passing. This is due to the more principled approach to comptime references causing more anonymous decls to get sent to the linker for codegen. However, in all these cases the decls are not actually referenced by the runtime machine code. A future commit in this branch will implement garbage collection of decls so that unused decls do not get sent to the linker for codegen. This will make the tests go back to passing.
2021-07-29 15:59:51 -07:00
const node_datas = tree.nodes.items(.data);
const node_tags = tree.nodes.items(.tag);
const node = src_loc.declRelativeToNodeIndex(node_off);
var params: [1]Ast.Node.Index = undefined;
const full = switch (node_tags[node]) {
.fn_proto_simple => tree.fnProtoSimple(&params, node),
.fn_proto_multi => tree.fnProtoMulti(node),
.fn_proto_one => tree.fnProtoOne(&params, node),
.fn_proto => tree.fnProto(node),
stage2: more principled approach to comptime references * AIR no longer has a `variables` array. Instead of the `varptr` instruction, Sema emits a constant with a `decl_ref`. * AIR no longer has a `ref` instruction. There is no longer any instruction that takes a value and returns a pointer to it. If this is desired, Sema must either create an anynomous Decl and return a constant `decl_ref`, or in the case of a runtime value, emit an `alloc` instruction, `store` the value to it, and then return the `alloc`. * The `ref_val` Value Tag is eliminated. `decl_ref` should be used instead. Also added is `eu_payload_ptr` which points to the payload of an error union, given an error union pointer. In general, Sema should avoid calling `analyzeRef` if it can be helped. For example in the case of field_val and elem_val, there should never be a reason to create a temporary (alloc or decl). Recent previous commits made progress along that front. There is a new abstraction in Sema, which looks like this: var anon_decl = try block.startAnonDecl(); defer anon_decl.deinit(); // here 'anon_decl.arena()` may be used const decl = try anon_decl.finish(ty, val); // decl is typically now used with `decl_ref`. This pattern is used to upgrade `ref_val` usages to `decl_ref` usages. Additional improvements: * Sema: fix source location resolution for calling convention expression. * Sema: properly report "unable to resolve comptime value" for loads of global variables. There is now a set of functions which can be called if the callee wants to obtain the Value even if the tag is `variable` (indicating comptime-known address but runtime-known value). * Sema: `coerce` resolves builtin types before checking equality. * Sema: fix `u1_type` missing from `addType`, making this type have a slightly more efficient representation in AIR. * LLVM backend: fix `genTypedValue` for tags `decl_ref` and `variable` to properly do an LLVMConstBitCast. * Remove unused parameter from `Value.toEnum`. After this commit, some test cases are no longer passing. This is due to the more principled approach to comptime references causing more anonymous decls to get sent to the linker for codegen. However, in all these cases the decls are not actually referenced by the runtime machine code. A future commit in this branch will implement garbage collection of decls so that unused decls do not get sent to the linker for codegen. This will make the tests go back to passing.
2021-07-29 15:59:51 -07:00
.fn_decl => switch (node_tags[node_datas[node].lhs]) {
.fn_proto_simple => tree.fnProtoSimple(&params, node_datas[node].lhs),
.fn_proto_multi => tree.fnProtoMulti(node_datas[node].lhs),
.fn_proto_one => tree.fnProtoOne(&params, node_datas[node].lhs),
.fn_proto => tree.fnProto(node_datas[node].lhs),
else => unreachable,
},
else => unreachable,
};
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[full.ast.callconv_expr];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_fn_type_ret_ty => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_tags = tree.nodes.items(.tag);
const node = src_loc.declRelativeToNodeIndex(node_off);
var params: [1]Ast.Node.Index = undefined;
const full = switch (node_tags[node]) {
.fn_proto_simple => tree.fnProtoSimple(&params, node),
.fn_proto_multi => tree.fnProtoMulti(node),
.fn_proto_one => tree.fnProtoOne(&params, node),
.fn_proto => tree.fnProto(node),
else => unreachable,
};
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[full.ast.return_type];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_anyframe_type => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_datas = tree.nodes.items(.data);
const parent_node = src_loc.declRelativeToNodeIndex(node_off);
const node = node_datas[parent_node].rhs;
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_lib_name => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_datas = tree.nodes.items(.data);
const node_tags = tree.nodes.items(.tag);
const parent_node = src_loc.declRelativeToNodeIndex(node_off);
var params: [1]Ast.Node.Index = undefined;
const full = switch (node_tags[parent_node]) {
.fn_proto_simple => tree.fnProtoSimple(&params, parent_node),
.fn_proto_multi => tree.fnProtoMulti(parent_node),
.fn_proto_one => tree.fnProtoOne(&params, parent_node),
.fn_proto => tree.fnProto(parent_node),
.fn_decl => blk: {
const fn_proto = node_datas[parent_node].lhs;
break :blk switch (node_tags[fn_proto]) {
.fn_proto_simple => tree.fnProtoSimple(&params, fn_proto),
.fn_proto_multi => tree.fnProtoMulti(fn_proto),
.fn_proto_one => tree.fnProtoOne(&params, fn_proto),
.fn_proto => tree.fnProto(fn_proto),
else => unreachable,
};
},
else => unreachable,
};
const tok_index = full.lib_name.?;
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_array_type_len => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_tags = tree.nodes.items(.tag);
const parent_node = src_loc.declRelativeToNodeIndex(node_off);
const full: Ast.full.ArrayType = switch (node_tags[parent_node]) {
.array_type => tree.arrayType(parent_node),
.array_type_sentinel => tree.arrayTypeSentinel(parent_node),
else => unreachable,
};
const node = full.ast.elem_count;
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_array_type_sentinel => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_tags = tree.nodes.items(.tag);
const parent_node = src_loc.declRelativeToNodeIndex(node_off);
const full: Ast.full.ArrayType = switch (node_tags[parent_node]) {
.array_type => tree.arrayType(parent_node),
.array_type_sentinel => tree.arrayTypeSentinel(parent_node),
else => unreachable,
};
const node = full.ast.sentinel;
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
.node_offset_array_type_elem => |node_off| {
const tree = try src_loc.file_scope.getTree(gpa);
const node_tags = tree.nodes.items(.tag);
const parent_node = src_loc.declRelativeToNodeIndex(node_off);
const full: Ast.full.ArrayType = switch (node_tags[parent_node]) {
.array_type => tree.arrayType(parent_node),
.array_type_sentinel => tree.arrayTypeSentinel(parent_node),
else => unreachable,
};
const node = full.ast.elem_type;
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[node];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
},
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
}
}
pub fn byteOffsetBuiltinCallArg(
src_loc: SrcLoc,
gpa: *Allocator,
node_off: i32,
arg_index: u32,
) !u32 {
const tree = try src_loc.file_scope.getTree(gpa);
const node_datas = tree.nodes.items(.data);
const node_tags = tree.nodes.items(.tag);
const node = src_loc.declRelativeToNodeIndex(node_off);
const param = switch (node_tags[node]) {
.builtin_call_two, .builtin_call_two_comma => switch (arg_index) {
0 => node_datas[node].lhs,
1 => node_datas[node].rhs,
else => unreachable,
},
.builtin_call, .builtin_call_comma => tree.extra_data[node_datas[node].lhs + arg_index],
else => unreachable,
};
const main_tokens = tree.nodes.items(.main_token);
const tok_index = main_tokens[param];
const token_starts = tree.tokens.items(.start);
return token_starts[tok_index];
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
};
/// Resolving a source location into a byte offset may require doing work
/// that we would rather not do unless the error actually occurs.
/// Therefore we need a data structure that contains the information necessary
/// to lazily produce a `SrcLoc` as required.
/// Most of the offsets in this data structure are relative to the containing Decl.
/// This makes the source location resolve properly even when a Decl gets
/// shifted up or down in the file, as long as the Decl's contents itself
/// do not change.
pub const LazySrcLoc = union(enum) {
/// When this tag is set, the code that constructed this `LazySrcLoc` is asserting
/// that all code paths which would need to resolve the source location are
/// unreachable. If you are debugging this tag incorrectly being this value,
/// look into using reverse-continue with a memory watchpoint to see where the
/// value is being set to this tag.
unneeded,
/// Means the source location points to an entire file; not any particular
/// location within the file. `file_scope` union field will be active.
entire_file,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
/// The source location points to a byte offset within a source file,
/// offset from 0. The source file is determined contextually.
/// Inside a `SrcLoc`, the `file_scope` union field will be active.
byte_abs: u32,
/// The source location points to a token within a source file,
/// offset from 0. The source file is determined contextually.
/// Inside a `SrcLoc`, the `file_scope` union field will be active.
token_abs: u32,
/// The source location points to an AST node within a source file,
/// offset from 0. The source file is determined contextually.
/// Inside a `SrcLoc`, the `file_scope` union field will be active.
node_abs: u32,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
/// The source location points to a byte offset within a source file,
/// offset from the byte offset of the Decl within the file.
/// The Decl is determined contextually.
byte_offset: u32,
/// This data is the offset into the token list from the Decl token.
/// The Decl is determined contextually.
token_offset: u32,
/// The source location points to an AST node, which is this value offset
/// from its containing Decl node AST index.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset: i32,
/// The source location points to two tokens left of the first token of an AST node,
/// which is this value offset from its containing Decl node AST index.
/// The Decl is determined contextually.
node_offset_back2tok: i32,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
/// The source location points to a variable declaration type expression,
/// found by taking this AST node index offset from the containing
/// Decl AST node, which points to a variable declaration AST node. Next, navigate
/// to the type expression.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_var_decl_ty: i32,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
/// The source location points to a for loop condition expression,
/// found by taking this AST node index offset from the containing
/// Decl AST node, which points to a for loop AST node. Next, navigate
/// to the condition expression.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_for_cond: i32,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
/// The source location points to the first parameter of a builtin
/// function call, found by taking this AST node index offset from the containing
/// Decl AST node, which points to a builtin call AST node. Next, navigate
/// to the first parameter.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_builtin_call_arg0: i32,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
/// Same as `node_offset_builtin_call_arg0` except arg index 1.
2021-03-20 17:09:06 -07:00
node_offset_builtin_call_arg1: i32,
node_offset_builtin_call_arg2: i32,
node_offset_builtin_call_arg3: i32,
node_offset_builtin_call_arg4: i32,
node_offset_builtin_call_arg5: i32,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
/// The source location points to the index expression of an array access
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to an array access AST node. Next, navigate
/// to the index expression.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_array_access_index: i32,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
/// The source location points to the sentinel expression of a slice
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to a slice AST node. Next, navigate
/// to the sentinel expression.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_slice_sentinel: i32,
/// The source location points to the callee expression of a function
/// call expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to a function call AST node. Next, navigate
/// to the callee expression.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_call_func: i32,
/// The payload is offset from the containing Decl AST node.
/// The source location points to the field name of:
/// * a field access expression (`a.b`), or
/// * the operand ("b" node) of a field initialization expression (`.a = b`)
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_field_name: i32,
/// The source location points to the pointer of a pointer deref expression,
/// found by taking this AST node index offset from the containing
/// Decl AST node, which points to a pointer deref AST node. Next, navigate
/// to the pointer expression.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_deref_ptr: i32,
/// The source location points to the assembly source code of an inline assembly
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to inline assembly AST node. Next, navigate
/// to the asm template source code.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_asm_source: i32,
/// The source location points to the return type of an inline assembly
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to inline assembly AST node. Next, navigate
/// to the return type expression.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_asm_ret_ty: i32,
/// The source location points to the condition expression of an if
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to an if expression AST node. Next, navigate
/// to the condition expression.
/// The Decl is determined contextually.
2021-03-20 17:09:06 -07:00
node_offset_if_cond: i32,
2021-03-21 19:23:12 -07:00
/// The source location points to a binary expression, such as `a + b`, found
/// by taking this AST node index offset from the containing Decl AST node.
/// The Decl is determined contextually.
node_offset_bin_op: i32,
/// The source location points to the LHS of a binary expression, found
/// by taking this AST node index offset from the containing Decl AST node,
/// which points to a binary expression AST node. Next, navigate to the LHS.
2021-03-21 19:23:12 -07:00
/// The Decl is determined contextually.
node_offset_bin_lhs: i32,
/// The source location points to the RHS of a binary expression, found
/// by taking this AST node index offset from the containing Decl AST node,
/// which points to a binary expression AST node. Next, navigate to the RHS.
2021-03-21 19:23:12 -07:00
/// The Decl is determined contextually.
node_offset_bin_rhs: i32,
/// The source location points to the operand of a switch expression, found
/// by taking this AST node index offset from the containing Decl AST node,
/// which points to a switch expression AST node. Next, navigate to the operand.
/// The Decl is determined contextually.
node_offset_switch_operand: i32,
/// The source location points to the else/`_` prong of a switch expression, found
/// by taking this AST node index offset from the containing Decl AST node,
/// which points to a switch expression AST node. Next, navigate to the else/`_` prong.
/// The Decl is determined contextually.
node_offset_switch_special_prong: i32,
/// The source location points to all the ranges of a switch expression, found
/// by taking this AST node index offset from the containing Decl AST node,
/// which points to a switch expression AST node. Next, navigate to any of the
/// range nodes. The error applies to all of them.
/// The Decl is determined contextually.
node_offset_switch_range: i32,
/// The source location points to the calling convention of a function type
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to a function type AST node. Next, navigate to
/// the calling convention node.
/// The Decl is determined contextually.
node_offset_fn_type_cc: i32,
/// The source location points to the return type of a function type
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to a function type AST node. Next, navigate to
/// the return type node.
/// The Decl is determined contextually.
node_offset_fn_type_ret_ty: i32,
/// The source location points to the type expression of an `anyframe->T`
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to a `anyframe->T` expression AST node. Next, navigate
/// to the type expression.
/// The Decl is determined contextually.
node_offset_anyframe_type: i32,
/// The source location points to the string literal of `extern "foo"`, found
/// by taking this AST node index offset from the containing
/// Decl AST node, which points to a function prototype or variable declaration
/// expression AST node. Next, navigate to the string literal of the `extern "foo"`.
/// The Decl is determined contextually.
node_offset_lib_name: i32,
/// The source location points to the len expression of an `[N:S]T`
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to an `[N:S]T` expression AST node. Next, navigate
/// to the len expression.
/// The Decl is determined contextually.
node_offset_array_type_len: i32,
/// The source location points to the sentinel expression of an `[N:S]T`
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to an `[N:S]T` expression AST node. Next, navigate
/// to the sentinel expression.
/// The Decl is determined contextually.
node_offset_array_type_sentinel: i32,
/// The source location points to the elem expression of an `[N:S]T`
/// expression, found by taking this AST node index offset from the containing
/// Decl AST node, which points to an `[N:S]T` expression AST node. Next, navigate
/// to the elem expression.
/// The Decl is determined contextually.
node_offset_array_type_elem: i32,
/// Upgrade to a `SrcLoc` based on the `Decl` provided.
pub fn toSrcLoc(lazy: LazySrcLoc, decl: *Decl) SrcLoc {
return switch (lazy) {
.unneeded,
.entire_file,
.byte_abs,
.token_abs,
.node_abs,
=> .{
.file_scope = decl.getFileScope(),
.parent_decl_node = 0,
.lazy = lazy,
},
.byte_offset,
.token_offset,
.node_offset,
.node_offset_back2tok,
.node_offset_var_decl_ty,
.node_offset_for_cond,
.node_offset_builtin_call_arg0,
.node_offset_builtin_call_arg1,
.node_offset_builtin_call_arg2,
.node_offset_builtin_call_arg3,
.node_offset_builtin_call_arg4,
.node_offset_builtin_call_arg5,
.node_offset_array_access_index,
.node_offset_slice_sentinel,
.node_offset_call_func,
.node_offset_field_name,
.node_offset_deref_ptr,
.node_offset_asm_source,
.node_offset_asm_ret_ty,
.node_offset_if_cond,
2021-03-21 19:23:12 -07:00
.node_offset_bin_op,
.node_offset_bin_lhs,
.node_offset_bin_rhs,
.node_offset_switch_operand,
.node_offset_switch_special_prong,
.node_offset_switch_range,
.node_offset_fn_type_cc,
.node_offset_fn_type_ret_ty,
.node_offset_anyframe_type,
.node_offset_lib_name,
.node_offset_array_type_len,
.node_offset_array_type_sentinel,
.node_offset_array_type_elem,
=> .{
.file_scope = decl.getFileScope(),
.parent_decl_node = decl.src_node,
.lazy = lazy,
},
};
}
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
};
pub const SemaError = error{ OutOfMemory, AnalysisFail };
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
pub const CompileError = error{
OutOfMemory,
/// When this is returned, the compile error for the failure has already been recorded.
AnalysisFail,
/// Returned when a compile error needed to be reported but a provided LazySrcLoc was set
/// to the `unneeded` tag. The source location was, in fact, needed. It is expected that
/// somewhere up the call stack, the operation will be retried after doing expensive work
/// to compute a source location.
NeededSourceLocation,
/// A Type or Value was needed to be used during semantic analysis, but it was not available
/// because the function is generic. This is only seen when analyzing the body of a param
/// instruction.
GenericPoison,
/// In a comptime scope, a return instruction was encountered. This error is only seen when
/// doing a comptime function call.
ComptimeReturn,
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
};
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
pub fn deinit(mod: *Module) void {
const gpa = mod.gpa;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
for (mod.import_table.keys()) |key| {
gpa.free(key);
}
for (mod.import_table.values()) |value| {
value.destroy(mod);
}
mod.import_table.deinit(gpa);
mod.deletion_set.deinit(gpa);
// The callsite of `Compilation.create` owns the `main_pkg`, however
2021-04-08 20:52:02 -07:00
// Module owns the builtin and std packages that it adds.
if (mod.main_pkg.table.fetchRemove("builtin")) |kv| {
gpa.free(kv.key);
kv.value.destroy(gpa);
2021-04-08 20:52:02 -07:00
}
if (mod.main_pkg.table.fetchRemove("std")) |kv| {
gpa.free(kv.key);
kv.value.destroy(gpa);
2021-04-08 20:52:02 -07:00
}
if (mod.main_pkg.table.fetchRemove("root")) |kv| {
gpa.free(kv.key);
2021-04-08 20:52:02 -07:00
}
if (mod.root_pkg != mod.main_pkg) {
mod.root_pkg.destroy(gpa);
}
2021-04-08 20:52:02 -07:00
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.compile_log_text.deinit(gpa);
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.zig_cache_artifact_directory.handle.close();
mod.local_zir_cache.handle.close();
mod.global_zir_cache.handle.close();
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
for (mod.failed_decls.values()) |value| {
value.destroy(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.failed_decls.deinit(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
if (mod.emit_h) |emit_h| {
for (emit_h.failed_decls.values()) |value| {
value.destroy(gpa);
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
}
emit_h.failed_decls.deinit(gpa);
emit_h.decl_table.deinit(gpa);
gpa.destroy(emit_h);
}
for (mod.failed_files.values()) |value| {
if (value) |msg| msg.destroy(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.failed_files.deinit(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
for (mod.failed_exports.values()) |value| {
value.destroy(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.failed_exports.deinit(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.compile_log_decls.deinit(gpa);
2020-11-21 21:12:33 -05:00
for (mod.decl_exports.values()) |export_list| {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
gpa.free(export_list);
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.decl_exports.deinit(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
for (mod.export_owners.values()) |value| {
freeExportList(gpa, value);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.export_owners.deinit(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
{
var it = mod.global_error_set.keyIterator();
while (it.next()) |key| {
gpa.free(key.*);
}
mod.global_error_set.deinit(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
mod.error_name_list.deinit(gpa);
2021-07-27 15:44:21 -07:00
mod.test_functions.deinit(gpa);
mod.align_stack_fns.deinit(gpa);
mod.monomorphed_funcs.deinit(gpa);
{
var it = mod.memoized_calls.iterator();
while (it.next()) |entry| {
gpa.free(entry.key_ptr.args);
entry.value_ptr.arena.promote(gpa).deinit();
}
mod.memoized_calls.deinit(gpa);
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
fn freeExportList(gpa: *Allocator, export_list: []*Export) void {
for (export_list) |exp| {
gpa.free(exp.options.name);
stage2: progress towards ability to compile compiler-rt * prepare compiler-rt to support being compiled by stage2 - put in a few minor workarounds that will be removed later, such as using `builtin.stage2_arch` rather than `builtin.cpu.arch`. - only try to export a few symbols for now - we'll move more symbols over to the "working in stage2" section as they become functional and gain test coverage. - use `inline fn` at function declarations rather than `@call` with an always_inline modifier at the callsites, to avoid depending on the anonymous array literal syntax language feature (for now). * AIR: replace floatcast instruction with fptrunc and fpext for shortening and widening floating point values, respectively. * Introduce a new ZIR instruction, `export_value`, which implements `@export` for the case when the thing to be exported is a local comptime value that points to a function. - AstGen: fix `@export` not properly reporting ambiguous decl references. * Sema: handle ExportOptions linkage. The value is now available to all backends. - Implement setting global linkage as appropriate in the LLVM backend. I did not yet inspect the LLVM IR, so this still needs to be audited. There is already a pending task to make sure the alias stuff is working as intended, and this is related. - Sema almost handles section, just a tiny bit more code is needed in `resolveExportOptions`. * Sema: implement float widening and shortening for both `@floatCast` and float coercion. - Implement the LLVM backend code for this as well.
2021-09-21 22:33:00 -07:00
if (exp.options.section) |s| gpa.free(s);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
gpa.destroy(exp);
}
gpa.free(export_list);
}
const data_has_safety_tag = @sizeOf(Zir.Inst.Data) != 8;
// TODO This is taking advantage of matching stage1 debug union layout.
// We need a better language feature for initializing a union with
// a runtime known tag.
const Stage1DataLayout = extern struct {
2021-09-15 21:05:40 +02:00
data: [8]u8 align(@alignOf(Zir.Inst.Data)),
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
safety_tag: u8,
};
comptime {
if (data_has_safety_tag) {
assert(@sizeOf(Stage1DataLayout) == @sizeOf(Zir.Inst.Data));
}
}
pub fn astGenFile(mod: *Module, file: *File) !void {
const tracy = trace(@src());
defer tracy.end();
const comp = mod.comp;
const gpa = mod.gpa;
// In any case we need to examine the stat of the file to determine the course of action.
var source_file = try file.pkg.root_src_directory.handle.openFile(file.sub_file_path, .{});
defer source_file.close();
const stat = try source_file.stat();
const want_local_cache = file.pkg == mod.main_pkg;
const digest = hash: {
var path_hash: Cache.HashHelper = .{};
path_hash.addBytes(build_options.version);
if (!want_local_cache) {
path_hash.addOptionalBytes(file.pkg.root_src_directory.path);
}
path_hash.addBytes(file.sub_file_path);
break :hash path_hash.final();
};
const cache_directory = if (want_local_cache) mod.local_zir_cache else mod.global_zir_cache;
const zir_dir = cache_directory.handle;
var cache_file: ?std.fs.File = null;
defer if (cache_file) |f| f.close();
// Determine whether we need to reload the file from disk and redo parsing and AstGen.
switch (file.status) {
.never_loaded, .retryable_failure => cached: {
// First, load the cached ZIR code, if any.
log.debug("AstGen checking cache: {s} (local={}, digest={s})", .{
file.sub_file_path, want_local_cache, &digest,
});
// We ask for a lock in order to coordinate with other zig processes.
// If another process is already working on this file, we will get the cached
// version. Likewise if we're working on AstGen and another process asks for
// the cached file, they'll get it.
cache_file = zir_dir.openFile(&digest, .{ .lock = .Shared }) catch |err| switch (err) {
error.PathAlreadyExists => unreachable, // opening for reading
error.NoSpaceLeft => unreachable, // opening for reading
error.NotDir => unreachable, // no dir components
error.InvalidUtf8 => unreachable, // it's a hex encoded name
error.BadPathName => unreachable, // it's a hex encoded name
error.NameTooLong => unreachable, // it's a fixed size name
error.PipeBusy => unreachable, // it's not a pipe
error.WouldBlock => unreachable, // not asking for non-blocking I/O
error.SymLinkLoop,
error.FileNotFound,
error.Unexpected,
=> break :cached,
else => |e| return e, // Retryable errors are handled at callsite.
};
// First we read the header to determine the lengths of arrays.
const header = cache_file.?.reader().readStruct(Zir.Header) catch |err| switch (err) {
// This can happen if Zig bails out of this function between creating
// the cached file and writing it.
error.EndOfStream => break :cached,
else => |e| return e,
};
const unchanged_metadata =
stat.size == header.stat_size and
stat.mtime == header.stat_mtime and
stat.inode == header.stat_inode;
if (!unchanged_metadata) {
log.debug("AstGen cache stale: {s}", .{file.sub_file_path});
break :cached;
}
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
log.debug("AstGen cache hit: {s} instructions_len={d}", .{
file.sub_file_path, header.instructions_len,
});
var instructions: std.MultiArrayList(Zir.Inst) = .{};
defer instructions.deinit(gpa);
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
try instructions.setCapacity(gpa, header.instructions_len);
instructions.len = header.instructions_len;
var zir: Zir = .{
.instructions = instructions.toOwnedSlice(),
.string_bytes = &.{},
.extra = &.{},
};
var keep_zir = false;
defer if (!keep_zir) zir.deinit(gpa);
zir.string_bytes = try gpa.alloc(u8, header.string_bytes_len);
zir.extra = try gpa.alloc(u32, header.extra_len);
const safety_buffer = if (data_has_safety_tag)
try gpa.alloc([8]u8, header.instructions_len)
else
undefined;
defer if (data_has_safety_tag) gpa.free(safety_buffer);
const data_ptr = if (data_has_safety_tag)
@ptrCast([*]u8, safety_buffer.ptr)
else
@ptrCast([*]u8, zir.instructions.items(.data).ptr);
var iovecs = [_]std.os.iovec{
.{
.iov_base = @ptrCast([*]u8, zir.instructions.items(.tag).ptr),
.iov_len = header.instructions_len,
},
.{
.iov_base = data_ptr,
.iov_len = header.instructions_len * 8,
},
.{
.iov_base = zir.string_bytes.ptr,
.iov_len = header.string_bytes_len,
},
.{
.iov_base = @ptrCast([*]u8, zir.extra.ptr),
.iov_len = header.extra_len * 4,
},
};
const amt_read = try cache_file.?.readvAll(&iovecs);
const amt_expected = zir.instructions.len * 9 +
zir.string_bytes.len +
zir.extra.len * 4;
if (amt_read != amt_expected) {
log.warn("unexpected EOF reading cached ZIR for {s}", .{file.sub_file_path});
zir.deinit(gpa);
break :cached;
}
if (data_has_safety_tag) {
const tags = zir.instructions.items(.tag);
for (zir.instructions.items(.data)) |*data, i| {
const union_tag = Zir.Inst.Tag.data_tags[@enumToInt(tags[i])];
const as_struct = @ptrCast(*Stage1DataLayout, data);
as_struct.* = .{
.safety_tag = @enumToInt(union_tag),
.data = safety_buffer[i],
};
}
}
keep_zir = true;
file.zir = zir;
file.zir_loaded = true;
file.stat_size = header.stat_size;
file.stat_inode = header.stat_inode;
file.stat_mtime = header.stat_mtime;
file.status = .success_zir;
log.debug("AstGen cached success: {s}", .{file.sub_file_path});
// TODO don't report compile errors until Sema @importFile
if (file.zir.hasCompileErrors()) {
{
const lock = comp.mutex.acquire();
defer lock.release();
try mod.failed_files.putNoClobber(gpa, file, null);
}
file.status = .astgen_failure;
return error.AnalysisFail;
}
return;
},
.parse_failure, .astgen_failure, .success_zir => {
const unchanged_metadata =
stat.size == file.stat_size and
stat.mtime == file.stat_mtime and
stat.inode == file.stat_inode;
if (unchanged_metadata) {
log.debug("unmodified metadata of file: {s}", .{file.sub_file_path});
return;
}
log.debug("metadata changed: {s}", .{file.sub_file_path});
},
}
if (cache_file) |f| {
f.close();
cache_file = null;
}
cache_file = zir_dir.createFile(&digest, .{ .lock = .Exclusive }) catch |err| switch (err) {
error.NotDir => unreachable, // no dir components
error.InvalidUtf8 => unreachable, // it's a hex encoded name
error.BadPathName => unreachable, // it's a hex encoded name
error.NameTooLong => unreachable, // it's a fixed size name
error.PipeBusy => unreachable, // it's not a pipe
error.WouldBlock => unreachable, // not asking for non-blocking I/O
error.FileNotFound => unreachable, // no dir components
else => |e| {
const pkg_path = file.pkg.root_src_directory.path orelse ".";
const cache_path = cache_directory.path orelse ".";
log.warn("unable to save cached ZIR code for {s}/{s} to {s}/{s}: {s}", .{
pkg_path, file.sub_file_path, cache_path, &digest, @errorName(e),
});
return;
},
};
mod.lockAndClearFileCompileError(file);
// If the previous ZIR does not have compile errors, keep it around
// in case parsing or new ZIR fails. In case of successful ZIR update
// at the end of this function we will free it.
// We keep the previous ZIR loaded so that we can use it
// for the update next time it does not have any compile errors. This avoids
// needlessly tossing out semantic analysis work when an error is
// temporarily introduced.
if (file.zir_loaded and !file.zir.hasCompileErrors()) {
assert(file.prev_zir == null);
const prev_zir_ptr = try gpa.create(Zir);
file.prev_zir = prev_zir_ptr;
prev_zir_ptr.* = file.zir;
file.zir = undefined;
file.zir_loaded = false;
}
file.unload(gpa);
if (stat.size > std.math.maxInt(u32))
return error.FileTooBig;
const source = try gpa.allocSentinel(u8, @intCast(usize, stat.size), 0);
defer if (!file.source_loaded) gpa.free(source);
const amt = try source_file.readAll(source);
if (amt != stat.size)
return error.UnexpectedEndOfFile;
file.stat_size = stat.size;
file.stat_inode = stat.inode;
file.stat_mtime = stat.mtime;
file.source = source;
file.source_loaded = true;
file.tree = try std.zig.parse(gpa, source);
defer if (!file.tree_loaded) file.tree.deinit(gpa);
if (file.tree.errors.len != 0) {
const parse_err = file.tree.errors[0];
var msg = std.ArrayList(u8).init(gpa);
defer msg.deinit();
const token_starts = file.tree.tokens.items(.start);
const token_tags = file.tree.tokens.items(.tag);
try file.tree.renderError(parse_err, msg.writer());
const err_msg = try gpa.create(ErrorMsg);
err_msg.* = .{
.src_loc = .{
.file_scope = file,
.parent_decl_node = 0,
.lazy = .{ .byte_abs = token_starts[parse_err.token] },
},
.msg = msg.toOwnedSlice(),
};
if (token_tags[parse_err.token] == .invalid) {
const bad_off = @intCast(u32, file.tree.tokenSlice(parse_err.token).len);
const byte_abs = token_starts[parse_err.token] + bad_off;
try mod.errNoteNonLazy(.{
.file_scope = file,
.parent_decl_node = 0,
.lazy = .{ .byte_abs = byte_abs },
}, err_msg, "invalid byte: '{'}'", .{std.zig.fmtEscapes(source[byte_abs..][0..1])});
}
{
const lock = comp.mutex.acquire();
defer lock.release();
try mod.failed_files.putNoClobber(gpa, file, err_msg);
}
file.status = .parse_failure;
return error.AnalysisFail;
}
file.tree_loaded = true;
file.zir = try AstGen.generate(gpa, file.tree);
file.zir_loaded = true;
file.status = .success_zir;
log.debug("AstGen fresh success: {s}", .{file.sub_file_path});
const safety_buffer = if (data_has_safety_tag)
try gpa.alloc([8]u8, file.zir.instructions.len)
else
undefined;
defer if (data_has_safety_tag) gpa.free(safety_buffer);
const data_ptr = if (data_has_safety_tag)
if (file.zir.instructions.len == 0)
@as([*]const u8, undefined)
else
@ptrCast([*]const u8, safety_buffer.ptr)
else
@ptrCast([*]const u8, file.zir.instructions.items(.data).ptr);
if (data_has_safety_tag) {
// The `Data` union has a safety tag but in the file format we store it without.
for (file.zir.instructions.items(.data)) |*data, i| {
const as_struct = @ptrCast(*const Stage1DataLayout, data);
safety_buffer[i] = as_struct.data;
}
}
const header: Zir.Header = .{
.instructions_len = @intCast(u32, file.zir.instructions.len),
.string_bytes_len = @intCast(u32, file.zir.string_bytes.len),
.extra_len = @intCast(u32, file.zir.extra.len),
.stat_size = stat.size,
.stat_inode = stat.inode,
.stat_mtime = stat.mtime,
};
var iovecs = [_]std.os.iovec_const{
.{
.iov_base = @ptrCast([*]const u8, &header),
.iov_len = @sizeOf(Zir.Header),
},
.{
.iov_base = @ptrCast([*]const u8, file.zir.instructions.items(.tag).ptr),
.iov_len = file.zir.instructions.len,
},
.{
.iov_base = data_ptr,
.iov_len = file.zir.instructions.len * 8,
},
.{
.iov_base = file.zir.string_bytes.ptr,
.iov_len = file.zir.string_bytes.len,
},
.{
.iov_base = @ptrCast([*]const u8, file.zir.extra.ptr),
.iov_len = file.zir.extra.len * 4,
},
};
cache_file.?.writevAll(&iovecs) catch |err| {
const pkg_path = file.pkg.root_src_directory.path orelse ".";
const cache_path = cache_directory.path orelse ".";
log.warn("unable to write cached ZIR code for {s}/{s} to {s}/{s}: {s}", .{
pkg_path, file.sub_file_path, cache_path, &digest, @errorName(err),
});
};
if (file.zir.hasCompileErrors()) {
{
const lock = comp.mutex.acquire();
defer lock.release();
try mod.failed_files.putNoClobber(gpa, file, null);
}
file.status = .astgen_failure;
return error.AnalysisFail;
}
if (file.prev_zir) |prev_zir| {
// Iterate over all Namespace objects contained within this File, looking at the
// previous and new ZIR together and update the references to point
// to the new one. For example, Decl name, Decl zir_decl_index, and Namespace
// decl_table keys need to get updated to point to the new memory, even if the
// underlying source code is unchanged.
// We do not need to hold any locks at this time because all the Decl and Namespace
// objects being touched are specific to this File, and the only other concurrent
// tasks are touching other File objects.
try updateZirRefs(gpa, file, prev_zir.*);
// At this point, `file.outdated_decls` and `file.deleted_decls` are populated,
// and semantic analysis will deal with them properly.
// No need to keep previous ZIR.
prev_zir.deinit(gpa);
gpa.destroy(prev_zir);
file.prev_zir = null;
} else if (file.root_decl) |root_decl| {
// This is an update, but it is the first time the File has succeeded
// ZIR. We must mark it outdated since we have already tried to
// semantically analyze it.
try file.outdated_decls.resize(gpa, 1);
file.outdated_decls.items[0] = root_decl;
}
}
/// Patch ups:
/// * Struct.zir_index
/// * Decl.zir_index
/// * Fn.zir_body_inst
/// * Decl.zir_decl_index
fn updateZirRefs(gpa: *Allocator, file: *File, old_zir: Zir) !void {
const new_zir = file.zir;
// Maps from old ZIR to new ZIR, struct_decl, enum_decl, etc. Any instruction which
// creates a namespace, gets mapped from old to new here.
var inst_map: std.AutoHashMapUnmanaged(Zir.Inst.Index, Zir.Inst.Index) = .{};
defer inst_map.deinit(gpa);
// Maps from old ZIR to new ZIR, the extra data index for the sub-decl item.
// e.g. the thing that Decl.zir_decl_index points to.
var extra_map: std.AutoHashMapUnmanaged(u32, u32) = .{};
defer extra_map.deinit(gpa);
try mapOldZirToNew(gpa, old_zir, new_zir, &inst_map, &extra_map);
// Walk the Decl graph, updating ZIR indexes, strings, and populating
// the deleted and outdated lists.
var decl_stack: std.ArrayListUnmanaged(*Decl) = .{};
defer decl_stack.deinit(gpa);
const root_decl = file.root_decl.?;
try decl_stack.append(gpa, root_decl);
file.deleted_decls.clearRetainingCapacity();
file.outdated_decls.clearRetainingCapacity();
// The root decl is always outdated; otherwise we would not have had
// to re-generate ZIR for the File.
try file.outdated_decls.append(gpa, root_decl);
while (decl_stack.popOrNull()) |decl| {
// Anonymous decls and the root decl have this set to 0. We still need
// to walk them but we do not need to modify this value.
// Anonymous decls should not be marked outdated. They will be re-generated
// if their owner decl is marked outdated.
if (decl.zir_decl_index != 0) {
const old_zir_decl_index = decl.zir_decl_index;
const new_zir_decl_index = extra_map.get(old_zir_decl_index) orelse {
log.debug("updateZirRefs {s}: delete {*} ({s})", .{
file.sub_file_path, decl, decl.name,
});
try file.deleted_decls.append(gpa, decl);
continue;
};
const old_hash = decl.contentsHashZir(old_zir);
decl.zir_decl_index = new_zir_decl_index;
const new_hash = decl.contentsHashZir(new_zir);
if (!std.zig.srcHashEql(old_hash, new_hash)) {
log.debug("updateZirRefs {s}: outdated {*} ({s}) {d} => {d}", .{
file.sub_file_path, decl, decl.name, old_zir_decl_index, new_zir_decl_index,
});
try file.outdated_decls.append(gpa, decl);
} else {
log.debug("updateZirRefs {s}: unchanged {*} ({s}) {d} => {d}", .{
file.sub_file_path, decl, decl.name, old_zir_decl_index, new_zir_decl_index,
});
}
}
if (!decl.owns_tv) continue;
if (decl.getStruct()) |struct_obj| {
struct_obj.zir_index = inst_map.get(struct_obj.zir_index) orelse {
try file.deleted_decls.append(gpa, decl);
continue;
};
}
if (decl.getUnion()) |union_obj| {
union_obj.zir_index = inst_map.get(union_obj.zir_index) orelse {
try file.deleted_decls.append(gpa, decl);
continue;
};
}
if (decl.getFunction()) |func| {
func.zir_body_inst = inst_map.get(func.zir_body_inst) orelse {
try file.deleted_decls.append(gpa, decl);
continue;
};
}
2021-05-07 18:52:11 -07:00
if (decl.getInnerNamespace()) |namespace| {
for (namespace.decls.values()) |sub_decl| {
try decl_stack.append(gpa, sub_decl);
}
for (namespace.anon_decls.keys()) |sub_decl| {
stage2: type declarations ZIR encode AnonNameStrategy which can be either parent, func, or anon. Here's the enum reproduced in the commit message for convenience: ```zig pub const NameStrategy = enum(u2) { /// Use the same name as the parent declaration name. /// e.g. `const Foo = struct {...};`. parent, /// Use the name of the currently executing comptime function call, /// with the current parameters. e.g. `ArrayList(i32)`. func, /// Create an anonymous name for this declaration. /// Like this: "ParentDeclName_struct_69" anon, }; ``` With this information in the ZIR, a future commit can improve the names of structs, unions, enums, and opaques. In order to accomplish this, the following ZIR instruction forms were removed and replaced with Extended op codes: * struct_decl * struct_decl_packed * struct_decl_extern * union_decl * union_decl_packed * union_decl_extern * enum_decl * enum_decl_nonexhaustive By being extended opcodes, one more u32 is needed, however we more than make up for it by repurposing the 16 "small" bits to provide shorter encodings for when decls_len == 0, fields_len == 0, a source node is not provided, etc. There tends to be no downside, and in fact sometimes upsides, to using an extended op code when there is a need for flag bits, which is the case for all three of these. Likewise, the container layout can be encoded in these bits rather than into the opcode. The following 4 ZIR instructions were added, netting a total of 4 freed up ZIR enum tags for future use: * opaque_decl_anon * opaque_decl_func * error_set_decl_anon * error_set_decl_func This is so that opaques and error sets can have the same name hint as structs, enums, and unions. `std.builtin.ContainerLayout` gets an explicit integer tag type so that it can be used inside packed structs. This commit also makes `Module.Namespace` use a separate set for anonymous decls, thus allowing anonymous decls to share the same `Decl.name` as their owner `Decl` objects.
2021-05-10 21:34:43 -07:00
try decl_stack.append(gpa, sub_decl);
}
}
}
}
pub fn mapOldZirToNew(
gpa: *Allocator,
old_zir: Zir,
new_zir: Zir,
inst_map: *std.AutoHashMapUnmanaged(Zir.Inst.Index, Zir.Inst.Index),
extra_map: *std.AutoHashMapUnmanaged(u32, u32),
) Allocator.Error!void {
// Contain ZIR indexes of declaration instructions.
const MatchedZirDecl = struct {
old_inst: Zir.Inst.Index,
new_inst: Zir.Inst.Index,
};
var match_stack: std.ArrayListUnmanaged(MatchedZirDecl) = .{};
defer match_stack.deinit(gpa);
// Main struct inst is always the same
try match_stack.append(gpa, .{
.old_inst = Zir.main_struct_inst,
.new_inst = Zir.main_struct_inst,
});
var old_decls = std.ArrayList(Zir.Inst.Index).init(gpa);
defer old_decls.deinit();
var new_decls = std.ArrayList(Zir.Inst.Index).init(gpa);
defer new_decls.deinit();
while (match_stack.popOrNull()) |match_item| {
try inst_map.put(gpa, match_item.old_inst, match_item.new_inst);
// Maps name to extra index of decl sub item.
var decl_map: std.StringHashMapUnmanaged(u32) = .{};
defer decl_map.deinit(gpa);
{
var old_decl_it = old_zir.declIterator(match_item.old_inst);
while (old_decl_it.next()) |old_decl| {
try decl_map.put(gpa, old_decl.name, old_decl.sub_index);
}
}
var new_decl_it = new_zir.declIterator(match_item.new_inst);
while (new_decl_it.next()) |new_decl| {
const old_extra_index = decl_map.get(new_decl.name) orelse continue;
const new_extra_index = new_decl.sub_index;
try extra_map.put(gpa, old_extra_index, new_extra_index);
try old_zir.findDecls(&old_decls, old_extra_index);
try new_zir.findDecls(&new_decls, new_extra_index);
var i: usize = 0;
while (true) : (i += 1) {
if (i >= old_decls.items.len) break;
if (i >= new_decls.items.len) break;
try match_stack.append(gpa, .{
.old_inst = old_decls.items[i],
.new_inst = new_decls.items[i],
});
}
}
}
}
pub fn ensureDeclAnalyzed(mod: *Module, decl: *Decl) SemaError!void {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
const tracy = trace(@src());
defer tracy.end();
const subsequent_analysis = switch (decl.analysis) {
.in_progress => unreachable,
.file_failure,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
.sema_failure,
.sema_failure_retryable,
.codegen_failure,
.dependency_failure,
.codegen_failure_retryable,
=> return error.AnalysisFail,
.complete => return,
.outdated => blk: {
log.debug("re-analyzing {*} ({s})", .{ decl, decl.name });
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
// The exports this Decl performs will be re-discovered, so we remove them here
// prior to re-analysis.
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
mod.deleteDeclExports(decl);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
// Dependencies will be re-discovered, so we remove them here prior to re-analysis.
for (decl.dependencies.keys()) |dep| {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
dep.removeDependant(decl);
if (dep.dependants.count() == 0 and !dep.deletion_flag) {
log.debug("insert {*} ({s}) dependant {*} ({s}) into deletion set", .{
decl, decl.name, dep, dep.name,
});
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
// We don't perform a deletion here, because this Decl or another one
// may end up referencing it before the update is complete.
dep.deletion_flag = true;
try mod.deletion_set.put(mod.gpa, dep, {});
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
}
decl.dependencies.clearRetainingCapacity();
break :blk true;
},
.unreferenced => false,
};
const type_changed = mod.semaDecl(decl) catch |err| switch (err) {
error.AnalysisFail => {
if (decl.analysis == .in_progress) {
// If this decl caused the compile error, the analysis field would
// be changed to indicate it was this Decl's fault. Because this
// did not happen, we infer here that it was a dependency failure.
decl.analysis = .dependency_failure;
}
return error.AnalysisFail;
},
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
error.NeededSourceLocation => unreachable,
error.GenericPoison => unreachable,
else => |e| {
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
decl.analysis = .sema_failure_retryable;
try mod.failed_decls.ensureUnusedCapacity(mod.gpa, 1);
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
mod.failed_decls.putAssumeCapacityNoClobber(decl, try ErrorMsg.create(
mod.gpa,
decl.srcLoc(),
"unable to analyze: {s}",
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
.{@errorName(e)},
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
));
return error.AnalysisFail;
},
};
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
if (subsequent_analysis) {
// We may need to chase the dependants and re-analyze them.
// However, if the decl is a function, and the type is the same, we do not need to.
if (type_changed or decl.ty.zigTypeTag() != .Fn) {
for (decl.dependants.keys()) |dep| {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
switch (dep.analysis) {
.unreferenced => unreachable,
.in_progress => continue, // already doing analysis, ok
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
.outdated => continue, // already queued for update
.file_failure,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
.dependency_failure,
.sema_failure,
.sema_failure_retryable,
.codegen_failure,
.codegen_failure_retryable,
.complete,
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
=> if (dep.generation != mod.generation) {
try mod.markOutdatedDecl(dep);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
},
}
}
}
}
}
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
pub fn semaPkg(mod: *Module, pkg: *Package) !void {
const file = (try mod.importPkg(pkg)).file;
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
return mod.semaFile(file);
}
/// Regardless of the file status, will create a `Decl` so that we
/// can track dependencies and re-analyze when the file becomes outdated.
pub fn semaFile(mod: *Module, file: *File) SemaError!void {
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
const tracy = trace(@src());
defer tracy.end();
if (file.root_decl != null) return;
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
const gpa = mod.gpa;
var new_decl_arena = std.heap.ArenaAllocator.init(gpa);
errdefer new_decl_arena.deinit();
const struct_obj = try new_decl_arena.allocator.create(Module.Struct);
const struct_ty = try Type.Tag.@"struct".create(&new_decl_arena.allocator, struct_obj);
const struct_val = try Value.Tag.ty.create(&new_decl_arena.allocator, struct_ty);
const ty_ty = comptime Type.initTag(.type);
struct_obj.* = .{
.owner_decl = undefined, // set below
.fields = .{},
.node_offset = 0, // it's the struct for the root file
.zir_index = undefined, // set below
.layout = .Auto,
.status = .none,
.known_has_bits = undefined,
.namespace = .{
.parent = null,
.ty = struct_ty,
.file_scope = file,
},
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
};
const decl_name = try file.fullyQualifiedNameZ(gpa);
const new_decl = try mod.allocateNewDecl(decl_name, &struct_obj.namespace, 0, null);
file.root_decl = new_decl;
struct_obj.owner_decl = new_decl;
new_decl.src_line = 0;
new_decl.is_pub = true;
new_decl.is_exported = false;
new_decl.has_align = false;
2021-09-02 14:46:31 +02:00
new_decl.has_linksection_or_addrspace = false;
new_decl.ty = ty_ty;
new_decl.val = struct_val;
new_decl.has_tv = true;
new_decl.owns_tv = true;
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
new_decl.alive = true; // This Decl corresponds to a File and is therefore always alive.
new_decl.analysis = .in_progress;
new_decl.generation = mod.generation;
if (file.status == .success_zir) {
assert(file.zir_loaded);
const main_struct_inst = Zir.main_struct_inst;
struct_obj.zir_index = main_struct_inst;
var sema_arena = std.heap.ArenaAllocator.init(gpa);
defer sema_arena.deinit();
var sema: Sema = .{
.mod = mod,
.gpa = gpa,
.arena = &sema_arena.allocator,
.perm_arena = &new_decl_arena.allocator,
.code = file.zir,
.owner_decl = new_decl,
.func = null,
stage2: fix generics with non-comptime anytype parameters The `comptime_args` field of Fn has a clarified purpose: For generic function instantiations, there is a `TypedValue` here for each parameter of the function: * Non-comptime parameters are marked with a `generic_poison` for the value. * Non-anytype parameters are marked with a `generic_poison` for the type. Sema now has a `fn_ret_ty` field. Doc comments reproduced here: > When semantic analysis needs to know the return type of the function whose body > is being analyzed, this `Type` should be used instead of going through `func`. > This will correctly handle the case of a comptime/inline function call of a > generic function which uses a type expression for the return type. > The type will be `void` in the case that `func` is `null`. Various places in Sema are modified in accordance with this guidance. Fixed `resolveMaybeUndefVal` not returning `error.GenericPoison` when Value Tag of `generic_poison` is encountered. Fixed generic function memoization incorrect equality checking. The logic now clearly deals properly with any combination of anytype and comptime parameters. Fixed not removing generic function instantiation from the table in case a compile errors in the rest of `call` semantic analysis. This required introduction of yet another adapter which I have called `GenericRemoveAdapter`. This one is nice and simple - it's the same hash function (the same precomputed hash is passed in) but the equality function checks pointers rather than doing any logic. Inline/comptime function calls coerce each argument in accordance with the function parameter type expressions. Likewise the return type expression is evaluated and provided (see `fn_ret_ty` above). There's a new compile error "unable to monomorphize function". It's pretty unhelpful and will need to get improved in the future. It happens when a type expression in a generic function did not end up getting resolved at a callsite. This can happen, for example, if a runtime parameter is attempted to be used where it needed to be comptime known: ```zig fn foo(x: anytype) [x]u8 { _ = x; } ``` In this example, even if we pass a number such as `10` for `x`, it is not marked `comptime`, so `x` will have a runtime known value, making the return type unable to resolve. In the LLVM backend I implement cmp instructions for float types to pass some behavior tests that used floats.
2021-08-06 16:24:39 -07:00
.fn_ret_ty = Type.initTag(.void),
.owner_func = null,
};
defer sema.deinit();
var wip_captures = try WipCaptureScope.init(gpa, &new_decl_arena.allocator, null);
defer wip_captures.deinit();
var block_scope: Sema.Block = .{
.parent = null,
.sema = &sema,
.src_decl = new_decl,
2021-10-01 12:05:12 -05:00
.namespace = &struct_obj.namespace,
.wip_capture_scope = wip_captures.scope,
.instructions = .{},
.inlining = null,
.is_comptime = true,
};
defer block_scope.instructions.deinit(gpa);
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
if (sema.analyzeStructDecl(new_decl, main_struct_inst, struct_obj)) |_| {
try wip_captures.finalize();
new_decl.analysis = .complete;
} else |err| switch (err) {
error.OutOfMemory => return error.OutOfMemory,
error.AnalysisFail => {},
}
} else {
new_decl.analysis = .file_failure;
}
try new_decl.finalizeNewArena(&new_decl_arena);
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
}
/// Returns `true` if the Decl type changed.
/// Returns `true` if this is the first time analyzing the Decl.
/// Returns `false` otherwise.
fn semaDecl(mod: *Module, decl: *Decl) !bool {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
const tracy = trace(@src());
defer tracy.end();
2021-10-01 12:05:12 -05:00
if (decl.getFileScope().status != .success_zir) {
return error.AnalysisFail;
}
const gpa = mod.gpa;
2021-10-01 12:05:12 -05:00
const zir = decl.getFileScope().zir;
const zir_datas = zir.instructions.items(.data);
decl.analysis = .in_progress;
// We need the memory for the Type to go into the arena for the Decl
var decl_arena = std.heap.ArenaAllocator.init(gpa);
errdefer decl_arena.deinit();
var analysis_arena = std.heap.ArenaAllocator.init(gpa);
defer analysis_arena.deinit();
var sema: Sema = .{
.mod = mod,
.gpa = gpa,
.arena = &analysis_arena.allocator,
.perm_arena = &decl_arena.allocator,
.code = zir,
.owner_decl = decl,
.func = null,
stage2: fix generics with non-comptime anytype parameters The `comptime_args` field of Fn has a clarified purpose: For generic function instantiations, there is a `TypedValue` here for each parameter of the function: * Non-comptime parameters are marked with a `generic_poison` for the value. * Non-anytype parameters are marked with a `generic_poison` for the type. Sema now has a `fn_ret_ty` field. Doc comments reproduced here: > When semantic analysis needs to know the return type of the function whose body > is being analyzed, this `Type` should be used instead of going through `func`. > This will correctly handle the case of a comptime/inline function call of a > generic function which uses a type expression for the return type. > The type will be `void` in the case that `func` is `null`. Various places in Sema are modified in accordance with this guidance. Fixed `resolveMaybeUndefVal` not returning `error.GenericPoison` when Value Tag of `generic_poison` is encountered. Fixed generic function memoization incorrect equality checking. The logic now clearly deals properly with any combination of anytype and comptime parameters. Fixed not removing generic function instantiation from the table in case a compile errors in the rest of `call` semantic analysis. This required introduction of yet another adapter which I have called `GenericRemoveAdapter`. This one is nice and simple - it's the same hash function (the same precomputed hash is passed in) but the equality function checks pointers rather than doing any logic. Inline/comptime function calls coerce each argument in accordance with the function parameter type expressions. Likewise the return type expression is evaluated and provided (see `fn_ret_ty` above). There's a new compile error "unable to monomorphize function". It's pretty unhelpful and will need to get improved in the future. It happens when a type expression in a generic function did not end up getting resolved at a callsite. This can happen, for example, if a runtime parameter is attempted to be used where it needed to be comptime known: ```zig fn foo(x: anytype) [x]u8 { _ = x; } ``` In this example, even if we pass a number such as `10` for `x`, it is not marked `comptime`, so `x` will have a runtime known value, making the return type unable to resolve. In the LLVM backend I implement cmp instructions for float types to pass some behavior tests that used floats.
2021-08-06 16:24:39 -07:00
.fn_ret_ty = Type.initTag(.void),
.owner_func = null,
};
defer sema.deinit();
if (decl.isRoot()) {
log.debug("semaDecl root {*} ({s})", .{ decl, decl.name });
const main_struct_inst = Zir.main_struct_inst;
const struct_obj = decl.getStruct().?;
// This might not have gotten set in `semaFile` if the first time had
// a ZIR failure, so we set it here in case.
struct_obj.zir_index = main_struct_inst;
try sema.analyzeStructDecl(decl, main_struct_inst, struct_obj);
decl.analysis = .complete;
decl.generation = mod.generation;
return false;
}
log.debug("semaDecl {*} ({s})", .{ decl, decl.name });
var wip_captures = try WipCaptureScope.init(gpa, &decl_arena.allocator, decl.src_scope);
defer wip_captures.deinit();
var block_scope: Sema.Block = .{
.parent = null,
.sema = &sema,
.src_decl = decl,
2021-10-01 12:05:12 -05:00
.namespace = decl.src_namespace,
.wip_capture_scope = wip_captures.scope,
.instructions = .{},
.inlining = null,
.is_comptime = true,
};
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
defer {
block_scope.instructions.deinit(gpa);
block_scope.params.deinit(gpa);
}
const zir_block_index = decl.zirBlockIndex();
const inst_data = zir_datas[zir_block_index].pl_node;
const extra = zir.extraData(Zir.Inst.Block, inst_data.payload_index);
const body = zir.extra[extra.end..][0..extra.data.body_len];
const break_index = try sema.analyzeBody(&block_scope, body);
try wip_captures.finalize();
const result_ref = zir_datas[break_index].@"break".operand;
const src: LazySrcLoc = .{ .node_offset = 0 };
stage2: more principled approach to comptime references * AIR no longer has a `variables` array. Instead of the `varptr` instruction, Sema emits a constant with a `decl_ref`. * AIR no longer has a `ref` instruction. There is no longer any instruction that takes a value and returns a pointer to it. If this is desired, Sema must either create an anynomous Decl and return a constant `decl_ref`, or in the case of a runtime value, emit an `alloc` instruction, `store` the value to it, and then return the `alloc`. * The `ref_val` Value Tag is eliminated. `decl_ref` should be used instead. Also added is `eu_payload_ptr` which points to the payload of an error union, given an error union pointer. In general, Sema should avoid calling `analyzeRef` if it can be helped. For example in the case of field_val and elem_val, there should never be a reason to create a temporary (alloc or decl). Recent previous commits made progress along that front. There is a new abstraction in Sema, which looks like this: var anon_decl = try block.startAnonDecl(); defer anon_decl.deinit(); // here 'anon_decl.arena()` may be used const decl = try anon_decl.finish(ty, val); // decl is typically now used with `decl_ref`. This pattern is used to upgrade `ref_val` usages to `decl_ref` usages. Additional improvements: * Sema: fix source location resolution for calling convention expression. * Sema: properly report "unable to resolve comptime value" for loads of global variables. There is now a set of functions which can be called if the callee wants to obtain the Value even if the tag is `variable` (indicating comptime-known address but runtime-known value). * Sema: `coerce` resolves builtin types before checking equality. * Sema: fix `u1_type` missing from `addType`, making this type have a slightly more efficient representation in AIR. * LLVM backend: fix `genTypedValue` for tags `decl_ref` and `variable` to properly do an LLVMConstBitCast. * Remove unused parameter from `Value.toEnum`. After this commit, some test cases are no longer passing. This is due to the more principled approach to comptime references causing more anonymous decls to get sent to the linker for codegen. However, in all these cases the decls are not actually referenced by the runtime machine code. A future commit in this branch will implement garbage collection of decls so that unused decls do not get sent to the linker for codegen. This will make the tests go back to passing.
2021-07-29 15:59:51 -07:00
const decl_tv = try sema.resolveInstValue(&block_scope, src, result_ref);
const align_val = blk: {
const align_ref = decl.zirAlignRef();
if (align_ref == .none) break :blk Value.initTag(.null_value);
break :blk (try sema.resolveInstConst(&block_scope, src, align_ref)).val;
};
const linksection_val = blk: {
const linksection_ref = decl.zirLinksectionRef();
if (linksection_ref == .none) break :blk Value.initTag(.null_value);
break :blk (try sema.resolveInstConst(&block_scope, src, linksection_ref)).val;
};
2021-09-02 14:46:31 +02:00
const address_space = blk: {
const addrspace_ctx: Sema.AddressSpaceContext = switch (decl_tv.val.tag()) {
.function, .extern_fn => .function,
.variable => .variable,
else => .constant,
};
break :blk switch (decl.zirAddrspaceRef()) {
.none => switch (addrspace_ctx) {
.function => target_util.defaultAddressSpace(sema.mod.getTarget(), .function),
.variable => target_util.defaultAddressSpace(sema.mod.getTarget(), .global_mutable),
.constant => target_util.defaultAddressSpace(sema.mod.getTarget(), .global_constant),
else => unreachable,
},
else => |addrspace_ref| try sema.analyzeAddrspace(&block_scope, src, addrspace_ref, addrspace_ctx),
};
2021-09-02 14:46:31 +02:00
};
stage2: field type expressions support referencing locals The big change in this commit is making `semaDecl` resolve the fields if the Decl ends up being a struct or union. It needs to do this while the `Sema` is still in scope, because it will have the resolved AIR instructions that the field type expressions possibly reference. We do this after the decl is populated and set to `complete` so that a `Decl` may reference itself. Everything else is fixes and improvements to make the test suite pass again after making this change. * New AIR instruction: `ptr_elem_ptr` - Implemented for LLVM backend * New Type tag: `type_info` which represents `std.builtin.TypeInfo`. It is used by AstGen for the operand type of `@Type`. * ZIR instruction `set_float_mode` uses `coerced_ty` to avoid superfluous `as` instruction on operand. * ZIR instruction `Type` uses `coerced_ty` to properly handle result location type of operand. * Fix two instances of `enum_nonexhaustive` Value Tag not handled properly - it should generally be handled the same as `enum_full`. * Fix struct and union field resolution not copying Type and Value objects into its Decl arena. * Fix enum tag value resolution discarding the ZIR=>AIR instruction map for the child Sema, when they still needed to be accessed. * Fix `zirResolveInferredAlloc` use-after-free in the AIR instructions data array. * Fix `elemPtrArray` not respecting const/mutable attribute of pointer in the result type. * Fix LLVM backend crashing when `updateDeclExports` is called before `updateDecl`/`updateFunc` (which is, according to the API, perfectly legal for the frontend to do). * Fix LLVM backend handling element pointer of pointer-to-array. It needed another index in the GEP otherwise LLVM saw the wrong type. * Fix LLVM test cases not returning 0 from main, causing test failures. Fixes a regression introduced in 6a5094872f10acc629543cc7f10533b438d0283a. * Implement comptime shift-right. * Implement `@Type` for integers and `@TypeInfo` for integers. * Implement union initialization syntax. * Implement `zirFieldType` for unions. * Implement `elemPtrArray` for a runtime-known operand. * Make `zirLog2IntType` support RHS of shift being `comptime_int`. In this case it returns `comptime_int`. The motivating test case for this commit was originally: ```zig test "example" { var l: List(10) = undefined; l.array[1] = 1; } fn List(comptime L: usize) type { var T = u8; return struct { array: [L]T, }; } ``` However I changed it to: ```zig test "example" { var l: List = undefined; l.array[1] = 1; } const List = blk: { const T = [10]u8; break :blk struct { array: T, }; }; ``` Which ended up being a similar, smaller problem. The former test case will require a similar solution in the implementation of comptime function calls - checking if the result of the function call is a struct or union, and using the child `Sema` before it is destroyed to resolve the fields.
2021-08-20 15:23:55 -07:00
// Note this resolves the type of the Decl, not the value; if this Decl
// is a struct, for example, this resolves `type` (which needs no resolution),
// not the struct itself.
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
try sema.resolveTypeLayout(&block_scope, src, decl_tv.ty);
const decl_arena_state = try decl_arena.allocator.create(std.heap.ArenaAllocator.State);
if (decl.is_usingnamespace) {
const ty_ty = Type.initTag(.type);
if (!decl_tv.ty.eql(ty_ty)) {
return sema.fail(&block_scope, src, "expected type, found {}", .{decl_tv.ty});
}
var buffer: Value.ToTypeBuffer = undefined;
const ty = decl_tv.val.toType(&buffer);
if (ty.getNamespace() == null) {
return sema.fail(&block_scope, src, "type {} has no namespace", .{ty});
}
decl.ty = ty_ty;
decl.val = try Value.Tag.ty.create(&decl_arena.allocator, ty);
decl.align_val = Value.initTag(.null_value);
decl.linksection_val = Value.initTag(.null_value);
decl.has_tv = true;
decl.owns_tv = false;
decl_arena_state.* = decl_arena.state;
decl.value_arena = decl_arena_state;
decl.analysis = .complete;
decl.generation = mod.generation;
return true;
}
if (decl_tv.val.castTag(.function)) |fn_payload| {
const func = fn_payload.data;
const owns_tv = func.owner_decl == decl;
if (owns_tv) {
var prev_type_has_bits = false;
var prev_is_inline = false;
var type_changed = true;
if (decl.has_tv) {
prev_type_has_bits = decl.ty.hasCodeGenBits();
type_changed = !decl.ty.eql(decl_tv.ty);
if (decl.getFunction()) |prev_func| {
prev_is_inline = prev_func.state == .inline_only;
}
decl.clearValues(gpa);
}
decl.ty = try decl_tv.ty.copy(&decl_arena.allocator);
decl.val = try decl_tv.val.copy(&decl_arena.allocator);
decl.align_val = try align_val.copy(&decl_arena.allocator);
decl.linksection_val = try linksection_val.copy(&decl_arena.allocator);
2021-09-02 14:46:31 +02:00
decl.@"addrspace" = address_space;
decl.has_tv = true;
decl.owns_tv = owns_tv;
decl_arena_state.* = decl_arena.state;
decl.value_arena = decl_arena_state;
decl.analysis = .complete;
decl.generation = mod.generation;
const is_inline = decl_tv.ty.fnCallingConvention() == .Inline;
if (!is_inline and decl_tv.ty.hasCodeGenBits()) {
// We don't fully codegen the decl until later, but we do need to reserve a global
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
// offset table index for it. This allows us to codegen decls out of dependency
// order, increasing how many computations can be done in parallel.
try mod.comp.bin_file.allocateDeclIndexes(decl);
try mod.comp.work_queue.writeItem(.{ .codegen_func = func });
if (type_changed and mod.emit_h != null) {
try mod.comp.work_queue.writeItem(.{ .emit_h_decl = decl });
}
} else if (!prev_is_inline and prev_type_has_bits) {
mod.comp.bin_file.freeDecl(decl);
}
if (decl.is_exported) {
const export_src = src; // TODO make this point at `export` token
if (is_inline) {
return sema.fail(&block_scope, export_src, "export of inline function", .{});
}
// The scope needs to have the decl in it.
stage2: progress towards ability to compile compiler-rt * prepare compiler-rt to support being compiled by stage2 - put in a few minor workarounds that will be removed later, such as using `builtin.stage2_arch` rather than `builtin.cpu.arch`. - only try to export a few symbols for now - we'll move more symbols over to the "working in stage2" section as they become functional and gain test coverage. - use `inline fn` at function declarations rather than `@call` with an always_inline modifier at the callsites, to avoid depending on the anonymous array literal syntax language feature (for now). * AIR: replace floatcast instruction with fptrunc and fpext for shortening and widening floating point values, respectively. * Introduce a new ZIR instruction, `export_value`, which implements `@export` for the case when the thing to be exported is a local comptime value that points to a function. - AstGen: fix `@export` not properly reporting ambiguous decl references. * Sema: handle ExportOptions linkage. The value is now available to all backends. - Implement setting global linkage as appropriate in the LLVM backend. I did not yet inspect the LLVM IR, so this still needs to be audited. There is already a pending task to make sure the alias stuff is working as intended, and this is related. - Sema almost handles section, just a tiny bit more code is needed in `resolveExportOptions`. * Sema: implement float widening and shortening for both `@floatCast` and float coercion. - Implement the LLVM backend code for this as well.
2021-09-21 22:33:00 -07:00
const options: std.builtin.ExportOptions = .{ .name = mem.spanZ(decl.name) };
try sema.analyzeExport(&block_scope, export_src, options, decl);
}
return type_changed or is_inline != prev_is_inline;
}
}
var type_changed = true;
if (decl.has_tv) {
type_changed = !decl.ty.eql(decl_tv.ty);
decl.clearValues(gpa);
}
decl.owns_tv = false;
var queue_linker_work = false;
if (decl_tv.val.castTag(.variable)) |payload| {
const variable = payload.data;
if (variable.owner_decl == decl) {
decl.owns_tv = true;
queue_linker_work = true;
const copied_init = try variable.init.copy(&decl_arena.allocator);
variable.init = copied_init;
}
} else if (decl_tv.val.castTag(.extern_fn)) |payload| {
const owner_decl = payload.data;
if (decl == owner_decl) {
decl.owns_tv = true;
queue_linker_work = true;
}
}
decl.ty = try decl_tv.ty.copy(&decl_arena.allocator);
decl.val = try decl_tv.val.copy(&decl_arena.allocator);
decl.align_val = try align_val.copy(&decl_arena.allocator);
decl.linksection_val = try linksection_val.copy(&decl_arena.allocator);
2021-09-02 14:46:31 +02:00
decl.@"addrspace" = address_space;
decl.has_tv = true;
decl_arena_state.* = decl_arena.state;
decl.value_arena = decl_arena_state;
decl.analysis = .complete;
decl.generation = mod.generation;
2021-05-07 16:05:44 -07:00
if (queue_linker_work and decl.ty.hasCodeGenBits()) {
try mod.comp.bin_file.allocateDeclIndexes(decl);
try mod.comp.work_queue.writeItem(.{ .codegen_decl = decl });
2021-05-07 16:05:44 -07:00
if (type_changed and mod.emit_h != null) {
try mod.comp.work_queue.writeItem(.{ .emit_h_decl = decl });
}
}
if (decl.is_exported) {
const export_src = src; // TODO point to the export token
// The scope needs to have the decl in it.
stage2: progress towards ability to compile compiler-rt * prepare compiler-rt to support being compiled by stage2 - put in a few minor workarounds that will be removed later, such as using `builtin.stage2_arch` rather than `builtin.cpu.arch`. - only try to export a few symbols for now - we'll move more symbols over to the "working in stage2" section as they become functional and gain test coverage. - use `inline fn` at function declarations rather than `@call` with an always_inline modifier at the callsites, to avoid depending on the anonymous array literal syntax language feature (for now). * AIR: replace floatcast instruction with fptrunc and fpext for shortening and widening floating point values, respectively. * Introduce a new ZIR instruction, `export_value`, which implements `@export` for the case when the thing to be exported is a local comptime value that points to a function. - AstGen: fix `@export` not properly reporting ambiguous decl references. * Sema: handle ExportOptions linkage. The value is now available to all backends. - Implement setting global linkage as appropriate in the LLVM backend. I did not yet inspect the LLVM IR, so this still needs to be audited. There is already a pending task to make sure the alias stuff is working as intended, and this is related. - Sema almost handles section, just a tiny bit more code is needed in `resolveExportOptions`. * Sema: implement float widening and shortening for both `@floatCast` and float coercion. - Implement the LLVM backend code for this as well.
2021-09-21 22:33:00 -07:00
const options: std.builtin.ExportOptions = .{ .name = mem.spanZ(decl.name) };
try sema.analyzeExport(&block_scope, export_src, options, decl);
}
return type_changed;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: implement simple enums A simple enum is an enum which has an automatic integer tag type, all tag values automatically assigned, and no top level declarations. Such enums are created directly in AstGen and shared by all the generic/comptime instantiations of the surrounding ZIR code. This commit implements, but does not yet add any test cases for, simple enums. A full enum is an enum for which any of the above conditions are not true. Full enums are created in Sema, and therefore will create a unique type per generic/comptime instantiation. This commit does not implement full enums. However the `enum_decl_nonexhaustive` ZIR instruction is added and the respective Type functions are filled out. This commit makes an improvement to ZIR code, removing the decls array and removing the decl_map from AstGen. Instead, decl_ref and decl_val ZIR instructions index into the `owner_decl.dependencies` ArrayHashMap. We already need this dependencies array for incremental compilation purposes, and so repurposing it to also use it for ZIR decl indexes makes for efficient memory usage. Similarly, this commit fixes up incorrect memory management by removing the `const` ZIR instruction. The two places it was used stored memory in the AstGen arena, which may get freed after Sema. Now it properly sets up a new anonymous Decl for error sets and uses a normal decl_val instruction. The other usage of `const` ZIR instruction was float literals. These are now changed to use `float` ZIR instruction when the value fits inside `zir.Inst.Data` and `float128` otherwise. AstGen + Sema: implement int_to_enum and enum_to_int. No tests yet; I expect to have to make some fixes before they will pass tests. Will do that in the branch before merging. AstGen: fix struct astgen incorrectly counting decls as fields. Type/Value: give up on trying to exhaustively list every tag all the time. This makes the file more manageable. Also found a bug with i128/u128 this way, since the name of the function was more obvious when looking at the tag values. Type: implement abiAlignment and abiSize for structs. This will need to get more sophisticated at some point, but for now it is progress. Value: add new `enum_field_index` tag. Value: add hash_u32, needed when using ArrayHashMap.
2021-04-06 17:43:56 -07:00
/// Returns the depender's index of the dependee.
pub fn declareDeclDependency(mod: *Module, depender: *Decl, dependee: *Decl) !void {
if (depender == dependee) return;
log.debug("{*} ({s}) depends on {*} ({s})", .{
depender, depender.name, dependee, dependee.name,
});
try depender.dependencies.ensureUnusedCapacity(mod.gpa, 1);
try dependee.dependants.ensureUnusedCapacity(mod.gpa, 1);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
if (dependee.deletion_flag) {
dependee.deletion_flag = false;
assert(mod.deletion_set.swapRemove(dependee));
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
dependee.dependants.putAssumeCapacity(depender, {});
depender.dependencies.putAssumeCapacity(dependee, {});
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
pub const ImportFileResult = struct {
file: *File,
is_new: bool,
};
pub fn importPkg(mod: *Module, pkg: *Package) !ImportFileResult {
const gpa = mod.gpa;
// The resolved path is used as the key in the import table, to detect if
// an import refers to the same as another, despite different relative paths
// or differently mapped package names.
const resolved_path = try std.fs.path.resolve(gpa, &[_][]const u8{
pkg.root_src_directory.path orelse ".", pkg.root_src_path,
});
var keep_resolved_path = false;
defer if (!keep_resolved_path) gpa.free(resolved_path);
const gop = try mod.import_table.getOrPut(gpa, resolved_path);
if (gop.found_existing) return ImportFileResult{
.file = gop.value_ptr.*,
.is_new = false,
};
keep_resolved_path = true; // It's now owned by import_table.
const sub_file_path = try gpa.dupe(u8, pkg.root_src_path);
errdefer gpa.free(sub_file_path);
const new_file = try gpa.create(File);
errdefer gpa.destroy(new_file);
gop.value_ptr.* = new_file;
new_file.* = .{
.sub_file_path = sub_file_path,
.source = undefined,
.source_loaded = false,
.tree_loaded = false,
.zir_loaded = false,
.stat_size = undefined,
.stat_inode = undefined,
.stat_mtime = undefined,
.tree = undefined,
.zir = undefined,
.status = .never_loaded,
.pkg = pkg,
.root_decl = null,
};
return ImportFileResult{
.file = new_file,
.is_new = true,
};
}
pub fn importFile(
mod: *Module,
cur_file: *File,
import_string: []const u8,
) !ImportFileResult {
if (cur_file.pkg.table.get(import_string)) |pkg| {
return mod.importPkg(pkg);
}
if (!mem.endsWith(u8, import_string, ".zig")) {
return error.PackageNotFound;
}
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
const gpa = mod.gpa;
// The resolved path is used as the key in the import table, to detect if
// an import refers to the same as another, despite different relative paths
// or differently mapped package names.
const cur_pkg_dir_path = cur_file.pkg.root_src_directory.path orelse ".";
const resolved_path = try std.fs.path.resolve(gpa, &[_][]const u8{
cur_pkg_dir_path, cur_file.sub_file_path, "..", import_string,
});
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
var keep_resolved_path = false;
defer if (!keep_resolved_path) gpa.free(resolved_path);
const gop = try mod.import_table.getOrPut(gpa, resolved_path);
if (gop.found_existing) return ImportFileResult{
.file = gop.value_ptr.*,
.is_new = false,
};
keep_resolved_path = true; // It's now owned by import_table.
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
const new_file = try gpa.create(File);
errdefer gpa.destroy(new_file);
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
const resolved_root_path = try std.fs.path.resolve(gpa, &[_][]const u8{cur_pkg_dir_path});
defer gpa.free(resolved_root_path);
if (!mem.startsWith(u8, resolved_path, resolved_root_path)) {
return error.ImportOutsidePkgPath;
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
}
// +1 for the directory separator here.
const sub_file_path = try gpa.dupe(u8, resolved_path[resolved_root_path.len + 1 ..]);
errdefer gpa.free(sub_file_path);
log.debug("new importFile. resolved_root_path={s}, resolved_path={s}, sub_file_path={s}, import_string={s}", .{
resolved_root_path, resolved_path, sub_file_path, import_string,
});
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
gop.value_ptr.* = new_file;
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
new_file.* = .{
.sub_file_path = sub_file_path,
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
.source = undefined,
.source_loaded = false,
.tree_loaded = false,
.zir_loaded = false,
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
.stat_size = undefined,
.stat_inode = undefined,
.stat_mtime = undefined,
.tree = undefined,
.zir = undefined,
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
.status = .never_loaded,
.pkg = cur_file.pkg,
.root_decl = null,
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
};
return ImportFileResult{
.file = new_file,
.is_new = true,
};
stage2: entry point via std lib and proper updated file detection Instead of Module setting up the root_scope with the root source file, instead, Module relies on the package table graph being set up properly, and inside `update()`, it does the equivalent of `_ = @import("std");`. This, in term, imports start.zig, which has the logic to call main (or not). `Module` no longer has `root_scope` - the root source file is no longer special, it's just in the package table mapped to "root". I also went ahead and implemented proper detection of updated files. mtime, inode, size, and source hash are kept in `Scope.File`. During an update, iterate over `import_table` and stat each file to find out which ones are updated. The source hash is redundant with the source hash used by the struct decl that corresponds to the file, so it should be removed in a future commit before merging the branch. * AstGen: add "previously declared here" notes for variables shadowing decls. * Parse imports as structs. Module now calls `AstGen.structDeclInner`, which is called by `AstGen.containerDecl`. - `importFile` is a bit kludgy with how it handles the top level Decl that kinda gets merged into the struct decl at the end of the function. Be on the look out for bugs related to that as well as possibly cleaner ways to implement this. * Module: factor out lookupDeclName into lookupIdentifier and lookupNa * Rename `Scope.Container` to `Scope.Namespace`. * Delete some dead code. This branch won't work until `usingnamespace` is implemented because it relies on `@import("builtin").OutputMode` and `OutputMode` comes from a `usingnamespace`.
2021-04-09 23:17:50 -07:00
}
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
pub fn scanNamespace(
stage2: improvements aimed at std lib integration * AstGen: emit decl lookup ZIR instructions rather than directly looking up decls in AstGen. This is necessary because we want to reuse the same immutable ZIR code for multiple generic instantiations (and comptime function calls). * AstGen: fix using members_len instead of fields_len for struct decls. * structs: the struct_decl ZIR instruction is now also a block. This is so that the type expressions, default field value expressions, and alignment expressions can be evaluated in a scope that contains the decls from the struct namespace itself. * Add "std" and "builtin" packages to the builtin package. * Don't try to build glibc, musl, or mingw-w64 when using `-ofmt=c`. * builtin.zig is generated without `usingnamespace`. * builtin.zig takes advantage of `std.zig.fmtId` for CPU features. * A first pass at implementing `usingnamespace`. It's problematic and should either be deleted, or polished, before merging this branch. * Sema: allow explicitly specifying the namespace in which to look up Decls. This is used by `struct_decl` in order to put the decls from the struct namespace itself in scope when evaluating the type expressions, default value expressions, and alignment expressions. * Module: fix `analyzeNamespace` assuming that it is the top-level root declaration node. * Sema: implement comptime and runtime cmp operator. * Sema: implement peer type resolution for enums and enum literals. * Pull in the changes from master branch: 262e09c482d98a78531c049a18b7f24146fe157f. * ZIR: complete out simple_ptr_type debug printing
2021-04-12 16:44:51 -07:00
mod: *Module,
namespace: *Namespace,
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
extra_start: usize,
decls_len: u32,
parent_decl: *Decl,
) SemaError!usize {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
const tracy = trace(@src());
defer tracy.end();
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
const gpa = mod.gpa;
const zir = namespace.file_scope.zir;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
try mod.comp.work_queue.ensureUnusedCapacity(decls_len);
try namespace.decls.ensureTotalCapacity(gpa, decls_len);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
const bit_bags_count = std.math.divCeil(usize, decls_len, 8) catch unreachable;
var extra_index = extra_start + bit_bags_count;
var bit_bag_index: usize = extra_start;
var cur_bit_bag: u32 = undefined;
var decl_i: u32 = 0;
var scan_decl_iter: ScanDeclIter = .{
.module = mod,
.namespace = namespace,
.parent_decl = parent_decl,
};
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
while (decl_i < decls_len) : (decl_i += 1) {
if (decl_i % 8 == 0) {
cur_bit_bag = zir.extra[bit_bag_index];
bit_bag_index += 1;
}
const flags = @truncate(u4, cur_bit_bag);
cur_bit_bag >>= 4;
const decl_sub_index = extra_index;
extra_index += 7; // src_hash(4) + line(1) + name(1) + value(1)
2021-09-02 14:46:31 +02:00
extra_index += @truncate(u1, flags >> 2); // Align
extra_index += @as(u2, @truncate(u1, flags >> 3)) * 2; // Link section or address space, consists of 2 Refs
try scanDecl(&scan_decl_iter, decl_sub_index, flags);
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
}
return extra_index;
}
const ScanDeclIter = struct {
module: *Module,
namespace: *Namespace,
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
parent_decl: *Decl,
usingnamespace_index: usize = 0,
comptime_index: usize = 0,
unnamed_test_index: usize = 0,
};
fn scanDecl(iter: *ScanDeclIter, decl_sub_index: usize, flags: u4) SemaError!void {
const tracy = trace(@src());
defer tracy.end();
const mod = iter.module;
const namespace = iter.namespace;
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
const gpa = mod.gpa;
const zir = namespace.file_scope.zir;
// zig fmt: off
2021-09-02 14:46:31 +02:00
const is_pub = (flags & 0b0001) != 0;
const export_bit = (flags & 0b0010) != 0;
const has_align = (flags & 0b0100) != 0;
const has_linksection_or_addrspace = (flags & 0b1000) != 0;
// zig fmt: on
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
const line = iter.parent_decl.relativeToLine(zir.extra[decl_sub_index + 4]);
const decl_name_index = zir.extra[decl_sub_index + 5];
const decl_index = zir.extra[decl_sub_index + 6];
const decl_block_inst_data = zir.instructions.items(.data)[decl_index].pl_node;
const decl_node = iter.parent_decl.relativeToNodeIndex(decl_block_inst_data.src_node);
// Every Decl needs a name.
2021-05-07 18:52:11 -07:00
var is_named_test = false;
const decl_name: [:0]const u8 = switch (decl_name_index) {
0 => name: {
if (export_bit) {
const i = iter.usingnamespace_index;
iter.usingnamespace_index += 1;
break :name try std.fmt.allocPrintZ(gpa, "usingnamespace_{d}", .{i});
} else {
const i = iter.comptime_index;
iter.comptime_index += 1;
break :name try std.fmt.allocPrintZ(gpa, "comptime_{d}", .{i});
}
},
1 => name: {
const i = iter.unnamed_test_index;
iter.unnamed_test_index += 1;
break :name try std.fmt.allocPrintZ(gpa, "test_{d}", .{i});
},
2021-05-07 18:52:11 -07:00
else => name: {
const raw_name = zir.nullTerminatedString(decl_name_index);
if (raw_name.len == 0) {
is_named_test = true;
const test_name = zir.nullTerminatedString(decl_name_index + 1);
break :name try std.fmt.allocPrintZ(gpa, "test.{s}", .{test_name});
} else {
break :name try gpa.dupeZ(u8, raw_name);
}
},
};
const is_exported = export_bit and decl_name_index != 0;
const is_usingnamespace = export_bit and decl_name_index == 0;
if (is_usingnamespace) try namespace.usingnamespace_set.ensureUnusedCapacity(gpa, 1);
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
// We create a Decl for it regardless of analysis status.
const gop = try namespace.decls.getOrPut(gpa, decl_name);
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
if (!gop.found_existing) {
const new_decl = try mod.allocateNewDecl(decl_name, namespace, decl_node, iter.parent_decl.src_scope);
if (is_usingnamespace) {
namespace.usingnamespace_set.putAssumeCapacity(new_decl, is_pub);
}
2021-05-07 18:52:11 -07:00
log.debug("scan new {*} ({s}) into {*}", .{ new_decl, decl_name, namespace });
new_decl.src_line = line;
gop.value_ptr.* = new_decl;
// Exported decls, comptime decls, usingnamespace decls, and
// test decls if in test mode, get analyzed.
const decl_pkg = namespace.file_scope.pkg;
const want_analysis = is_exported or switch (decl_name_index) {
0 => true, // comptime or usingnamespace decl
1 => blk: {
// test decl with no name. Skip the part where we check against
// the test name filter.
if (!mod.comp.bin_file.options.is_test) break :blk false;
if (decl_pkg != mod.main_pkg) break :blk false;
try mod.test_functions.put(gpa, new_decl, {});
break :blk true;
},
else => blk: {
if (!is_named_test) break :blk false;
if (!mod.comp.bin_file.options.is_test) break :blk false;
if (decl_pkg != mod.main_pkg) break :blk false;
// TODO check the name against --test-filter
try mod.test_functions.put(gpa, new_decl, {});
break :blk true;
},
};
if (want_analysis) {
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
mod.comp.work_queue.writeItemAssumeCapacity(.{ .analyze_decl = new_decl });
}
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
new_decl.is_pub = is_pub;
new_decl.is_exported = is_exported;
new_decl.is_usingnamespace = is_usingnamespace;
new_decl.has_align = has_align;
2021-09-02 14:46:31 +02:00
new_decl.has_linksection_or_addrspace = has_linksection_or_addrspace;
new_decl.zir_decl_index = @intCast(u32, decl_sub_index);
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
new_decl.alive = true; // This Decl corresponds to an AST node and therefore always alive.
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
return;
}
2021-05-07 18:52:11 -07:00
gpa.free(decl_name);
const decl = gop.value_ptr.*;
log.debug("scan existing {*} ({s}) of {*}", .{ decl, decl.name, namespace });
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
// Update the AST node of the decl; even if its contents are unchanged, it may
// have been re-ordered.
decl.src_node = decl_node;
decl.src_line = line;
decl.is_pub = is_pub;
decl.is_exported = is_exported;
decl.is_usingnamespace = is_usingnamespace;
decl.has_align = has_align;
2021-09-02 14:46:31 +02:00
decl.has_linksection_or_addrspace = has_linksection_or_addrspace;
decl.zir_decl_index = @intCast(u32, decl_sub_index);
2021-06-19 21:10:22 -04:00
if (decl.getFunction()) |_| {
switch (mod.comp.bin_file.tag) {
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
.coff => {
// TODO Implement for COFF
},
.elf => if (decl.fn_link.elf.len != 0) {
// TODO Look into detecting when this would be unnecessary by storing enough state
// in `Decl` to notice that the line number did not change.
mod.comp.work_queue.writeItemAssumeCapacity(.{ .update_line_number = decl });
},
.macho => if (decl.fn_link.macho.len != 0) {
// TODO Look into detecting when this would be unnecessary by storing enough state
// in `Decl` to notice that the line number did not change.
mod.comp.work_queue.writeItemAssumeCapacity(.{ .update_line_number = decl });
},
.plan9 => {
2021-08-29 15:11:01 -04:00
// TODO Look into detecting when this would be unnecessary by storing enough state
// in `Decl` to notice that the line number did not change.
mod.comp.work_queue.writeItemAssumeCapacity(.{ .update_line_number = decl });
},
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
.c, .wasm, .spirv => {},
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
}
/// Make it as if the semantic analysis for this Decl never happened.
pub fn clearDecl(
mod: *Module,
decl: *Decl,
outdated_decls: ?*std.AutoArrayHashMap(*Decl, void),
) Allocator.Error!void {
const tracy = trace(@src());
defer tracy.end();
log.debug("clearing {*} ({s})", .{ decl, decl.name });
const gpa = mod.gpa;
try mod.deletion_set.ensureUnusedCapacity(gpa, decl.dependencies.count());
if (outdated_decls) |map| {
_ = map.swapRemove(decl);
2021-05-07 18:52:11 -07:00
try map.ensureUnusedCapacity(decl.dependants.count());
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
// Remove itself from its dependencies.
for (decl.dependencies.keys()) |dep| {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
dep.removeDependant(decl);
if (dep.dependants.count() == 0 and !dep.deletion_flag) {
log.debug("insert {*} ({s}) dependant {*} ({s}) into deletion set", .{
decl, decl.name, dep, dep.name,
});
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
// We don't recursively perform a deletion here, because during the update,
// another reference to it may turn up.
dep.deletion_flag = true;
mod.deletion_set.putAssumeCapacity(dep, {});
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
}
decl.dependencies.clearRetainingCapacity();
// Anything that depends on this deleted decl needs to be re-analyzed.
for (decl.dependants.keys()) |dep| {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
dep.removeDependency(decl);
if (outdated_decls) |map| {
map.putAssumeCapacity(dep, {});
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
}
decl.dependants.clearRetainingCapacity();
if (mod.failed_decls.fetchSwapRemove(decl)) |kv| {
kv.value.destroy(gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
if (mod.emit_h) |emit_h| {
if (emit_h.failed_decls.fetchSwapRemove(decl)) |kv| {
kv.value.destroy(gpa);
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
}
assert(emit_h.decl_table.swapRemove(decl));
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
_ = mod.compile_log_decls.swapRemove(decl);
mod.deleteDeclExports(decl);
if (decl.has_tv) {
if (decl.ty.hasCodeGenBits()) {
mod.comp.bin_file.freeDecl(decl);
// TODO instead of a union, put this memory trailing Decl objects,
// and allow it to be variably sized.
decl.link = switch (mod.comp.bin_file.tag) {
.coff => .{ .coff = link.File.Coff.TextBlock.empty },
.elf => .{ .elf = link.File.Elf.TextBlock.empty },
.macho => .{ .macho = link.File.MachO.TextBlock.empty },
.plan9 => .{ .plan9 = link.File.Plan9.DeclBlock.empty },
.c => .{ .c = {} },
.wasm => .{ .wasm = link.File.Wasm.DeclBlock.empty },
.spirv => .{ .spirv = {} },
};
decl.fn_link = switch (mod.comp.bin_file.tag) {
.coff => .{ .coff = {} },
.elf => .{ .elf = link.File.Elf.SrcFn.empty },
.macho => .{ .macho = link.File.MachO.SrcFn.empty },
.plan9 => .{ .plan9 = {} },
.c => .{ .c = {} },
.wasm => .{ .wasm = link.File.Wasm.FnData.empty },
.spirv => .{ .spirv = .{} },
};
}
if (decl.getInnerNamespace()) |namespace| {
try namespace.deleteAllDecls(mod, outdated_decls);
}
decl.clearValues(gpa);
2021-05-07 18:52:11 -07:00
}
if (decl.deletion_flag) {
decl.deletion_flag = false;
assert(mod.deletion_set.swapRemove(decl));
}
decl.analysis = .unreferenced;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
pub fn deleteUnusedDecl(mod: *Module, decl: *Decl) void {
log.debug("deleteUnusedDecl {*} ({s})", .{ decl, decl.name });
// TODO: remove `allocateDeclIndexes` and make the API that the linker backends
// are required to notice the first time `updateDecl` happens and keep track
// of it themselves. However they can rely on getting a `freeDecl` call if any
// `updateDecl` or `updateFunc` calls happen. This will allow us to avoid any call
// into the linker backend here, since the linker backend will never have been told
// about the Decl in the first place.
// Until then, we did call `allocateDeclIndexes` on this anonymous Decl and so we
// must call `freeDecl` in the linker backend now.
switch (mod.comp.bin_file.tag) {
.c => {}, // this linker backend has already migrated to the new API
else => if (decl.has_tv) {
if (decl.ty.hasCodeGenBits()) {
mod.comp.bin_file.freeDecl(decl);
}
},
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
}
2021-10-01 12:05:12 -05:00
assert(!decl.isRoot());
assert(decl.src_namespace.anon_decls.swapRemove(decl));
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
const dependants = decl.dependants.keys();
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
for (dependants) |dep| {
dep.removeDependency(decl);
}
for (decl.dependencies.keys()) |dep| {
dep.removeDependant(decl);
}
decl.destroy(mod);
}
/// Cancel the creation of an anon decl and delete any references to it.
/// If other decls depend on this decl, they must be aborted first.
pub fn abortAnonDecl(mod: *Module, decl: *Decl) void {
log.debug("abortAnonDecl {*} ({s})", .{ decl, decl.name });
2021-09-30 23:38:41 -05:00
2021-10-01 12:05:12 -05:00
assert(!decl.isRoot());
assert(decl.src_namespace.anon_decls.swapRemove(decl));
2021-09-30 23:38:41 -05:00
// An aborted decl must not have dependants -- they must have
// been aborted first and removed from this list.
assert(decl.dependants.count() == 0);
2021-09-30 23:38:41 -05:00
for (decl.dependencies.keys()) |dep| {
dep.removeDependant(decl);
}
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
decl.destroy(mod);
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
/// Delete all the Export objects that are caused by this Decl. Re-analysis of
/// this Decl will cause them to be re-created (or not).
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
fn deleteDeclExports(mod: *Module, decl: *Decl) void {
const kv = mod.export_owners.fetchSwapRemove(decl) orelse return;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
for (kv.value) |exp| {
if (mod.decl_exports.getPtr(exp.exported_decl)) |value_ptr| {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
// Remove exports with owner_decl matching the regenerating decl.
const list = value_ptr.*;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
var i: usize = 0;
var new_len = list.len;
while (i < new_len) {
if (list[i].owner_decl == decl) {
mem.copyBackwards(*Export, list[i..], list[i + 1 .. new_len]);
new_len -= 1;
} else {
i += 1;
}
}
value_ptr.* = mod.gpa.shrink(list, new_len);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
if (new_len == 0) {
assert(mod.decl_exports.swapRemove(exp.exported_decl));
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
if (mod.comp.bin_file.cast(link.File.Elf)) |elf| {
elf.deleteExport(exp.link.elf);
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
if (mod.comp.bin_file.cast(link.File.MachO)) |macho| {
macho.deleteExport(exp.link.macho);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
if (mod.failed_exports.fetchSwapRemove(exp)) |failed_kv| {
failed_kv.value.destroy(mod.gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.gpa.free(exp.options.name);
mod.gpa.destroy(exp);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
mod.gpa.free(kv.value);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
pub fn analyzeFnBody(mod: *Module, decl: *Decl, func: *Fn, arena: *Allocator) SemaError!Air {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
const tracy = trace(@src());
defer tracy.end();
const gpa = mod.gpa;
// Use the Decl's arena for captured values.
var decl_arena = decl.value_arena.?.promote(gpa);
defer decl.value_arena.?.* = decl_arena.state;
var sema: Sema = .{
.mod = mod,
.gpa = gpa,
.arena = arena,
.perm_arena = &decl_arena.allocator,
2021-10-01 12:05:12 -05:00
.code = decl.getFileScope().zir,
.owner_decl = decl,
.func = func,
stage2: fix generics with non-comptime anytype parameters The `comptime_args` field of Fn has a clarified purpose: For generic function instantiations, there is a `TypedValue` here for each parameter of the function: * Non-comptime parameters are marked with a `generic_poison` for the value. * Non-anytype parameters are marked with a `generic_poison` for the type. Sema now has a `fn_ret_ty` field. Doc comments reproduced here: > When semantic analysis needs to know the return type of the function whose body > is being analyzed, this `Type` should be used instead of going through `func`. > This will correctly handle the case of a comptime/inline function call of a > generic function which uses a type expression for the return type. > The type will be `void` in the case that `func` is `null`. Various places in Sema are modified in accordance with this guidance. Fixed `resolveMaybeUndefVal` not returning `error.GenericPoison` when Value Tag of `generic_poison` is encountered. Fixed generic function memoization incorrect equality checking. The logic now clearly deals properly with any combination of anytype and comptime parameters. Fixed not removing generic function instantiation from the table in case a compile errors in the rest of `call` semantic analysis. This required introduction of yet another adapter which I have called `GenericRemoveAdapter`. This one is nice and simple - it's the same hash function (the same precomputed hash is passed in) but the equality function checks pointers rather than doing any logic. Inline/comptime function calls coerce each argument in accordance with the function parameter type expressions. Likewise the return type expression is evaluated and provided (see `fn_ret_ty` above). There's a new compile error "unable to monomorphize function". It's pretty unhelpful and will need to get improved in the future. It happens when a type expression in a generic function did not end up getting resolved at a callsite. This can happen, for example, if a runtime parameter is attempted to be used where it needed to be comptime known: ```zig fn foo(x: anytype) [x]u8 { _ = x; } ``` In this example, even if we pass a number such as `10` for `x`, it is not marked `comptime`, so `x` will have a runtime known value, making the return type unable to resolve. In the LLVM backend I implement cmp instructions for float types to pass some behavior tests that used floats.
2021-08-06 16:24:39 -07:00
.fn_ret_ty = func.owner_decl.ty.fnReturnType(),
.owner_func = func,
};
defer sema.deinit();
// First few indexes of extra are reserved and set at the end.
const reserved_count = @typeInfo(Air.ExtraIndex).Enum.fields.len;
try sema.air_extra.ensureTotalCapacity(gpa, reserved_count);
sema.air_extra.items.len += reserved_count;
var wip_captures = try WipCaptureScope.init(gpa, &decl_arena.allocator, decl.src_scope);
defer wip_captures.deinit();
var inner_block: Sema.Block = .{
.parent = null,
.sema = &sema,
.src_decl = decl,
2021-10-01 12:05:12 -05:00
.namespace = decl.src_namespace,
.wip_capture_scope = wip_captures.scope,
.instructions = .{},
.inlining = null,
.is_comptime = false,
};
defer inner_block.instructions.deinit(gpa);
const fn_info = sema.code.getFnInfo(func.zir_body_inst);
const zir_tags = sema.code.instructions.items(.tag);
// Here we are performing "runtime semantic analysis" for a function body, which means
// we must map the parameter ZIR instructions to `arg` AIR instructions.
// AIR requires the `arg` parameters to be the first N instructions.
stage2: basic generic functions are working The general strategy is that Sema will pre-map comptime arguments into the inst_map, and then re-run the block body that contains the `param` and `func` instructions. This re-runs all the parameter type expressions except with comptime values populated. In Sema, param instructions are now handled specially: they detect whether they are comptime-elided or not. If so, they skip putting a value in the inst_map, since it is already pre-populated. If not, then they append to the `fields` field of `Sema` for use with the `func` instruction. So when the block body is re-run, a new function is generated with all the comptime arguments elided, and the new function type has only runtime parameters in it. TODO: give the generated Decls better names than "foo__anon_x". The new function is then added to the work queue to have its body analyzed and a runtime call AIR instruction to the new function is emitted. When the new function gets semantically analyzed, comptime parameters are pre-mapped to the corresponding `comptime_args` values rather than mapped to an `arg` AIR instruction. `comptime_args` is a new field that `Fn` has which is a `TypedValue` for each parameter. This field is non-null for generic function instantiations only. The values are the comptime arguments. For non-comptime parameters, a sentinel value is used. This is because we need to know the information of which parameters are comptime-known. Additionally: * AstGen: align and section expressions are evaluated in the scope that has comptime parameters in it. There are still some TODO items left; see the BRANCH_TODO file.
2021-08-03 22:34:22 -07:00
// This could be a generic function instantiation, however, in which case we need to
// map the comptime parameters to constant values and only emit arg AIR instructions
// for the runtime ones.
const fn_ty = decl.ty;
stage2: basic generic functions are working The general strategy is that Sema will pre-map comptime arguments into the inst_map, and then re-run the block body that contains the `param` and `func` instructions. This re-runs all the parameter type expressions except with comptime values populated. In Sema, param instructions are now handled specially: they detect whether they are comptime-elided or not. If so, they skip putting a value in the inst_map, since it is already pre-populated. If not, then they append to the `fields` field of `Sema` for use with the `func` instruction. So when the block body is re-run, a new function is generated with all the comptime arguments elided, and the new function type has only runtime parameters in it. TODO: give the generated Decls better names than "foo__anon_x". The new function is then added to the work queue to have its body analyzed and a runtime call AIR instruction to the new function is emitted. When the new function gets semantically analyzed, comptime parameters are pre-mapped to the corresponding `comptime_args` values rather than mapped to an `arg` AIR instruction. `comptime_args` is a new field that `Fn` has which is a `TypedValue` for each parameter. This field is non-null for generic function instantiations only. The values are the comptime arguments. For non-comptime parameters, a sentinel value is used. This is because we need to know the information of which parameters are comptime-known. Additionally: * AstGen: align and section expressions are evaluated in the scope that has comptime parameters in it. There are still some TODO items left; see the BRANCH_TODO file.
2021-08-03 22:34:22 -07:00
const runtime_params_len = @intCast(u32, fn_ty.fnParamLen());
try inner_block.instructions.ensureTotalCapacity(gpa, runtime_params_len);
try sema.air_instructions.ensureUnusedCapacity(gpa, fn_info.total_params_len * 2); // * 2 for the `addType`
try sema.inst_map.ensureUnusedCapacity(gpa, fn_info.total_params_len);
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
var runtime_param_index: usize = 0;
var total_param_index: usize = 0;
for (fn_info.param_body) |inst| {
const name = switch (zir_tags[inst]) {
.param, .param_comptime => blk: {
const inst_data = sema.code.instructions.items(.data)[inst].pl_tok;
const extra = sema.code.extraData(Zir.Inst.Param, inst_data.payload_index).data;
break :blk extra.name;
},
.param_anytype, .param_anytype_comptime => blk: {
const str_tok = sema.code.instructions.items(.data)[inst].str_tok;
break :blk str_tok.start;
},
else => continue,
};
stage2: basic generic functions are working The general strategy is that Sema will pre-map comptime arguments into the inst_map, and then re-run the block body that contains the `param` and `func` instructions. This re-runs all the parameter type expressions except with comptime values populated. In Sema, param instructions are now handled specially: they detect whether they are comptime-elided or not. If so, they skip putting a value in the inst_map, since it is already pre-populated. If not, then they append to the `fields` field of `Sema` for use with the `func` instruction. So when the block body is re-run, a new function is generated with all the comptime arguments elided, and the new function type has only runtime parameters in it. TODO: give the generated Decls better names than "foo__anon_x". The new function is then added to the work queue to have its body analyzed and a runtime call AIR instruction to the new function is emitted. When the new function gets semantically analyzed, comptime parameters are pre-mapped to the corresponding `comptime_args` values rather than mapped to an `arg` AIR instruction. `comptime_args` is a new field that `Fn` has which is a `TypedValue` for each parameter. This field is non-null for generic function instantiations only. The values are the comptime arguments. For non-comptime parameters, a sentinel value is used. This is because we need to know the information of which parameters are comptime-known. Additionally: * AstGen: align and section expressions are evaluated in the scope that has comptime parameters in it. There are still some TODO items left; see the BRANCH_TODO file.
2021-08-03 22:34:22 -07:00
if (func.comptime_args) |comptime_args| {
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
const arg_tv = comptime_args[total_param_index];
stage2: fix generics with non-comptime anytype parameters The `comptime_args` field of Fn has a clarified purpose: For generic function instantiations, there is a `TypedValue` here for each parameter of the function: * Non-comptime parameters are marked with a `generic_poison` for the value. * Non-anytype parameters are marked with a `generic_poison` for the type. Sema now has a `fn_ret_ty` field. Doc comments reproduced here: > When semantic analysis needs to know the return type of the function whose body > is being analyzed, this `Type` should be used instead of going through `func`. > This will correctly handle the case of a comptime/inline function call of a > generic function which uses a type expression for the return type. > The type will be `void` in the case that `func` is `null`. Various places in Sema are modified in accordance with this guidance. Fixed `resolveMaybeUndefVal` not returning `error.GenericPoison` when Value Tag of `generic_poison` is encountered. Fixed generic function memoization incorrect equality checking. The logic now clearly deals properly with any combination of anytype and comptime parameters. Fixed not removing generic function instantiation from the table in case a compile errors in the rest of `call` semantic analysis. This required introduction of yet another adapter which I have called `GenericRemoveAdapter`. This one is nice and simple - it's the same hash function (the same precomputed hash is passed in) but the equality function checks pointers rather than doing any logic. Inline/comptime function calls coerce each argument in accordance with the function parameter type expressions. Likewise the return type expression is evaluated and provided (see `fn_ret_ty` above). There's a new compile error "unable to monomorphize function". It's pretty unhelpful and will need to get improved in the future. It happens when a type expression in a generic function did not end up getting resolved at a callsite. This can happen, for example, if a runtime parameter is attempted to be used where it needed to be comptime known: ```zig fn foo(x: anytype) [x]u8 { _ = x; } ``` In this example, even if we pass a number such as `10` for `x`, it is not marked `comptime`, so `x` will have a runtime known value, making the return type unable to resolve. In the LLVM backend I implement cmp instructions for float types to pass some behavior tests that used floats.
2021-08-06 16:24:39 -07:00
if (arg_tv.val.tag() != .generic_poison) {
stage2: basic generic functions are working The general strategy is that Sema will pre-map comptime arguments into the inst_map, and then re-run the block body that contains the `param` and `func` instructions. This re-runs all the parameter type expressions except with comptime values populated. In Sema, param instructions are now handled specially: they detect whether they are comptime-elided or not. If so, they skip putting a value in the inst_map, since it is already pre-populated. If not, then they append to the `fields` field of `Sema` for use with the `func` instruction. So when the block body is re-run, a new function is generated with all the comptime arguments elided, and the new function type has only runtime parameters in it. TODO: give the generated Decls better names than "foo__anon_x". The new function is then added to the work queue to have its body analyzed and a runtime call AIR instruction to the new function is emitted. When the new function gets semantically analyzed, comptime parameters are pre-mapped to the corresponding `comptime_args` values rather than mapped to an `arg` AIR instruction. `comptime_args` is a new field that `Fn` has which is a `TypedValue` for each parameter. This field is non-null for generic function instantiations only. The values are the comptime arguments. For non-comptime parameters, a sentinel value is used. This is because we need to know the information of which parameters are comptime-known. Additionally: * AstGen: align and section expressions are evaluated in the scope that has comptime parameters in it. There are still some TODO items left; see the BRANCH_TODO file.
2021-08-03 22:34:22 -07:00
// We have a comptime value for this parameter.
const arg = try sema.addConstant(arg_tv.ty, arg_tv.val);
sema.inst_map.putAssumeCapacityNoClobber(inst, arg);
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
total_param_index += 1;
stage2: basic generic functions are working The general strategy is that Sema will pre-map comptime arguments into the inst_map, and then re-run the block body that contains the `param` and `func` instructions. This re-runs all the parameter type expressions except with comptime values populated. In Sema, param instructions are now handled specially: they detect whether they are comptime-elided or not. If so, they skip putting a value in the inst_map, since it is already pre-populated. If not, then they append to the `fields` field of `Sema` for use with the `func` instruction. So when the block body is re-run, a new function is generated with all the comptime arguments elided, and the new function type has only runtime parameters in it. TODO: give the generated Decls better names than "foo__anon_x". The new function is then added to the work queue to have its body analyzed and a runtime call AIR instruction to the new function is emitted. When the new function gets semantically analyzed, comptime parameters are pre-mapped to the corresponding `comptime_args` values rather than mapped to an `arg` AIR instruction. `comptime_args` is a new field that `Fn` has which is a `TypedValue` for each parameter. This field is non-null for generic function instantiations only. The values are the comptime arguments. For non-comptime parameters, a sentinel value is used. This is because we need to know the information of which parameters are comptime-known. Additionally: * AstGen: align and section expressions are evaluated in the scope that has comptime parameters in it. There are still some TODO items left; see the BRANCH_TODO file.
2021-08-03 22:34:22 -07:00
continue;
}
}
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
const param_type = fn_ty.fnParamType(runtime_param_index);
const ty_ref = try sema.addType(param_type);
const arg_index = @intCast(u32, sema.air_instructions.len);
inner_block.instructions.appendAssumeCapacity(arg_index);
sema.air_instructions.appendAssumeCapacity(.{
.tag = .arg,
.data = .{ .ty_str = .{
.ty = ty_ref,
.str = name,
} },
});
sema.inst_map.putAssumeCapacityNoClobber(inst, Air.indexToRef(arg_index));
stage2 generics improvements: anytype and param type exprs AstGen result locations now have a `coerced_ty` tag which is the same as `ty` except it assumes that Sema will do a coercion, so it does not redundantly add an `as` instruction into the ZIR code. This results in cleaner ZIR and about a 14% reduction of ZIR bytes. param and param_comptime ZIR instructions now have a block body for their type expressions. This allows Sema to skip evaluation of the block in the case that the parameter is comptime-provided. It also allows a new mechanism to function: when evaluating type expressions of generic functions, if it would depend on another parameter, it returns `error.GenericPoison` which bubbles up and then is caught by the param/param_comptime instruction and then handled. This allows parameters to be evaluated independently so that the type info for functions which have comptime or anytype parameters will still have types populated for parameters that do not depend on values of previous parameters (because evaluation of their param blocks will return successfully instead of `error.GenericPoison`). It also makes iteration over the block that contains function parameters slightly more efficient since it now only contains the param instructions. Finally, it fixes the case where a generic function type expression contains a function prototype. Formerly, this situation would cause shared state to clobber each other; now it is in a proper tree structure so that can't happen. This fix also required adding a field to Sema `comptime_args_fn_inst` to make sure that the `comptime_args` field passed into Sema is applied to the correct `func` instruction. Source location for `node_offset_asm_ret_ty` is fixed; it was pointing at the asm output name rather than the return type as intended. Generic function instantiation is fixed, notably with respect to parameter type expressions that depend on previous parameters, and with respect to types which must be always comptime-known. This involves passing all the comptime arguments at a callsite of a generic function, and allowing the generic function semantic analysis to coerce the values to the proper types (since it has access to the evaluated parameter type expressions) and then decide based on the type whether the parameter is runtime known or not. In the case of explicitly marked `comptime` parameters, there is a check at the semantic analysis of the `call` instruction. Semantic analysis of `call` instructions does type coercion on the arguments, which is needed both for generic functions and to make up for using `coerced_ty` result locations (mentioned above). Tasks left in this branch: * Implement the memoization table. * Add test coverage. * Improve error reporting and source locations for compile errors.
2021-08-04 21:11:31 -07:00
total_param_index += 1;
runtime_param_index += 1;
}
func.state = .in_progress;
log.debug("set {s} to in_progress", .{decl.name});
_ = sema.analyzeBody(&inner_block, fn_info.body) catch |err| switch (err) {
// TODO make these unreachable instead of @panic
error.NeededSourceLocation => @panic("zig compiler bug: NeededSourceLocation"),
error.GenericPoison => @panic("zig compiler bug: GenericPoison"),
error.ComptimeReturn => @panic("zig compiler bug: ComptimeReturn"),
else => |e| return e,
};
try wip_captures.finalize();
// Copy the block into place and mark that as the main block.
try sema.air_extra.ensureUnusedCapacity(gpa, @typeInfo(Air.Block).Struct.fields.len +
inner_block.instructions.items.len);
const main_block_index = sema.addExtraAssumeCapacity(Air.Block{
.body_len = @intCast(u32, inner_block.instructions.items.len),
});
sema.air_extra.appendSliceAssumeCapacity(inner_block.instructions.items);
sema.air_extra.items[@enumToInt(Air.ExtraIndex.main_block)] = main_block_index;
func.state = .success;
log.debug("set {s} to success", .{decl.name});
return Air{
.instructions = sema.air_instructions.toOwnedSlice(),
.extra = sema.air_extra.toOwnedSlice(gpa),
.values = sema.air_values.toOwnedSlice(gpa),
};
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
fn markOutdatedDecl(mod: *Module, decl: *Decl) !void {
log.debug("mark outdated {*} ({s})", .{ decl, decl.name });
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
try mod.comp.work_queue.writeItem(.{ .analyze_decl = decl });
if (mod.failed_decls.fetchSwapRemove(decl)) |kv| {
kv.value.destroy(mod.gpa);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
if (decl.has_tv and decl.owns_tv) {
if (decl.val.castTag(.function)) |payload| {
const func = payload.data;
_ = mod.align_stack_fns.remove(func);
}
}
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
if (mod.emit_h) |emit_h| {
if (emit_h.failed_decls.fetchSwapRemove(decl)) |kv| {
kv.value.destroy(mod.gpa);
stage2: rewire the frontend driver to whole-file-zir * Remove some unused imports in AstGen.zig. I think it would make sense to start decoupling AstGen from the rest of the compiler code, similar to how the tokenizer and parser are decoupled. * AstGen: For decls, move the block_inline instructions to the top of the function so that they get lower ZIR instruction indexes. With this, the block_inline instruction index combined with its corresponding break_inline instruction index can be used to form a ZIR instruction range. This is useful for allocating an array to map ZIR instructions to semantically analyzed instructions. * Module: extract emit-h functionality into a struct, and only allocate it when emit-h is activated. * Module: remove the `decl_table` field. This previously was a table of all Decls in the entire Module. A "name hash" strategy was used to find decls within a given namespace, using this global table. Now, each Namespace has its own map of name to children Decls. - Additionally, there were 3 places that relied on iterating over decl_table in order to function: - C backend and SPIR-V backend. These now have their own decl_table that they keep populated when `updateDecl` and `removeDecl` are called. - emit-h. A `decl_table` field has been added to the new GlobalEmitH struct which is only allocated when emit-h is activated. * Module: fix ZIR serialization/deserialization bug in debug mode having to do with the secret safety tag for untagged unions. There is still an open TODO to investigate a friendlier solution to this problem with the language. * Module: improve deserialization of ZIR to allocate only exactly as much capacity as length in the instructions array so as to not waste space. * Module: move `srcHashEql` to `std.zig` to live next to the definition of `SrcHash` itself. * Module: re-introduce the logic for scanning top level declarations within a namespace. * Compilation: add an `analyze_pkg` Job which is used to kick off the start of semantic analysis by doing the equivalent of `_ = @import("std");`. The `analyze_pkg` job is unconditionally added to the work queue on every update(), with pkg set to the std lib pkg. * Rename TZIR to AIR in a few places. A more comprehensive rename will come later.
2021-04-26 20:41:07 -07:00
}
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
_ = mod.compile_log_decls.swapRemove(decl);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
decl.analysis = .outdated;
}
pub fn allocateNewDecl(mod: *Module, name: [:0]const u8, namespace: *Namespace, src_node: Ast.Node.Index, src_scope: ?*CaptureScope) !*Decl {
// If we have emit-h then we must allocate a bigger structure to store the emit-h state.
const new_decl: *Decl = if (mod.emit_h != null) blk: {
const parent_struct = try mod.gpa.create(DeclPlusEmitH);
parent_struct.* = .{
.emit_h = .{},
.decl = undefined,
};
break :blk &parent_struct.decl;
} else try mod.gpa.create(Decl);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
new_decl.* = .{
.name = name,
2021-10-01 12:05:12 -05:00
.src_namespace = namespace,
.src_node = src_node,
.src_line = undefined,
.has_tv = false,
.owns_tv = false,
.ty = undefined,
.val = undefined,
.align_val = undefined,
.linksection_val = undefined,
2021-09-02 14:46:31 +02:00
.@"addrspace" = undefined,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
.analysis = .unreferenced,
.deletion_flag = false,
.zir_decl_index = 0,
.src_scope = src_scope,
.link = switch (mod.comp.bin_file.tag) {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
.coff => .{ .coff = link.File.Coff.TextBlock.empty },
.elf => .{ .elf = link.File.Elf.TextBlock.empty },
.macho => .{ .macho = link.File.MachO.TextBlock.empty },
.plan9 => .{ .plan9 = link.File.Plan9.DeclBlock.empty },
.c => .{ .c = {} },
.wasm => .{ .wasm = link.File.Wasm.DeclBlock.empty },
2021-01-19 00:34:44 +01:00
.spirv => .{ .spirv = {} },
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
},
.fn_link = switch (mod.comp.bin_file.tag) {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
.coff => .{ .coff = {} },
.elf => .{ .elf = link.File.Elf.SrcFn.empty },
.macho => .{ .macho = link.File.MachO.SrcFn.empty },
.plan9 => .{ .plan9 = {} },
.c => .{ .c = {} },
.wasm => .{ .wasm = link.File.Wasm.FnData.empty },
2021-01-19 00:34:44 +01:00
.spirv => .{ .spirv = .{} },
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
},
.generation = 0,
.is_pub = false,
.is_exported = false,
2021-09-02 14:46:31 +02:00
.has_linksection_or_addrspace = false,
.has_align = false,
stage2: garbage collect unused anon decls After this change, the frontend and backend cooperate to keep track of which Decls are actually emitted into the machine code. When any backend sees a `decl_ref` Value, it must mark the corresponding Decl `alive` field to true. This prevents unused comptime data from spilling into the output object files. For example, if you do an `inline for` loop, previously, any intermediate value calculations would have gone into the object file. Now they are garbage collected immediately after the owner Decl has its machine code generated. In the frontend, when it is time to send a Decl to the linker, if it has not been marked "alive" then it is deleted instead. Additional improvements: * Resolve type ABI layouts after successful semantic analysis of a Decl. This is needed so that the backend has access to struct fields. * Sema: fix incorrect logic in resolveMaybeUndefVal. It should return "not comptime known" instead of a compile error for global variables. * `Value.pointerDeref` now returns `null` in the case that the pointer deref cannot happen at compile-time. This is true for global variables, for example. Another example is if a comptime known pointer has a hard coded address value. * Binary arithmetic sets the requireRuntimeBlock source location to the lhs_src or rhs_src as appropriate instead of on the operator node. * Fix LLVM codegen for slice_elem_val which had the wrong logic for when the operand was not a pointer. As noted in the comment in the implementation of deleteUnusedDecl, a future improvement will be to rework the frontend/linker interface to remove the frontend's responsibility of calling allocateDeclIndexes. I discovered some issues with the plan9 linker backend that are related to this, and worked around them for now.
2021-07-29 19:30:37 -07:00
.alive = false,
.is_usingnamespace = false,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
};
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
return new_decl;
}
/// Get error value for error tag `name`.
pub fn getErrorValue(mod: *Module, name: []const u8) !std.StringHashMapUnmanaged(ErrorInt).KV {
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
const gop = try mod.global_error_set.getOrPut(mod.gpa, name);
if (gop.found_existing) {
return std.StringHashMapUnmanaged(ErrorInt).KV{
.key = gop.key_ptr.*,
.value = gop.value_ptr.*,
};
}
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
errdefer assert(mod.global_error_set.remove(name));
try mod.error_name_list.ensureUnusedCapacity(mod.gpa, 1);
gop.key_ptr.* = try mod.gpa.dupe(u8, name);
gop.value_ptr.* = @intCast(ErrorInt, mod.error_name_list.items.len);
mod.error_name_list.appendAssumeCapacity(gop.key_ptr.*);
return std.StringHashMapUnmanaged(ErrorInt).KV{
.key = gop.key_ptr.*,
.value = gop.value_ptr.*,
};
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
/// Takes ownership of `name` even if it returns an error.
pub fn createAnonymousDeclNamed(
mod: *Module,
block: *Sema.Block,
typed_value: TypedValue,
name: [:0]u8,
) !*Decl {
return mod.createAnonymousDeclFromDeclNamed(block.src_decl, block.namespace, block.wip_capture_scope, typed_value, name);
}
pub fn createAnonymousDecl(mod: *Module, block: *Sema.Block, typed_value: TypedValue) !*Decl {
return mod.createAnonymousDeclFromDecl(block.src_decl, block.namespace, block.wip_capture_scope, typed_value);
}
2021-10-01 12:05:12 -05:00
pub fn createAnonymousDeclFromDecl(
mod: *Module,
src_decl: *Decl,
namespace: *Namespace,
2021-10-01 12:05:12 -05:00
src_scope: ?*CaptureScope,
tv: TypedValue,
) !*Decl {
const name_index = mod.getNextAnonNameIndex();
const name = try std.fmt.allocPrintZ(mod.gpa, "{s}__anon_{d}", .{
src_decl.name, name_index,
});
2021-10-01 12:05:12 -05:00
return mod.createAnonymousDeclFromDeclNamed(src_decl, namespace, src_scope, tv, name);
}
/// Takes ownership of `name` even if it returns an error.
pub fn createAnonymousDeclFromDeclNamed(
mod: *Module,
src_decl: *Decl,
namespace: *Namespace,
src_scope: ?*CaptureScope,
typed_value: TypedValue,
name: [:0]u8,
) !*Decl {
errdefer mod.gpa.free(name);
stage2: type declarations ZIR encode AnonNameStrategy which can be either parent, func, or anon. Here's the enum reproduced in the commit message for convenience: ```zig pub const NameStrategy = enum(u2) { /// Use the same name as the parent declaration name. /// e.g. `const Foo = struct {...};`. parent, /// Use the name of the currently executing comptime function call, /// with the current parameters. e.g. `ArrayList(i32)`. func, /// Create an anonymous name for this declaration. /// Like this: "ParentDeclName_struct_69" anon, }; ``` With this information in the ZIR, a future commit can improve the names of structs, unions, enums, and opaques. In order to accomplish this, the following ZIR instruction forms were removed and replaced with Extended op codes: * struct_decl * struct_decl_packed * struct_decl_extern * union_decl * union_decl_packed * union_decl_extern * enum_decl * enum_decl_nonexhaustive By being extended opcodes, one more u32 is needed, however we more than make up for it by repurposing the 16 "small" bits to provide shorter encodings for when decls_len == 0, fields_len == 0, a source node is not provided, etc. There tends to be no downside, and in fact sometimes upsides, to using an extended op code when there is a need for flag bits, which is the case for all three of these. Likewise, the container layout can be encoded in these bits rather than into the opcode. The following 4 ZIR instructions were added, netting a total of 4 freed up ZIR enum tags for future use: * opaque_decl_anon * opaque_decl_func * error_set_decl_anon * error_set_decl_func This is so that opaques and error sets can have the same name hint as structs, enums, and unions. `std.builtin.ContainerLayout` gets an explicit integer tag type so that it can be used inside packed structs. This commit also makes `Module.Namespace` use a separate set for anonymous decls, thus allowing anonymous decls to share the same `Decl.name` as their owner `Decl` objects.
2021-05-10 21:34:43 -07:00
try namespace.anon_decls.ensureUnusedCapacity(mod.gpa, 1);
const new_decl = try mod.allocateNewDecl(name, namespace, src_decl.src_node, src_scope);
new_decl.src_line = src_decl.src_line;
new_decl.ty = typed_value.ty;
new_decl.val = typed_value.val;
new_decl.align_val = Value.initTag(.null_value);
new_decl.linksection_val = Value.initTag(.null_value);
new_decl.@"addrspace" = .generic; // default global addrspace
new_decl.has_tv = true;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
new_decl.analysis = .complete;
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
new_decl.generation = mod.generation;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
stage2: type declarations ZIR encode AnonNameStrategy which can be either parent, func, or anon. Here's the enum reproduced in the commit message for convenience: ```zig pub const NameStrategy = enum(u2) { /// Use the same name as the parent declaration name. /// e.g. `const Foo = struct {...};`. parent, /// Use the name of the currently executing comptime function call, /// with the current parameters. e.g. `ArrayList(i32)`. func, /// Create an anonymous name for this declaration. /// Like this: "ParentDeclName_struct_69" anon, }; ``` With this information in the ZIR, a future commit can improve the names of structs, unions, enums, and opaques. In order to accomplish this, the following ZIR instruction forms were removed and replaced with Extended op codes: * struct_decl * struct_decl_packed * struct_decl_extern * union_decl * union_decl_packed * union_decl_extern * enum_decl * enum_decl_nonexhaustive By being extended opcodes, one more u32 is needed, however we more than make up for it by repurposing the 16 "small" bits to provide shorter encodings for when decls_len == 0, fields_len == 0, a source node is not provided, etc. There tends to be no downside, and in fact sometimes upsides, to using an extended op code when there is a need for flag bits, which is the case for all three of these. Likewise, the container layout can be encoded in these bits rather than into the opcode. The following 4 ZIR instructions were added, netting a total of 4 freed up ZIR enum tags for future use: * opaque_decl_anon * opaque_decl_func * error_set_decl_anon * error_set_decl_func This is so that opaques and error sets can have the same name hint as structs, enums, and unions. `std.builtin.ContainerLayout` gets an explicit integer tag type so that it can be used inside packed structs. This commit also makes `Module.Namespace` use a separate set for anonymous decls, thus allowing anonymous decls to share the same `Decl.name` as their owner `Decl` objects.
2021-05-10 21:34:43 -07:00
namespace.anon_decls.putAssumeCapacityNoClobber(new_decl, {});
2021-03-28 19:38:19 -07:00
// TODO: This generates the Decl into the machine code file if it is of a
// type that is non-zero size. We should be able to further improve the
// compiler to omit Decls which are only referenced at compile-time and not runtime.
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
if (typed_value.ty.hasCodeGenBits()) {
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
try mod.comp.bin_file.allocateDeclIndexes(new_decl);
try mod.comp.work_queue.writeItem(.{ .codegen_decl = new_decl });
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
return new_decl;
}
pub fn getNextAnonNameIndex(mod: *Module) usize {
stage2: type declarations ZIR encode AnonNameStrategy which can be either parent, func, or anon. Here's the enum reproduced in the commit message for convenience: ```zig pub const NameStrategy = enum(u2) { /// Use the same name as the parent declaration name. /// e.g. `const Foo = struct {...};`. parent, /// Use the name of the currently executing comptime function call, /// with the current parameters. e.g. `ArrayList(i32)`. func, /// Create an anonymous name for this declaration. /// Like this: "ParentDeclName_struct_69" anon, }; ``` With this information in the ZIR, a future commit can improve the names of structs, unions, enums, and opaques. In order to accomplish this, the following ZIR instruction forms were removed and replaced with Extended op codes: * struct_decl * struct_decl_packed * struct_decl_extern * union_decl * union_decl_packed * union_decl_extern * enum_decl * enum_decl_nonexhaustive By being extended opcodes, one more u32 is needed, however we more than make up for it by repurposing the 16 "small" bits to provide shorter encodings for when decls_len == 0, fields_len == 0, a source node is not provided, etc. There tends to be no downside, and in fact sometimes upsides, to using an extended op code when there is a need for flag bits, which is the case for all three of these. Likewise, the container layout can be encoded in these bits rather than into the opcode. The following 4 ZIR instructions were added, netting a total of 4 freed up ZIR enum tags for future use: * opaque_decl_anon * opaque_decl_func * error_set_decl_anon * error_set_decl_func This is so that opaques and error sets can have the same name hint as structs, enums, and unions. `std.builtin.ContainerLayout` gets an explicit integer tag type so that it can be used inside packed structs. This commit also makes `Module.Namespace` use a separate set for anonymous decls, thus allowing anonymous decls to share the same `Decl.name` as their owner `Decl` objects.
2021-05-10 21:34:43 -07:00
return @atomicRmw(usize, &mod.next_anon_name_index, .Add, 1, .Monotonic);
}
pub fn makeIntType(arena: *Allocator, signedness: std.builtin.Signedness, bits: u16) !Type {
const int_payload = try arena.create(Type.Payload.Bits);
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
int_payload.* = .{
.base = .{
.tag = switch (signedness) {
.signed => .int_signed,
.unsigned => .int_unsigned,
},
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
},
.data = bits,
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
};
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
return Type.initPayload(&int_payload.base);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
pub fn errNoteNonLazy(
mod: *Module,
src_loc: SrcLoc,
parent: *ErrorMsg,
comptime format: []const u8,
args: anytype,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
) error{OutOfMemory}!void {
const msg = try std.fmt.allocPrint(mod.gpa, format, args);
errdefer mod.gpa.free(msg);
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
parent.notes = try mod.gpa.realloc(parent.notes, parent.notes.len + 1);
parent.notes[parent.notes.len - 1] = .{
.src_loc = src_loc,
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
.msg = msg,
stage2: implement error notes and regress -femit-zir * Implement error notes - note: other symbol exported here - note: previous else prong is here - note: previous '_' prong is here * Add Compilation.CObject.ErrorMsg. This object properly converts to AllErrors.Message when the time comes. * Add Compilation.CObject.failure_retryable. Properly handles out-of-memory and other transient failures. * Introduce Module.SrcLoc which has not only a byte offset but also references the file which the byte offset applies to. * Scope.Block now contains both a pointer to the "owner" Decl and the "source" Decl. As an example, during inline function call, the "owner" will be the Decl of the caller and the "source" will be the Decl of the callee. * Module.ErrorMsg now sports a `file_scope` field so that notes can refer to source locations in a file other than the parent error message. * Some instances where a `*Scope` was stored, now store a `*Scope.Container`. * Some methods in the `Scope` namespace were moved to the more specific type, since there was only an implementation for one particular tag. - `removeDecl` moved to `Scope.Container` - `destroy` moved to `Scope.File` * Two kinds of Scope deleted: - zir_module - decl * astgen: properly use DeclVal / DeclRef. DeclVal was incorrectly changed to be a reference; this commit fixes it. Fewer ZIR instructions processed as a result. - declval_in_module is renamed to declval - previous declval ZIR instruction is deleted; it was only for .zir files. * Test harness: friendlier diagnostics when an unexpected set of errors is encountered. * zir_sema: fix analyzeInstBlockFlat by properly calling resolvingInst on the last zir instruction in the block. Compile log implementation: * Write to a buffer rather than directly to stderr. * Only keep track of 1 callsite per Decl. * No longer mutate the ZIR Inst struct data. * "Compile log statement found" errors are only emitted when there are no other compile errors. -femit-zir and support for .zir source files is regressed. If we wanted to support this again, outputting .zir would need to be done as yet another backend rather than in the haphazard way it was previously implemented. For parsing .zir, it was implemented previously in a way that was not helpful for debugging. We need tighter integration with the test harness for it to be useful; so clearly a rewrite is needed. Given that a rewrite is needed, and it was getting in the way of progress and organization of the rest of stage2, I regressed the feature.
2021-01-16 22:51:01 -07:00
};
}
pub fn errorUnionType(
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
arena: *Allocator,
error_set: Type,
payload: Type,
) Allocator.Error!Type {
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
assert(error_set.zigTypeTag() == .ErrorSet);
if (error_set.eql(Type.initTag(.anyerror)) and payload.eql(Type.initTag(.void))) {
return Type.initTag(.anyerror_void_error_union);
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
return Type.Tag.error_union.create(arena, .{
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
.error_set = error_set,
.payload = payload,
});
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
pub fn getTarget(mod: Module) Target {
return mod.comp.bin_file.options.target;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
stage2: *WIP*: rework ZIR memory layout; overhaul source locations The memory layout for ZIR instructions is completely reworked. See zir.zig for those changes. Some new types: * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating each instruction independently, there is now a Tag and 8 bytes of data available for all ZIR instructions. Small instructions fit within these 8 bytes; larger ones use 4 bytes for an index into `extra`. There is also `string_bytes` so that we can have 4 byte references to strings. `zir.Inst.Tag` describes how to interpret those 8 bytes of data. - This is shared by all `Block` scopes. * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this structure, the arrays are mutable, and get resized as we add/delete things. There is extra state to keep track of things. This struct is stored on the stack. Once it is finished, it produces an immutable `zir.Code`, which will remain on the heap for the duration of a function's existence. - This is shared by all `GenZir` scopes. * `Sema`: represents in-progress semantic analysis of a `zir.Code`. This data is stored on the stack and is shared among all `Block` scopes. It is now the main "self" argument to everything in the file that was previously named `zir_sema.zig`. Additionally, I moved some logic that was in `Module` into here. `Module.Fn` now stores its parameter names inside the `zir.Code`, instead of inside ZIR instructions. When the TZIR memory layout reworking time comes, codegen will be able to reference this data directly instead of duplicating it. astgen.zig is (so far) almost entirely untouched, but nearly all of it will need to be reworked to adhere to this new memory layout structure. I have no benchmarks to report yet, as I am still working through compile errors and fixing various things that I broke in this branch. Overhaul of Source Locations: Previously we used `usize` everywhere to mean byte offset, but sometimes also mean other stuff. This was error prone and also made us do unnecessary work, and store unnecessary bytes in memory. Now there are more types involved into source locations, and more ways to describe a source location. * AllErrors.Message: embrace the assumption that files always have less than 2 << 32 bytes. * SrcLoc gets more complicated, to model more complicated source locations. * Introduce LazySrcLoc, which can model interesting source locations with very little stored state. Useful for avoiding doing unnecessary work when no compile errors occur. Also, previously, we had `src: usize` on every ZIR instruction. This is no longer the case. Each instruction now determines whether it even cares about source location, and if so, how that source location is stored. This requires more careful work inside `Sema`, but it results in fewer bytes stored on the heap, without compromising accuracy and power of compile error messages. Miscellaneous: * std.zig: string literals have more helpful result values for reporting errors. There is now a lower level API and a higher level API. - side note: I noticed that the string literal logic needs some love. There is some unnecessarily hacky code there. * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This probably broke stuff and needs to get fixed. * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't think this quite how this code will be organized. Need some more careful planning about how to implement structs, unions, enums. They need to be independent Decls, just like a top level function.
2021-03-15 23:38:38 -07:00
pub fn optimizeMode(mod: Module) std.builtin.Mode {
return mod.comp.bin_file.options.optimize_mode;
stage2: caching system integration & Module/Compilation splitting * update to the new cache hash API * std.Target defaultVersionRange moves to std.Target.Os.Tag * std.Target.Os gains getVersionRange which returns a tagged union * start the process of splitting Module into Compilation and "zig module". - The parts of Module having to do with only compiling zig code are extracted into ZigModule.zig. - Next step is to rename Module to Compilation. - After that rename ZigModule back to Module. * implement proper cache hash usage when compiling C objects, and properly manage the file lock of the build artifacts. * make versions optional to match recent changes to master branch. * proper cache hash integration for compiling zig code * proper cache hash integration for linking even when not compiling zig code. * ELF LLD linking integrates with the caching system. A comment from the source code: Here we want to determine whether we can save time by not invoking LLD when the output is unchanged. None of the linker options or the object files that are being linked are in the hash that namespaces the directory we are outputting to. Therefore, we must hash those now, and the resulting digest will form the "id" of the linking job we are about to perform. After a successful link, we store the id in the metadata of a symlink named "id.txt" in the artifact directory. So, now, we check if this symlink exists, and if it matches our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD. * implement disable_c_depfile option * add tracy to a few more functions
2020-09-13 19:17:58 -07:00
}
fn lockAndClearFileCompileError(mod: *Module, file: *File) void {
switch (file.status) {
.success_zir, .retryable_failure => {},
.never_loaded, .parse_failure, .astgen_failure => {
const lock = mod.comp.mutex.acquire();
defer lock.release();
if (mod.failed_files.fetchSwapRemove(file)) |kv| {
if (kv.value) |msg| msg.destroy(mod.gpa); // Delete previous error message.
}
},
}
}
pub const SwitchProngSrc = union(enum) {
scalar: u32,
multi: Multi,
range: Multi,
pub const Multi = struct {
prong: u32,
item: u32,
};
pub const RangeExpand = enum { none, first, last };
/// This function is intended to be called only when it is certain that we need
/// the LazySrcLoc in order to emit a compile error.
pub fn resolve(
prong_src: SwitchProngSrc,
gpa: *Allocator,
decl: *Decl,
switch_node_offset: i32,
range_expand: RangeExpand,
) LazySrcLoc {
@setCold(true);
2021-10-01 12:05:12 -05:00
const tree = decl.getFileScope().getTree(gpa) catch |err| {
// In this case we emit a warning + a less precise source location.
log.warn("unable to load {s}: {s}", .{
2021-10-01 12:05:12 -05:00
decl.getFileScope().sub_file_path, @errorName(err),
});
return LazySrcLoc{ .node_offset = 0 };
};
const switch_node = decl.relativeToNodeIndex(switch_node_offset);
const main_tokens = tree.nodes.items(.main_token);
const node_datas = tree.nodes.items(.data);
const node_tags = tree.nodes.items(.tag);
const extra = tree.extraData(node_datas[switch_node].rhs, Ast.Node.SubRange);
const case_nodes = tree.extra_data[extra.start..extra.end];
var multi_i: u32 = 0;
var scalar_i: u32 = 0;
for (case_nodes) |case_node| {
const case = switch (node_tags[case_node]) {
.switch_case_one => tree.switchCaseOne(case_node),
.switch_case => tree.switchCase(case_node),
else => unreachable,
};
if (case.ast.values.len == 0)
continue;
if (case.ast.values.len == 1 and
node_tags[case.ast.values[0]] == .identifier and
mem.eql(u8, tree.tokenSlice(main_tokens[case.ast.values[0]]), "_"))
{
continue;
}
const is_multi = case.ast.values.len != 1 or
node_tags[case.ast.values[0]] == .switch_range;
switch (prong_src) {
.scalar => |i| if (!is_multi and i == scalar_i) return LazySrcLoc{
.node_offset = decl.nodeIndexToRelative(case.ast.values[0]),
},
.multi => |s| if (is_multi and s.prong == multi_i) {
var item_i: u32 = 0;
for (case.ast.values) |item_node| {
if (node_tags[item_node] == .switch_range) continue;
if (item_i == s.item) return LazySrcLoc{
.node_offset = decl.nodeIndexToRelative(item_node),
};
item_i += 1;
} else unreachable;
},
.range => |s| if (is_multi and s.prong == multi_i) {
var range_i: u32 = 0;
for (case.ast.values) |range| {
if (node_tags[range] != .switch_range) continue;
if (range_i == s.item) switch (range_expand) {
.none => return LazySrcLoc{
.node_offset = decl.nodeIndexToRelative(range),
},
.first => return LazySrcLoc{
.node_offset = decl.nodeIndexToRelative(node_datas[range].lhs),
},
.last => return LazySrcLoc{
.node_offset = decl.nodeIndexToRelative(node_datas[range].rhs),
},
};
range_i += 1;
} else unreachable;
},
}
if (is_multi) {
multi_i += 1;
} else {
scalar_i += 1;
}
} else unreachable;
}
};
pub const PeerTypeCandidateSrc = union(enum) {
/// Do not print out error notes for candidate sources
none: void,
/// When we want to know the the src of candidate i, look up at
/// index i in this slice
override: []LazySrcLoc,
/// resolvePeerTypes originates from a @TypeOf(...) call
typeof_builtin_call_node_offset: i32,
pub fn resolve(
self: PeerTypeCandidateSrc,
gpa: *Allocator,
decl: *Decl,
candidate_i: usize,
) ?LazySrcLoc {
@setCold(true);
switch (self) {
.none => {
return null;
},
.override => |candidate_srcs| {
return candidate_srcs[candidate_i];
},
.typeof_builtin_call_node_offset => |node_offset| {
switch (candidate_i) {
0 => return LazySrcLoc{ .node_offset_builtin_call_arg0 = node_offset },
1 => return LazySrcLoc{ .node_offset_builtin_call_arg1 = node_offset },
2 => return LazySrcLoc{ .node_offset_builtin_call_arg2 = node_offset },
3 => return LazySrcLoc{ .node_offset_builtin_call_arg3 = node_offset },
4 => return LazySrcLoc{ .node_offset_builtin_call_arg4 = node_offset },
5 => return LazySrcLoc{ .node_offset_builtin_call_arg5 = node_offset },
else => {},
}
2021-10-01 12:05:12 -05:00
const tree = decl.getFileScope().getTree(gpa) catch |err| {
// In this case we emit a warning + a less precise source location.
log.warn("unable to load {s}: {s}", .{
2021-10-01 12:05:12 -05:00
decl.getFileScope().sub_file_path, @errorName(err),
});
return LazySrcLoc{ .node_offset = 0 };
};
const node = decl.relativeToNodeIndex(node_offset);
const node_datas = tree.nodes.items(.data);
const params = tree.extra_data[node_datas[node].lhs..node_datas[node].rhs];
return LazySrcLoc{ .node_abs = params[candidate_i] };
},
}
}
};
/// Called from `performAllTheWork`, after all AstGen workers have finished,
/// and before the main semantic analysis loop begins.
pub fn processOutdatedAndDeletedDecls(mod: *Module) !void {
// Ultimately, the goal is to queue up `analyze_decl` tasks in the work queue
// for the outdated decls, but we cannot queue up the tasks until after
// we find out which ones have been deleted, otherwise there would be
// deleted Decl pointers in the work queue.
var outdated_decls = std.AutoArrayHashMap(*Decl, void).init(mod.gpa);
defer outdated_decls.deinit();
for (mod.import_table.values()) |file| {
try outdated_decls.ensureUnusedCapacity(file.outdated_decls.items.len);
for (file.outdated_decls.items) |decl| {
outdated_decls.putAssumeCapacity(decl, {});
}
file.outdated_decls.clearRetainingCapacity();
// Handle explicitly deleted decls from the source code. This is one of two
// places that Decl deletions happen. The other is in `Compilation`, after
// `performAllTheWork`, where we iterate over `Module.deletion_set` and
// delete Decls which are no longer referenced.
// If a Decl is explicitly deleted from source, and also no longer referenced,
// it may be both in this `deleted_decls` set, as well as in the
// `Module.deletion_set`. To avoid deleting it twice, we remove it from the
// deletion set at this time.
for (file.deleted_decls.items) |decl| {
log.debug("deleted from source: {*} ({s})", .{ decl, decl.name });
// Remove from the namespace it resides in, preserving declaration order.
assert(decl.zir_decl_index != 0);
2021-10-01 12:05:12 -05:00
_ = decl.src_namespace.decls.orderedRemove(mem.spanZ(decl.name));
try mod.clearDecl(decl, &outdated_decls);
decl.destroy(mod);
}
file.deleted_decls.clearRetainingCapacity();
}
// Finally we can queue up re-analysis tasks after we have processed
// the deleted decls.
for (outdated_decls.keys()) |key| {
try mod.markOutdatedDecl(key);
}
}
/// Called from `Compilation.update`, after everything is done, just before
/// reporting compile errors. In this function we emit exported symbol collision
/// errors and communicate exported symbols to the linker backend.
pub fn processExports(mod: *Module) !void {
const gpa = mod.gpa;
// Map symbol names to `Export` for name collision detection.
var symbol_exports: std.StringArrayHashMapUnmanaged(*Export) = .{};
defer symbol_exports.deinit(gpa);
var it = mod.decl_exports.iterator();
while (it.next()) |entry| {
const exported_decl = entry.key_ptr.*;
const exports = entry.value_ptr.*;
for (exports) |new_export| {
const gop = try symbol_exports.getOrPut(gpa, new_export.options.name);
if (gop.found_existing) {
new_export.status = .failed_retryable;
try mod.failed_exports.ensureUnusedCapacity(gpa, 1);
const src_loc = new_export.getSrcLoc();
const msg = try ErrorMsg.create(gpa, src_loc, "exported symbol collision: {s}", .{
new_export.options.name,
});
errdefer msg.destroy(gpa);
const other_export = gop.value_ptr.*;
const other_src_loc = other_export.getSrcLoc();
try mod.errNoteNonLazy(other_src_loc, msg, "other symbol here", .{});
mod.failed_exports.putAssumeCapacityNoClobber(new_export, msg);
new_export.status = .failed;
} else {
gop.value_ptr.* = new_export;
}
}
mod.comp.bin_file.updateDeclExports(mod, exported_decl, exports) catch |err| switch (err) {
error.OutOfMemory => return error.OutOfMemory,
else => {
const new_export = exports[0];
new_export.status = .failed_retryable;
try mod.failed_exports.ensureUnusedCapacity(gpa, 1);
const src_loc = new_export.getSrcLoc();
const msg = try ErrorMsg.create(gpa, src_loc, "unable to export: {s}", .{
@errorName(err),
});
mod.failed_exports.putAssumeCapacityNoClobber(new_export, msg);
},
};
}
}
pub fn populateTestFunctions(mod: *Module) !void {
const gpa = mod.gpa;
const builtin_pkg = mod.main_pkg.table.get("builtin").?;
const builtin_file = (mod.importPkg(builtin_pkg) catch unreachable).file;
2021-10-01 12:05:12 -05:00
const builtin_namespace = builtin_file.root_decl.?.src_namespace;
const decl = builtin_namespace.decls.get("test_functions").?;
var buf: Type.SlicePtrFieldTypeBuffer = undefined;
const tmp_test_fn_ty = decl.ty.slicePtrFieldType(&buf).elemType();
const array_decl = d: {
// Add mod.test_functions to an array decl then make the test_functions
// decl reference it as a slice.
var new_decl_arena = std.heap.ArenaAllocator.init(gpa);
errdefer new_decl_arena.deinit();
const arena = &new_decl_arena.allocator;
const test_fn_vals = try arena.alloc(Value, mod.test_functions.count());
2021-10-01 12:05:12 -05:00
const array_decl = try mod.createAnonymousDeclFromDecl(decl, decl.src_namespace, null, .{
.ty = try Type.Tag.array.create(arena, .{
.len = test_fn_vals.len,
.elem_type = try tmp_test_fn_ty.copy(arena),
}),
.val = try Value.Tag.array.create(arena, test_fn_vals),
});
for (mod.test_functions.keys()) |test_decl, i| {
const test_name_slice = mem.sliceTo(test_decl.name, 0);
const test_name_decl = n: {
var name_decl_arena = std.heap.ArenaAllocator.init(gpa);
errdefer name_decl_arena.deinit();
const bytes = try name_decl_arena.allocator.dupe(u8, test_name_slice);
2021-10-01 12:05:12 -05:00
const test_name_decl = try mod.createAnonymousDeclFromDecl(array_decl, array_decl.src_namespace, null, .{
.ty = try Type.Tag.array_u8.create(&name_decl_arena.allocator, bytes.len),
.val = try Value.Tag.bytes.create(&name_decl_arena.allocator, bytes),
});
try test_name_decl.finalizeNewArena(&name_decl_arena);
break :n test_name_decl;
};
try mod.linkerUpdateDecl(test_name_decl);
const field_vals = try arena.create([3]Value);
field_vals.* = .{
try Value.Tag.slice.create(arena, .{
.ptr = try Value.Tag.decl_ref.create(arena, test_name_decl),
.len = try Value.Tag.int_u64.create(arena, test_name_slice.len),
}), // name
try Value.Tag.decl_ref.create(arena, test_decl), // func
Value.initTag(.null_value), // async_frame_size
};
test_fn_vals[i] = try Value.Tag.@"struct".create(arena, field_vals);
}
try array_decl.finalizeNewArena(&new_decl_arena);
break :d array_decl;
};
try mod.linkerUpdateDecl(array_decl);
{
var arena_instance = decl.value_arena.?.promote(gpa);
defer decl.value_arena.?.* = arena_instance.state;
const arena = &arena_instance.allocator;
decl.ty = try Type.Tag.const_slice.create(arena, try tmp_test_fn_ty.copy(arena));
decl.val = try Value.Tag.slice.create(arena, .{
.ptr = try Value.Tag.decl_ref.create(arena, array_decl),
.len = try Value.Tag.int_u64.create(arena, mod.test_functions.count()),
});
}
try mod.linkerUpdateDecl(decl);
}
pub fn linkerUpdateDecl(mod: *Module, decl: *Decl) !void {
mod.comp.bin_file.updateDecl(mod, decl) catch |err| switch (err) {
error.OutOfMemory => return error.OutOfMemory,
error.AnalysisFail => {
decl.analysis = .codegen_failure;
return;
},
else => {
const gpa = mod.gpa;
try mod.failed_decls.ensureUnusedCapacity(gpa, 1);
mod.failed_decls.putAssumeCapacityNoClobber(decl, try ErrorMsg.create(
gpa,
decl.srcLoc(),
"unable to codegen: {s}",
.{@errorName(err)},
));
decl.analysis = .codegen_failure_retryable;
return;
},
};
}