Rework API graphique Vulkan - EnTT pour ECS + Chargement modèle 3D assimp + SDL3 pour events input et fenetre + mesh texture camera transform ok + attention tous les assets nouveaus ne sont pas commités et il y a du code test en dur dans scene addentity + restructuration globale

This commit is contained in:
Tom Ray
2026-03-14 20:24:17 +01:00
parent 7c352bc280
commit 6695d46bcd
672 changed files with 238656 additions and 1821 deletions

View File

@@ -0,0 +1,156 @@
Slang 64-bit Type Support
=========================
## Summary
* Not all targets support 64 bit types, or all 64 bit types
* 64 bit integers generally require later APIs/shader models
* When specifying 64 bit floating-point literals *always* use the type suffixes (ie `L`)
* An integer literal will be interpreted as 64 bits if it cannot fit in a 32 bit value.
* GPU target/s generally do not support all double intrinsics
* Typically missing are trascendentals (sin, cos etc), logarithm and exponential functions
* CUDA is the exception supporting nearly all double intrinsics
* D3D
* D3D targets *appear* to support double intrinsics (like sin, cos, log etc), but behind the scenes they are actually being converted to float
* When using D3D12, it is best to use DXIL if you use double because there are some serious issues around double and DXBC
* VK will produce an error in validation if a double intrinsic is used it does support (which is most of them)
* Vector and Matrix types have even spottier than scalar intrinsic support across targets
Overview
========
The Slang language supports 64 bit built in types. Such as
* `double`
* `uint64_t`
* `int64_t`
This also applies to vector and matrix versions of these types.
Unfortunately if a specific target supports the type or the typical HLSL intrinsic functions (such as sin/cos/max/min etc) depends very much on the target.
Special attention has to be made with respect to literal 64 bit types. By default float literals if they do not have an explicit suffix are assumed to be 32 bit. There is a variety of reasons for this design choice - the main one being around by default behavior of getting good performance. The suffixes required for 64 bit types are as follows
```
// double - 'l' or 'L'
double a = 1.34e-200L;
// WRONG!: This is the same as b = double(float(1.34e-200)) which will be 0. Will produce a warning.
double b = 1.34e-200;
// int64_t - 'll' or 'LL' (or combination of upper/lower)
int64_t c = -5436365345345234ll;
int64_t e = ~0LL; // Same as 0xffffffffffffffff
// uint64_t - 'ull' or 'ULL' (or combination of upper/lower)
uint64_t g = 0x8000000000000000ull;
uint64_t i = ~0ull; // Same as 0xffffffffffffffff
uint64_t j = ~0; // Equivalent to 'i' because uint64_t(int64_t(~int32_t(0)));
```
These issues are discussed more on issue [#1185](https://github.com/shader-slang/slang/issues/1185)
The type of a decimal non-suffixed integer literal is the first integer type from the list [`int`, `int64_t`]
which can represent the specified literal value. If the value cannot fit, the literal is represented as an `uint64_t`
and a warning is given.
The type of a hexadecimal non-suffixed integer literal is the first type from the list [`int`, `uint`, `int64_t`, `uint64_t`]
that can represent the specified literal value. A non-suffixed integer literal will be 64 bit if it cannot fit in 32 bits.
```
// Same as int64_t a = int(1), the value can fit into a 32 bit integer.
int64_t a = 1;
// Same as int64_t b = int64_t(2147483648), the value cannot fit into a 32 bit integer.
int64_t b = 2147483648;
// Same as int64_t c = uint64_t(18446744073709551615), the value is larger than the maximum value of a signed 64 bit
// integer, and is interpreted as an unsigned 64 bit integer. Warning is given.
uint64_t c = 18446744073709551615;
// Same as uint64_t = int(0x7FFFFFFF), the value can fit into a 32 bit integer.
uint64_t d = 0x7FFFFFFF;
// Same as uint64_t = int64_t(0x7FFFFFFFFFFFFFFF), the value cannot fit into an unsigned 32 bit integer but
// can fit into a signed 64 bit integer.
uint64_t e = 0x7FFFFFFFFFFFFFFF;
// Same as uint64_t = uint64_t(0xFFFFFFFFFFFFFFFF), the value cannot fit into a signed 64 bit integer, and
// is interpreted as an unsigned 64 bit integer.
uint64_t f = 0xFFFFFFFFFFFFFFFF;
```
Double support
==============
Target | Compiler/Binary | Double Type | Intrinsics | Notes
---------|------------------|----------------|-----------------------|-----------
CPU | | Yes | Yes | 1
CUDA | Nvrtx/PTX | Yes | Yes | 1
D3D12 | DXC/DXIL | Yes | Small Subset | 4
Vulkan | GlSlang/Spir-V | Yes | Partial | 2
D3D11 | FXC/DXBC | Yes | Small Subset | 4
D3D12 | FXC/DXBC | Yes | Small Subset | 3, 4
1) CUDA and CPU support most intrinsics, with the notable exception currently of matrix invert
2) In terms of lack of general intrinsic support, the restriction is described in https://www.khronos.org/registry/spir-v/specs/1.0/GLSL.std.450.html
The following intrinsics are available for Vulkan
`fmod` (as %), `rcp`, `sign`, `saturate`, `sqrt`, `rsqrt`, `frac`, `ceil`, `floor`, `trunc`, `abs`, `min`, `max`, `smoothstep`, `lerp`, `clamp`, `step` and `asuint`.
These are tested in the test `tests/hlsl-intrinsic/scalar-double-vk-intrinsic.slang`.
What is missing are transedentals, expX, logX.
Note that GlSlang does produce Spir-V that contains double intrinsic calls for the missing intrinsics, the failure happens when validating the Spir-V
```
Validation: error 0: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object: VK_NULL_HANDLE (Type = 0) | SPIR-V module not valid: GLSL.std.450 Sin: expected Result Type to be a 16 or 32-bit scalar or vector float type
%57 = OpExtInst %double %1 Sin %56
```
3) That if a RWStructuredBuffer<double> is used on D3D12 with DXBC, and a double is written, it can lead to incorrect behavior. Thus it is recommended not to use double with dxbc, but to use dxil to keep things simple. A test showing this problem is `tests/bugs/dxbc-double-problem.slang`. The test `tests/hlsl-intrinsic/scalar-double-simple.slang` shows not using a double resource, doubles do appear to work on D3D12 DXBC.
4) If you compile code using double and intrinsics through Slang at first blush it will seem to work. Assuming there are no errors in your code, your code will even typically appear to work correctly. Unfortunately what is really happening is the backend compiler (fxc or dxc) compiler is narrowing double to float and then using float intrinsics. It typically generates a warning when this happens, but unless there is an error in your code you will not see these warnings because dxc doesn't appear to have a mechanism to return warnings if there isn't an error. This is why everything appears to work - but actually any intrinsic call is losing precision silently.
Note on dxc by default Slang disables warnings - warnings need to be enabled to see the narrowing warnings.
There is another exception around the use of % - if you do this with double it will return an error saying on float is supported.
It appears that no intrinsics are available for double with fxc.
On dxc the following intrinsics are available with double::
`rcp`, `sign`, `saturate`, `abs`, `min`, `max`, `clamp`, `asuint`.
These are tested in the test `tests/hlsl-intrinsic/scalar-double-d3d-intrinsic.slang`.
There is no support for transcendentals (`sin`, `cos` etc) or `log`/`exp`. More surprising is that `sqrt`, `rsqrt`, `frac`, `ceil`, `floor`, `trunc`, `step`, `lerp`, `smoothstep` are also not supported.
uint64_t and int64_t Support
============================
Target | Compiler/Binary | u/int64_t Type | Intrinsic support | Notes
---------|------------------|----------------|--------------------|--------
CPU | | Yes | Yes |
CUDA | Nvrtx/PTX | Yes | Yes |
Vulkan | GlSlang/Spir-V | Yes | Yes |
D3D12 | DXC/DXIL | Yes | Yes | 1
D3D11 | FXC/DXBC | No | No | 2
D3D12 | FXC/DXBC | No | No | 2
1) The [sm6.0 docs](https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/hlsl-shader-model-6-0-features-for-direct3d-12) describe only supporting uint64_t, but dxc says int64_t is supported in [HLSL 2016](https://github.com/Microsoft/DirectXShaderCompiler/wiki/Language-Versions). Tests show that this is indeed the case.
2) uint64_t support requires https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/hlsl-shader-model-6-0-features-for-direct3d-12, so DXBC is not a target.
The intrinsics available on `uint64_t` type are `abs`, `min`, `max`, `clamp` and `countbits`.
The intrinsics available on `int64_t` type are `abs`, `min`, `max`, `clamp` and `countbits`.
GLSL
====
GLSL/Spir-v based targets do not support 'generated' intrinsics on matrix types. For example 'sin(mat)' will not work on GLSL/Spir-v.

View File

@@ -0,0 +1,35 @@
Slang Documentation
===================
This directory contains documentation for the Slang system.
Some of the documentation is intended for users of the language and compiler, while other documentation is intended for developers contributing to the project.
Getting Started
---------------
The Slang [User's Guide](https://shader-slang.github.io/slang/user-guide/) provides an introduction to the Slang language and its major features, as well as the compilation and reflection API.
There is also documentation specific to using the [slangc](https://shader-slang.github.io/slang/user-guide/compiling.html#command-line-compilation-with-slangc) command-line tool.
Advanced Users
--------------
For the benefit of advanced users we provide detailed documentation on how Slang compiles code for specific platforms.
The [target compatibility guide](target-compatibility.md) gives an overview of feature compatibility for targets.
The [CPU target guide](cpu-target.md) gives information on compiling Slang or C++ source into shared libraries/executables or functions that can be directly executed. It also covers how to generate C++ code from Slang source.
The [CUDA target guide](cuda-target.md) provides information on compiling Slang/HLSL or CUDA source. Slang can compile to equivalent CUDA source, as well as to PTX via the nvrtc CUDA compiler.
Contributors
------------
For contributors to the Slang project, the information under the [`design/`](design/) directory may help explain the rationale behind certain design decisions and help when ramping up in the codebase.
Research
--------
The Slang project is based on a long history of research work. While understanding this research is not necessary for working with Slang, it may be instructive for understanding the big-picture goals of the language, as well as why certain critical decisions were made.
A [paper](http://graphics.cs.cmu.edu/projects/slang/) on the Slang system was accepted into SIGGRAPH 2018, and it provides an overview of the language and the compiler implementation.
Yong He's [dissertation](http://graphics.cs.cmu.edu/projects/renderergenerator/yong_he_thesis.pdf) provided more detailed discussion of the design of the Slang system.

View File

@@ -0,0 +1 @@
theme: jekyll-theme-tactile

View File

@@ -0,0 +1,137 @@
{% capture headingsWorkspace %}
{% comment %}
Copyright (c) 2018 Vladimir "allejo" Jimenez
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
{% endcomment %}
{% comment %}
Version 1.0.9
https://github.com/allejo/jekyll-anchor-headings
"Be the pull request you wish to see in the world." ~Ben Balter
Usage:
{% include anchor_headings.html html=content anchorBody="#" %}
Parameters:
* html (string) - the HTML of compiled markdown generated by kramdown in Jekyll
Optional Parameters:
* beforeHeading (bool) : false - Set to true if the anchor should be placed _before_ the heading's content
* headerAttrs (string) : '' - Any custom HTML attributes that will be added to the heading tag; you may NOT use `id`;
the `%heading%` and `%html_id%` placeholders are available
* anchorAttrs (string) : '' - Any custom HTML attributes that will be added to the `<a>` tag; you may NOT use `href`,
`class` or `title`;
the `%heading%` and `%html_id%` placeholders are available
* anchorBody (string) : '' - The content that will be placed inside the anchor; the `%heading%` placeholder is
available
* anchorClass (string) : '' - The class(es) that will be used for each anchor. Separate multiple classes with a
space
* anchorTitle (string) : '' - The `title` attribute that will be used for anchors
* h_min (int) : 1 - The minimum header level to build an anchor for; any header lower than this value will be
ignored
* h_max (int) : 6 - The maximum header level to build an anchor for; any header greater than this value will be
ignored
* bodyPrefix (string) : '' - Anything that should be inserted inside of the heading tag _before_ its anchor and
content
* bodySuffix (string) : '' - Anything that should be inserted inside of the heading tag _after_ its anchor and
content
Output:
The original HTML with the addition of anchors inside of all of the h1-h6 headings.
{% endcomment %}
{% assign minHeader = include.h_min | default: 1 %}
{% assign maxHeader = include.h_max | default: 2 %}
{% assign beforeHeading = include.beforeHeading %}
{% assign nodes = include.html | split: '<h' %} {% capture edited_headings %}{% endcapture %} {% for _node in nodes
%} {% capture node %}{{ _node | strip }}{% endcapture %} {% if node=="" %} {% continue %} {% endif %} {% assign
nextChar=node | replace: '"' , '' | strip | slice: 0, 1 %} {% assign headerLevel=nextChar | times: 1 %} <!-- If
the level is cast to 0, it means it's not a h1-h6 tag, so let's see if we need to fix it -->
{% if headerLevel == 0 %}
<!-- Split up the node based on closing angle brackets and get the first one. -->
{% assign firstChunk = node | split: '>' | first %}
<!-- If the first chunk does NOT contain a '<', that means we've broken another HTML tag that starts with 'h' -->
{% unless firstChunk contains '<' %} {% capture node %}<h{{ node }}{% endcapture %} {% endunless %} {% capture
edited_headings %}{{ edited_headings }}{{ node }}{% endcapture %} {% continue %} {% endif %} {% capture
_closingTag %}</h{{ headerLevel }}>{% endcapture %}
{% assign _workspace = node | split: _closingTag %}
{% assign _idWorkspace = _workspace[0] | split: 'id="' %}
{% assign _idWorkspace = _idWorkspace[1] | split: '"' %}
{% assign html_id = _idWorkspace[0] %}
{% capture _hAttrToStrip %}{{ _workspace[0] | split: '>' | first }}>{% endcapture %}
{% assign header = _workspace[0] | replace: _hAttrToStrip, '' %}
<!-- Build the anchor to inject for our heading -->
{% capture anchor %}{% endcapture %}
{% if html_id and headerLevel >= minHeader and headerLevel <= maxHeader %} {% assign escaped_header=header |
strip_html %} {% if include.headerAttrs %} {% capture _hAttrToStrip %}{{ _hAttrToStrip | split: '>' |
first }} {{ include.headerAttrs | replace: '%heading%' , escaped_header | replace: '%html_id%' , html_id
}}>{% endcapture %}
{% endif %}
{% capture anchor %}href="#{{ html_id }}"{% endcapture %}
{% if include.anchorClass %}
{% capture anchor %}{{ anchor }} class="{{ include.anchorClass }}"{% endcapture %}
{% endif %}
{% if include.anchorTitle %}
{% capture anchor %}{{ anchor }} title="{{ include.anchorTitle | replace: '%heading%', escaped_header
}}"{% endcapture %}
{% endif %}
{% if include.anchorAttrs %}
{% capture anchor %}{{ anchor }} {{ include.anchorAttrs | replace: '%heading%', escaped_header |
replace: '%html_id%', html_id }}{% endcapture %}
{% endif %}
{% capture anchor %}<a {{ anchor }}>{{ include.anchorBody | replace: '%heading%', escaped_header |
default: '' }}</a>{% endcapture %}
<!-- In order to prevent adding extra space after a heading, we'll let the 'anchor' value contain it -->
{% if beforeHeading %}
{% capture anchor %}{{ anchor }} {% endcapture %}
{% else %}
{% capture anchor %} {{ anchor }}{% endcapture %}
{% endif %}
{% endif %}
{% capture new_heading %}
<h{{ _hAttrToStrip }} {{ include.bodyPrefix }} {% if beforeHeading %} {{ anchor }}{{ header }} {% else
%} {{ header }}{{ anchor }} {% endif %} {{ include.bodySuffix }} </h{{ headerLevel }}>
{% endcapture %}
<!--
If we have content after the `</hX>` tag, then we'll want to append that here so we don't lost any content.
-->
{% assign chunkCount = _workspace | size %}
{% if chunkCount > 1 %}
{% capture new_heading %}{{ new_heading }}{{ _workspace | last }}{% endcapture %}
{% endif %}
{% capture edited_headings %}{{ edited_headings }}{{ new_heading }}{% endcapture %}
{% endfor %}
{% endcapture %}{% assign headingsWorkspace = '' %}{{ edited_headings | strip }}

View File

@@ -0,0 +1,225 @@
<!DOCTYPE html>
<html lang="{{ site.lang | default: " en-US" }}">
<head>
<meta charset='utf-8'>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link rel="stylesheet" href="{{ '/assets/css/style.css?v=' | append: site.github.build_revision | relative_url }}">
<link rel="stylesheet" type="text/css" href="{{ '/assets/css/print.css' | relative_url }}" media="print">
<script async src="https://www.googletagmanager.com/gtag/js?id=G-TMTZVLLMBP"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-TMTZVLLMBP');
</script>
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<style>
#centeringDiv {
margin: auto;
max-width: 1200px;
}
#navDiv
{
display: block;
box-sizing: border-box;
padding-top: 5px;
padding-bottom: 5px;
border-bottom-width: 3px;
border-bottom-style: solid;
border-bottom-color: #F0F0F0;
}
#navDiv nav
{
float:left;
}
#navDiv::after {
content: "";
clear: both;
display: table;
}
#navDiv nav li::after
{
content: "/";
padding-left: 10px;
padding-right: 0px;
color: #808080;
}
#navDiv nav li
{
display:inline;
padding-left: 10px;
padding-right: 0px;
}
#tocColumn {
width: 350px;
position: fixed;
overflow-y: auto;
box-sizing: border-box;
display: block;
}
#tocInner {
padding: 20px;
}
#rightColumn {
padding-left: 390px;
padding-right: 40px;
padding-top: 20px;
}
.toc_root_list {
list-style-type: none;
list-style-position: outside;
background-color: initial;
padding-left: 0px;
}
.toc_list {
padding-left: 16px;
background-color: initial;
list-style-type: none;
margin-bottom: 0px;
}
.toc_item {
cursor: pointer;
user-select: none;
list-style-type: none;
padding-left: 0px;
padding-top: 5px;
}
.toc_item_expanded::before {
content: "\25be";
cursor: pointer;
}
.toc_item_collapsed::before {
content: "\25b8";
cursor: pointer;
}
.toc_item_leaf {
padding-left: 14px;
cursor: pointer;
list-style-type: none;
}
.toc_span:hover
{
color: #d5000d;
}
.tocIcon
{
vertical-align: -2.5px;
}
.editButton
{
float: right;
margin-right: 10px;
color:#808080;
}
.editIcon
{
fill: currentColor;
vertical-align: text-top;
}
#btnToggleTOC {
display: none;
width: fit-content;
margin-left: 10px;
margin-top: 10px;
padding: 10px;
border-style: solid;
border-color: #808080;
border-width: 1px;
background-color: #E8E8E8;
}
#btnToggleTOC:hover {
background-color: #F0F0E8;
}
#btnToggleTOC:active {
background-color: #D4D4D4;
}
@media screen and (max-width: 900px) {
#tocColumn {
width: 300px;
display: block;
box-sizing: border-box;
}
#rightColumn {
padding-left: 320px;
padding-right: 20px;
}
}
@media screen and (max-width: 700px) {
#tocColumn {
width: 100%;
position: static;
display: none;
border-right-style: none;
box-sizing: content-box;
}
#tocInner {
padding: 10px;
}
#rightColumn {
padding-left: 10px;
padding-right: 10px;
}
#centeringDiv {
padding-left: 0px;
}
#btnToggleTOC {
display: block;
}
}
</style>
{% seo %}
</head>
<body>
<div id="centeringDiv">
<div id="navDiv">
<a class="editButton" title="Edit this page" href="https://github.com/{{ site.github.repository_nwo }}/edit/master/docs/{{ page.path }}">
<svg class="editIcon" height="16" viewBox="0 0 16 16" version="1.1" width="16" aria-hidden="true">
<path fill-rule="evenodd"
d="M11.013 1.427a1.75 1.75 0 012.474 0l1.086 1.086a1.75 1.75 0 010 2.474l-8.61 8.61c-.21.21-.47.364-.756.445l-3.251.93a.75.75 0 01-.927-.928l.929-3.25a1.75 1.75 0 01.445-.758l8.61-8.61zm1.414 1.06a.25.25 0 00-.354 0L10.811 3.75l1.439 1.44 1.263-1.263a.25.25 0 000-.354l-1.086-1.086zM11.189 6.25L9.75 4.81l-6.286 6.287a.25.25 0 00-.064.108l-.558 1.953 1.953-.558a.249.249 0 00.108-.064l6.286-6.286z">
</path>
</svg>
</a>
</div>
<div id="rightColumn">
<section id="main_content">
{% include anchor_headings.html html=content anchorBody="" %}
</section>
<a href="javascript:;" id="_content_end_"></a>
<footer>
{% if site.github.is_project_page %}
{{ site.title | default: site.github.repository_name }} is maintained by <a
href="{{ site.github.owner_url }}">{{ site.github.owner_name }}</a><br>
{% endif %}
This page was generated by <a href="https://pages.github.com">GitHub Pages</a>.
</footer>
</div>
</div>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$$','$$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\(","\\)"] ],
},
TeX: {
Macros: {
bra: ["\\langle{#1}|", 1],
ket: ["|{#1}\\rangle", 1],
braket: ["\\langle{#1}\\rangle", 1],
bk: ["\\langle{#1}|{#2}|{#3}\\rangle", 3]
}
}
});
</script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
</body>
</html>

View File

@@ -0,0 +1,417 @@
<!DOCTYPE html>
<html lang="{{ site.lang | default: " en-US" }}">
<head>
<meta charset='utf-8'>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link rel="stylesheet" href="{{ '/assets/css/style.css?v=' | append: site.github.build_revision | relative_url }}">
<link rel="stylesheet" type="text/css" href="{{ '/assets/css/print.css' | relative_url }}" media="print">
<script async src="https://www.googletagmanager.com/gtag/js?id=G-TMTZVLLMBP"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-TMTZVLLMBP');
</script>
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<style>
#centeringDiv {
margin: auto;
max-width: 1200px;
}
#navDiv
{
display: block;
box-sizing: border-box;
padding-top: 5px;
padding-bottom: 5px;
border-bottom-width: 3px;
border-bottom-style: solid;
border-bottom-color: #F0F0F0;
}
#navDiv nav
{
float:left;
}
#navDiv::after {
content: "";
clear: both;
display: table;
}
#navDiv nav li::after
{
content: "/";
padding-left: 10px;
padding-right: 0px;
color: #808080;
}
#navDiv nav li
{
display:inline;
padding-left: 10px;
padding-right: 0px;
}
#tocColumn {
width: 350px;
position: fixed;
overflow-y: auto;
box-sizing: border-box;
display: block;
}
#tocInner {
padding: 20px;
}
#rightColumn {
padding-left: 390px;
padding-right: 40px;
padding-top: 20px;
}
.toc_root_list {
list-style-type: none;
list-style-position: outside;
background-color: initial;
padding-left: 0px;
}
.toc_list {
padding-left: 16px;
background-color: initial;
list-style-type: none;
margin-bottom: 0px;
}
.toc_item {
cursor: pointer;
user-select: none;
list-style-type: none;
padding-left: 0px;
padding-top: 5px;
}
.toc_item_expanded::before {
content: "\25be";
cursor: pointer;
}
.toc_item_collapsed::before {
content: "\25b8";
cursor: pointer;
}
.toc_item_leaf {
padding-left: 14px;
cursor: pointer;
list-style-type: none;
}
.toc_span:hover
{
color: #d5000d;
}
.tocIcon
{
vertical-align: -2.5px;
}
.editButton
{
float: right;
margin-right: 10px;
color:#808080;
}
.editIcon
{
fill: currentColor;
vertical-align: text-top;
}
#btnToggleTOC {
display: none;
width: fit-content;
margin-left: 10px;
margin-top: 10px;
padding: 10px;
border-style: solid;
border-color: #808080;
border-width: 1px;
background-color: #E8E8E8;
}
#btnToggleTOC:hover {
background-color: #F0F0E8;
}
#btnToggleTOC:active {
background-color: #D4D4D4;
}
@media screen and (max-width: 900px) {
#tocColumn {
width: 300px;
display: block;
box-sizing: border-box;
}
#rightColumn {
padding-left: 320px;
padding-right: 20px;
}
}
@media screen and (max-width: 700px) {
#tocColumn {
width: 100%;
position: static;
display: none;
border-right-style: none;
box-sizing: content-box;
}
#tocInner {
padding: 10px;
}
#rightColumn {
padding-left: 10px;
padding-right: 10px;
}
#centeringDiv {
padding-left: 0px;
}
#btnToggleTOC {
display: block;
}
}
</style>
{% seo %}
</head>
<body>
<div id="centeringDiv">
<div id="navDiv">
{% include_relative nav.html %}
<a class="editButton" title="Edit this page" href="https://github.com/{{ site.github.repository_nwo }}/edit/master/docs/{{ page.path }}">
<svg class="editIcon" height="16" viewBox="0 0 16 16" version="1.1" width="16" aria-hidden="true">
<path fill-rule="evenodd"
d="M11.013 1.427a1.75 1.75 0 012.474 0l1.086 1.086a1.75 1.75 0 010 2.474l-8.61 8.61c-.21.21-.47.364-.756.445l-3.251.93a.75.75 0 01-.927-.928l.929-3.25a1.75 1.75 0 01.445-.758l8.61-8.61zm1.414 1.06a.25.25 0 00-.354 0L10.811 3.75l1.439 1.44 1.263-1.263a.25.25 0 000-.354l-1.086-1.086zM11.189 6.25L9.75 4.81l-6.286 6.287a.25.25 0 00-.064.108l-.558 1.953 1.953-.558a.249.249 0 00.108-.064l6.286-6.286z">
</path>
</svg>
</a>
</div>
<button id="btnToggleTOC" onclick="toggleTOC()">
<svg height="16" class="tocIcon" viewBox="0 0 16 16" version="1.1" width="16" aria-hidden="true">
<path fill-rule="evenodd"
d="M2 4a1 1 0 100-2 1 1 0 000 2zm3.75-1.5a.75.75 0 000 1.5h8.5a.75.75 0 000-1.5h-8.5zm0 5a.75.75 0 000 1.5h8.5a.75.75 0 000-1.5h-8.5zm0 5a.75.75 0 000 1.5h8.5a.75.75 0 000-1.5h-8.5zM3 8a1 1 0 11-2 0 1 1 0 012 0zm-1 6a1 1 0 100-2 1 1 0 000 2z">
</path>
</svg>
Table of Contents</button>
<div id="tocColumn">
<div id="tocInner">
{% include_relative toc.html %}
</div>
</div>
<div id="rightColumn">
<section id="main_content">
{% include anchor_headings.html html=content anchorBody="" %}
</section>
<a href="javascript:;" id="_content_end_"></a>
<footer>
{% if site.github.is_project_page %}
{{ site.title | default: site.github.repository_name }} is maintained by <a
href="{{ site.github.owner_url }}">{{ site.github.owner_name }}</a><br>
{% endif %}
This page was generated by <a href="https://pages.github.com">GitHub Pages</a>.
</footer>
</div>
</div>
<script>
// Fix for IE. Make sure String has `startsWith` method.
if (!String.prototype.startsWith)
{
String.prototype.startsWith = function (searchString, position) {
position = position || 0;
return this.indexOf(searchString, position) === position;
};
}
var tocColumn = document.getElementById("tocColumn");
var rightColumn = document.getElementById("rightColumn");
function updateScroll()
{
if (window.innerWidth < 700)
{
tocColumn.style.height = "";
return;
}
var top = Math.max(0, rightColumn.getBoundingClientRect().top);
tocColumn.style.top = top + "px";
tocColumn.style.height = (window.innerHeight-top) + "px";
}
function updatePosition()
{
if (window.innerWidth > 700)
tocColumn.style.display = "";
tocColumn.style.left = rightColumn.getBoundingClientRect().left + "px";
updateScroll();
}
window.addEventListener("resize", updatePosition);
updatePosition();
var tocItemsArray = [];
var subSectionItems = [];
var selectedItem = null;
function toggleTOC() {
var tocColumn = document.getElementById("tocColumn");
if (tocColumn.style.display == "block")
tocColumn.style.display = "none";
else
tocColumn.style.display = "block";
event.stopPropagation();
}
function expandItem(e) {
if (e == selectedItem)
e.style["font-weight"] = "bold";
var childList = e.getElementsByClassName("toc_list");
if (childList.length == 0)
return;
childList[0].style.display = "block";
childList[0].style["font-weight"] = "normal";
e.setAttribute("class", "toc_item toc_item_expanded");
}
function collapseItem(e) {
var childList = e.getElementsByClassName("toc_list");
if (childList.length == 0)
return;
childList[0].style.display = "none";
e.setAttribute("class", "toc_item toc_item_collapsed");
}
function tocSpanOnClick(e)
{
if (event.srcElement != null && event.srcElement.parentElement != null)
{
var link = event.srcElement.parentElement.getAttribute("data-link");
if (link != null)
{
var poundIndex = link.indexOf("#");
if (poundIndex == -1)
window.location.href = link + ".html";
else
window.location.href = link.substr(0, poundIndex) + ".html#" + link.substr(poundIndex+1, link.length - poundIndex - 1);
}
}
event.stopPropagation();
}
function tocItemOnClick(e)
{
if (event.srcElement == null) return;
// Toggle expanded/collapsed state.
if (event.srcElement.getAttribute("class").endsWith("toc_item_collapsed"))
expandItem(event.srcElement);
else if (event.srcElement.getAttribute("class").endsWith("toc_item_expanded"))
collapseItem(event.srcElement);
event.stopPropagation();
}
var path = window.location.pathname;
var pageName = path.split("/").pop();
var currentPageID = pageName.substr(0, pageName.lastIndexOf("."));
if (currentPageID.length == 0)
currentPageID = "index";
var tocLists = document.getElementsByClassName("toc_root_list");
for (var i = 0; i < tocLists.length; i++) {
var tocList = tocLists[i];
var items = tocList.getElementsByTagName("li")
for (var j = 0; j < items.length; j++)
tocItemsArray.push(items[j]);
}
for (var i = 0; i < tocItemsArray.length; i++) {
var item = tocItemsArray[i];
if (item.getAttribute("data-link") == currentPageID)
selectedItem = item;
if (item.getElementsByTagName("li").length != 0) {
collapseItem(item);
}
else {
item.setAttribute("class", "toc_item toc_item_leaf");
}
item.addEventListener("click", tocItemOnClick);
var innerSpan = item.getElementsByTagName("span");
if (innerSpan.length != 0)
{
innerSpan[0].addEventListener("click", tocSpanOnClick);
innerSpan[0].setAttribute("class", "toc_span");
}
}
var curItem = selectedItem;
while (curItem != null) {
expandItem(curItem);
curItem = curItem.parentElement;
if (curItem != null && curItem.getAttribute("class") != null &&
curItem.getAttribute("class").startsWith("toc_list"))
curItem = curItem.parentElement;
if (curItem != null && curItem.getAttribute("class") != null &&
curItem.getAttribute("class").startsWith("toc_root_list"))
break;
}
var subItems = selectedItem.getElementsByTagName("li");
var subSectionTitles = [];
var subSectionTitleStrs = [];
for (var i = 0; i < subItems.length; i++)
{
subSectionItems.push(subItems[i]);
var title = subItems[i].getAttribute("data-link");
var pos = title.lastIndexOf("#");
title = title.substr(pos + 1);
var element = document.getElementById(title);
subSectionTitles.push(element);
subSectionTitleStrs.push(title);
}
subSectionTitles.push(document.getElementById("_content_end_"));
function isSectionFullyVisible(id)
{
var titleElement = subSectionTitles[id];
var nextTitleElement = subSectionTitles[id+1];
return (titleElement.getBoundingClientRect().top >= 0 && nextTitleElement.getBoundingClientRect().top <= window.innerHeight);
}
function findCurrentSubsection()
{
var currentSubsectionID = -1;
for (var i = 0; i < subSectionItems.length; i++) {
var titleElement = subSectionTitles[i];
if (titleElement == null)
continue;
if (titleElement.getBoundingClientRect().top < window.innerHeight * 0.12)
currentSubsectionID = i;
}
return currentSubsectionID;
}
function updateCurrentSubsection(currentSubsectionID)
{
for (var i = 0; i < subSectionItems.length; i++)
{
if (i == currentSubsectionID || isSectionFullyVisible(i))
subSectionItems[i].getElementsByTagName("span")[0].style["font-weight"] = 600;
else
subSectionItems[i].getElementsByTagName("span")[0].style["font-weight"] = 400;
}
}
function windowScroll(e)
{
updateCurrentSubsection(findCurrentSubsection());
updateScroll();
}
window.addEventListener("scroll", windowScroll);
updateCurrentSubsection(findCurrentSubsection());
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$$','$$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\(","\\)"] ],
},
TeX: {
Macros: {
bra: ["\\langle{#1}|", 1],
ket: ["|{#1}\\rangle", 1],
braket: ["\\langle{#1}\\rangle", 1],
bk: ["\\langle{#1}|{#2}|{#3}\\rangle", 3]
}
}
});
</script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
</body>
</html>

View File

@@ -0,0 +1,203 @@
---
---
@import "{{ site.theme }}";
a:hover {
text-decoration: underline;
}
h3 {
color: #363636;
}
h4 {
color: #363636;
}
blockquote {
background-color: #f2f2f2;
padding-top: 10px;
padding-bottom: 5px;
}
blockquote p {
font-size: 16px;
font-weight: 400;
margin-bottom: 5px;
color: #202020;
}
body {
color: initial;
text-shadow: none;
background: none;
}
#container
{
background:none;
}
.highlight .cm {
color: #148b04;
}
.highlight .cp {
color: #148b04;
}
.highlight .c1 {
color: #148b04;
}
.highlight .cs {
color: #148b04;
}
.highlight .c, .highlight .ch, .highlight .cd, .highlight .cpf {
color: #148b04;
}
.highlight .err {
color: #a61717;
background-color: #e3d2d2;
}
.highlight .gd {
color: #000000;
background-color: #ffdddd;
}
.highlight .ge {
color: #000000;
font-style: italic;
}
.highlight .gr {
color: #aa0000;
}
.highlight .gh {
color: #999999;
}
.highlight .gi {
color: #000000;
background-color: #ddffdd;
}
.highlight .go {
color: #888888;
}
.highlight .gp {
color: #555555;
}
.highlight .gu {
color: #aaaaaa;
}
.highlight .gt {
color: #aa0000;
}
.highlight .kc {
color: #1243d4;
}
.highlight .kd {
color: #1243d4;
}
.highlight .kn {
color: #1243d4;
}
.highlight .kp {
color: #1243d4;
}
.highlight .kr {
color: #1243d4;
}
.highlight .kt {
color: #1243d4;
}
.highlight .k, .highlight .kv {
color: #1243d4;
}
.highlight .m, .highlight .mb, .highlight .mx, .highlight .mi, .highlight .mf {
color: #7211c2;
}
.highlight .sa {
color: #000000;
}
.highlight .sb {
color: #d14;
}
.highlight .sc {
color: #d14;
}
.highlight .sd {
color: #d14;
}
.highlight .s2 {
color: #d14;
}
.highlight .se {
color: #d14;
}
.highlight .sh {
color: #d14;
}
.highlight .si {
color: #d14;
}
.highlight .sx {
color: #d14;
}
.highlight .sr {
color: #009926;
}
.highlight .s1 {
color: #d14;
}
.highlight .ss {
color: #990073;
}
.highlight .s, .highlight .dl {
color: #d14;
}
.highlight .na {
color: #008080;
}
.highlight .bp {
color: #999999;
}
.highlight .n{
color: black;
}
.highlight .nc {
color: #11abb9;
}
.highlight .nt {
color: #11abb9;
}
.highlight .vc {
color: #008080;
}
.highlight .vg {
color: #008080;
}
.highlight .vi {
color: #008080;
}
.highlight .nv, .highlight .vm {
color: #008080;
}
.highlight .ow {
color: #000000;
}
.highlight .o {
color: #000000;
}
.highlight .w {
color: #000000;
}
.highlight .p {color:#000000;}
code
{
background-color: initial;
border:none;
}
pre{
color: #000000;
background: #F8F8F8;
}
pre code {
color: #000000;
background-color: #F8F8F8;
}
.highlight
{
background: #F8F8F8;
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

View File

@@ -0,0 +1,62 @@
# This script uses `slangc` to generate the core module reference documentation and push the updated
# documents to shader-slang/stdlib-reference repository.
# The stdlib-reference repository has github-pages setup so that the markdown files we generate
# in this step will be rendered as html pages by Jekyll upon a commit to the repository.
# So we we need to do here is to pull the stdlib-reference repository, regenerate the markdown files
# and push the changes back to the repository.
# The generated markdown files will be located in three folders:
# - ./global-decls
# - ./interfaces
# - ./types
# In addition, slangc will generate a table of content file `toc.html` which will be copied to
# ./_includes/stdlib-reference-toc.html for Jekyll for consume it correctly.
# If stdlib-reference folder does not exist, clone from github repo
if (-not (Test-Path ".\stdlib-reference")) {
git clone https://github.com/shader-slang/stdlib-reference/
}
else {
# If it already exist, just pull the latest changes.
cd stdlib-reference
git pull
cd ../
}
# Remove the old generated files.
Remove-Item -Path ".\stdlib-reference\global-decls" -Recurse -Force
Remove-Item -Path ".\stdlib-reference\interfaces" -Recurse -Force
Remove-Item -Path ".\stdlib-reference\types" -Recurse -Force
Remove-Item -Path ".\stdlib-reference\attributes" -Recurse -Force
# Use git describe to produce a version string and write it to _includes/version.inc.
# This file will be included by the stdlib-reference Jekyll template.
git describe --tags | Out-File -FilePath ".\stdlib-reference\_includes\version.inc" -Encoding ASCII
cd stdlib-reference
$slangPaths = @(
"../../build/RelWithDebInfo/bin/slangc.exe",
"../../build/Release/bin/slangc.exe",
"../../build/Debug/bin/slangc.exe"
)
$slangExe = $slangPaths | Where-Object { Test-Path $_ } | Select-Object -First 1
if ($slangExe) {
& $slangExe -compile-core-module -doc
Move-Item -Path ".\toc.html" -Destination ".\_includes\stdlib-reference-toc.html" -Force
git config user.email "bot@shader-slang.com"
git config user.name "Stdlib Reference Bot"
git add .
git commit -m "Update the core module reference"
git push
} else {
Write-Error "Could not find slangc executable in RelWithDebInfo or Release directories"
}
cd ../
# For local debugging only.
# Remove-Item -Path "D:\git_repo\stdlib-reference\global-decls" -Recurse -Force
# Remove-Item -Path "D:\git_repo\stdlib-reference\interfaces" -Recurse -Force
# Remove-Item -Path "D:\git_repo\stdlib-reference\types" -Recurse -Force
# Copy-Item -Path .\stdlib-reference\global-decls -Destination D:\git_repo\stdlib-reference\global-decls -Recurse -Force
# Copy-Item -Path .\stdlib-reference\interfaces -Destination D:\git_repo\stdlib-reference\interfaces -Recurse -Force
# Copy-Item -Path .\stdlib-reference\types -Destination D:\git_repo\stdlib-reference\types -Recurse -Force
# Copy-Item -Path .\stdlib-reference\_includes\stdlib-reference-toc.html -Destination D:\git_repo\stdlib-reference\_includes\stdlib-reference-toc.html -Force

View File

@@ -0,0 +1,10 @@
$job = Start-Job -ArgumentList $PSScriptRoot -ScriptBlock {
Set-Location $args[0]
$code = (Get-Content -Raw -Path "scripts/Program.cs").ToString()
$assemblies = ("System.Core", "System.IO", "System.Collections")
Add-Type -ReferencedAssemblies $assemblies -TypeDefinition $code -Language CSharp
$path = Join-Path -Path $args[0] -ChildPath "user-guide"
[toc.Builder]::Run($path);
}
Wait-Job $job
Receive-Job -Job $job

View File

@@ -0,0 +1,127 @@
#!/usr/bin/env bash
set -e
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
project_root="$(dirname "$script_dir")"
check_only=0
show_help() {
me=$(basename "$0")
cat <<EOF
$me: Build table of contents for documentation directories
Usage: $me [--help] [--source <path>] [--check-only]
Options:
--help Show this help message
--source Path to project root directory (defaults to parent of the script directory)
--check-only Check if TOC needs updating, exit 1 if changes needed
EOF
}
while [[ "$#" -gt 0 ]]; do
case $1 in
-h | --help)
show_help
exit 0
;;
--source)
project_root="$2"
shift
;;
--check-only)
check_only=1
;;
*)
echo "unrecognized argument: $1" >&2
show_help >&2
exit 1
;;
esac
shift
done
missing_bin=0
require_bin() {
local name="$1"
if ! command -v "$name" &>/dev/null; then
echo "This script needs $name, but it isn't in \$PATH" >&2
missing_bin=1
return
fi
}
require_bin "mcs"
require_bin "mono"
if [ "$missing_bin" -eq 1 ]; then
exit 1
fi
temp_dir=$(mktemp -d)
trap 'rm -rf "$temp_dir"' EXIT
docs_dir="$project_root/docs"
cat >"$temp_dir/temp_program.cs" <<EOL
$(cat "$script_dir/scripts/Program.cs")
namespace toc
{
class Program
{
static int Main(string[] args)
{
if (args.Length < 1)
{
Console.WriteLine("Please provide a directory path");
return 1;
}
try
{
Builder.Run(args[0]);
return 0;
}
catch (Exception ex)
{
Console.WriteLine(\$"Error: {ex.Message}");
return 1;
}
}
}
}
EOL
if ! mcs -r:System.Core "$temp_dir/temp_program.cs" -out:"$temp_dir/toc-builder.exe"; then
echo "Compilation of $script_dir/scripts/Program.cs failed" >&2
exit 1
fi
for dir in "user-guide"; do
if [ -d "$docs_dir/$dir" ]; then
if [ "$check_only" -eq 1 ]; then
# Ensure working directory is clean
if ! git -C "$project_root" diff --quiet "docs/$dir/toc.html" 2>/dev/null; then
echo "Working directory not clean, cannot check TOC" >&2
exit 1
fi
fi
if ! mono "$temp_dir/toc-builder.exe" "$docs_dir/$dir"; then
echo "TOC generation failed for $dir" >&2
exit 1
fi
if [ "$check_only" -eq 1 ]; then
if ! git -C "$project_root" diff --quiet "docs/$dir/toc.html" 2>/dev/null; then
git -C "$project_root" diff --color "docs/$dir/toc.html"
git -C "$project_root" checkout -- "docs/$dir/toc.html" 2>/dev/null
exit 1
fi
fi
else
echo "Directory $dir not found" >&2
fi
done

View File

@@ -0,0 +1,442 @@
# Building Slang From Source
### TLDR
`cmake --workflow --preset release` to configure, build, and package a release
version of Slang.
## Prerequisites:
Please install:
- CMake (3.26 preferred, but 3.22 works[^1])
- A C++ compiler with support for C++17. GCC, Clang and MSVC are supported
- A CMake compatible backend, for example Visual Studio or Ninja
- Python3 (a dependency for building spirv-tools)
Optional dependencies for tests include
- CUDA
- OptiX
- NVAPI
- Aftermath
- X11
Other dependencies are sourced from submodules in the [./external](./external)
directory.
## Get the Source Code
Clone [this](https://github.com/shader-slang/slang) repository. Make sure to
fetch the submodules also.
```bash
git clone https://github.com/shader-slang/slang --recursive
```
You will need the git tags from this repository, otherwise versioning
information (including the Slang modules directory name and the library
filenames on macOS and Linux) will be incorrect. The above command should fetch
them for you, but if you're fetching from a fork you may need to explicitly
fetch the latest tags from the shader-slang repository with:
```bash
git fetch https://github.com/shader-slang/slang.git 'refs/tags/*:refs/tags/*'
```
## Configure and build
> This section assumes cmake 3.25 or greater, if you're on a lower version
> please see [building with an older cmake](#building-with-an-older-cmake)
For a Ninja based build system (all platforms) run:
```bash
cmake --preset default
cmake --build --preset releaseWithDebugInfo # or --preset debug, or --preset release
```
For Visual Studio run:
```bash
cmake --preset vs2022 # or 'vs2019' or 'vs2026'
start devenv ./build/slang.sln # to optionally open the project in Visual Studio
cmake --build --preset releaseWithDebugInfo # to build from the CLI, could also use --preset release or --preset debug
```
There are also `*-dev` variants like `vs2022-dev` and `vs2026-dev` which turn on features to aid
debugging.
### WebAssembly build
In order to build WebAssembly build of Slang, Slang needs to be compiled with
[Emscripten SDK](https://github.com/emscripten-core/emsdk). You can find more
information about [Emscripten](https://emscripten.org/).
You need to clone the EMSDK repo. And you need to install and activate the latest.
```bash
git clone https://github.com/emscripten-core/emsdk.git
cd emsdk
```
For non-Windows platforms
```bash
./emsdk install latest
./emsdk activate latest
```
For Windows
```cmd
emsdk.bat install latest
emsdk.bat activate latest
```
After EMSDK is activated, Slang needs to be built in a cross compiling setup:
- build the `generators` target for the build platform
- configure the build with `emcmake` for the host platform
- build for the host platform.
> Note: For more details on cross compiling please refer to the
> [cross-compiling](docs/building.md#cross-compiling) section.
```bash
# Build generators.
cmake --workflow --preset generators --fresh
mkdir generators
cmake --install build --prefix generators --component generators
# Configure the build with emcmake.
# emcmake is available only when emsdk_env setup the environment correctly.
pushd ../emsdk
source ./emsdk_env # For Windows, emsdk_env.bat
popd
emcmake cmake -DSLANG_GENERATORS_PATH=generators/bin --preset emscripten -G "Ninja"
# Build slang-wasm.js and slang-wasm.wasm in build.em/Release/bin
cmake --build --preset emscripten --target slang-wasm
```
> Note: If the last build step fails, try running the command that `emcmake`
> outputs, directly.
## Installing
Build targets may be installed using cmake:
```bash
cmake --build . --target install
```
This should install `SlangConfig.cmake` that should allow `find_package` to work.
SlangConfig.cmake defines `SLANG_EXECUTABLE` variable that will point to `slangc`
executable and also define `slang::slang` target to be linked to.
For now, `slang::slang` is the only exported target defined in the config which can
be linked to.
Example usage
```cmake
find_package(slang REQUIRED PATHS ${your_cmake_install_prefix_path} NO_DEFAULT_PATH)
# slang_FOUND should be automatically set
target_link_libraries(yourLib PUBLIC
slang::slang
)
```
## Testing
```bash
build/Debug/bin/slang-test
```
See the [documentation on testing](../tools/slang-test/README.md) for more information.
## Debugging
See the [documentation on debugging](/docs/debugging.md).
## Distributing
### Versioned Libraries
As of v2025.21, the Slang libraries on **Mac** and **Linux** use versioned
filenames. The public ABI for Slang libraries in general is not currently
stable, so in accordance with semantic versioning conventions, the major
version number for dynamically linkable libraries is currently 0. Due to the
unstable ABI, releases are designed so that downstream users will be linked
against the fully versioned library filenames (e.g.,
`libslang-compiler.so.0.2025.21` instead of `libslang-compiler.so`).
Slang libraries for **Windows** do not have an explicit version in the
library filename, but the the same guidance about stability of the ABI applies.
Downstream users of Slang distributing their products as binaries should
therefor **on all platforms, including Windows** redistribute the Slang
libraries they linked against, or otherwise communicate the specific version
dependency to their users. It is *not the case* that a user of your product can
just install any recent Slang release and have an installation of Slang that
works for any given binary.
## More niche topics
### CMake options
| Option | Default | Description |
|-----------------------------------|----------------------------|----------------------------------------------------------------------------------------------|
| `SLANG_VERSION` | Latest `v*` tag | The project version, detected using git if available |
| `SLANG_EMBED_CORE_MODULE` | `TRUE` | Build slang with an embedded version of the core module |
| `SLANG_EMBED_CORE_MODULE_SOURCE` | `TRUE` | Embed the core module source in the binary |
| `SLANG_ENABLE_DXIL` | `TRUE` | Enable generating DXIL using DXC |
| `SLANG_ENABLE_ASAN` | `FALSE` | Enable ASAN (address sanitizer) |
| `SLANG_ENABLE_COVERAGE` | `FALSE` | Enable code coverage instrumentation |
| `SLANG_ENABLE_FULL_IR_VALIDATION` | `FALSE` | Enable full IR validation (SLOW!) |
| `SLANG_ENABLE_IR_BREAK_ALLOC` | `FALSE` | Enable IR BreakAlloc functionality for debugging. |
| `SLANG_ENABLE_GFX` | `TRUE` | Enable gfx targets |
| `SLANG_ENABLE_SLANGD` | `TRUE` | Enable language server target |
| `SLANG_ENABLE_SLANGC` | `TRUE` | Enable standalone compiler target |
| `SLANG_ENABLE_SLANGI` | `TRUE` | Enable Slang interpreter target |
| `SLANG_ENABLE_SLANGRT` | `TRUE` | Enable runtime target |
| `SLANG_ENABLE_SLANG_GLSLANG` | `TRUE` | Enable glslang dependency and slang-glslang wrapper target |
| `SLANG_ENABLE_TESTS` | `TRUE` | Enable test targets, requires SLANG_ENABLE_GFX, SLANG_ENABLE_SLANGD and SLANG_ENABLE_SLANGRT |
| `SLANG_ENABLE_EXAMPLES` | `TRUE` | Enable example targets, requires SLANG_ENABLE_GFX |
| `SLANG_LIB_TYPE` | `SHARED` | How to build the slang library |
| `SLANG_ENABLE_RELEASE_DEBUG_INFO` | `TRUE` | Enable generating debug info for Release configs |
| `SLANG_ENABLE_RELEASE_LTO` | `FALSE` | Enable LTO for Release builds |
| `SLANG_ENABLE_SPLIT_DEBUG_INFO` | `TRUE` | Enable generating split debug info for Debug and RelWithDebInfo configs |
| `SLANG_SLANG_LLVM_FLAVOR` | `FETCH_BINARY_IF_POSSIBLE` | How to set up llvm support |
| `SLANG_SLANG_LLVM_BINARY_URL` | System dependent | URL specifying the location of the slang-llvm prebuilt library |
| `SLANG_GENERATORS_PATH` | `` | Path to an installed `all-generators` target for cross compilation |
The following options relate to optional dependencies for additional backends
and running additional tests. Left unchanged they are auto detected, however
they can be set to `OFF` to prevent their usage, or set to `ON` to make it an
error if they can't be found.
| Option | CMake hints | Notes |
|--------------------------|--------------------------------|----------------------------------------------------------------------------------------------|
| `SLANG_ENABLE_CUDA` | `CUDAToolkit_ROOT` `CUDA_PATH` | Enable running tests with the CUDA backend, doesn't affect the targets Slang itself supports |
| `SLANG_ENABLE_OPTIX` | `Optix_ROOT_DIR` | Requires CUDA |
| `SLANG_ENABLE_NVAPI` | `NVAPI_ROOT_DIR` | Only available for builds targeting Windows |
| `SLANG_ENABLE_AFTERMATH` | `Aftermath_ROOT_DIR` | Enable Aftermath in GFX, and add aftermath crash example to project |
| `SLANG_ENABLE_XLIB` | | |
### Advanced options
| Option | Default | Description |
|------------------------------------|---------|--------------------------------------------------------------------------------------------------------------------------------|
| `SLANG_ENABLE_DX_ON_VK` | `FALSE` | Enable running the DX11 and DX12 tests on non-warning Windows platforms via vkd3d-proton, requires system-provided d3d headers |
| `SLANG_ENABLE_SLANG_RHI` | `TRUE` | Enable building and using [slang-rhi](https://github.com/shader-slang/slang-rhi) for tests |
| `SLANG_USE_SYSTEM_MINIZ` | `FALSE` | Build using system Miniz library instead of the bundled version in [./external](./external) |
| `SLANG_USE_SYSTEM_LZ4` | `FALSE` | Build using system LZ4 library instead of the bundled version in [./external](./external) |
| `SLANG_USE_SYSTEM_VULKAN_HEADERS` | `FALSE` | Build using system Vulkan headers instead of the bundled version in [./external](./external) |
| `SLANG_USE_SYSTEM_SPIRV_HEADERS` | `FALSE` | Build using system SPIR-V headers instead of the bundled version in [./external](./external) |
| `SLANG_USE_SYSTEM_UNORDERED_DENSE` | `FALSE` | Build using system unordered dense instead of the bundled version in [./external](./external) |
| `SLANG_SPIRV_HEADERS_INCLUDE_DIR` | `` | Use this specific path to SPIR-V headers instead of the bundled version in [./external](./external) |
### LLVM Support
There are several options for getting llvm-support:
- Use a prebuilt binary slang-llvm library:
`-DSLANG_SLANG_LLVM_FLAVOR=FETCH_BINARY` or `-DSLANG_SLANG_LLVM_FLAVOR=FETCH_BINARY_IF_POSSIBLE` (this is the default)
- You can set `SLANG_SLANG_LLVM_BINARY_URL` to point to a local
`libslang-llvm.so/slang-llvm.dll` or set it to a URL of an zip/archive
containing such a file
- If this isn't set then the build system tries to download it from the
release on github matching the current tag. If such a tag doesn't exist
or doesn't have the correct os\*arch combination then the latest release
will be tried.
- If `SLANG_SLANG_LLVM_BINARY_URL` is `FETCH_BINARY_IF_POSSIBLE` then in
the case that a prebuilt binary can't be found then the build will proceed
as though `DISABLE` was chosen
- Use a system supplied LLVM: `-DSLANG_SLANG_LLVM_FLAVOR=USE_SYSTEM_LLVM`, you
must have llvm-21.1 and a matching libclang installed. It's important that
either:
- You don't end up linking to a dynamic libllvm.so, this will almost
certainly cause multiple versions of LLVM to be loaded at runtime,
leading to errors like `opt: CommandLine Error: Option
'asm-macro-max-nesting-depth' registered more than once!`. Avoid this by
compiling LLVM without the dynamic library.
- Anything else which may be linked in (for example Mesa, also dynamically
loads the same llvm object)
- Do not enable LLVM support: `-DSLANG_SLANG_LLVM_FLAVOR=DISABLE`
To build only a standalone slang-llvm, you can run:
```bash
cmake --workflow --preset slang-llvm
```
This will generate `build/dist-release/slang-slang-llvm.zip` containing the
library. This, of course, uses the system LLVM to build slang-llvm, otherwise
it would just be a convoluted way to download a prebuilt binary.
### Cross compiling
Slang generates some code at build time, using generators build from this
codebase. Due to this, for cross compilation one must already have built these
generators for the build platform. Build them with the `generators` preset, and
pass the install path to the cross building CMake invocation using
`SLANG_GENERATORS_PATH`
Non-Windows platforms:
```bash
# build the generators
cmake --workflow --preset generators --fresh
mkdir build-platform-generators
cmake --install build --config Release --prefix build-platform-generators --component generators
# reconfigure, pointing to these generators
# Here is also where you should set up any cross compiling environment
cmake \
--preset default \
--fresh \
-DSLANG_GENERATORS_PATH=build-platform-generators/bin \
-Dwhatever-other-necessary-options-for-your-cross-build \
# for example \
-DCMAKE_C_COMPILER=my-arch-gcc \
-DCMAKE_CXX_COMPILER=my-arch-g++
# perform the final build
cmake --workflow --preset release
```
Windows
```bash
# build the generators
cmake --workflow --preset generators --fresh
mkdir build-platform-generators
cmake --install build --config Release --prefix build-platform-generators --component generators
# reconfigure, pointing to these generators
# Here is also where you should set up any cross compiling environment
# For example
./vcvarsamd64_arm64.bat
cmake \
--preset default \
--fresh \
-DSLANG_GENERATORS_PATH=build-platform-generators/bin \
-Dwhatever-other-necessary-options-for-your-cross-build
# perform the final build
cmake --workflow --preset release
```
### Example cross compiling with MSVC to windows-aarch64
One option is to build using the ninja generator, which requires providing the
native and cross environments via `vcvarsall.bat`
```bash
vcvarsall.bat
cmake --workflow --preset generators --fresh
mkdir generators
cmake --install build --prefix generators --component generators
vsvarsall.bat x64_arm64
cmake --preset default --fresh -DSLANG_GENERATORS_PATH=generators/bin
cmake --workflow --preset release
```
Another option is to build using the Visual Studio generator which can find
this automatically
```
cmake --preset vs2022 # or --preset vs2019, vs2026
cmake --build --preset generators # to build from the CLI
cmake --install build --prefix generators --component generators
rm -rf build # The Visual Studio generator will complain if this is left over from a previous build
cmake --preset vs2022 --fresh -A arm64 -DSLANG_GENERATORS_PATH=generators/bin
cmake --build --preset release
```
### Nix
This repository contains a [Nix](https://nixos.org/)
[flake](https://wiki.nixos.org/wiki/Flakes) (not officially supported or
tested), which provides the necessary prerequisites for local development. Also,
if you use [direnv](https://direnv.net/), you can run the following commands to
have the Nix environment automatically activate when you enter your clone of
this repository:
```bash
echo 'use flake' > .envrc
direnv allow
```
## Building with an older CMake
Because older CMake versions don't support all the features we want to use in
CMakePresets, you'll have to do without the presets. Something like the following
```bash
cmake -B build -G Ninja
cmake --build build -j
```
## Specific supported compiler versions
<!---
Please keep the exact formatting '_Foo_ xx.yy is tested in CI' as there is a
script which checks that this is still up to date.
-->
_GCC_ 11.4 and 13.3 are tested in CI and is the recommended minimum version. GCC 10 is
supported on a best-effort basis, i.e. PRs supporting this version are
encouraged but it isn't a continuously maintained setup.
_MSVC_ 19 is tested in CI and is the recommended minimum version.
_Clang_ 17.0 is tested in CI and is the recommended minimum version.
## Static linking against libslang-compiler
To build statically, set the `SLANG_LIB_TYPE` flag in CMake to `STATIC`.
If linking against a static `libslang-compiler.a` you will need to link against some
dependencies also if you're not already incorporating them into your project.
```
${SLANG_DIR}/build/Release/lib/libslang-compiler.a
${SLANG_DIR}/build/Release/lib/libcompiler-core.a
${SLANG_DIR}/build/Release/lib/libcore.a
${SLANG_DIR}/build/external/miniz/libminiz.a
${SLANG_DIR}/build/external/lz4/build/cmake/liblz4.a
```
## Deprecation of libslang and slang.dll filenames
In Slang v2025.21, the primary library for Slang was renamed, from
`libslang.so` and `slang.dll` to `libslang-compiler.so` and
`slang-compiler.dll`. (A similar change was made for macOS.) The reason behind
this change was to address a conflict on the Linux target, where the S-Lang
library of the same name is commonly preinstalled on Linux distributions. The
same issue affected macOS, to a lesser extent, where the S-Lang library could
be installed via `brew`. To make the Slang library name predictable and
simplify downstream build logic, the Slang library name was changed on all
platforms.
A change like this requires a period of transition, so on a **temporary**
basis: Linux and macOS packages now include symlinks from the old filename to
the new one. For Windows, a proxy library is provided with the old name, that
redirects all functions to the new `slang-compiler.dll`. The rationale here is
that applications with a complex dependency graph may have some components
still temporarily using `slang.dll`, while others have been updated to use
`slang-compiler.dll`. Using a proxy library for `slang.dll` ensures that all
components are using the same library, and avoids any potential state or
heap-related issues from an executable sharing data structures between the two
libraries.
These backwards compatability affordances, namely the proxy `slang.dll` and
`slang.lib` (for Windows) and the `libslang.so` and `libslang.dylib` symlinks
(for Linux and macOS), **will be removed at the end of 2026**. Until that time,
they will be present in the github release packages for downstream use.
Downstream packaging may or may not choose to distribute them, at their
discretion. **We strongly encourage downstream users of Slang to move to the
new library names as soon as they are able.**
## Notes
[^1] below 3.25, CMake lacks the ability to mark directories as being
system directories (https://cmake.org/cmake/help/latest/prop_tgt/SYSTEM.html#prop_tgt:SYSTEM),
this leads to an inability to suppress warnings originating in the
dependencies in `./external`, so be prepared for some additional warnings.

View File

@@ -0,0 +1,36 @@
# Our CI
There are github actions for building and testing slang.
## Tests
Most configurations run a restricted set of tests, however on some self hosted
runners we run the full test suite, as well as running Falcor's test suite with
the new slang build.
## Building LLVM
We require a static build of LLVM for building slang-llvm, we build and cache
this in all workflow runs. Since this changes infrequently, the cache is almost
always hit. A cold build takes about an hour on the slowest platform. The
cached output is a few hundred MB, so conceivably if we add many more platforms
we might be caching more than the 10GB github allowance, which would
necessitate being a bit more complicated in building and tracking outputs here.
For slang-llvm, this is handled the same as any other dependency, except on
Windows Debug builds, where we are required by the differences in Debug/Release
standard libraries to always make a release build, this is noted in the ci
action yaml file.
Note that we don't use sccache while building LLVM, as it changes very
infrequently. The caching of LLVM is done by caching the final build product
only.
## sccache
> Due to reliability issues, we are not currently using sccache, this is
> historical/aspirational.
The CI actions use sccache, keyed on compiler and platform, this runs on all
configurations and significantly speeds up small source change builds. This
cache can be safely missed without a large impact on build times.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,650 @@
Slang CPU Target Support
========================
Slang has preliminary support for producing CPU source and binaries.
# Features
* Can compile C/C++/Slang source to binaries (executables, shared libraries or [directly executable](#host-callable))
* Does *not* require a C/C++ compiler to be installed if [slang-llvm](#slang-llvm) is available (as distributed with slang binary distributions)
* Can compile Slang source into C++ source code
* Supports compute style shaders
# Limitations
These limitations apply to Slang transpiling to C++.
* Barriers are not supported (making these work would require an ABI change)
* Atomics are not currently supported
* Limited support for [out of bounds](#out-of-bounds) accesses handling
* Entry point/s cannot be named `main` (this is because downstream C++ compiler/s expecting a regular `main`)
* `float16_t` type is not currently supported
For current C++ source output, the compiler needs to support partial specialization.
# How it works
The initial version works by using a 'downstream' C/C++ compiler. A C++ compiler does *not* in general need to be installed on a system to compile and execute code as long as [slang-llvm](#slang-llvm) is available. A [regular C/C++](#regular-cpp) compiler can also be used, allowing access to tooling, such as profiling and debuggers, as well as being able to use regular host development features such as linking, libraries, shared libraries/dlls and executables.
The C/C++ backend can be directly accessed much like 'dxc', 'fxc' of 'glslang' can, using the pass-through mechanism with the following new backends...
```
SLANG_PASS_THROUGH_CLANG, ///< Clang C/C++ compiler
SLANG_PASS_THROUGH_VISUAL_STUDIO, ///< Visual studio C/C++ compiler
SLANG_PASS_THROUGH_GCC, ///< GCC C/C++ compiler
SLANG_PASS_THROUGH_LLVM, ///< slang-llvm 'compiler' - includes LLVM and Clang
SLANG_PASS_THROUGH_GENERIC_C_CPP, ///< Generic C or C++ compiler, which is decided by the source type
```
Sometimes it is not important which C/C++ compiler is used, and this can be specified via the 'Generic C/C++' option. This will aim to use the compiler that is most likely binary compatible with the compiler that was used to build the Slang binary being used.
To make it possible for Slang to produce CPU code, in this first iteration we convert Slang code into C/C++ which can subsequently be compiled into CPU code. If source is desired instead of a binary this can be specified via the SlangCompileTarget. These can be specified on the `slangc` command line as `-target cpp`.
When using the 'pass through' mode for a CPU based target it is currently necessary to set an entry point, even though it's basically ignored.
In the API the `SlangCompileTarget`s are
```
SLANG_C_SOURCE ///< The C language
SLANG_CPP_SOURCE ///< The C++ language
SLANG_CPP_HEADER ///< The C++ language (header)
SLANG_HOST_CPP_SOURCE, ///< C++ code for `host` style
```
Using the `-target` command line option
* `C_SOURCE`: c
* `CPP_SOURCE`: cpp,c++,cxx
* `CPP_HEADER`: hpp
* `HOST_CPP_SOURCE`: host-cpp,host-c++,host-cxx
Note! Output of C source is not currently supported.
If a CPU binary is required this can be specified as a `SlangCompileTarget` of
```
SLANG_EXECUTABLE ///< Executable (for hosting CPU/OS)
SLANG_SHADER_SHARED_LIBRARY ///< A shared library/Dll (for hosting CPU/OS)
SLANG_SHADER_HOST_CALLABLE ///< A CPU target that makes `compute kernel` compiled code available to be run immediately
SLANG_HOST_HOST_CALLABLE ///< A CPU target that makes `scalar` compiled code available to be run immediately
SLANG_OBJECT_CODE, ///< Object code that can be used for later linking
```
Using the `-target` command line option
* `EXECUTABLE`: exe, executable
* `SHADER_SHARED_LIBRARY`: sharedlib, sharedlibrary, dll
* `SHADER_HOST_CALLABLE`: callable, host-callable
* `OBJECT_CODE`: object-conde
* `HOST_HOST_CALLABLE`: host-host-callable
Using `host-callable` types from the the command line, other than to test such code compile and can be loaded for host execution.
For launching a [shader like](#compile-style) Slang code on the CPU, there typically needs to be binding of values passed the entry point function. How this works is described in the [ABI section](#abi). Functions *can* be executed directly but care must be taken to [export](#visibility) them and such that there isn't an issue with [context threading](#context-threading).
If a binary target is requested, the binary contents can be returned in a ISlangBlob just like for other targets. When using a [regular C/C++ compiler](#regular-cpp) the CPU binary typically must be saved as a file and then potentially marked for execution by the OS. It may be possible to load shared libraries or dlls from memory - but doing so is a non standard feature, that requires unusual work arounds. If possible it is typically fastest and easiest to use [slang-llvm](#slang-llvm) to directly execute slang or C/C++ code.
## <a id="compile-style"/>Compilation Styles
There are currently two styles of *compilation style* supported - `host` and `shader`.
The `shader` style implies
* The code *can* be executed in a GPU-kernel like execution model, launched across multiple threads (as described in the [ABI](#abi))
* Currently no reference counting
* Only functionality from the Slang core module, built in HLSL or anything supplied by a [COM interfaces](#com-interface) is available
* Currently [slang-llvm](#slang-llvm) only supports the `shader` style
The `host` style implies
* Execution style is akin to more regular CPU scalar code
* Typically requires linking with `slang-rt` and use of `slang-rt` types such as `Slang::String`
* Allows use of `new`
* Allows the use of `class` for reference counted types
* COM interfaces are reference counted
The styles as used with [host-callable](#host-callable) are indicated via the API by
```
SLANG_SHADER_HOST_CALLABLE ///< A CPU target that makes `compute kernel` compiled code available to be run immediately
SLANG_HOST_HOST_CALLABLE ///< A CPU target that makes `scalar` compiled code available to be run immediately
```
Or via the `-target` command line options
* For 'shader' `callable` `host-callable`
* For 'host' `host-host-callable`
For an example of the `host` style please look at "examples/cpu-hello-world".
## <a id="host-callable"/>Host callable
Slang supports `host-callable` compilation targets which allow for the direct execution of the compiled code on the CPU. Currently this style of execution is supported if [slang-llvm](#slang-llvm) or a [regular C/C++ compiler](#regular-cpp) are available.
There are currently two [compilation styles](#compile-style) supported.
In order to call into `host-callable` code after compilation it's necessary to access the result via the `ISlangSharedLibrary` interface.
Please look at the [ABI](#abi) section for more specifics around ABI usage especially for `shader` [compile styles](#compile-style).
```C++
slang::ICompileRequest* request = ...;
const SlangResult compileRes = request->compile();
// Even if there were no errors that forced compilation to fail, the
// compiler may have produced "diagnostic" output such as warnings.
// We will go ahead and print that output here.
//
if(auto diagnostics = request->getDiagnosticOutput())
{
printf("%s", diagnostics);
}
// Get the 'shared library' (note that this doesn't necessarily have to be implemented as a shared library
// it's just an interface to executable code).
ComPtr<ISlangSharedLibrary> sharedLibrary;
SLANG_RETURN_ON_FAIL(request->getTargetHostCallable(0, sharedLibrary.writeRef()));
// We can now find exported functions/variables via findSymbolAddressByName
// For a __global public __extern_cpp int myGlobal;
{
auto myGlobalPtr = (int*)sharedLibrary->findSymbolAddressByName("myGlobal");
if (myGlobalPtr)
{
*myGlobalPtr = 10;
}
}
// To get a function
//
// public __extern_cpp int add(int a, int b);
// Test a free function
{
typedef int (*AddFunc)(int a, int b);
auto func = (AddFunc)sharedLibrary->findFuncByName("add");
if (func)
{
// Let's add!
int c = func(10, 20):
}
}
```
## <a id="slang-llvm"/>slang-llvm
`slang-llvm` is a special Slang version of [LLVM](https://llvm.org/). It's current main purpose is to allow compiling C/C++ such that it is [directly available](#host-callable) for execution using the LLVM JIT feature. If `slang-llvm` is available it is the default downstream compiler for [host-callable](#host-callable). This is because it allows for faster compilation, avoids the file system, and can execute the compiled code directly. [Regular C/C++ compilers](#regular-cpp) can be used for [host-callable](#host-callable) but requires writing source files to the file system and creating/loading shared-libraries/dlls to make the feature work. Additionally using `slang-llvm` avoids the need for a C/C++ compiler installed on a target system.
`slang-llvm` contains the Clang C++ compiler, so it is possible to also compile and execute C/C++ code in the [host-callable](#host-callable) style.
Limitations of using `slang-llvm`
* Can only currently be used for [shader style](#compile-style)
* Cannot produce object files, libraries, OS executables or binaries
* Is *limited* because it is not possible to directly access libraries such as the C or C++ standard libraries (see [COM interface](#com-interface) for a work-around)
* It's not possible to source debug into `slang-llvm` compiled code running on the JIT (see [debugging](#debugging) for a work-around)
* Not currently possible to return as a ISlangBlob representation
You can detect if `slang-llvm` is available via
```C++
slang::IGlobalSession* slangSession = ...;
const bool hasSlangLlvm = SLANG_SUCCEEDED(slangSession->checkPassThroughSupport(SLANG_PASS_THROUGH_LLVM));
```
## <a id="regular-cpp"/>Regular C/C++ compilers
Slang can work with regular C/C++ 'downstream' compilers. It has been tested to work with Visual Studio, Clang and G++/Gcc on Windows and Linux.
Under the covers when Slang is used to generate a binary via a C/C++ compiler, it must do so through the file system. Currently this means the source (say generated by Slang) and the binary (produced by the C/C++ compiler) must all be files. To make this work Slang uses temporary files. The reasoning for hiding this mechanism, other than simplicity, is that it allows using with [slang-llvm](#slang-llvm) without any changes.
## <a id="visibility"/>Visibility
In a typical Slang [shader like](#compile-style) scenario, functionality is exposed via entry points. It can be convenient and desirable to be able to call Slang functions directly from application code, and not just via entry points. By default non entry point functions are *removed* if they are not reachable by the specified entry point. Additionally for non entry point functions Slang typically generates function names that differ from the original name.
To work around these two issues the `public` and `__extern_cpp` modifiers can be used.
`public` makes the variable or function visible outside of the module even if it isn't used within the module. For the function to work it will also keep around any function or variable it accesses.
Note! Some care is needed here around [context threading](#context-threading) - if a function or any function a function accesses requires state held in the context, the signature of the function will be altered to include the context as the first parameter.
Making a function or variable `public` does not mean that the name remains the same. To indicate that the name should not be altered use the `__extern_cpp` modifier. For example
```
// myGlobal will be visible to the application (note the __global modifier additionally means it has C++ global behavior)
__global public __extern_cpp int myGlobal;
// myFunc is available to the application
public __extern_cpp myFunc(int a)
{
return a * a;
}
```
## <a id="com-interface"/>COM interface support
Slang has preliminary support for [Component Object Model (COM)](https://en.wikipedia.org/wiki/Component_Object_Model) interfaces in CPU code.
```
[COM]
interface IDoThings
{
int doThing(int a, int b);
int calcHash(NativeString in);
void printMessage(NativeString nativeString);
}
```
This support provides a way for an application to provide access to functionality in the application runtime - essentially it allows Slang code to call into application code. To do this a COM interface can be created that exposes the desired functionality. The interface/s can be made available through any of the normal mechanisms - such as through a constant buffer variable. Additionally [`__global`](#actual-global) provides a way to make functions available to Slang code without the need for [context threading](#context-threading).
The example "examples/cpu-com-example" shows this at work.
## <a id="actual-global"/>Global support
The Slang language is based on the HLSL language. This heritage means that globals have slightly different meaning to typical C/C++ usage.
```
int myGlobal; ///< A constant value stored in a constant buffer
static int staticMyGlobal; ///< A global that cannot be seen by the application
static const int staticConstMyGlobal; ///< A fixed value
```
The variable `myGlobal` will be a member of a constant buffer, meaning it's value can only change via bindings and not during execution. For some uses having `myGlobal` in the constant buffer might be appropriate, for example
* It's use is reached from a [shader style](#compile-style) entry point
* It's value is constant across the launch
In Slang a variable can be declared as global in the C/C++ sense via the `__global` modifier. For example
```
__global int myGlobal;
```
Doing so means
* `myGlobal` will not be defined in the constant buffer
* It can be used in functions that do not have access to the [constant buffer](#context-threading)
* It can be modified in the kernel
* Can only be used on CPU targets (currently `__global` is not supported on the GPU targets)
One disadvantage of using `__global` is in multi-threaded environments, with multiple launches on multiple CPU threads, there is only one global and will likely cause problems unless the global value is the same across all threads.
It may be useful to set a global directly via host code, without having to write a function to enable the access. This is possible by using [`public`](#visibility) and [`__extern_cpp`](#visibility) modifiers. For example
```
__global public __extern_cpp int myGlobal;
```
The global can now be set from host code via
```C++
slang::ICompileRequest = ...;
// Get the 'shared library' (note that this doesn't necessarily have to be implemented as a shared library
// it's just an interface to executable code).
ComPtr<ISlangSharedLibrary> sharedLibrary;
SLANG_RETURN_ON_FAIL(request->getTargetHostCallable(0, sharedLibrary.writeRef()));
// Set myGlobal to 20
{
auto myGlobalPtr = (int*)sharedLibrary->findSymbolAddressByName("myGlobal");
*myGlobalPtr = 20;
}
```
In terms of reflection `__global` variables are not visible.
## NativeString
Slang supports a rich 'String' type when using the [host style](#compile-style), which for C++ targets is implemented as the `Slang::String` C++ type. The type is only available on CPU targets that support `slang-rt`.
Some limited String-like support is available via `NativeString` type which for C/C++ CPU targets is equivalent to `const char*`. For GPU targets this will use the same hash mechanism as normally available.
`NativeString` is supported by all [shader compilation styles](#compile-style) including [slang-llvm](#slang-llvm).
TODO(JS): What happens with String with shader compile style on CPU? Shouldn't it be the same as GPU (and reflected as such in reflection)?
## Debugging
It is currently not possible to step into LLVM-JIT code when using [slang-llvm](#slang-llvm). Fortunately it is possible to step into code compiled via a [regular C/C++ compiler](#regular-cpp).
Below is a code snippet showing how to switch to a [regular C/C++ compiler](#regular-cpp) at runtime.
```C++
SlangPassThrough findRegularCppCompiler(slang::IGlobalSession* slangSession)
{
// Current list of 'regular' C/C++ compilers
const SlangPassThrough cppCompilers[] =
{
SLANG_PASS_THROUGH_VISUAL_STUDIO,
SLANG_PASS_THROUGH_GCC,
SLANG_PASS_THROUGH_CLANG,
};
// Do we have a C++ compiler
for (const auto compiler : cppCompilers)
{
if (SLANG_SUCCEEDED(slangSession->checkPassThroughSupport(compiler)))
{
return compile;
}
}
return SLANG_PASS_THROUGH_NONE;
}
SlangResult useRegularCppCompiler(slang::IGlobalSession* session)
{
const auto regularCppCompiler = findRegularCppCompiler(session)
if (regularCppCompiler != SLANG_PASS_THROUGH_NONE)
{
slangSession->setDownstreamCompilerForTransition(SLANG_CPP_SOURCE, SLANG_SHADER_HOST_CALLABLE, regularCppCompiler);
slangSession->setDownstreamCompilerForTransition(SLANG_CPP_SOURCE, SLANG_HOST_HOST_CALLABLE, regularCppCompiler);
return SLANG_OK;
}
return SLANG_FAIL;
}
```
It is generally recommended to use [slang-llvm](#slang-llvm) if that is appropriate, but to switch to using a [regular C/C++ compiler](#regular-cpp) when debugging is needed. This should be largely transparent to most code.
Executing CPU Code
==================
In typical Slang operation when code is compiled it produces either source or a binary that can then be loaded by another API such as a rendering API. With CPU code the binary produced could be saved to a file and then executed as an exe or a shared library/dll. In practice though it is common to want to be able to execute compiled code immediately. Having to save off to a file and then load again can be awkward. It is also not necessarily the case that code needs to be saved to a file to be executed.
To handle being able call code directly, code can be compiled using the [host-callable](#host-callable).
For pass through compilation of C/C++ this mechanism allows any functions marked for export to be directly queried. Marking for export is a C/C++ compiler specific feature. Look at the definition of `SLANG_PRELUDE_EXPORT` in the [C++ prelude](#prelude).
For a complete example on how to execute CPU code using `spGetEntryPointHostCallable`/`getEntryPointHostCallable` look at code in `example/cpu-hello-world`.
<a id="abi"/>Application Binary Interface (ABI)
===
Say we have some Slang source like the following:
```
struct Thing { int a; int b; }
Texture2D<float> tex;
SamplerState sampler;
RWStructuredBuffer<int> outputBuffer;
ConstantBuffer<Thing> thing3;
[numthreads(4, 1, 1)]
void computeMain(
uint3 dispatchThreadID : SV_DispatchThreadID,
uniform Thing thing,
uniform Thing thing2)
{
// ...
}
```
When compiled into a [shader compile style](#compile-style) shared library/dll/host-callable - how is it invoked? An entry point in the Slang source code produces several exported functions. The 'default' exported function has the same name as the entry point in the original source. It has the signature
```
void computeMain(ComputeVaryingInput* varyingInput, UniformEntryPointParams* uniformParams, UniformState* uniformState);
```
NOTE! Using `main` as an entry point name should be avoided if CPU is a target because it typically causes compilation errors due it's normal C/C++ usage.
ComputeVaryingInput is defined in the prelude as
```
struct ComputeVaryingInput
{
uint3 startGroupID;
uint3 endGroupID;
};
```
`ComputeVaryingInput` allows specifying a range of groupIDs to execute - all the ids in a grid from startGroup to endGroup, but not including the endGroupIDs. Most compute APIs allow specifying an x,y,z extent on 'dispatch'. This would be equivalent as having startGroupID = { 0, 0, 0} and endGroupID = { x, y, z }. The exported function allows setting a range of groupIDs such that client code could dispatch different parts of the work to different cores. This group range mechanism was chosen as the 'default' mechanism as it is most likely to achieve the best performance.
There are two other functions that consist of the entry point name postfixed with `_Thread` and `_Group`. For the entry point 'computeMain' these functions would be accessible from the shared library interface as `computeMain_Group` and `computeMain_Thread`. `_Group` has the same signature as the listed for computeMain, but it doesn't execute a range, only the single group specified by startGroupID (endGroupID is ignored). That is all of the threads within the group (as specified by `[numthreads]`) will be executed in a single call.
It may be desirable to have even finer control of how execution takes place down to the level of individual 'thread's and this can be achieved with the `_Thread` style. The signature looks as follows
```
struct ComputeThreadVaryingInput
{
uint3 groupID;
uint3 groupThreadID;
};
void computeMain_Thread(ComputeThreadVaryingInput* varyingInput, UniformEntryPointParams* uniformParams, UniformState* uniformState);
```
When invoking the kernel at the `thread` level it is a question of updating the groupID/groupThreadID, to specify which thread of the computation to execute. For the example above we have `[numthreads(4, 1, 1)]`. This means groupThreadID.x can vary from 0-3 and .y and .z must be 0. That groupID.x indicates which 'group of 4' to execute. So groupID.x = 1, with groupThreadID.x=0,1,2,3 runs the 4th, 5th, 6th and 7th 'thread'. Being able to invoke each thread in this way is flexible - in that any specific thread can specified and executed. It is not necessarily very efficient because there is the call overhead and a small amount of extra work that is performed inside the kernel.
Note that the `_Thread` style signature is likely to change to support 'groupshared' variables in the near future.
In terms of performance the 'default' function is probably the most efficient for most common usages. The `_Group` style allows for slightly less loop overhead, but with many invocations this will likely be drowned out by the extra call/setup overhead. The `_Thread` style in most situations will be the slowest, with even more call overhead, and less options for the C/C++ compiler to use faster paths.
The UniformState and UniformEntryPointParams struct typically vary by shader. UniformState holds 'normal' bindings, whereas UniformEntryPointParams hold the uniform entry point parameters. Where specific bindings or parameters are located can be determined by reflection. The structures for the example above would be something like the following...
```
struct UniformEntryPointParams
{
Thing thing;
Thing thing2;
};
struct UniformState
{
Texture2D<float > tex;
SamplerState sampler;
RWStructuredBuffer<int32_t> outputBuffer;
Thing* thing3;
};
```
Notice that of the entry point parameters `dispatchThreadID` is not part of UniformEntryPointParams and this is because it is not uniform.
`ConstantBuffer` and `ParameterBlock` will become pointers to the type they hold (as `thing3` is in the above structure).
`StructuredBuffer<T>`,`RWStructuredBuffer<T>` become
```
T* data;
size_t count;
```
`ByteAddressBuffer`, `RWByteAddressBuffer` become
```
uint32_t* data;
size_t sizeInBytes;
```
Resource types become pointers to interfaces that implement their features. For example `Texture2D` become a pointer to a `ITexture2D` interface that has to be implemented in client side code. Similarly SamplerState and SamplerComparisonState become `ISamplerState` and `ISamplerComparisonState`.
The actual definitions for the interfaces for resource types, and types are specified in 'slang-cpp-types.h' in the `prelude` directory.
## Unsized arrays
Unsized arrays can be used, which are indicated by an array with no size as in `[]`. For example
```
RWStructuredBuffer<int> arrayOfArrays[];
```
With normal 'sized' arrays, the elements are just stored contiguously within wherever they are defined. With an unsized array they map to `Array<T>` which is...
```
T* data;
size_t count;
```
Note that there is no method in the shader source to get the `count`, even though on the CPU target it is stored and easily available. This is because of the behavior on GPU targets
* That the count has to be stored elsewhere (unlike with CPU)
* On some GPU targets there is no bounds checking - accessing outside the bound values can cause *undefined behavior*
* The elements may be laid out *contiguously* on GPU
In practice this means if you want to access the `count` in shader code it will need to be passed by another mechanism - such as within a constant buffer. It is possible in the future support may be added to allow direct access of `count` work across targets transparently.
It is perhaps worth noting that the CPU allows us to have an indirection (a pointer to the unsized arrays contents) which has the potential for more flexibility than is possible on GPU targets. GPU target typically require the elements to be placed 'contiguously' from their location in their `container` - be that registers or in memory. This means on GPU targets there may be other restrictions on where unsized arrays can be placed in a structure for example, such as only at the end. If code needs to work across targets this means these restrictions will need to be followed across targets.
## <a id="context-threading"/>Context Threading
The [shader compile style](#compile-style) brings some extra issues to bare. In the HLSL compute kernel launch model application visible variables and resource are bound. As described in the [ABI](#abi) section these bindings and additional information identifying a compute thread are passed into the launch as a context. Take for example the code snippet below
```
int myGlobal;
int myFunc(int v)
{
return myGlobal + v;
}
int anotherFunc(int a, int b)
{
return a + b;
}
[numthreads(4, 1, 1)]
void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID)
{
outputBuffer[dispatchThreadID.x] = myFunc(dispatchThreadID.x) + anotherFunc(1, dispatchThreadID.y);
}
```
The function `myFunc` accesses a variable `myGlobal` that is held within a constant buffer. The function cannot be meaningfully executed without access to the context, and the context is available as a parameter passed through `computeMain` entry point at launch. This means the *actual* signature of this function in output code will be something like
```
int32_t myFunc_0(KernelContext_0 * kernelContext_0)
{
return *(&(*(&kernelContext_0->globalParams_0))->myGlobal_0) + int(1);
}
```
The context parameter has been *threaded* into this function. This *threading* will happen to any function that accesses any state that is held in the context. This behavior also happens transitively - if a function *could* call *any* another function that requires the context, the context will be threaded through to it also.
If application code assumed `myFunc` could be called with no parameters a crash would likely ensue. Note that `anotherFunc` does not have the issue because it doesn't perform an access that needs the context, and so no context threading is added.
If a global is desired in a function that wants to be called from the application, the [`__global`](#actual-global) modifier can be used.
## <a id="prelude"/>Prelude
For C++ targets, there is code to support the Slang generated source defined within the 'prelude'. The prelude is inserted text placed before the Slang generated C++ source. For the Slang command line tools as well as the test infrastructure, the prelude functionality is achieved through a `#include` in the prelude text of the `prelude/slang-cpp-prelude.h` specified with an absolute path. Doing so means other files the `slang-cpp-prelude.h` might need can be specified relatively, and include paths for the backend C/C++ compiler do not need to be modified.
The prelude needs to define
* 'Built in' types (vector, matrix, 'object'-like Texture, SamplerState etc)
* Scalar intrinsic function implementations
* Compiler based definations/tweaks
For the Slang prelude this is split into the following files...
* 'prelude/slang-cpp-prelude.h' - Header that includes all the other requirements & some compiler tweaks
* 'prelude/slang-cpp-scalar-intrinsics.h' - Scalar intrinsic implementations
* 'prelude/slang-cpp-types.h' - The 'built in types'
* 'slang.h' - Slang header is used for majority of compiler based definitions
For a client application - as long as the requirements of the generated code are met, the prelude can be implemented by whatever mechanism is appropriate for the client. For example the implementation could be replaced with another implementation, or the prelude could contain all of the required text for compilation. Setting the prelude text can be achieved with the method on the global session...
```
/** Set the 'prelude' for generated code for a 'downstream compiler'.
@param passThrough The downstream compiler for generated code that will have the prelude applied to it.
@param preludeText The text added pre-pended verbatim before the generated source
That for pass-through usage, prelude is not pre-pended, preludes are for code generation only.
*/
virtual SLANG_NO_THROW void SLANG_MCALL setDownstreamCompilerPrelude(
SlangPassThrough passThrough,
const char* preludeText) = 0;
```
It may be useful to be able to include `slang-cpp-types.h` in C++ code to access the types that are used in the generated code. This introduces a problem in that the types used in the generated code might clash with types in client code. To work around this problem, you can wrap all of the types defined in the prelude with a namespace of your choosing. For example
```
#define SLANG_PRELUDE_NAMESPACE CPPPrelude
#include "../../prelude/slang-cpp-types.h"
```
Would wrap all the Slang prelude types in the namespace `CPPPrelude`, such that say a `StructuredBuffer<int32_t>` could be specified in C++ source code as `CPPPrelude::StructuredBuffer<int32_t>`.
The code that sets up the prelude for the test infrastructure and command line usage can be found in ```TestToolUtil::setSessionDefaultPrelude```. Essentially this determines what the absolute path is to `slang-cpp-prelude.h` is and then just makes the prelude `#include "the absolute path"`.
The *default* prelude is set to the contents of the files for C++ held in the prelude directory and is held within the Slang shared library. It is therefore typically not necessary to distribute Slang with prelude files.
Language aspects
================
# Arrays passed by Value
Slang follows the HLSL convention that arrays are passed by value. This is in contrast the C/C++ where arrays are passed by reference. To make generated C/C++ follow this convention an array is turned into a 'FixedArray' struct type. Sinces classes by default in C/C++ are passed by reference the wrapped array is also.
To get something similar to C/C++ operation the array can be marked `inout` to make it passed by reference.
Limitations
===========
# <a id="out-of-bounds"/>Out of bounds access
In HLSL code if an access is made out of bounds of a StructuredBuffer, execution proceceeds. If an out of bounds read is performed, a zeroed value is returned. If an out of bounds write is performed it's effectively a noop, as the value is discarded. On the CPU target this behavior is *not* supported by default.
For a debug CPU build an out of bounds access will assert, for a release build the behaviour is by default undefined. A limited Limited [zero index](#zero-index) out of bounds mechanism is supported, but must be enabled.
The reason for this is that such an access is difficult and/or slow to implement the identical GPU behavior on the CPU. The underlying problem is `operator[]` typically returns a reference to the contained value. If this is out of bounds - it's not clear what to return, in particular because the value may be read or written and moreover elements of the type might be written. In practice this means a global zeroed value cannot be returned.
This could be somewhat supported if code gen worked as followed for say
```
RWStructuredBuffer<float4> values;
values[3].x = 10;
```
Produces
```
template <typename T>
struct RWStructuredBuffer
{
T& at(size_t index, T& defValue) { return index < size ? values[index] : defValue; }
T* values;
size_t size;
};
RWStructuredBuffer<float4> values;
// ...
Vector<float, 3> defValue = {}; // Zero initialize such that read access returns default values
values.at(3).x = 10;
```
Note that '[] 'would be turned into the `at` function, which takes the default value as a parameter provided by the caller. If this is then written to then only the defValue is corrupted. Even this mechanism not be quite right, because if we write and then read again from the out of bounds reference in HLSL we may expect that 0 is returned, whereas here we get the value that was last written.
## <a id="zero-index"/>Zero index bound checking
If bounds checking is wanted in order to avoid undefined behavior and limit how memory is accessed `zero indexed` bounds checking might be appropriate. When enabled if an access is out of bounds the value at the zero index is returned. This is quite different behavior than the typical GPU behavior, but is fairly efficient and simple to implement. Importantly it means behavior is well defined and always 'in range' assuming there is an element.
To enable zero indexing bounds checking pass in the define `SLANG_ENABLE_BOUND_ZERO_INDEX` to a Slang compilation. This define is passed down to C++ and CUDA compilations, and the code in the CUDA and C++ preludes implement the feature. Note that zero indexed bounds checking will slow down accesses that are checked.
The C++ implementation of the feature can be seen by looking at the file "prelude/slang-cpp-types.h". For CUDA "prelude/slang-cuda-prelude.h".
The bounds checking macros are guarded such it is possible to replace the implementations, without directly altering the prelude.
TODO
====
# Main
* groupshared is not yet supported
* Output of header files
* Output multiple entry points
# Internal Slang compiler features
These issues are more internal Slang features/improvements
* Currently only generates C++ code, it would be fairly straight forward to support C (especially if we have 'intrinsic definitions')
* Have 'intrinsic definitions' in standard library - such that they can be generated where appropriate
+ This will simplify the C/C++ code generation as means Slang language will generate must of the appropriate code
* Currently 'construct' IR inst is supported as is, we may want to split out to separate instructions for specific scenarios
* Refactoring around swizzle. Currently in emit it has to check for a variety of scenarios - could be simplified with an IR pass and perhaps more specific instructions.

View File

@@ -0,0 +1,333 @@
Slang CUDA Target Support
=========================
Slang has preliminary support for producing CUDA source, and PTX binaries using [NVRTC](https://docs.nvidia.com/cuda/nvrtc/index.html).
NOTE! NVRTC is only available for 64-bit operating systems. On Windows Visual Studio make sure you are compiling for 'x64' and/or use 64 bit Slang binaries.
# Features
* Can compile Slang source into CUDA source code
* Supports compute style shaders
* Supports a 'bindless' CPU like model
* Can compile CUDA source to PTX through 'pass through' mechansism
# Limitations
These limitations apply to Slang transpiling to CUDA.
* Only supports the 'texture object' style binding (The texture object API is only supported on devices of compute capability 3.0 or higher. )
* Samplers are not separate objects in CUDA - they are combined into a single 'TextureObject'. So samplers are effectively ignored on CUDA targets.
* When using a TextureArray.Sample (layered texture in CUDA) - the index will be treated as an int, as this is all CUDA allows
* Care must be used in using `WaveGetLaneIndex` wave intrinsic - it will only give the right results for appropriate launches
* CUDA 'surfaces' are used for textures which are read/write (aka RWTexture).
The following are a work in progress or not implemented but are planned to be so in the future
* Some resource types remain unsupported, and not all methods on all types are supported
# How it works
For producing PTX binaries Slang uses [NVRTC](https://docs.nvidia.com/cuda/nvrtc/index.html). NVRTC dll/shared library has to be available to Slang (for example in the appropriate PATH for example) for it to be able to produce PTX.
The NVRTC compiler can be accessed directly via the pass through mechanism and is identified by the enum value `SLANG_PASS_THROUGH_NVRTC`.
Much like other targets that use downstream compilers Slang can be used to compile CUDA source directly to PTX via the pass through mechansism. The Slang command line options will broadly be mapped down to the appropriate options for the NVRTC compilation. In the API the `SlangCompileTarget` for CUDA is `SLANG_CUDA_SOURCE` and for PTX is `SLANG_PTX`. These can also be specified on the Slang command line as `-target cuda` and `-target ptx`.
## Locating NVRTC
Finding NVRTC can require some nuance if a specific version is required. On the command line the `-nvrtc-path` option can be used to set the `path` to NVRTC. Also `spProcessCommandLineArguments`/`processCommandLineArguments` with `-nvrtc-path` or `setDownstreamCompilerPath` with `SLANG_PASS_THROUGH_NVRTC` can be used to set the location and/or name of NVRTC via the API.
Important points of note are
* The name of the shared library should *not* include any extension (such as `.dll`/`.so`/`.dynlib`) or prefix (such as `lib`).
* The path also *doesn't* have to be path, it can just be the shared library name. Doing so will mean it will be searched for by whatever the default mechanism is on the target.
* If a path and/or name is specified for NVRTC - this will be the *only* version searched for.
If a path/name is *not* specified for NVRTC, Slang will attempt to load a shared library called `nvrtc`. For non Windows targets this should be enough to find and load the latest version.
On Windows NVRTC dlls have a name the contains the version number, for example `nvrtc64_102_0.dll`. This will lead to the load of just `nvrtc` to fail. One approach to fix this is to place the NVRTC dll and associated files in the same directory as `slang-compiler.dll`, and rename the main dll to `nvrtc.dll`. Another approach is to specify directly on the command line the name including the version, as previously discussed. For example
`-nvrtc-path nvrtc64_102_0`
will load NVRTC 10.2 assuming that version of the dll can be found via the normal lookup mechanism.
On Windows if NVRTC is not loadable directly as 'nvrtc' Slang will attempt to search for the newest version of NVRTC on your system. The places searched are...
* The instance directory (where the slang-compiler.dll and/or program exe is)
* The CUDA_PATH enivonment variable (if set)
* Directories in PATH that look like a CUDA installation.
If a candidate is found via an earlier mechanism, subsequent searches are not performed. If multiple candidates are found, Slang tries the newest version first.
Binding
=======
Say we have some Slang source like the following:
```
struct Thing { int a; int b; }
Texture2D<float> tex;
SamplerState sampler;
RWStructuredBuffer<int> outputBuffer;
ConstantBuffer<Thing> thing3;
[numthreads(4, 1, 1)]
void computeMain(
uint3 dispatchThreadID : SV_DispatchThreadID,
uniform Thing thing,
uniform Thing thing2)
{
// ...
}
```
This will be turned into a CUDA entry point with
```
struct UniformEntryPointParams
{
Thing thing;
Thing thing2;
};
struct UniformState
{
CUtexObject tex; // This is the combination of a texture and a sampler(!)
SamplerState sampler; // This variable exists within the layout, but it's value is not used.
RWStructuredBuffer<int32_t> outputBuffer; // This is implemented as a template in the CUDA prelude. It's just a pointer, and a size
Thing* thing3; // Constant buffers map to pointers
};
// [numthreads(4, 1, 1)]
extern "C" __global__ void computeMain(UniformEntryPointParams* params, UniformState* uniformState)
```
With CUDA - the caller specifies how threading is broken up, so `[numthreads]` is available through reflection, and in a comment in output source code but does not produce varying code.
The UniformState and UniformEntryPointParams struct typically vary by shader. UniformState holds 'normal' bindings, whereas UniformEntryPointParams hold the uniform entry point parameters. Where specific bindings or parameters are located can be determined by reflection. The structures for the example above would be something like the following...
`StructuredBuffer<T>`,`RWStructuredBuffer<T>` become
```
T* data;
size_t count;
```
`ByteAddressBuffer`, `RWByteAddressBuffer` become
```
uint32_t* data;
size_t sizeInBytes;
```
## Texture
Read only textures will be bound as the opaque CUDA type CUtexObject. This type is the combination of both a texture AND a sampler. This is somewhat different from HLSL, where there can be separate `SamplerState` variables. This allows access of a single texture binding with different types of sampling.
If code relies on this behavior it will be necessary to bind multiple CtexObjects with different sampler settings, accessing the same texture data.
Slang has some preliminary support for TextureSampler type - a combined Texture and SamplerState. To write Slang code that can target CUDA and other platforms using this mechanism will expose the semantics appropriately within the source.
Load is only supported for Texture1D, and the mip map selection argument is ignored. This is because there is tex1Dfetch and no higher dimensional equivalents. CUDA also only allows such access if the backing array is linear memory - meaning the bound texture cannot have mip maps - thus making the mip map parameter superfluous anyway. RWTexture does allow Load on other texture types.
## RWTexture
RWTexture types are converted into CUsurfObject type.
In regular CUDA it is not possible to do a format conversion on an access to a CUsurfObject. Slang does add support for hardware write conversions where they are available. To enable the feature it is necessary to attribute your RWTexture with `format`. For example
```
[format("rg16f")]
RWTexture2D<float2> rwt2D_2;
```
The format names used are the same as for [GLSL layout format types](https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)). If no format is specified Slang will *assume* that the format is the same as the type specified.
Note that the format attribution is on variables/parameters/fields and not part of the type system. This means that if you have a scenario like...
```
[format(rg16f)]
RWTexture2d<float2> g_texture;
float2 getValue(RWTexture2D<float2> t)
{
return t[int2(0, 0)];
}
void doThing()
{
float2 v = getValue(g_texture);
}
```
Even `getValue` will receive t *without* the format attribute, and so will access it, presumably erroneously. A workaround for this specific scenario would be to attribute the parameter
```
float2 getValue([format("rg16f")] RWTexture2D<float2> t)
{
return t[int2(0, 0)];
}
```
This will only work correctly if `getValue` is called with a `t` that has that format attribute. As it stands no checking is performed on this matching so no error or warning will be produced if there is a mismatch.
There is limited software support for doing a conversion on reading. Currently this only supports only 1D, 2D, 3D RWTexture, backed with half1, half2 or half4. For this path to work NVRTC must have the `cuda_fp16.h` and associated files available. Please check the section on `Half Support`.
If hardware read conversions are desired, this can be achieved by having a Texture<T> that uses the surface of a RWTexture<T>. Using the Texture<T> not only allows hardware conversion but also filtering.
It is also worth noting that CUsurfObjects in CUDA are NOT allowed to have mip maps.
By default surface access uses cudaBoundaryModeZero, this can be replaced using the macro SLANG_CUDA_BOUNDARY_MODE in the CUDA prelude. For HW format conversions the macro SLANG_PTX_BOUNDARY_MODE. These boundary settings are in effect global for the whole of the kernel.
`SLANG_CUDA_BOUNDARY_MODE` can be one of
* cudaBoundaryModeZero causes an execution trap on out-of-bounds addresses
* cudaBoundaryModeClamp stores data at the nearest surface location (sized appropriately)
* cudaBoundaryModeTrap drops stores to out-of-bounds addresses
`SLANG_PTX_BOUNDARY_MODE` can be one of `trap`, `clamp` or `zero`. In general it is recommended to have both set to the same type of value, for example `cudaBoundaryModeZero` and `zero`.
## Sampler
Samplers are in effect ignored in CUDA output. Currently we do output a variable `SamplerState`, but this value is never accessed within the kernel and so can be ignored. More discussion on this behavior is in `Texture` section.
## Unsized arrays
Unsized arrays can be used, which are indicated by an array with no size as in `[]`. For example
```
RWStructuredBuffer<int> arrayOfArrays[];
```
With normal 'sized' arrays, the elements are just stored contiguously within wherever they are defined. With an unsized array they map to `Array<T>` which is...
```
T* data;
size_t count;
```
Note that there is no method in the shader source to get the `count`, even though on the CUDA target it is stored and easily available. This is because of the behavior on GPU targets
* That the count has to be stored elsewhere (unlike with CUDA)
* On some GPU targets there is no bounds checking - accessing outside the bound values can cause *undefined behavior*
* The elements may be laid out *contiguously* on GPU
In practice this means if you want to access the `count` in shader code it will need to be passed by another mechanism - such as within a constant buffer. It is possible in the future support may be added to allow direct access of `count` work across targets transparently.
## Prelude
For CUDA the code to support the code generated by Slang is partly defined within the 'prelude'. The prelude is inserted text placed before the generated CUDA source code. For the Slang command line tools as well as the test infrastructure, the prelude functionality is achieved through a `#include` in the prelude text of the `prelude/slang-cuda-prelude.h` specified with an absolute path. Doing so means other files the `slang-cuda-prelude.h` might need can be specified relatively, and include paths for the backend compiler do not need to be modified.
The prelude needs to define
* 'Built in' types (vector, matrix, 'object'-like Texture, SamplerState etc)
* Scalar intrinsic function implementations
* Compiler based definations/tweaks
For a client application - as long as the requirements of the generated code are met, the prelude can be implemented by whatever mechanism is appropriate for the client. For example the implementation could be replaced with another implementation, or the prelude could contain all of the required text for compilation. Setting the prelude text can be achieved with the method on the global session...
```
/** Set the 'prelude' for generated code for a 'downstream compiler'.
@param passThrough The downstream compiler for generated code that will have the prelude applied to it.
@param preludeText The text added pre-pended verbatim before the generated source
That for pass-through usage, prelude is not pre-pended, preludes are for code generation only.
*/
void setDownstreamCompilerPrelude(SlangPassThrough passThrough, const char* preludeText);
```
The code that sets up the prelude for the test infrastructure and command line usage can be found in ```TestToolUtil::setSessionDefaultPrelude```. Essentially this determines what the absolute path is to `slang-cpp-prelude.h` is and then just makes the prelude `#include "the absolute path"`.
Half Support
============
Slang supports the half/float16 types on CUDA. To do so NVRTC must have access to the `cuda_fp16.h` and `cuda_fp16.hpp` files that are typically distributed as part of the CUDA SDK. When Slang detects the use of half in source, it will define `SLANG_CUDA_ENABLE_HALF` when `slang-cuda-prelude.h` is included. This will in turn try to include `cuda_fp16.h` and enable extra functionality within the prelude for half support.
Slang tries several mechanisms to locate `cuda_fp16.h` when NVRTC is initiated. The first mechanism is to look in the include paths that are passed to Slang. If `cuda_fp16.h` can be found in one of these paths, no more searching will be performed.
If this fails, the path where NVRTC is located will be searched. In that path "include" and "CUDA/include" paths will be searched. This is probably most suitable for Windows based targets, where NVRTC dll is placed along with other binaries. The "CUDA/include" path is used to try and make clear in this scenario what the contained files are for.
If this fails Slang will look for the CUDA_PATH environmental variable, as is typically set during a CUDA SDK installation.
If this fails - the prelude include of `cuda_fp16.h` will most likely fail on NVRTC invocation.
CUDA has the `__half` and `__half2` types defined in `cuda_fp16.h`. The `__half2` can produce results just as quickly as doing the same operation on `__half` - in essence for some operations `__half2` is [SIMD](https://en.wikipedia.org/wiki/SIMD) like. The half implementation in Slang tries to take advantage of this optimization.
Since Slang supports up to 4 wide vectors Slang has to build on CUDAs half support. The types `__half3` and `__half4` are implemented in `slang-cuda-prelude.h` for this reason. It is worth noting that `__half3` is made up of a `__half2` and a `__half`. As `__half2` is 4 byte aligned, this means `__half3` is actually 8 bytes, rather than 6 bytes that might be expected.
One area where this optimization isn't fully used is in comparisons - as in effect Slang treats all the vector/matrix half comparisons as if they are scalar. This could be perhaps be improved on in the future. Doing so would require using features that are not directly available in the CUDA headers.
Wave Intrinsics
===============
There is broad support for [HLSL Wave intrinsics](https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/hlsl-shader-model-6-0-features-for-direct3d-12), including support for [SM 6.5 intrinsics](https://microsoft.github.io/DirectX-Specs/d3d/HLSL_ShaderModel6_5.html).
Most Wave intrinsics will work with vector, matrix or scalar types of typical built in types - `uint`, `int`, `float`, `double`, `uint64_t`, `int64_t`.
The support is provided via both the Slang core module as well as the Slang CUDA prelude found in 'prelude/slang-cuda-prelude.h'. Many Wave intrinsics are not directly applicable within CUDA which supplies a more low level mechanisms. The implementation of most Wave functions work most optimally if a 'Wave' where all lanes are used. If all lanes from index 0 to pow2(n) -1 are used (which is also true if all lanes are used) a binary reduction is typically applied. If this is not the case the implementation fallsback on a slow path which is linear in the number of active lanes, and so is typically significantly less performant.
For more a more concrete example take
```
int sum = WaveActiveSum(...);
```
When computing the sum, if all lanes (32 on CUDA), the computation will require 5 steps to complete (2^5 = 32). If say just one lane is not being used it will take 31 steps to complete (because it is now linear in amount of lanes). So just having one lane disabled required 6 times as many steps. If lanes with 0 - 15 are active, it will take 4 steps to complete (2^4 = 16).
In the future it may be possible to improve on the performance of the 'slow' path, however it will always remain the most efficient generally for all of 0 to pow2(n) - 1 lanes to be active.
It is also worth noting that lane communicating intrinsics performance will be impacted by the 'size' of the data communicated. The size here is at a minimum the amount of built in scalar types used in the processing. The CUDA language only allows direct communication with built in scalar types.
Thus
```
int3 v = ...;
int3 sum = WaveActiveSum(v);
```
Will require 3 times as many steps as the earlier scalar example just using a single int.
## WaveGetLaneIndex
'WaveGetLaneIndex' defaults to `(threadIdx.x & SLANG_CUDA_WARP_MASK)`. Depending on how the kernel is launched this could be incorrect. There are other ways to get lane index, for example using inline assembly. This mechanism though is apparently slower than the simple method used here. There is support for using the asm mechanism in the CUDA prelude using the `SLANG_USE_ASM_LANE_ID` preprocessor define to enable the feature.
There is potential to calculate the lane id using the [numthreads] markup in Slang/HLSL, but that also requires some assumptions of how that maps to a lane index.
## Unsupported Intrinsics
* Intrinsics which only work in pixel shaders
+ QuadXXXX intrinsics
OptiX Support
=============
Slang supports OptiX for raytracing. To compile raytracing programs, NVRTC must have access to the `optix.h` and dependent files that are typically distributed as part of the OptiX SDK. When Slang detects the use of raytracing in source, it will define `SLANG_CUDA_ENABLE_OPTIX` when `slang-cuda-prelude.h` is included. This will in turn try to include `optix.h`.
Slang tries several mechanisms to locate `optix.h` when NVRTC is initiated. The first mechanism is to look in the include paths that are passed to Slang. If `optix.h` can be found in one of these paths, no more searching will be performed.
If this fails, the default OptiX SDK install locations are searched. On Windows this is `%{PROGRAMDATA}\NVIDIA Corporation\OptiX SDK X.X.X\include`. On Linux this is `${HOME}/NVIDIA-OptiX-SDK-X.X.X-suffix`.
If OptiX headers cannot be found, compilation will fail.
Limitations
===========
Some features are not available because they cannot be mapped with appropriate behavior to a target. Other features are unavailable because of resources to devote to more unusual features.
* Not all Wave intrinsics are supported
* There is not complete support for all methods on 'objects' like textures etc.
* Does not currently support combined 'TextureSampler'. A Texture behaves equivalently to a TextureSampler and Samplers are ignored.
* Half type is not currently supported
* GetDimensions is not available on any Texture type currently - as there doesn't appear to be a CUDA equivalent
Language aspects
================
# Arrays passed by Value
Slang follows the HLSL convention that arrays are passed by value. This is in contrast with CUDA where arrays follow C++ conventions and are passed by reference. To make generated CUDA follow this convention an array is turned into a 'FixedArray' struct type.
To get something more similar to CUDA/C++ operation the array can be marked in out or inout to make it passed by reference.

View File

@@ -0,0 +1,70 @@
# Debugging Slang
This document gives examples showing how to run debuggers in the Slang codebase.
Follow the [Building Slang From Source](/docs/building.md) instructions first.
## Visual Studio
This repo includes multiple `*.natvis` files which Visual Studio picks up
automatically; no extra configuration is required.
## LLDB
If you use [LLDB][], we provide a `.lldbinit` file which enables data formatters
for types in the Slang codebase. You can use this with LLDB in your terminal via
the [`--local-lldbinit`][] flag; for example:
```
$ cmake --build --preset debug
$ lldb --local-lldbinit build/Debug/bin/slangc -- tests/byte-code/hello.slang -dump-ir
(lldb) breakpoint set --name dumpIR
(lldb) run
```
LLDB can be used with either GCC or Clang, but Clang seems to behave better
about respecting breakpoint locations and not having missing variables.
### VS Code
If instead you prefer to debug within VS Code, you can run LLDB via the
[CodeLLDB][] extension. For example, to recreate the same debugging session as
above, create a `.vscode/tasks.json` file with these contents:
```json
{
"version": "2.0.0",
"tasks": [
{
"label": "Debug build",
"type": "shell",
"command": "cmake",
"args": ["--build", "--preset", "debug"]
}
]
}
```
Then create a `.vscode/launch.json` file with these contents:
```json
{
"version": "0.2.0",
"configurations": [
{
"name": "LLDB",
"preLaunchTask": "Debug build",
"type": "lldb",
"request": "launch",
"initCommands": ["command source .lldbinit"],
"program": "build/Debug/bin/slangc",
"args": ["tests/byte-code/hello.slang", "-dump-ir"]
}
]
}
```
Finally, place any breakpoints you want, and hit F5.
[`--local-lldbinit`]: https://lldb.llvm.org/man/lldb.html#cmdoption-lldb-local-lldbinit
[codelldb]: https://marketplace.visualstudio.com/items?itemName=vadimcn.vscode-lldb
[lldb]: https://lldb.llvm.org/index.html

View File

@@ -0,0 +1,814 @@
---
layout: deprecated
permalink: "docs/user-guide/a1-02-slangpy"
---
Using Slang to Write PyTorch Kernels
=========================================================
> #### Note
> This documentation is about `slang-torch`, a way to use Slang with Python and PyTorch.
> For new projects, we recommend exploring <a href="https://slangpy.shader-slang.org">SlangPy</a> as an alternative.
> We plan to deprecate `slang-torch` in favor of SlangPy in the near future, and we will communicate any plans in advance.
If you are a PyTorch user seeking to write complex, high-performance, and automatically differentiated kernel functions using a per-thread programming model, we invite you to try Slang. Slang is a cutting-edge shading language that provides a straightforward way to define kernel functions that run incredibly fast in graphics applications. With the latest addition of automatic differentiation and PyTorch interop features, Slang offers an efficient solution for developing auto-differentiated kernels that run at lightning speed with a strongly typed, per-thread programming model.
One of the primary advantages of a per-thread programming model in kernel programming is the elimination of concerns regarding maintaining masks for branches. When developing a kernel in Slang, you can use all control flow statements, composite data types (structs, arrays, etc.), and function calls without additional effort. Code created with these language constructs can be automatically differentiated by the compiler without any restrictions. Additionally, Slang is a strongly typed language, which ensures that you will never encounter type errors at runtime. Most code errors can be identified as you type thanks to the [compiler's coding assistance service](https://marketplace.visualstudio.com/items?itemName=shader-slang.slang-language-extension), further streamlining the development process.
In addition, using a per-thread programming model also results in more optimized memory usage. When writing a kernel in Slang, most intermediate results do not need to be written out to global memory and then read back, reducing global memory bandwidth consumption and the delay caused by these memory operations. As a result, a Slang kernel can typically run at higher efficiency compared to the traditional bulk-synchronous programming model.
## Getting Started with SlangTorch
In this tutorial, we will use a simple example to walk through the steps to use Slang in your PyTorch project.
### Installation
`slangtorch` is available via PyPI, so you can install it simply through
```sh
pip install slangtorch
```
Note that `slangtorch` requires `torch` with CUDA support. See the [pytorch](https://pytorch.org/) installation page to find the right version for your platform.
You can check that you have the right installation by running:
```sh
python -c "import torch; print(f'cuda: {torch.cuda.is_available()}')"
```
### Writing Slang kernels for `slangtorch` >= **v1.1.5**
From **v2023.4.0**, Slang supports auto-binding features that make it easier than ever to invoke Slang kernels from python, and interoperate seamlessly with `pytorch` tensors.
Here's a barebones example of a simple squaring kernel written in Slang (`square.slang`):
```csharp
[AutoPyBindCUDA]
[CUDAKernel]
void square(TensorView<float> input, TensorView<float> output)
{
// Get the 'global' index of this thread.
uint3 dispatchIdx = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
// If the thread index is beyond the input size, exit early.
if (dispatchIdx.x >= input.size(0))
return;
output[dispatchIdx.x] = input[dispatchIdx.x] * input[dispatchIdx.x];
}
```
This code follows the standard pattern of a typical CUDA kernel function. It takes as input
two tensors, `input` and `output`.
It first obtains the global dispatch index of the current thread and performs range check to make sure we don't read or write out
of the bounds of input and output tensors, and then calls `square()` to compute the per-element result, and
store it at the corresponding location in `output` tensor.
`slangtorch` works by compiling kernels to CUDA and it identifies the functions to compile by checking for the `[CUDAKernel]` attribute.
The second attribute `[AutoPyBindCUDA]` allows us to call `square` directly from python without having to write any host code. If you would like to write the host code yourself for finer control, see the other version of this example [here](#manually-binding-kernels).
You can now simply invoke this kernel from python:
```python
import torch
import slangtorch
m = slangtorch.loadModule('square.slang')
A = torch.randn((1024,), dtype=torch.float).cuda()
output = torch.zeros_like(A).cuda()
# Number of threads launched = blockSize * gridSize
m.square(input=A, output=output).launchRaw(blockSize=(32, 1, 1), gridSize=(64, 1, 1))
print(output)
```
The python script `slangtorch.loadModule("square.slang")` returns a scope that contains a handle to the `square` kernel.
The kernel can be invoked by
1. calling `square` and binding `torch` tensors as arguments for the kernel, and then
2. launching it using `launchRaw()` by specifying CUDA launch arguments to `blockSize` & `gridSize`. (Refer to the [CUDA documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications) for restrictions around `blockSize`)
Note that for semantic clarity reasons, calling a kernel requires the use of keyword arguments with names that are lifted from the `.slang` implementation.
### Invoking derivatives of kernels using slangtorch
The `[AutoPyBindCUDA]` attribute can also be used on differentiable functions defined in Slang, and will automatically bind the derivatives. To do this, simply add the `[Differentiable]` attribute.
One key point is that the basic `TensorView<T>` objects are not differentiable. They can be used as buffers for data that does not require derivatives, or even as buffers for the manual accumulation of derivatives.
Instead, use the `DiffTensorView` type for when you need differentiable tensors. Currently, `DiffTensorView` only supports the `float` dtype variety.
Here's a barebones example of a differentiable version of `square`:
```csharp
[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void square(DiffTensorView input, DiffTensorView output)
{
uint3 dispatchIdx = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
if (dispatchIdx.x >= input.size(0))
return;
output[dispatchIdx.x] = input[dispatchIdx.x] * input[dispatchIdx.x];
}
```
Now, `slangtorch.loadModule("square.slang")` returns a scope with three callable handles `square`, `square.fwd` for the forward-mode derivative & `square.bwd` for the reverse-mode derivative.
You can invoke `square()` normally to get the same effect as the previous example, or invoke `square.fwd()` / `square.bwd()` by binding pairs of tensors to compute the derivatives.
```python
import torch
import slangtorch
m = slangtorch.loadModule('square.slang')
input = torch.tensor((0, 1, 2, 3, 4, 5), dtype=torch.float).cuda()
output = torch.zeros_like(input).cuda()
# Invoke normally
m.square(input=input, output=output).launchRaw(blockSize=(6, 1, 1), gridSize=(1, 1, 1))
print(output)
# Invoke reverse-mode autodiff by first allocating tensors to hold the gradients
input = torch.tensor((0, 1, 2, 3, 4, 5), dtype=torch.float).cuda()
input_grad = torch.zeros_like(input).cuda()
output = torch.zeros_like(input)
# Pass in all 1s as the output derivative for our example
output_grad = torch.ones_like(output)
m.square.bwd(
input=(input, input_grad), output=(output, output_grad)
).launchRaw(
blockSize=(6, 1, 1), gridSize=(1, 1, 1))
# Derivatives get propagated to input_grad
print(input_grad)
# Note that the derivatives in output_grad are 'consumed'.
# i.e. all zeros after the call.
print(output_grad)
```
`slangtorch` also binds the forward-mode version of your kernel (propagate derivatives of inputs to the output) which can be invoked the same way using `module.square.fwd()`
You can refer to [this documentation](autodiff) for a detailed reference of Slang's automatic differentiation feature.
### Wrapping your kernels as pytorch functions
`pytorch` offers an easy way to define a custom operation using `torch.autograd.Function`, and defining the `.forward()` and `.backward()` members.
This can be a very helpful way to wrap your Slang kernels as pytorch-compatible operations. Here's an example of the `square` kernel as a differentiable pytorch function.
```python
import torch
import slangtorch
m = slangtorch.loadModule("square.slang")
class MySquareFunc(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
output = torch.zeros_like(input)
kernel_with_args = m.square(input=input, output=output)
kernel_with_args.launchRaw(
blockSize=(32, 32, 1),
gridSize=((input.shape[0] + 31) // 32, (input.shape[1] + 31) // 32, 1))
ctx.save_for_backward(input, output)
return output
@staticmethod
def backward(ctx, grad_output):
(input, output) = ctx.saved_tensors
input_grad = torch.zeros_like(input)
# Note: When using DiffTensorView, grad_output gets 'consumed' during the reverse-mode.
# If grad_output may be reused, consider calling grad_output = grad_output.clone()
#
kernel_with_args = m.square.bwd(input=(input, input_grad), output=(output, grad_output))
kernel_with_args.launchRaw(
blockSize=(32, 32, 1),
gridSize=((input.shape[0] + 31) // 32, (input.shape[1] + 31) // 32, 1))
return input_grad
```
Now we can use the autograd function `MySquareFunc` in our python script:
```python
x = torch.tensor((3.0, 4.0), requires_grad=True, device='cuda')
print(f"X = {x}")
y_pred = MySquareFunc.apply(x)
loss = y_pred.sum()
loss.backward()
print(f"dX = {x.grad.cpu()}")
```
Output:
```
X = tensor([3., 4.],
device='cuda:0', requires_grad=True)
dX = tensor([6., 8.])
```
And that's it! `slangtorch.loadModule` uses JIT compilation to compile your Slang source into CUDA binary.
It may take a little longer the first time you execute the script, but the compiled binaries will be cached and as long as the kernel code is not changed, future runs will not rebuild the CUDA kernel.
Because the PyTorch JIT system requires `ninja`, you need to make sure `ninja` is installed on your system
and is discoverable from the current environment, you also need to have a C++ compiler available on the system.
On Windows, this means that Visual Studio need to be installed.
## Specializing shaders using slangtorch
`slangtorch.loadModule` allows specialization parameters to be specified since it might be easier to write shaders with placeholder definitions that can be substituted at load-time.
For instance, here's a sphere tracer that uses a _compile-time_ specialization parameter for its maximum number of steps (`N`):
```csharp
float sphereTrace<let N:int>(Ray ray, SDF sdf)
{
var pt = ray.o;
for (int i = 0; i < N; i++)
{
pt += sdf.eval(pt) * ray.d;
}
return pt;
}
float render(Ray ray)
{
// Use N=20 for sphere tracing.
float3 pt = sphereTrace<20>(ray, sdf);
return shade(pt, sdf.normal());
}
```
However, instead of using a fixed `20` steps, the renderer can be configured to use an arbitrary compile-time constant.
```csharp
// Compile-time constant. Expect "MAX_STEPS" to be set by the loadModule call.
static const uint kMaxSteps = MAX_STEPS;
float render(Ray ray)
{
float3 pt = sphereTrace<kMaxSteps>(ray, sdf);
return shade(pt, sdf.normal());
}
```
Then multiple versions of this shader can be compiled from Python using the `defines` argument:
```python
import slangtorch
sdfRenderer20Steps = slangtorch.loadModule('sdf.slang', defines={"MAX_STEPS": 20})
sdfRenderer50Steps = slangtorch.loadModule('sdf.slang', defines={"MAX_STEPS": 50})
...
```
This is often helpful for code re-use, parameter sweeping, comparison/ablation studies, and more, from the convenience of Python.
## Back-propagating Derivatives through Complex Access Patterns
In most common scenarios, a kernel function will access input tensors in a complex pattern instead of mapping
1:1 from an input element to an output element, like the `square` example shown above. When you have a kernel
function that access many different elements from the input tensors and use them to compute an output element,
the derivatives of each input element can't be represented directly as a function parameter, like the `x` in `square(x)`.
Consider a 3x3 box filtering kernel that computes for each pixel in a 2D image, the average value of its
surrounding 3x3 pixel block. We can write a Slang function that computes the value of an output pixel:
```csharp
float computeOutputPixel(TensorView<float> input, uint2 pixelLoc)
{
int width = input.size(0);
int height = input.size(1);
// Track the sum of neighboring pixels and the number
// of pixels currently accumulated.
int count = 0;
float sumValue = 0.0;
// Iterate through the surrounding area.
for (int offsetX = -1; offsetX <= 1; offsetX++)
{
// Skip out of bounds pixels.
int x = pixelLoc.x + offsetX;
if (x < 0 || x >= width) continue;
for (int offsetY = -1; offsetY <= 1; offsetY++)
{
int y = pixelLoc.y + offsetY;
if (y < 0 || y >= height) continue;
sumValue += input[x, y];
count++;
}
}
// Compute the average value.
sumValue /= count;
return sumValue;
}
```
We can define our kernel function to compute the entire output image by calling `computeOutputPixel`:
```csharp
[CudaKernel]
void boxFilter_fwd(TensorView<float> input, TensorView<float> output)
{
uint2 pixelLoc = (cudaBlockIdx() * cudaBlockDim() + cudaThreadIdx()).xy;
int width = input.dim(0);
int height = input.dim(1);
if (pixelLoc.x >= width) return;
if (pixelLoc.y >= height) return;
float outputValueAtPixel = computeOutputPixel(input, pixelLoc)
// Write to output tensor.
output[pixelLoc] = outputValueAtPixel;
}
```
How do we define the backward derivative propagation kernel? Note that in this example, there
isn't a function like `square` that we can just mark as `[Differentiable]` and
call `bwd_diff(square)` to get back the derivative of an input parameter.
In this example, the input comes from multiple elements in a tensor. How do we propagate the
derivatives to those input elements?
The solution is to wrap tensor access with a custom function:
```csharp
float getInputElement(
TensorView<float> input,
TensorView<float> inputGradToPropagateTo,
uint2 loc)
{
return input[loc];
}
```
Note that the `getInputElement` function simply returns `input[loc]` and is not using the
`inputGradToPropagateTo` parameter. That is intended. The `inputGradToPropagateTo` parameter
is used to hold the backward propagated derivatives of each input element, and is reserved for later use.
Now we can replace all direct accesses to `input` with a call to `getInputElement`. The
`computeOutputPixel` can be implemented as following:
```csharp
[Differentiable]
float computeOutputPixel(
TensorView<float> input,
TensorView<float> inputGradToPropagateTo,
uint2 pixelLoc)
{
int width = input.dim(0);
int height = input.dim(1);
// Track the sum of neighboring pixels and the number
// of pixels currently accumulated.
int count = 0;
float sumValue = 0.0;
// Iterate through the surrounding area.
for (int offsetX = -1; offsetX <= 1; offsetX++)
{
// Skip out of bounds pixels.
int x = pixelLoc.x + offsetX;
if (x < 0 || x >= width) continue;
for (int offsetY = -1; offsetY <= 1; offsetY++)
{
int y = pixelLoc.y + offsetY;
if (y < 0 || y >= height) continue;
sumValue += getInputElement(input, inputGradToPropagateTo, uint2(x, y));
count++;
}
}
// Compute the average value.
sumValue /= count;
return sumValue;
}
```
The main changes compared to our original version of `computeOutputPixel` are:
- Added a `inputGradToPropagateTo` parameter.
- Modified `input[x,y]` with a call to `getInputElement`.
- Added a `[Differentiable]` attribute to the function.
With that, we can define our backward kernel function:
```csharp
[CudaKernel]
void boxFilter_bwd(
TensorView<float> input,
TensorView<float> resultGradToPropagateFrom,
TensorView<float> inputGradToPropagateTo)
{
uint2 pixelLoc = (cudaBlockIdx() * cudaBlockDim() + cudaThreadIdx()).xy;
int width = input.dim(0);
int height = input.dim(1);
if (pixelLoc.x >= width) return;
if (pixelLoc.y >= height) return;
bwd_diff(computeOutputPixel)(input, inputGradToPropagateTo, pixelLoc);
}
```
The kernel function simply calls `bwd_diff(computeOutputPixel)` without taking any return values from the call
and without writing to any elements in the final `inputGradToPropagateTo` tensor. But when exactly does the propagated
output get written to the output gradient tensor (`inputGradToPropagateTo`)?
And that logic is defined in our final piece of code:
```csharp
[BackwardDerivativeOf(getInputElement)]
void getInputElement_bwd(
TensorView<float> input,
TensorView<float> inputGradToPropagateTo,
uint2 loc,
float derivative)
{
float oldVal;
inputGradToPropagateTo.InterlockedAdd(loc, derivative, oldVal);
}
```
Here, we are providing a custom defined backward propagation function for `getInputElement`.
In this function, we simply add `derivative` to the element in `inputGradToPropagateTo` tensor.
When we call `bwd_diff(computeOutputPixel)` in `boxFilter_bwd`, the Slang compiler will automatically
differentiate all operations and function calls in `computeOutputPixel`. By wrapping the tensor element access
with `getInputElement` and by providing a custom backward propagation function of `getInputElement`, we are effectively
telling the compiler what to do when a derivative propagates to an input tensor element. Inside the body
of `getInputElement_bwd`, we define what to do then: atomically adds the derivative propagated to the input element
in the `inputGradToPropagateTo` tensor. Therefore, after running `boxFilter_bwd`, the `inputGradToPropagateTo` tensor will contain all the
back propagated derivative values.
Again, to understand all the details of the automatic differentiation system, please refer to the
[Automatic Differentiation](autodiff) chapter for a detailed explanation.
## Manually binding kernels
`[AutoPyBindCUDA]` works for most use cases, but in certain situations, it may be necessary to write the *host* function by hand. The host function can also be written in Slang, and `slangtorch` handles its compilation to C++.
Here's the same `square` example from before:
```csharp
// square.slang
float compute_square(float x)
{
return x * x;
}
[CudaKernel]
void square_kernel(TensorView<float> input, TensorView<float> output)
{
uint3 globalIdx = cudaBlockIdx() * cudaBlockDim() + cudaThreadIdx();
if (globalIdx.x >= input.size(0))
return;
float result = compute_square(input[globalIdx.x]);
output[globalIdx.x] = result;
}
```
To manually invoke this kernel, we then need to write a CPU(host) function that defines how this kernel is dispatched. This can be defined in the same Slang file:
```csharp
[TorchEntryPoint]
TorchTensor<float> square(TorchTensor<float> input)
{
var result = TorchTensor<float>.zerosLike(input);
let blockCount = uint3(1);
let groupSize = uint3(result.size(0), result.size(1), 1);
__dispatch_kernel(square_kernel, blockCount, groupSize)(input, result);
return result;
}
```
Here, we mark the function with the `[TorchEntryPoint]` attribute, so it will be compiled to C++ and exported as a python callable.
Since this is a host function, we can perform tensor allocations. For instance, `square()` calls `TorchTensor<float>.zerosLike` to allocate a 2D-tensor that has the same size as the input.
`zerosLike` returns a `TorchTensor<float>` object that represents a CPU handle of a PyTorch tensor.
Then we launch `square_kernel` with the `__dispatch_kernel` syntax. Note that we can directly pass
`TorchTensor<float>` arguments to a `TensorView<float>` parameter and the compiler will automatically convert the type and obtain a view into the tensor that can be accessed by the GPU kernel function.
### Calling a `[TorchEntryPoint]` function from Python
You can use the following code to call `square` from Python:
```python
import torch
import slangtorch
m = slangtorch.loadModule("square.slang")
x = torch.randn(2,2)
print(f"X = {x}")
y = m.square(x)
print(f"Y = {y.cpu()}")
```
Result output:
```
X = tensor([[ 0.1407, 0.6594],
[-0.8978, -1.7230]])
Y = tensor([[0.0198, 0.4349],
[0.8060, 2.9688]])
```
### Manual binding for kernel derivatives
The above example demonstrates how to write a simple kernel function in Slang and call it from Python.
Another major benefit of using Slang is that the Slang compiler support generating backward derivative
propagation functions automatically.
In the following section, we walk through how to use Slang to generate a backward propagation function
for `square`, and expose it to PyTorch as an autograd function.
First we need to tell Slang compiler that we need the `square` function to be considered a differentiable function, so Slang compiler can generate a backward derivative propagation function for it:
```csharp
[Differentiable]
float square(float x)
{
return x * x;
}
```
This is done by simply adding a `[Differentiable]` attribute to our `square` function.
With that, we can now define `square_bwd_kernel` that performs backward propagation as:
```csharp
[CudaKernel]
void square_bwd_kernel(TensorView<float> input, TensorView<float> grad_out, TensorView<float> grad_propagated)
{
uint3 globalIdx = cudaBlockIdx() * cudaBlockDim() + cudaThreadIdx();
if (globalIdx.x >= input.size(0) || globalIdx.y >= input.size(1))
return;
DifferentialPair<float> dpInput = diffPair(input[globalIdx.xy]);
var gradInElem = grad_out[globalIdx.xy];
bwd_diff(square)(dpInput, gradInElem);
grad_propagated[globalIdx.xy] = dpInput.d;
}
```
Note that the function follows the same structure of `square_fwd_kernel`, with the only difference being that
instead of calling into `square` to compute the forward value for each tensor element, we are calling `bwd_diff(square)`
that represents the automatically generated backward propagation function of `square`.
`bwd_diff(square)` will have the following signature:
```csharp
void bwd_diff_square(inout DifferentialPair<float> dpInput, float dOut);
```
Where the first parameter, `dpInput` represents a pair of original and derivative value for `input`, and the second parameter,
`dOut`, represents the initial derivative with regard to some latent variable that we wish to back-prop through. The resulting
derivative will be stored in `dpInput.d`. For example:
```csharp
// construct a pair where the primal value is 3, and derivative value is 0.
var dp = diffPair(3.0);
bwd_diff(square)(dp, 1.0);
// dp.d is now 6.0
```
Similar to `square_fwd`, we can define the host side function `square_bwd` as:
```csharp
[TorchEntryPoint]
TorchTensor<float> square_bwd(TorchTensor<float> input, TorchTensor<float> grad_out)
{
var grad_propagated = TorchTensor<float>.zerosLike(input);
let blockCount = uint3(1);
let groupSize = uint3(input.size(0), input.size(1), 1);
__dispatch_kernel(square_bwd_kernel, blockCount, groupSize)(input, grad_out, grad_propagated);
return grad_propagated;
}
```
## Builtin Library Support for PyTorch Interop
As shown in previous tutorial, Slang has defined the `TorchTensor<T>` and `TensorView<T>` type for interop with PyTorch
tensors. The `TorchTensor<T>` represents the CPU view of a tensor and provides methods to allocate a new tensor object.
The `TensorView<T>` represents the GPU view of a tensor and provides accessors to read write tensor data.
Following is a list of built-in methods and attributes for PyTorch interop.
### `TorchTensor` methods
#### `static TorchTensor<T> TorchTensor<T>.alloc(uint x, uint y, ...)`
Allocates a new PyTorch tensor with the given dimensions. If `T` is a vector type, the length of the vector is implicitly included as the last dimension.
For example, `TorchTensor<float3>.alloc(4, 4)` allocates a 3D tensor of size `(4,4,3)`.
#### `static TorchTensor<T> TorchTensor<T>.emptyLike(TorchTensor<T> other)`
Allocates a new PyTorch tensor that has the same dimensions as `other` without initializing it.
#### `static TorchTensor<T> TorchTensor<T>.zerosLike(TorchTensor<T> other)`
Allocates a new PyTorch tensor that has the same dimensions as `other` and initialize it to zero.
#### `uint TorchTensor<T>.dims()`
Returns the tensor's dimension count.
#### `uint TorchTensor<T>.size(int dim)`
Returns the tensor's size (in number of elements) at `dim`.
#### `uint TorchTensor<T>.stride(int dim)`
Returns the tensor's stride (in bytes) at `dim`.
### `TensorView` methods
#### `TensorView<T>.operator[uint x, uint y, ...]`
Provide an accessor to data content in a tensor.
#### `TensorView<T>.operator[vector<uint, N> index]`
Provide an accessor to data content in a tensor, indexed by a uint vector.
`tensor[uint3(1,2,3)]` is equivalent to `tensor[1,2,3]`.
#### `uint TensorView<T>.dims()`
Returns the tensor's dimension count.
#### `uint TensorView<T>.size(int dim)`
Returns the tensor's size (in number of elements) at `dim`.
#### `uint TensorView<T>.stride(int dim)`
Returns the tensor's stride (in bytes) at `dim`.
#### `void TensorView<T>.fillZero()`
Fills the tensor with zeros. Modifies the tensor in-place.
#### `void TensorView<T>.fillValue(T value)`
Fills the tensor with the specified value, modifies the tensor in-place.
#### `T* TensorView<T>.data_ptr_at(vector<uint, N> index)`
Returns a pointer to the element at `index`.
#### `void TensorView<T>.InterlockedAdd(vector<uint, N> index, T val, out T oldVal)`
Atomically add `val` to element at `index`.
#### `void TensorView<T>.InterlockedMin(vector<uint, N> index, T val, out T oldVal)`
Atomically computes the min of `val` and the element at `index`. Available for 32 and 64 bit integer types only.
#### `void TensorView<T>.InterlockedMax(vector<uint, N> index, T val, out T oldVal)`
Atomically computes the max of `val` and the element at `index`. Available for 32 and 64 bit integer types only.
#### `void TensorView<T>.InterlockedAnd(vector<uint, N> index, T val, out T oldVal)`
Atomically computes the bitwise and of `val` and the element at `index`. Available for 32 and 64 bit integer types only.
#### `void TensorView<T>.InterlockedOr(vector<uint, N> index, T val, out T oldVal)`
Atomically computes the bitwise or of `val` and the element at `index`. Available for 32 and 64 bit integer types only.
#### `void TensorView<T>.InterlockedXor(vector<uint, N> index, T val, out T oldVal)`
Atomically computes the bitwise xor of `val` and the element at `index`. Available for 32 and 64 bit integer types only.
#### `void TensorView<T>.InterlockedExchange(vector<uint, N> index, T val, out T oldVal)`
Atomically swaps `val` into the element at `index`. Available for `float` and 32/64 bit integer types only.
#### `void TensorView<T>.InterlockedCompareExchange(vector<uint, N> index, T compare, T val)`
Atomically swaps `val` into the element at `index` if the element equals to `compare`. Available for `float` and 32/64 bit integer types only.
### `DiffTensorView` methods
#### `DiffTensorView.operator[uint x, uint y, ...]`
Provide an accessor to data content in a tensor. This method is **differentiable**, and has the same semantics as using a `.load()` to get data, and `.store()` to set data.
#### `DiffTensorView.operator[vector<uint, N> index]`
Provide an accessor to data content in a tensor, indexed by a uint vector.`tensor[uint3(1,2,3)]` is equivalent to `tensor[1,2,3]`. This method is **differentiable**, and has the same semantics as using a `.load()` to get data, and `.store()` to set data.
#### `float DiffTensorView.load(vector<uint, N> index)`
Loads the 32-bit floating point data at the specified multi-dimensional `index`. This method is **differentiable**, and in reverse-mode will perform an atomic-add.
#### `void DiffTensorView.store(vector<uint, N> index, float val)`
Stores the 32-bit floating point value `val` at the specified multi-dimensional `index`. This method is **differentiable**, and in reverse-mode will perform an *atomic exchange* to retrieve the derivative and replace with 0.
#### `float DiffTensorView.loadOnce(vector<uint, N> index)`
Loads the 32-bit floating point data at the specified multi-dimensional `index`. This method is **differentiable**, and uses a simple `store` for the reverse-mode for faster gradient aggregation, but `loadOnce` **must** be used at most once per index. `loadOnce` is ideal for situations where each thread loads data from a unique index, but will cause incorrect gradients when an index may be accessed multiple times.
#### `void DiffTensorView.storeOnce(vector<uint, N> index, float val)`
Stores the 32-bit floating point value `val` at the specified multi-dimensional `index`. This method is **differentiable**, and uses a simple `load` for the reverse-mode for faster gradient loading, but `storeOnce` **must** be used at most once per index. `loadOnce` is ideal for situations where each thread stores data to a unique index, but will cause incorrect gradient propagation when an index may be accessed multiple times.
#### `uint DiffTensorView.size(int dim)`
Returns the underlying primal tensor's size (in number of elements) at `dim`.
#### `uint DiffTensorView.dims()`
Returns the underlying primal tensor's dimension count.
#### `uint DiffTensorView.stride(uint dim)`
Returns the stride of the underlying primal tensor's `dim` dimension
### CUDA Support Functions
#### `cudaThreadIdx()`
Returns the `threadIdx` variable in CUDA.
#### `cudaBlockIdx()`
Returns the `blockIdx` variable in CUDA.
#### `cudaBlockDim()`
Returns the `blockDim` variable in CUDA.
#### `syncTorchCudaStream()`
Waits for all pending CUDA kernel executions to complete on host.
### Attributes for PyTorch Interop
#### `[CudaKernel]` attribute
Marks a function as a CUDA kernel (maps to a `__global__` function)
#### `[TorchEntryPoint]` attribute
Marks a function for export to Python. Functions marked with `[TorchEntryPoint]` will be accessible from a loaded module returned by `slangtorch.loadModule`.
#### `[CudaDeviceExport]` attribute
Marks a function as a CUDA device function, and ensures the compiler to include it in the generated CUDA source.
#### `[AutoPyBindCUDA]` attribute
Marks a cuda kernel for automatic binding generation so that it may be invoked from python without having to hand-code the torch entry point. The marked function **must** also be marked with `[CudaKernel]`. If the marked function is also marked with `[Differentiable]`, this will also generate bindings for the derivative methods.
Restriction: methods marked with `[AutoPyBindCUDA]` will not operate
## Type Marshalling Between Slang and Python
### Python-CUDA type marshalling for functions using `[AutoPyBindCUDA]`
When using auto-binding, aggregate types like structs are converted to Python `namedtuples` and are made available when using `slangtorch.loadModule`.
```csharp
// mesh.slang
struct Mesh
{
TensorView<float> vertices;
TensorView<int> indices;
};
[AutoPyBindCUDA]
[CUDAKernel]
void processMesh(Mesh mesh)
{
/* ... */
}
```
Here, since `Mesh` is being used by `renderMesh`, the loaded module will provide `Mesh` as a python `namedtuple` with named fields.
While using the `namedtuple` is the best way to use structured arguments, they can also be passed as a python `dict` or `tuple`
```python
m = slangtorch.loadModule('mesh.slang')
vertices = torch.tensor()
indices = torch.tensor()
# use namedtuple to provide structured input.
mesh = m.Mesh(vertices=vertices, indices=indices)
m.processMesh(mesh=mesh).launchRaw(blockSize=(32, 32, 1), gridSize=(1, 1, 1))
# use dict to provide input.
mesh = {'vertices': vertices, 'indices':indices}
m.processMesh(mesh=mesh).launchRaw(blockSize=(32, 32, 1), gridSize=(1, 1, 1))
# use tuple to provide input (warning: user responsible for right order)
mesh = (vertices, indices)
m.processMesh(mesh=mesh).launchRaw(blockSize=(32, 32, 1), gridSize=(1, 1, 1))
```
### Python-CUDA type marshalling for functions using `[TorchEntryPoint]`
The return types and parameters types of an exported `[TorchEntryPoint]` function can be a basic type (e.g. `float`, `int` etc.), a vector type (e.g. `float3`), a `TorchTensor<T>` type, an array type, or a struct type.
When you use struct or array types in the function signature, it will be exposed as a Python tuple.
For example,
```csharp
struct MyReturnType
{
TorchTensor<T> tensors[3];
float v;
}
[TorchEntryPoint]
MyReturnType myFunc()
{
...
}
```
Calling `myFunc` from python will result in a python tuple in the form of
```
[[tensor, tensor, tensor], float]
```
The same transform rules apply to parameter types.

View File

@@ -0,0 +1,25 @@
Slang Design and Implementation Notes
=====================================
This directory contains documents that are primarily intended for developers working on the Slang implementation.
They are not intended to be helpful to Slang users.
These documents can only be trusted to reflect the state of the codebase or the plans of their authors at the time they were written. Changes to the implementation are not expected to always come with matching changes to these documents, so some amount of drift is to be expected.
Developers interested in contributing to Slang might want to start with the [Overview](overview.md) document, which describes the overall compilation pipeline that Slang uses and the purpose of the various steps (both implemented and planned).
The [Coding Conventions](coding-conventions.md) document describes the conventions that should be followed in all code added to the Slang project.
The [Interfaces](interfaces.md) document describes the high-level design plan for Slang's interfaces and generics features.
The [Declaration References](decl-refs.md) document is intended to help out developers who are mystified by the heavily used `DeclRef` type in the compiler implementation.
The [Intermediate Representation (IR)](ir.md) document describes the design of Slang's internal IR.
The [Existential Types](existential-types.md) document goes into some detail about what "existential types" are in the context of the Slang language, and explains how we may go about supporting them.
The [Capabilities](capabilities.md) document explains the proposed model for how Slang will support general notions of profile- or capability-based overloading/dispatch.
The [Casting](casting.md) document explains how casting works in the slang C++ compiler code base.
The [Experimental API Interfaces](experimental.md) document explains how experimental Slang API changes are to be deployed.

View File

@@ -0,0 +1,333 @@
Reverse Mode Autodiff (Out of Date)
==================================
This document serves as a design reference for reverse-mode auto-diff in the Slang compiler.
## Reverse-Mode Passes
Rather than implementing reverse-mode as a separate pass, Slang implements this as a series of independent passes:
If a function needs a reverse-mode version generated:
- *Linearize* the function, and all dependencies.
- *Propagate* differential types through the linearized code.
- *Unzip* by moving primal insts to before differential insts.
- *Transpose* the differential insts.
## Linearization (Forward-mode)
### Overview
(This is a incomplete section. More details coming soon)
Consider an arbitrary function `float f(float a, float b, float c, ..., z)` which takes in N inputs and generates one output `y`. Linearization aims to generate the first-order Taylor expansion of f about _all_ of it's inputs.
Mathematically, the forward derivative `fwd_f` represents `df/da * (a_0 - a) + df/db * (b_0 - b) + ...`, where `a_0` is the value at which the Taylor expansion was produced. The quantity `a_0 - a` is known as the 'differential' (for brevity we'll denote them da, db, dc, etc..), and there is at-most one differential per input.
Thus, the new function's signature should be `fwd_f(float a, float da, float b, float db, float c, float dc, ...)`. For simplicity, we'll use *pairs* instead of interleaving the original and differential parameters. We use the intrinsic `DifferentialPair<T>` (or for short: `DP<T>`) to denote this.
The signature we use is then `fwd_f(DP<float> a, DP<float> b, DP<float> c)`
An example of linearization:
```C
float f(float a, float b)
{
if (a > 0)
{
return a + b + 2.0 * a * b;
}
else
{
return sqrt(a);
}
}
```
We'll write out the SSA form of this function.
```C
float f_SSA(float a, float b)
{
bool _b1 = a > 0;
if (_b1)
{
float _t1 = a + b;
float _t2 = 2.0 * a;
float _t3 = _t2 * b;
float _t4 = _t1 + _t3;
return _t4;
}
else
{
float _t1 = sqrt(a);
return _t1;
}
}
DP<float> f_SSA(DP<float> dpa, DP<float> dpb)
{
bool _b1 = dpa.p > 0;
if (_b1)
{
float _t1 = dpa.p + dpb.p;
float _t1_d = dpa.d + dpb.d;
float _t2 = 2.0 * dpa.p;
float _t2_d = 0.0 * dpa.p + 2.0 * dpa.d;
float _t3 = _t2 * dpb.p;
float _t3_d = _t2_d * dpb.p + _t2 * dpb.d;
float _t4 = _t1 + _t3;
float _t4_d = _t1_d + _t3_d;
return DP<float>(_t4, _t4_d);
}
else
{
DP<float> _t1_dp = sqrt_fwd(dpa);
return DP<float>(_t1_dp.p, _t1_dp.d);
}
}
```
In the result, the primal part of the pair holds the original computation, while the differential part computes the dot product of the differentials with the derivatives of the function's output w.r.t each input.
## Propagation
This step takes a linearized function and propagates information about which instructions are computing a differential and which ones are part of the primal (original) computation.
Assuming first-order differentiation only:
The approach will be to mark any instructions that extract the differential from the differential pair as a differential. Then any instruction that uses the differential is itself marked as a differential and so on. The only exception is the call instruction which is either non-differentiable (do nothing) or differentiable and returns a pair (follow the same process)
Here's the above example with propagated type information (we use float.D to denote intermediaries that have been marked as differential, and also expand everything so that each line has a single operation)
```C
DP<float> f_SSA_Proped(DP<float> dpa, DP<float> dpb)
{
bool _b1 = dpa.p > 0;
if (_b1)
{
float _t1 = dpa.p + dpb.p;
float.D _q1_d = dpa.d;
float.D _q2_d = dpb.d;
float.D _t1_d = _q1_d + _q2_d;
float _t2 = 2.0 * dpa.p;
float.D _q2_d = dpa.d;
float.D _q3_d = 2.0 * dpa.d;
float _q4 = dpa.p;
float.D _q4_d = 0.0 * dpa.p;
float.D _t2_d = _q4_d + _q3_d;
float _t3 = _t2 * dpb.p;
float _q5 = dpb.p;
float.D _q6_d = _q5 * _t2_d;
float.D _q7_d = dpb.d;
float.D _q8_d = _t2 * _q7_d
float _t3_d = _q6_d + _q8_d;
float _t4 = _t1 + _t3;
float.D _t4_d = _t1_d + _t3_d;
return DP<float>(_t4, _t4_d);
}
else
{
DP<float> _t1_dp = sqrt_fwd(dpa);
float _q1 = _t1_dp.p;
float.D _q1_d = _t1_dp.d;
return DP<float>(_q1, _q1_d);
}
}
```
## Unzipping
This is a fairly simple process when there is no control flow. We simply move all non-differential instructions to before the first differential instruction.
When there is control flow, we need to be a bit more careful: the key is to *replicate* the control flow graph once for primal and once for the differential.
Here's the previous example unzipped:
```C
DP<float> f_SSA_Proped(DP<float> dpa, DP<float> dpb)
{
bool _b1 = dpa.p > 0;
float _t1, _t2, _q4, _t3, _q5, _t3_d, _t4, _q1;
if (_b1)
{
_t1 = dpa.p + dpb.p;
_t2 = 2.0 * dpa.p;
_q4 = dpa.p;
_t3 = _t2 * dpb.p;
_q5 = dpb.p;
_t4 = _t1 + _t3;
}
else
{
_q1 = sqrt_fwd(DP<float>(dpa.p, 0.0));
}
// Note here that we have to 'store' all the intermediaries
// _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundamentally
// the tradeoff between fwd_mode and rev_mode
if (_b1)
{
float.D _q1_d = dpa.d;
float.D _q2_d = dpb.d;
float.D _t1_d = _q1_d + _q2_d;
float.D _q2_d = dpa.d;
float.D _q3_d = 2.0 * dpa.d;
float.D _q4_d = 0.0 * dpa.p;
float.D _t2_d = _q4_d + _q3_d;
float.D _q6_d = _q5 * _t2_d;
float.D _q7_d = dpb.d;
float.D _q8_d = _t2 * _q7_d
float.D _t3_d = _q6_d + _q8_d;
float.D _t4_d = _t1_d + _t3_d;
return DP<float>(_t4, _t4_d);
}
else
{
DP<float> _t1_dp = sqrt_fwd(dpa);
float.D _q1_d = _t1_dp.d;
return DP<float>(_q1, _q1_d);
}
}
```
## Transposition
### Overview
This transposition pass _assumes_ that provided function is linear in it's differentials.
It is out of scope of this project to attempt to enforce that constraint for user-defined differential code.
For transposition we walk all differential instructions in reverse starting from the return statement, and apply the following rules:
We'll have an accumulator dictionary `Dictionary<IRInst, IRInst> accMap` holding assignments for
intermediaries which don't have concrete variables. When we add a pair (A, C) and (A, B) already exists, this will form the pair (A, ADD(C, B)) in the dictionary. (ADD will be replaced with a call to `T.dadd` for a generic type T)
- If `inst` is a `RETURN(A)`, add pair `(A, d_out)` to `accMap`
- If an instruction is `MUL(P, D)` where D is the differential, add pair `(D, MUL(P, accMap[this_inst]))` to `accMap`
- If an instruction is `ADD(D1, D2)`, where both D1 and D2 are differentials (this is the only config that should occur), then add pair `(D1, accMap[this_inst])` to `accMap`
- If an instruction is `CALL(f_fwd, (P1, D1), (P2, D2), ...)`, create variables D1v, D2v, ... for D1, D2, ..., then replace with `CALL(f_rev, (P1, D1v), (P2, D2v), ..., accMap[this_inst])`, and finally add pairs `(D1, LOAD[D1v]), (D2, LOAD[D2v]), ...` to `accMap`
```C
void f_SSA_Rev(inout DP<float> dpa, inout DP<float> dpb, float dout)
{
bool _b1 = dpa.p > 0;
float _t1, _t2, _q4, _t3, _q5, _t3_d, _t4, _q1;
if (_b1)
{
_t1 = dpa.p + dpb.p;
_t2 = 2.0 * dpa.p;
_q4 = dpa.p;
_t3 = _t2 * dpb.p;
_q5 = dpb.p;
_t4 = _t1 + _t3;
}
else
{
_q1 = sqrt_fwd(DP<float>(dpa.p, 0.0));
}
// Note here that we have to 'store' all the intermediaries
// _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundamentally
// the tradeoff between fwd_mode and rev_mode
if (_b1)
{
float.D _t4_rev = d_out;
float.D _t1_rev = _t4_rev;
float.D _t3_rev = _t4_rev;
float.D _q8_rev = _t3_rev;
float.D _q6_rev = _t3_rev;
float.D _q7_rev = _t2 * _q8_rev;
dpb.d += _q7_rev;
float.D _t2_rev = _q5 * _q6_rev;
float.D _q4_rev = _t2_rev;
float.D _q3_rev = _t2_rev;
dpa.d += 2.0 * _q3_rev;
float.D _q1_rev = _t1_rev;
float.D _q2_rev = _t1_rev;
dpb.d += _q2_rev;
dpa.d += _q1_rev;
}
else
{
_q1_rev = d_out;
DP<float> dpa_copy;
sqrt_rev(dpa_copy, _q1_rev);
dpa.d += dpa_copy.d;
}
}
```

View File

@@ -0,0 +1,396 @@
<!--The goal of this set of documents is to describe the design of Slang's automatic differentiation passes, along with the mechanisms & passes used to support various features. -->
This documentation is intended for Slang contributors and is written from a compiler engineering point of view. For Slang users, see the user-guide at this link: [https://shader-slang.com/slang/user-guide/autodiff.html](https://shader-slang.com/slang/user-guide/autodiff.html)
## What is Automatic Differentiation?
Before diving into the design of the automatic differentiation (for brevity, we will call it 'auto-diff') passes, it is important to understand the end goal of what auto-diff tries to achieve.
The over-arching goal of Slang's auto-diff is to enable the user to compute derivatives of a given shader program or function's output w.r.t its input parameters. This critical compiler feature enables users to quickly use their shaders with gradient-based parameter optimization algorithms, which forms the backbone of modern machine learning systems. It enables users to train and deploy graphics systems that contain ML primitives (like multi-layer perceptron's or MLPs) or use their shader programs as differentiable primitives within larger ML pipelines.
### More Resources
Here are some links to resources that talk more about differentiable programming from a more mathematical perspective:
1. UCSD CSE 291 (Spring 2024): https://cseweb.ucsd.edu/~tzli/cse291/sp2024/
2. UW CSE 5990 (Winter 2024): https://sites.google.com/cs.washington.edu/cse-599o-dppl
## Definition of Derivatives
This section is based off of these slides: https://cseweb.ucsd.edu/~tzli/cse291/sp2024/lectures/03_forward_mode.pdf.
Here, we establish the mathematical definition of derivatives, starting with a simple 1D case (function with a single input and output), and extending to the general case of functions mapping multiple inputs to multiple outputs.
To avoid confusion, we will denote mathematical functions using LaTeX italic script ($f$, $g$, etc..) and programs that compute these functions with markdown code (`f`, `g`, etc..)
### Derivatives of scalar (1D) functions
Consider the simplest case: a smooth scalar mathematical function that maps a real number to another real number:
$$f : \mathbb{R} \to \mathbb{R}$$
There are several definitions for a derivative, but we will use the definition that a derivative is the *closest linear approximation* of the output function at a given input location.
Concretely, given a specific input $x$, we can create a linear approximation of the function $f$ around $x$ as follows:
$$ f(x + dx) \approx f(x) + Df(x) \cdot dx $$
<!--// TODO: Add image here.-->
This can also be understood as a geometric 'tangent' to the function at $x$. $Df(x)$ is the slope of $f$ at $x$, i.e. $\frac{\partial f}{\partial x}$, and $dx$ is the perturbation away from $x$. Our approximation is linear as a function of the perturbation $dx$. Note that no matter how non-linear or complex the underlying function $f(x)$ is, the approximation is always linear (this property becomes very important later).
### Forward-mode derivative functions
Now consider a concrete program `f` that computes some function.
```C
// Computes square of x
float f(float x)
{
return x * x;
}
```
What should its derivative program look like? We the need the output $f(x)$ and the product of derivative at $x$, $Df(x)$ with the differential $dx$.
In Slang, we put both of these together into a single function, called the *forward-mode derivative* function, which takes in a pair $(x, dx)$ returns a pair $(f(x), Df(x)\cdot dx)$ Note that in auto-diff literature, this is also often referred to as the *total derivative* function.
```C
DifferentialPair<float> fwd_f(DifferentialPair<float> dpx)
{
float x = dpx.getPrimal(); // Can also be accessed via property dpx.p
float dx = dpx.getDifferential(); // Can also be accessed via property dpx.d
return makePair(x * x, (2 * x) * dx);
}
```
Note that `(2 * x)` is the multiplier corresponding to $Df(x)$. We refer to $x$ and $f(x)$ as "*primal*" values and the perturbations $dx$ and $Df(x)\cdot dx$ as "*differential*" values. The reason for this separation is that the "*differential*" output values are always linear w.r.t their "*differential*" inputs.
As the name implies, `DifferentialPair<T>` is a special pair type used by Slang to hold values and their corresponding differentials.
### Forward-mode derivatives for higher-dimensional functions
In practice, most functions tend to have multiple inputs and multiple outputs, i.e. $f: \mathbb{R}^N \to \mathbb{R}^M$
The definition above can be extended to higher dimensions, using the closest-linear-approximation idea. The main difference is that the derivative function represents a hyperplane rather than a line.
Effectively, we want our forward-mode derivative to compute the following:
$$ f(\mathbf{x} + \mathbf{dx}) \approx f(\mathbf{x}) + \langle Df(\mathbf{x}),\mathbf{dx}\rangle $$
Here, the input and its differential can be represented as a vector quantity $\mathbf{x}, \mathbf{dx} \in \mathbb{R}^N$ and the multiplier $Df(\mathbf{x})$ (also known as the *Jacobian* matrix) is a NxM matrix, and $\left\< \cdot,\cdot \right\>$ denotes the inner product (i.e. matrix-vector multiplication)
Here's an example of a Slang function taking in two inputs (N=2) and generating one output (M=1)
```C
// Compute length of hypotenuse.
float f(float x, float y)
{
return sqrt(x * x + y * y);
}
```
and its forward-mode derivative:
```C
// Closest linear approximation at x, y
DifferentialPair<float> fwd_f(DifferentialPair<float> dpx, DifferentialPair<float> dpy)
{
float x = dpx.p;
float y = dpy.p;
float dx = dpx.d;
float dy = dpx.d;
return DifferentialPair<float>(
sqrt(x * x + y * y), // f(x, y)
(x * dx + y * dy) / sqrt(x * x, y * y)); // <Df(x,y), dx>
}
```
Important note: the forward-mode function only needs to compute the inner product $\langle Df(\mathbf{x}),\mathbf{dx} \rangle$. The Jacobian matrix itself never needs to be fully materialized. This is a key design element of automatic differentiation, one which allows it to scale to huge input/output counts.
### Building Blocks: Forward-mode derivatives compose in forward order of execution.
In practice, we compute forward-mode derivatives of a complex function by decomposing them into constituent functions (or in compiler-speak: instructions) and composing the forward-mode derivative of each piece in the **same** order.
This is because of each forward derivative is a 'right-side' product (or product of Jacobian matrix with a vector)
Here's an example of this in action (consider a complex function $h$ composed of $f$ and $g$):
$$ h(\mathbf{x}) = f(g(\mathbf{x})) $$
It's forward-mode derivative is then:
$$ \langle Dh(\mathbf{x}), \mathbf{dx}\rangle = \big\langle Df(\mathbf{x}), \langle Dg(\mathbf{x}), \mathbf{dx}\rangle\big\rangle $$
which is the forward-mode derivative of the outer function $f$ evaluated on the result of the forward-mode derivative of the inner function $g$.
An example of this in Slang code:
```C
// Compute square.
float sqr(float x)
{
return x * x;
}
// Compute length of hypotenuse.
float f(float x, float y)
{
float x_sqr = sqr(x);
float y_sqr = sqr(y)
return sqrt(x_sqr + y_sqr);
}
```
The resulting derivative of `f` can be computed by composition:
```C
// Forward-mode derivative of sqr()
DifferentialPair<float> fwd_sqr(DifferentialPair<float> dpx)
{
float x = dpx.getPrimal();
float dx = dpx.getDifferential();
return DifferentialPair<float>(x * x, 2 * x * dx);
}
// Forward-mode derivative of f()
DifferentialPair<float> fwd_f(DifferentialPair<float> dpx, DifferentialPair<float> dpy)
{
DifferentialPair<float> dp_x_sqr = fwd_sqr(dpx);
DifferentialPair<float> dp_y_sqr = fwd_sqr(dpy);
float x_sqr = dp_x_sqr.getPrimal();
float y_sqr = dp_y_sqr.getPrimal();
float x_sqr_d = dp_x_sqr.getDifferential();
float y_sqr_d = dp_y_sqr.getDifferential();
return DifferentialPair<float>(
sqrt(x_sqr + y_sqr),
(x_sqr_d + y_sqr_d) / sqrt(x_sqr + y_sqr));
}
```
### Tip: Extracting partial derivatives from a forward-mode derivative (i.e. a 'total' derivative)
As we discussed above, forward-mode derivatives compute $\langle Df(\mathbf{x}),\mathbf{dx}\rangle$ rather than what you may be used to seeing in a calculus course (e.g. partial derivatives like $\frac{\partial f}{\partial x}$).
In fact, the forward-mode derivative is simply an product of the partial derivative w.r.t each input parameter multiplied by their differential perturbations $\frac{\partial f}{\partial x} * dx + \frac{\partial f}{\partial x} * dy$. This is the reason for the alternative name: *total derivative*.
Thus, partial derivative can be obtained by successively setting each input's differential to 1 (and 0 for everything else)
Example:
```C
// Compute partial derivative w.r.t x (pass dx=1.0)
float df_dx = fwd_f(DifferentialPair<float>(x, 1.0), DifferentialPair<float>(y, 0.0)).d;
// Compute partial derivaive w.r.t y (pass dy=1.0)
float df_dy = fwd_f(DifferentialPair<float>(x, 0.0), DifferentialPair<float>(y, 1.0)).d;
```
### Tip: Testing forward-mode derivatives using the first principles of calculus (i.e. the *finite difference* method)
In Calculus, partial derivatives of a function are often defined in a 'black box' manner using limits, by perturbing a single parameter by an infinitesimal amount:
$$ \frac{\partial f}{\partial x} = \lim_{dx\to 0} \frac{f(x + dx) - f(x - dx)}{2 * dx} $$
At the moment, we cannot leverage programming languages to compute true inifinitesimal limits, but we can replace $dx \to 0$ with a sufficiently small $\epsilon$ leading to the following 'test' to check if derivatives produced by automatic differentiation match with their true mathematical expected values.
Here's an example of using this idea to test functions (many autodiff tests were written this way)
```C
// Compute partial derivative w.r.t x analytically
float df_dx_ad = fwd_f(DifferentialPair<float>(x, 1.0), DifferentialPair<float>(y, 0.0))
// Compute partial derivative w.r.t x through the finite difference (FD) method.
float eps = 1e-4
float df_dx_fd = (f(x + eps, y) - f(x - eps, y)) / (2 * eps);
// If computed correctly, df_dx_ad and df_dx_fd are very close.
```
**Caveats:**
Since the finite difference method only produces a biased estimate of the derivative, the result is only numerically *close* to the auto-diff-based result. Poorly behaved functions (those that rapidly change, or are discontinuous or otherwise non-differentiable) will result in a (expected) mismatch between FD and AD results.
## Reverse-mode derivative functions
This section is based off of these slides: https://cseweb.ucsd.edu/~tzli/cse291/sp2024/lectures/05_reverse_mode.pdf.
### Motivation: Challenges with scaling forward-mode derivatives
A big problem with forward-mode derivatives is their inability to scale to great parameter counts.
Machine learning pipelines often compute derivatives of a large complex pipeline with millions or even billions of input parameters, but a single output value, i.e. the *loss* or *objective* function, frequently denoted by $\mathcal{L}$.
Computing $\frac{\partial \mathcal{L}}{\partial x_i}$ for $N$ inputs $x_i$ using the one-hot vector approach will involve invoking the forward-mode derivative function $N$ times.
The reason for this limitation is that forward-mode derivatives pass derivatives from the inputs through to the outputs by computing the dot-product $\left\< Df(\mathbf{x}),\mathbf{dx}\right\>$.
Instead, we employ a different approach called the reverse-mode derivative, which propagates differentials *backwards* from outputs to inputs.
### Key Idea: Generate code to compute $\langle \frac{\partial \mathcal{L}}{\partial f}, Df(\mathbf{x})\rangle$ rather than $\langle Df(\mathbf{x}),\mathbf{dx}\rangle$
The fundamental building blocks of reverse-mode derivatives are the **left-side inner product**. That is, the product of a vector of derivatives of w.r.t outputs $\frac{\partial \mathcal{L}}{\partial f}$ with the Jacobian matrix $Df(\mathbf{x})$.
An important thing to keep in mind is that it does not necessarily matter what the scalar quantity $\mathcal{L}$ is. The goal of this product is to propagate the derivatives of any scalar value $\mathcal{L}$ w.r.t output vector $f(\mathbf{x})$ (i.e., $\frac{\partial \mathcal{L}}{\partial f}$) into derivatives of that same scalar value $\mathcal{L}$ w.r.t the input vector $\mathbf{x}$ (i.e., $\frac{\partial \mathcal{L}}{\partial \mathbf{x}}$).
Here's an example of a Slang function computing the `reverse-mode derivative`.
```C
// Compute length of hypotenuse
float f(float x, float y)
{
return sqrt(x * x + y * y);
}
// Reverse-mode derivative of f. dOutput represents the derivative dL/dOutput of the output w.r.t scalar value.
void rev_f(inout DifferentialPair<float> dpx, inout DifferentialPair<float> dpy, float dOutput)
{
float x = dpx.getPrimal();
float y = dpy.getPrimal();
float t = 1.0 / (sqrt(x * x + y * y));
dpx = DifferentialPair<float>(
x, // The primal part of the return value is *always* copied in from the input as-is.
dOutput * x * t); // The differential part for x is the derivative dL/dx computed as
// (dL/dOutput) * (dOutput/dx), where dOutput/dx = x / sqrt(x*x+y*y).
dpy = DifferentialPair<float>(
y,
dOutput * y * t); // The differential part for y is the derivative dL/dy computed as
// (dL/dOutput) * (dOutput/dy), where dOutput/dy = y / sqrt(x*x+y*y).
}
```
Note that `rev_f` accepts derivatives w.r.t the output value as the input, and returns derivatives w.r.t inputs as its output (through `inout` parameters). `rev_f` still needs the primal values `x` and `y` to compute the derivatives, so those are still passed in as an input through the primal part of the differential pair.
Also note that the reverse-mode derivative function does not have to compute the primal result value (its return is void). The reason for this is a matter of convenience: reverse-mode derivatives are often invoked after all the primal functions, and there is typically no need for these values. We go into more detail on this topic in the checkpointing chapter.
The reverse mode function can be used to compute both `dOutput/dx` and `dOutput/dy` with a single invocation (unlike the forward-mode case where we had to invoke `fwd_f` once for each input)
```C
DifferentialPair<float> dpx = makePair<float>(x, 0.f); // Initialize diff-value to 0 (not necessary)
DifferentialPair<float> dpx = makePair<float>(y, 0.f); // Initialize diff-value to 0 (not necessary)
rev_f(dpx, dpy, 1.0); // Pass 1.0 for dL/dOutput so that the results are (1.0 * dOutput/dx) and (1.0 * dOutput/dy)
float doutput_dx = dpx.getDifferential();
float doutput_dy = dpy.getDifferential();
```
### Extension to multiple outputs
The extension to multiple outputs is fairly natural. Each output gets a separate input for its derivative.
Here is an example:
```C
// Computation involving multiple inputs and outputs.
float2 f_multi_output(float x, float y)
{
return float2(
x * x,
x + y);
}
// Reverse-mode derivative of 'f_multi_output'. The derivative of the outputs is also a vector quantity
// (type follows from return type of f_multi_output)
void rev_f_multi_output(DifferentialPair<float> dpx, DifferentialPair<float> dpy, float2 dOut)
{
float x = dpx.getPrimal();
float y = dpy.getPrimal();
dpx = DifferentialPair<float>(x, dOut[0] * 2 * x + dOut[1]);
dpy = DifferentialPair<float>(x, dOut[1]);
}
```
### Jacobian method: Generate forward- and reverse-mode derivatives from first principles.
A simple way to figure out what the generated reverse (or forward) derivative function is supposed to compute is to write down the entire Jacobian function. That is, write down the partial derivative of each input w.r.t each output
$$
D\mathbf{f}(\mathbf{x}) = \begin{bmatrix}
\partial f_0 / \partial x & \partial f_0 / \partial y \\
\partial f_1 / \partial x & \partial f_1 / \partial y \\
\end{bmatrix} =
\begin{bmatrix}
2x & 0.0 \\
1.0 & 1.0 \\
\end{bmatrix}
$$
The **reverse-mode derivative**'s outputs should match the left-product of this matrix with the vector of derivatives w.r.t outputs:
$$ \left\langle \frac{\partial \mathcal{L}}{\partial \mathbf{f}}, D\mathbf{f}(\mathbf{x})\right\rangle =
\begin{bmatrix}
\frac{\partial \mathcal{L}}{\partial f_0} & \frac{\partial \mathcal{L}}{\partial f_1}
\end{bmatrix}
\begin{bmatrix}
2x & 0.0 \\
1.0 & 1.0 \\
\end{bmatrix} =
\begin{bmatrix} \left(\frac{\partial \mathcal{L}}{\partial f_0} \cdot 2x + \frac{\partial \mathcal{L}}{\partial f_1}\right) & \frac{\partial \mathcal{L}}{\partial f_1} \end{bmatrix}
$$
and the **forward-mode derivative**'s outputs should match the right-product of this matrix with the vector of differentials of the inputs:
$$ \langle D\mathbf{f}(\mathbf{x}), d\mathbf{x}\rangle =
\begin{bmatrix}
2x & 0.0 \\
1.0 & 1.0 \\
\end{bmatrix}
\begin{bmatrix}
dx \\ dy
\end{bmatrix} =
\begin{bmatrix} 2x \cdot dx & dx + dy \end{bmatrix}
$$
Note that when we generate derivative code in practice, we do not materialize the full Jacobian matrix, and instead use the composition property to chain together derivatives at the instruction level.
However, the resulting code is equivalent to the Jacobian method (mathematically), and it is a good, analytical way to confirm that the generated code is indeed correct (or when thinking about what the derivative of a particular instruction/set of instructions should be)
### Building Blocks: Reverse-mode derivatives compose in reverse order of execution.
A consequence of using the 'left-side inner product' is that derivatives of a composite function must be computed in the reverse of the order of primal computation.
Here's an example of a composite function $h$ (similar to the example used in forward-mode building blocks):
$$ h(\mathbf{x}) = f(g(\mathbf{x})) $$
where (for brevity):
$$ \mathbf{y} = g(\mathbf{x}) $$
The reverse-mode derivative function for $h$ can be written as the composition of the reverse-mode derivatives of $f$ and $g$
$$ \left\langle \frac{\partial L}{\partial h}, Dh(\mathbf{x})\right\rangle = \left\langle \left\langle \frac{\partial L}{\partial h}, Df(\mathbf{y})\right\rangle , Dg(\mathbf{x})\right\rangle $$
Note the 'backward' order here. We must first pass the derivatives through the outer function $f$, and then pass the result through the inner function $g$ to compute derivatives w.r.t inner-most inputs $\mathbf{x}$. This process of passing derivatives backwards is often referred to as *backpropagation*.
A more concrete Slang example of the same:
```C
// Compute square
float sqr(float x)
{
return x * x;
}
// Compute length of hypotenuse
float f(float x, float y)
{
return sqrt(sqr(x) + sqr(y));
}
```
The derivative functions are then:
```C
void rev_sqr(DifferentialPair<float> dpx, float dOutput)
{
float x = dpx.getPrimal();
dpx = DifferentialPair<float>(x, dOutput * 2 * x);
}
void rev_f(DifferentialPair<float> dpx, DifferentialPair<float> dpy, float dOut)
{
float t = 0.5f / sqrt(x * x + y * y);
float d_xsqr = t * dOut; // Calculate derivatives w.r.t output of sqr(x)
float d_ysqr = t * dOut; // Calculate derivatives w.r.t output of sqr(y)
rev_sqr(dpx, d_xsqr); // Propagate to x
rev_sqr(dpx, d_ysqr); // Propagate to y
}
```
When comparing `rev_f`'s implementation to `fwd_f`, note the order of computing derivative w.r.t `sqr` (in `rev_f`, `rev_sqr` is called at the end, while in `fwd_f` it is called at the beginning)

View File

@@ -0,0 +1,92 @@
This document details auto-diff-related decorations that are lowered in to the IR to help annotate methods with relevant information.
## `[Differentiable]`
The `[Differentiable]` attribute is used to mark functions as being differentiable. The auto-diff process will only touch functions that are marked explicitly as `[Differentiable]`. All other functions are considered non-differentiable and calls to such functions from a differentiable function are simply copied as-is with no transformation.
Further, only `[Differentiable]` methods are checked during the derivative data-flow pass. This decorator is translated into `BackwardDifferentiableAttribute` (which implies both forward and backward differentiability), and then lowered into the IR `OpBackwardDifferentiableDecoration`
**Note:** `[Differentiable]` was previously implemented as two separate decorators `[ForwardDifferentiable]` and `[BackwardDifferentiable]` to denote differentiability with each type of auto-diff transformation. However, these are now **deprecated**. The preferred approach is to use only `[Differentiable]`
`fwd_diff` and `bwd_diff` cannot be directly called on methods that don't have the `[Differentiable]` tag (will result in an error). If non-`[Differentiable]` methods are called from within a `[Differentiable]` method, they must be wrapped in `no_diff()` operation (enforced by the [derivative data-flow analysis pass](./types.md#derivative-data-flow-analysis) )
### `[Differentiable]` for `interface` Requirements
The `[Differentiable]` attribute can also be used to decorate interface requirements. In this case, the attribute is handled in a slightly different manner, since we do not have access to the concrete implementations.
The process is roughly as follows:
1. During the semantic checking step, when checking a method that is an interface requirement (in `checkCallableDeclCommon` in `slang-check-decl.cpp`), we check if the method has a `[Differentiable]` attribute
2. If yes, we construct create a set of new method declarations, one for the forward-mode derivative (`ForwardDerivativeRequirementDecl`) and one for the reverse-mode derivative (`BackwardDerivativeRequirementDecl`), with the appropriate translated function types and insert them into the same interface.
3. Insert a new member into the original method to reference the new declarations (`DerivativeRequirementReferenceDecl`)
4. When lowering to IR, the `DerivativeRequirementReferenceDecl` member is converted into a custom derivative reference by adding the `OpBackwardDerivativeDecoration(deriv-fn-req-key)` and `OpForwardDerivativeDecoration(deriv-fn-req-key)` decorations on the primal method's requirement key.
Here is an example of what this would look like:
```C
interface IFoo
{
[Differentiable]
float bar(float);
};
// After checking & lowering
interface IFoo_after_checking_and_lowering
{
[BackwardDerivative(bar_bwd)]
[ForwardDerivative(bar_fwd)]
float bar(float);
void bar_bwd(inout DifferentialPair<float>, float);
DifferentialPair<float> bar_fwd(DifferentialPair<float>);
};
```
**Note:** All conforming types must _also_ declare their corresponding implementations as differentiable so that their derivative implementations are synthesized to match the interface signature. In this sense, the `[Differentiable]` attribute is part of the functions signature, so a `[Differentiable]` interface requirement can only be satisfied by a `[Differentiable]` function implementation
### `[TreatAsDifferentiable]`
In large codebases where some interfaces may have several possible implementations, it may not be reasonable to have to mark all possible implementations with `[Differentiable]`, especially if certain implementations use hacks or workarounds that need additional consideration before they can be marked `[Differentiable]`
In such cases, we provide the `[TreatAsDifferentiable]` decoration (AST node: `TreatAsDifferentiableAttribute`, IR: `OpTreatAsDifferentiableDecoration`), which instructs the auto-diff passes to construct an 'empty' function that returns a 0 (or 0-equivalent) for the derivative values. This allows the signature of a `[TreatAsDifferentiable]` function to match a `[Differentiable]` requirement without actually having to produce a derivative.
## Custom derivative decorators
In many cases, it is desirable to manually specify the derivative code for a method rather than let the auto-diff pass synthesize it from the method body. This is usually desirable if:
1. The body of the method is too complex, and there is a simpler, mathematically equivalent way to compute the same value (often the case for intrinsics like `sin(x)`, `arccos(x)`, etc..)
2. The method involves global/shared memory accesses, and synthesized derivative code may cause race conditions or be very slow due to overuse of synchronization. For this reason Slang assumes global memory accesses are non-differentiable by default, and requires that the user (or the core module) define separate accessors with different derivative semantics.
The Slang front-end provides two sets of decorators to facilitate this:
1. To reference a custom derivative function from a primal function: `[ForwardDerivative(fn)]` and `[BackwardDerivative(fn)]` (AST Nodes: `ForwardDerivativeAttribute`/`BackwardDerivativeAttribute`, IR: `OpForwardDervativeDecoration`/`OpBackwardDerivativeDecoration`), and
2. To reference a primal function from its custom derivative function: `[ForwardDerivativeOf(fn)]` and `[BackwardDerivativeOf(fn)]` (AST Nodes: `ForwardDerivativeAttributeOf`/`BackwardDerivativeAttributeOf`). These attributes are useful to provide custom derivatives for existing methods in a different file without having to edit/change that module. For instance, we use `diff.meta.slang` to provide derivatives for the core module functions in `hlsl.meta.slang`. When lowering to IR, these references are placed on the target (primal function). That way both sets of decorations are lowered on the primal function.
These decorators also work on generically defined methods, as well as struct methods. Similar to how function calls work, these decorators also work on overloaded methods (and reuse the `ResolveInoke` infrastructure to perform resolution)
### Checking custom derivative signatures
To ensure that the user-provided derivatives agree with the expected signature, as well as resolve the appropriate method when multiple overloads are available, we check the signature of the custom derivative function against the translated version of the primal function. This currently occurs in `checkDerivativeAttribute()`/`checkDerivativeOfAttribute()`.
The checking process re-uses existing infrastructure from `ResolveInvoke`, by constructing a temporary invoke expr to call the user-provided derivative using a set of 'imaginary' arguments according to the translated type of the primal method. If `ResolveInvoke` is successful, the provided derivative signature is considered to be a match. This approach also automatically allows us to resolve overloaded methods, account for generic types and type coercion.
## `[PrimalSubstitute(fn)]` and `[PrimalSubstituteOf(fn)]`
In some cases, we face the opposite problem that inspired custom derivatives. That is, we want the compiler to auto-synthesize the derivative from the function body, but there _is_ no function body to translate.
This frequently occurs with hardware intrinsic operations that are lowered into special op-codes that map to hardware units, such as texture sampling & interpolation operations.
However, these operations do have reference 'software' implementations which can be used to produce the derivative.
To allow user code to use the fast hardware intrinsics for the primal pass, but use synthesized derivatives for the derivative pass, we provide decorators `[PrimalSubstitute(ref-fn)]` and `[PrimalSubstituteOf(orig-fn)]` (AST Node: `PrimalSubstituteAttribute`/`PrimalSubstituteOfAttribute`, IR: `OpPrimalSubstituteDecoration`), that can be used to provide a reference implementation for the auto-diff pass.
Example:
```C
[PrimalSubstitute(sampleTexture_ref)]
float sampleTexture(TexHandle2D tex, float2 uv)
{
// Hardware intrinsics
}
float sampleTexture_ref(TexHandle2D tex, float2 uv)
{
// Reference SW implementation.
}
void sampleTexture_bwd(TexHandle2D tex, inout DifferentialPair<float2> dp_uv, float dOut)
{
// Backward derivate code synthesized using the reference implementation.
}
```
The implementation of `[PrimalSubstitute(fn)]` is relatively straightforward. When the transcribers are asked to synthesize a derivative of a function, they check for a `OpPrimalSubstituteDecoration`, and swap the current function out for the substitute function before proceeding with derivative synthesis.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,290 @@
This documentation is intended for Slang contributors and is written from a compiler engineering point of view. For Slang users, see the user-guide at this link: [https://shader-slang.com/slang/user-guide/autodiff.html](https://shader-slang.com/slang/user-guide/autodiff.html)
Before diving into this document, please review the document on [Basics](./basics.md) for the fundamentals of automatic differentiation.
# Components of the Type System
Here we detail the main components of the type system: the `IDifferentiable` interface to define differentiable types, the `DifferentialPair<T>` type to carry a primal and corresponding differential in a single type.
We also detail how auto-diff operators are type-checked (the higher-order function checking system), how the `no_diff` decoration can be used to avoid differentiation through attributed types, and the derivative data flow analysis that warns the the user of unintentionally stopping derivatives.
## `interface IDifferentiable`
Defined in core.meta.slang, `IDifferentiable` forms the basis for denoting differentiable types, both within the core module, and otherwise.
The definition of `IDifferentiable` is designed to encapsulate the following 4 items:
1. `Differential`: The type of the differential value of the conforming type. This allows custom data-structures to be defined to carry the differential values, which may be optimized for space instead of relying solely on compiler synthesis/
Since the computation of derivatives is inherently linear, we only need access to a few operations. These are:
2. `dadd(Differential, Differential) -> Differential`: Addition of two values of the differential type. It's implementation must be associative and commutative, or the resulting derivative code may be incorrect.
3. `dzero() -> Differential`: Additive identity (i.e. the zero or empty value) that can be used to initialize variables during gradient aggregation
4. `dmul<S:__BuiltinRealType>(S, Differential)`: Scalar multiplication of a real number with the differential type. It's implementation must be distributive over differential addition (`dadd`).
Points 2, 3 & 4 are derived from the concept of vector spaces. The derivative values of any Slang function always form a vector space (https://en.wikipedia.org/wiki/Vector_space).
### Derivative member associations
In certain scenarios, the compiler needs information on how the fields in the original type map to the differential type. Particularly, this is a problem when differentiate the implicit construction of a struct through braces (i.e. `{}`), represented by `kIROp_MakeStruct`. We provide the decorator `[DerivativeMember(DifferentialTypeName.fieldName)]` (ASTNode: DerivativeMemberAttribute, IR: kIROp_DerivativeMemberDecoration) to explicitly mark these associations.
Example
```C
struct MyType : IDifferentiable
{
typealias Differential = MyDiffType;
float a;
[DerivativeMember(MyDiffType.db)]
float b;
/* ... */
};
struct MyDiffType
{
float db;
};
```
### Automatic Synthesis of `IDifferentible` Conformances for Aggregate Types
It can be tedious to expect users to hand-write the associated `Differential` type, the corresponding mappings and interface methods for every user-defined `struct` type. For aggregate types, these are trivial to construct by analysing which of their components conform to `IDifferentiable`.
The synthesis proceeds in roughly the following fashion:
1. `IDifferentiable`'s components are tagged with special decorator `__builtin_requirement(unique_integer_id)` which carries an enum value from `BuiltinRequirementKind`.
2. When checking that types conform to their interfaces, if a user-provided definition does not satisfy a requirement with a built-in tag, we perform synthesis by dispatching to `trySynthesizeRequirementWitness`.
3. For _user-defined types_, Differential **types** are synthesized during conformance-checking through `trySynthesizeDifferentialAssociatedTypeRequirementWitness` by checking if each constituent type conforms to `IDifferentiable`, looking up the corresponding `Differential` type, and constructing a new aggregate type from these differential types. Note that since it is possible that a `Differential` type of a constituent member has not yet been synthesized, we have additional logic in the lookup system (`trySynthesizeRequirementWitness`) that synthesizes a temporary empty type with a `ToBeSynthesizedModifier`, so that the fields can be filled in later, when the member type undergoes conformance checking.
4. For _user-defined types_, Differential methods (`dadd`, `dzero` and `dmul`) are synthesized in `trySynthesizeDifferentialMethodRequirementWitness` by utilizing the `Differential` member and its `[DifferentialMember]` decorations to determine which fields need to be considered and the base type to use for each field. There are two synthesis patterns. The fully-inductive pattern is used for `dadd` and `dzero` which works by calling `dadd` and `dzero` respectively on the individual fields of the `Differential` type under consideration.
Example:
```C
// Synthesized from "struct T {FT1 field1; FT2 field2;}"
T.Differential dadd(T.Differential a, T.Differential b)
{
return Differential(
FT1.dadd(a.field1, b.field1),
FT2.dadd(a.field2, b.field2),
)
}
```
On the other hand, `dmul` uses the fixed-first arg pattern since the first argument is a common scalar, and proceeds inductively on all the other args.
Example:
```C
// Synthesized from "struct T {FT1 field1; FT2 field2;}"
T.Differential dmul<S:__BuiltinRealType>(S s, T.Differential a)
{
return Differential(
FT1<S>.dmul(s, a.field1),
FT2<S>.dmul(s, a.field2),
)
}
```
5. During auto-diff, the compiler can sometimes synthesize new aggregate types. The most common case is the intermediate context type (`kIROp_BackwardDerivativeIntermediateContextType`), which is lowered into a standard struct once the auto-diff pass is complete. It is important to synthesize the `IDifferentiable` conformance for such types since they may be further differentiated (through higher-order differentiation). This implementation is contained in `fillDifferentialTypeImplementationForStruct(...)` and is roughly analogous to the AST-side synthesis.
### Differentiable Type Dictionaries
During auto-diff, the IR passes frequently need to perform lookups to check if an `IRType` is differentiable, and retrieve references to the corresponding `IDifferentiable` methods. These lookups also need to work on generic parameters (that are defined inside generic containers), and existential types that are interface-typed parameters.
To accommodate this range of different type systems, Slang uses a type dictionary system that associates a dictionary of relevant types with each function. This works in the following way:
1. When `CheckTerm()` is called on an expression within a function that is marked differentiable (`[Differentiable]`), we check if the resolved type conforms to `IDifferentiable`. If so, we add this type to the dictionary along with the witness to its differentiability. The dictionary is currently located on `DifferentiableAttribute` that corresponds to the `[Differentiable]` modifier.
2. When lowering to IR, we create a `DifferentiableTypeDictionaryDecoration` which holds the IR versions of all the types in the dictionary as well as a reference to their `IDifferentiable` witness tables.
3. When synthesizing the derivative code, all the transcriber passes use `DifferentiableTypeConformanceContext::setFunc()` to load the type dictionary. `DifferentiableTypeConformanceContext` then provides convenience functions to lookup differentiable types, appropriate `IDifferentiable` methods, and construct appropriate `DifferentialPair<T>`s.
### Looking up Differential Info on _Generic_ types
Generically defined types are also lowered into the differentiable type dictionary, but rather than having a concrete witness table, the witness table is itself a parameter. When auto-diff passes need to find the differential type or place a call to the IDifferentiable methods, this is turned into a lookup on the witness table parameter (i.e. `Lookup(<InterfaceRequirementKey>, <WitnessTableParameter>)`). Note that these lookups instructions are inserted into the generic parent container rather than the inner most function.
Example:
```C
T myFunc<T:IDifferentiable>(T a)
{
return a * a;
}
// Reverse-mode differentiated version
void bwd_myFunc<T:IDifferentiable>(
inout DifferentialPair<T> dpa,
T.Differential dOut) // T.Differential is Lookup('Differential', T_Witness_Table)
{
T.Differential da = T.dzero(); // T.dzero is Lookup('dzero', T_Witness_Table)
da = T.dadd(dpa.p * dOut, da); // T.dadd is Lookup('dadd', T_Witness_Table)
da = T.dadd(dpa.p * dOut, da);
dpa = diffPair(dpa.p, da);
}
```
### Looking up Differential Info on _Existential_ types
Existential types are interface-typed values, where there are multiple possible implementations at run-time. The existential type carries information about the concrete type at run-time and is effectively a 'tagged union' of all possible types.
#### Differential type of an Existential
The differential type of an existential type is tricky to define since our type system's only restriction on the `.Differential` type is that it also conforms to `IDifferentiable`. The differential type of any interface `IInterface : IDifferentiable` is therefore the interface type `IDifferentiable`. This is problematic since Slang generally requires a static `anyValueSize` that must be a strict upper bound on the sizes of all conforming types (since this size is used to allocate space for the union). Since `IDifferentiable` is defined in the core module `core.meta.slang` and can be used by the user, it is impossible to define a reliable bound.
We instead provide a new **any-value-size inference** pass (`slang-ir-any-value-inference.h`/`slang-ir-any-value-inference.cpp`) that assembles a list of types that conform to each interface in the final linked IR and determines a relevant upper bound. This allows us to ignore types that conform to `IDifferentiable` but aren't used in the final IR, and generate a tighter upper bound.
**Future work:**
This approach, while functional, creates a locality problem since the size of `IDifferentiable` is the max of _all_ types that conform to `IDifferentiable` in visible modules, even though we only care about the subset of types that appear as `T.Differential` for `T : IInterface`. The reason for this problem is that upon performing an associated type lookup, the Slang IR drops all information about the base interface that the lookup starts from and only considers the constraint interface (in this case `Differential : IDifferentiable`).
There are several ways to resolve this issue, including (i) a static analysis pass that determines the possible set of types at each use location and propagates them to determine a narrower set of types, or (ii) generic (or 'parameterized') interfaces, such as `IDifferentiable<T>` where each version can have a different set of conforming types.
<!--#### IDifferentiable Method lookups on an Existential
All other method lookups are performed using existential-type lookups on the existential parameter. The idea is that existential-typed parameters come with a witness-table component that can be accessed by invoking `kIROp_ExtractExistentialWitnessTable` on them. This allows us to look up the `dadd`/`dzero` methods on this witness table in the same way as we did for generic types.-->
Example:
```C
interface IInterface : IDifferentiable
{
[Differentiable]
This foo(float val);
[Differentiable]
float bar();
};
float myFunc(IInterface obj, float a)
{
IInterface k = obj.foo(a);
return k.bar();
}
// Reverse-mode differentiated version (in pseudo-code corresponding to IR, some of these will get lowered further)
void bwd_myFunc(
inout DifferentialPair<IInterface> dpobj,
inout DifferentialPair<float> dpa,
float.Differential dOut) // T.Differential is Lookup('Differential', T_Witness_Table)
{
// Primal pass..
IInterface obj = dpobj.p;
IInterface k = obj.foo(a);
// .....
// Backward pass
DifferentialPair<IInterface> dpk = diffPair(k);
bwd_bar(dpk, dOut);
IDifferentiable dk = dpk.d; // Differential of `IInterface` is `IDifferentiable`
DifferentialPair<IInterface> dp = diffPair(dpobj.p);
bwd_foo(dpobj, dpa, dk);
}
```
#### Looking up `dadd()` and `dzero()` on Existential Types
There are two distinct cases for lookup on an existential type. The more common case is the closed-box existential type represented simply by an interface. Every value of this type contains a type identifier & a witness table identifier along with the value itself. The less common case is when the function calls are performed directly on the value after being cast to the concrete type.
**`dzero()` for "closed" Existential type: The `NullDifferential` Type**
For concrete and even generic types, we can initialize a derivative accumulator variable by calling the appropriate `Type.dzero()` method. This is unfortunately not possible when initializing an existential differential (which is currently of type `IDifferentiable`), since we must also initialize the type-id of this existential to one of the implementations, but we do not know which one yet since that is a run-time value that only becomes known after the first differential value is generated.
To get around this issue, we declare a special type called `NullDifferential` that acts as a "none type" for any `IDifferentiable` existential object.
**`dadd()` for "closed" Existential types: `__existential_dadd`**
We cannot directly use `dadd()` on two existential differentials of type `IDifferentiable` because we must handle the case where one of them is of type `NullDifferential` and `dadd()` is only defined for differentials of the same type.
We handle this currently by synthesizing a special method called `__existential_dadd` (`getOrCreateExistentialDAddMethod` in `slang-ir-autodiff.cpp`) that performs a run-time type-id check to see if one of the operand is of type `NullDifferential` and returns the other operand if so. If both are non-null, we dispatch to the appropriate `dadd` for the concrete type.
**`dadd()` and `dzero()` for "open" Existential types**
If we are dealing with values of the concrete type (i.e. the opened value obtained through `ExtractExistentialValue(ExistentialParam)`). Then we can perform lookups in the same way we do for generic type. All existential parameters come with a witness table. We insert instructions to extract this witness table and perform lookups accordingly. That is, for `dadd()`, we use `Lookup('dadd', ExtractExistentialWitnessTable(ExistentialParam))` and place a call to the result.
## `struct DifferentialPair<T:IDifferentiable>`
The second major component is `DifferentialPair<T:IDifferentiable>` that represents a pair of a primal value and its corresponding differential value.
The differential pair is primarily used for passing & receiving derivatives from the synthesized derivative methods, as well as for block parameters on the IR-side.
Both `fwd_diff(fn)` and `bwd_diff(fn)` act as function-to-function transformations, and so the Slang front-end translates the type of `fn` to its derivative version so the arguments can be type checked.
### Pair type lowering.
The differential pair type is a special type throughout the AST and IR passes (AST Node: `DifferentialPairType`, IR: `kIROp_DifferentialPairType`) because of its use in front-end semantic checking and when synthesizing the derivative code for the functions. Once the auto-diff passes are complete, the pair types are lowering into simple `struct`s so they can be easily emitted (`DiffPairLoweringPass` in `slang-ir-autodiff-pairs.cpp`).
We also define additional instructions for pair construction (`kIROp_MakeDifferentialPair`) and extraction (`kIROp_DifferentialPairGetDifferential` & `kIROp_DifferentialPairGetPrimal`) which are lowered into struct construction and field accessors, respectively.
### "User-code" Differential Pairs
Just as we use special IR codes for differential pairs because they have special handling in the IR passes, sometimes differential pairs should be _treated as_ regular struct types during the auto-diff passes.
This happens primarily during higher-order differentiation when the user wishes to differentiate the same code multiple times.
Slang's auto-diff approaches this by rewriting all the relevant differential pairs into 'irrelevant' differential pairs (`kIROp_DifferentialPairUserCode`) and 'irrelevant' accessors (`kIROp_DifferentialPairGetDifferentialUserCode`, `kIROp_DifferentialPairGetPrimalUserCode`) at the end of **each auto-diff iteration** so that the next iteration treats these as regular differentiable types.
The user-code versions are also lowered into `struct`s in the same way.
## Type Checking of Auto-Diff Calls (and other _higher-order_ functions)
Since `fwd_diff` and `bwd_diff` are represented as higher order functions that take a function as an input and return the derivative function, the front-end semantic checking needs some notion of higher-order functions to be able to check and lower the calls into appropriate IR.
### Higher-order Invocation Base: `HigherOrderInvokeExpr`
All higher order transformations derive from `HigherOrderInvokeExpr`. For auto-diff there are two possible expression classes `ForwardDifferentiateExpr` and `BackwardDifferentiateExpr`, both of which derive from this parent expression.
### Higher-order Function Call Checking: `HigherOrderInvokeExprCheckingActions`
Resolving the concrete method is not a trivial issue in Slang, given its support for overloading, type coercion and more. This becomes more complex with the presence of a function transformation in the chain.
For example, if we have `fwd_diff(f)(DiffPair<float>(...), DiffPair<double>(...))`, we would need to find the correct match for `f` based on its post-transform argument types.
To facilitate this we use the following workflow:
1. The `HigherOrderInvokeExprCheckingActions` base class provides a mechanism for different higher-order expressions to implement their type translation (i.e. what is the type of the transformed function).
2. The checking mechanism passes all detected overloads for `f` through the type translation and assembles a new group out of the results (the new functions are 'temporary')
3. This new group is used by `ResolveInvoke` when performing overload resolution and type coercion using the user-provided argument list.
4. The resolved signature (if there is one) is then replaced with the corresponding function reference and wrapped in the appropriate higher-order invoke.
**Example:**
Let's say we have two functions with the same name `f`: (`int -> float`, `double, double -> float`)
and we want to resolve `fwd_diff(f)(DiffPair<float>(1.0, 0.0), DiffPair<float>(0.0, 1.0))`.
The higher-order checking actions will synthesize the 'temporary' group of translated signatures (`int -> DiffPair<float>`, `DiffPair<double>, DiffPair<double> -> DiffPair<float>`).
Invoke resolution will then narrow this down to a single match (`DiffPair<double>, DiffPair<double> -> DiffPair<float>`) by automatically casting the `float`s to `double`s. Once the resolution is complete,
we return `InvokeExpr(ForwardDifferentiateExpr(f : double, double -> float), casted_args)` by wrapping the corresponding function in the corresponding higher-order expr
## Attributed Types (`no_diff` parameters)
Often, it will be necessary to prevent gradients from propagating through certain parameters, for correctness reasons. For example, values representing random samples are often not differentiated since the result may be mathematically incorrect.
Slang provides the `no_diff` operator to mark parameters as non-differentiable, even if they use a type that conforms to `IDifferentiable`
```C
float myFunc(float a, no_diff float b)
{
return a * b;
}
// Resulting fwd-mode derivative:
DiffPair<float> myFunc(DiffPair<float> dpa, float b)
{
return diffPair(dpa.p * b, dpa.d * b);
}
```
Slang uses _OpAttributedType_ to denote the IR type of such parameters. For example, the lowered type of `b` in the above example is `OpAttributedType(OpFloat, OpNoDiffAttr)`. In the front-end, this is represented through the `ModifiedType` AST node.
Sometimes, this additional layer can get in the way of things like type equality checks and other mechanisms where the `no_diff` is irrelevant. Thus, we provide the `unwrapAttributedType` helper to remove attributed type layers for such cases.
## Derivative Data-Flow Analysis
Slang has a derivative data-flow analysis pass that is performed on a per-function basis immediately after lowering to IR and before the linking step (`slang-ir-check-differentiability.h`/`slang-ir-check-differentiability.cpp`).
The job of this pass is to enforce that instructions that are of a differentiable type will propagate a derivatives, unless explicitly dropped by the user through `detach()` or `no_diff`. The reason for this is that Slang requires functions to be decorated with `[Differentiable]` to allow it to propagate derivatives. Otherwise, the function is considered non-differentiable, and effectively produces a 0 derivative. This can lead to frustrating situations where a function may be dropping non-differentiable on purpose. Example:
```C
float nonDiffFunc(float x)
{
/* ... */
}
float differentiableFunc(float x) // Forgot to annotate with [Differentiable]
{
/* ... */
}
float main(float x)
{
// User doesn't realise that the function that is supposed to be differentiable is not
// getting differentiated, because the types here are all 'float'.
//
return nonDiffFunc(x) * differentiableFunc(x);
}
```
The data-flow analysis step enforces that non-differentiable functions used in a differentiable context should get their derivative dropped explicitly. That way, it is clear to the user whether a call is getting differentiated or dropped.
Same example with `no_diff` enforcement:
```C
float nonDiffFunc(float x)
{
/* ... */
}
[Differentiable]
float differentiableFunc(float x)
{
/* ... */
}
float main(float x)
{
return no_diff(nonDiffFunc(x)) * differentiableFunc(x);
}
```
A `no_diff` can only be used directly on a function call, and turns into a `TreatAsDifferentiableDecoration` that indicates that the function will not produce a derivative.
The derivative data-flow analysis pass works similar to a standard data-flow pass:
1. We start by assembling a set of instructions that 'produce' derivatives by starting with the parameters of differentiable types (and without an explicit `no_diff`), and propagating them through each instruction in the block. An inst carries a derivative if there one of its operands carries a derivative, and the result type is differentiable.
2. We then assemble a set of instructions that expect a derivative. These are differentiable operands of differentiable functions (unless they have been marked by `no_diff`). We then reverse-propagate this set by adding in all differentiable operands (and repeating this process).
3. During this reverse-propagation, if there is any `OpCall` in the 'expect' set that is not also in the 'produce' set, then we have a situation where the gradient hasn't been explicitly dropped, and we create a user diagnostic.

View File

@@ -0,0 +1,199 @@
# Design Document: Slang IR Module Backwards Compatibility
## Overview
This document describes the design and implementation of backwards compatibility support for serialized Slang IR modules. The feature enables Slang to load IR modules compiled with different versions of the compiler, providing version information and graceful handling of incompatible modules.
## Motivation
As Slang evolves, the intermediate representation (IR) may change with new instructions being added or existing ones being modified. Without backwards compatibility:
- Users cannot load modules compiled with older versions of Slang
- There's no way to detect version mismatches between modules
- Module compatibility issues are opaque to users
This feature addresses these issues by introducing versioning and stable instruction naming.
## User-Facing Changes
### New Command Line Options
1. **`-get-module-info <module-file>`**
- Prints information about a serialized IR module without loading it
- Output includes:
- Module name
- Module version
- Compiler version that created the module
- Example usage: `slangc -get-module-info mymodule.slang-module`
2. **`-get-supported-module-versions`**
- Prints the range of module versions this compiler supports
- Output includes minimum and maximum supported versions
- Example usage: `slangc -get-supported-module-versions`
### API Changes
New method in `ISession` interface:
```cpp
SlangResult loadModuleInfoFromIRBlob(
slang::IBlob* source,
SlangInt& outModuleVersion,
const char*& outModuleCompilerVersion,
const char*& outModuleName);
```
This allows programmatic inspection of module metadata without full deserialization.
## Technical Design
### Stable Instruction Names
The core mechanism for backwards compatibility is the introduction of stable names for IR instructions:
1. **Stable Name Table** (`slang-ir-insts-stable-names.lua`)
- Maps instruction names to unique integer IDs
- IDs are permanent once assigned
- New instructions get new IDs, never reusing old ones
2. **Runtime Mapping**
- `getOpcodeStableName(IROp)`: Convert runtime opcode to stable ID
- `getStableNameOpcode(UInt)`: Convert stable ID back to runtime opcode
- Unknown stable IDs map to `kIROp_Unrecognized`
### Module Versioning
Two types of versions are tracked:
1. **Module Version** (`IRModule::m_version`)
- Semantic version of the IR instruction set
- Range: `k_minSupportedModuleVersion` to `k_maxSupportedModuleVersion`
- Stored in each serialized module
2. **Serialization Version** (`IRModuleInfo::serializationVersion`)
- Version of the serialization format itself
- Currently version 0
- Allows future changes to serialization structure
### Compiler Version Tracking
Each module stores the exact compiler version (`SLANG_TAG_VERSION`) that created it. This enables version-specific workarounds if needed in the future.
### Validation System
A GitHub Actions workflow (`check-ir-stable-names.yml`) ensures consistency:
1. **Check Mode**: Validates that:
- All IR instructions have stable names
- No duplicate stable IDs exist
- The stable name table is a bijection with current instructions
2. **Update Mode**: Automatically assigns stable IDs to new instructions
The validation is implemented in `check-ir-stable-names.lua` which:
- Loads instruction definitions from `slang-ir-insts.lua`
- Compares against `slang-ir-insts-stable-names.lua`
- Reports missing entries or inconsistencies
## Breaking Changes and Version Management
### When to Update Module Version
The module version must be updated when:
1. **Adding Instructions** (Minor Version Bump)
- Increment `k_maxSupportedModuleVersion`
- Older compilers can still load modules that don't use new instructions
2. **Removing Instructions** (Major Version Bump)
- Increment `k_maxSupportedModuleVersion`
- Update `k_minSupportedModuleVersion` to exclude versions with removed instructions
- This breaks compatibility with older modules using removed instructions
3. **Changing Instruction Semantics**
- Even if the instruction name remains the same
- Requires version bump to prevent incorrect behavior
- To avoid bumping the minimum supported version, one may instead introduce
a new instruction and just bump `k_maxSupportedModuleVersion`
### Serialization Format Changes
Changes to how data is serialized (not what data) require updating `serializationVersion`:
- Changes to the RIFF container structure
- Different encoding for instruction payloads
- Reordering of serialized data
## Implementation Details
### Module Loading Flow
1. **Version Check**
```cpp
if (fossilizedModuleInfo->serializationVersion != IRModuleInfo::kSupportedSerializationVersion)
return SLANG_FAIL;
```
2. **Instruction Deserialization**
- Stable IDs are converted to runtime opcodes
- Unknown IDs become `kIROp_Unrecognized`
3. **Validation Pass**
- After deserialization, check for any `kIROp_Unrecognized` instructions
- Fail loading if any are found
### Error Handling
- Incompatible serialization versions: Immediate failure
- Unknown instructions: Mark as unrecognized, fail after full deserialization
(this should be caught by the next check)
- Module version out of range: Fail after deserialization
## Future Considerations
### Potential Enhancements
1. **Graceful Degradation**
- Skip unrecognized instructions if they're not critical
- Provide compatibility shims for removed instructions
2. **Module Migration Tools**
- Utility to upgrade old modules to new formats
- Batch processing for large codebases
### Maintenance Guidelines
1. **Regular CI Validation**
- The GitHub Action ensures stable names stay synchronized
- Catches missing entries before merge
2. **Version Documentation**
- Maintain changelog of what changed in each module version
- Document any version-specific workarounds
3. **Testing**
- Test loading of modules from previous versions
- Verify error messages for incompatible modules
## Conclusion
This backwards compatibility system provides a robust foundation for Slang IR evolution while maintaining compatibility where possible. The combination of stable instruction naming, comprehensive versioning, and automated validation ensures that:
- Users can reliably use modules across Slang versions
- Developers can evolve the IR with clear compatibility boundaries
- Version mismatches are detected and reported clearly
The system is designed to be maintainable and extensible, with clear guidelines for when and how to make breaking changes.

View File

@@ -0,0 +1,271 @@
Capabilities (Out of Date)
============
Slang aims to be a portable language for shader programming, which introduces two complementary problems:
1. We need a way to indicate that certain constructs (types, functions, etc.) are only allowed on certain targets, so that a user gets a meaningful error if they try to do something that won't work on one or more of the APIs or platforms they want to target. Similarly, the user expects to get an error if they call a fragment-shader-specific function inside of, say, compute shader code, or vice versa.
2. If the same feature can be implemented across multiple platforms, but the best (or only) implementation path differs across platforms, then we need a way to express the platform specific code and pick the right implementation per-target.
Item (2) is traditionally handled with preprocessor techniques (e.g., `#ifdef`ing the body of a function based on target platform), but that of course requires that the user invoke the Slang front end once for each target platform, and target-specific coding in a library will then "infect" code that uses that library, forcing them to invoke the front-end once per target as well.
We are especially sensitive to this problem in the compiler itself, because we have to author and maintain the Slang standard modules, which needs to (1) expose the capabilities of many platforms and (2) work across all those platforms. It would be very unfortunate if we had to build different copies of our standard modules per-target.
The intention in Slang is to solve both of these problems with a system of *capabilities*.
What is a capability?
---------------------
For our purposes a capability is a discrete feature that a compilation target either does or does not support.
We could imagine defining a capability for the presence of texture sampling operations with implicit gradients; this capability would be supported when generating fragment shader kernel code, but not when generating code for other stages.
Let's imagine a language syntax that the standard modules could use to define some *atomic* capabilities:
```
capability implicit_gradient_texture_fetches;
```
We can then imagine using attributes to indicate that a function requires a certain capability:
```
struct Texture2D
{
...
// Implicit-gradient sampling operation.
[availableFor(implicit_gradient_texture_fetches)]
float4 Sample(SamplerState s, float2 uv);
}
```
(Note that the `[availableFor(...)]` syntax is just a straw-man to write up examples, and a better name would be desirable if/when we implement this stuff.)
Given those declarations, we could then check when compiling code if the user is trying to call `Texture2D.Sample` in code compiled for a target that *doesn't* support implicit-gradient texture fetches, and issue an appropriate error.
The details on how to sequence this all in the compiler will be covered later.
Derived Capabilities
--------------------
Once we can define atomic capabilities, the next step is to be able to define *derived* capabilities.
Let's imagine that we extend our `capability` syntax so that we can define a new capability that automatically implies one or more other capabilities:
```
capability fragment : implicit_gradient_texture_fetches;
```
Here we've said that whenever the `fragment` capability is available, we can safely assume that the `implicit_gradient_texture_fetches` capability is available (but not vice versa).
Given even a rudimentary tool like that, we can start to build up capabilities that relate closely to the "profiles" in things like D3D:
```
capability d3d;
capability sm_5_0 : d3d;
capability sm_5_1 : sm_5_0;
capability sm_6_0 : sm_5_1;
...
capability d3d11 : d3d, sm_5_0;
capability d3d12 : d3d, sm_6_0;
capability khronos;
capability glsl_400 : khronos;
capability glsl_410 : glsl_400;
...
capability vulkan : khronos, glsl_450;
capability opengl : khronos;
```
Here we are saying that `sm_5_1` supports everything `sm_5_0` supports, and potentially more. We are saying that `d3d12` supports `sm_6_0` but maybe not, e.g., `sm_6_3`.
We are expressing that fact that having a `glsl_*` capability means you are on some Khronos API target, but that it doesn't specify which one.
(The exact details of these declarations obviously aren't the point; getting a good hierarchy of capabilities will take time.)
Capability Composition
----------------------
Sometimes we'll want to give a distinct name to a specific combination of capabilities, but not say that it supports anything new:
```
capability ps_5_1 = sm_5_1 & fragment;
```
Here we are saying that the `ps_5_1` capability is *equivalent* to the combination of `sm_5_1` and `fragment` (that is, if you support both `sm_5_1` and `fragment` then you support `ps_5_1` and vice versa).
Compositions should be allowed in `[availableFor(...)]` attributes (e.g., `[availableFor(vulkan & glsl_450)]`), but pre-defined compositions should be favored when possible.
When composing things with `&` it is safe for the compiler to filter out redundancies based on what it knows so that, e.g., `ps_5_0 & fragment` resolves to just `ps_5_0`.
Once we have an `&` operator for capabilities, it is easy to see that "derived" capabilities are really syntax sugar, so that a derived capability like:
```
capability A : B, C
```
could have been written instead as :
```
capability A_atomic
capability A = A_atomic & B & C
```
Where the `A_atomic` capability guarantees that `A` implies `B` and `C` but not vice versa.
It is also useful to think of an `|` operator on capabilities.
In particular if a function has multiple `[availableFor(...)]` attributes:
```
[availableFor(vulkan & fragment)]
[availableFor(d3d12 & fragment)]
void myFunc();
```
This function should be equivalent to one with just a single `[availableFor((vulkan & fragment) | (d3d12 & fragment))]` which is equivalent to `[availableFor((vulkan | d3d12) & fragment)]`.
Simplification should generally push toward "disjunctive normal form," though, rather than pursue simplifications like that.
Note that we do *not* include negation, so that capabilities are not general Boolean expressions.
Validation
----------
For a given function definition `F`, the front end will scan its body and see what it calls, and compose the capabilities required by the called functions using `&` (simplifying along the way). Call the resulting capability (in disjunctive normal form) `R`.
If `F` doesn't have an `[availableFor(...)]` attribute, then we can derive its *effective* `[availableFor(...)]` capability as `R` (this probably needs to be expressed as an iterative dataflow problem over the call graph, to handle cycles).
If `F` *does* have one or more `[availableFor(...)]` clauses that amount to a declared capability `C` (again in disjunctive normal form), then we can check that `C` implies `R` and error out if it is not the case.
A reasonable implementation would track which calls introduced which requirements, and be able to explain *why* `C` does not capture the stated requirements.
For a shader entry point, we should check it as if it had an `[availableFor(...)]` that is the OR of all the specified target profiles (e.g., `sm_5_0 | glsl_450 | ...`) ANDed with the specified stage (e.g., `fragment`).
Any error here should be reported to the user.
If an entry point has an explicit `[availableFor(...)]` then we should AND that onto the profile computed above, so that the user can restrict certain entry points to certain profiles.
In order to support separate compilation, the functions that are exported from a module should probably either have explicit availability attributes, or else they will be compiled against a kind of "default capability" used for the whole module.
Downstream code that consumes such a module would see declarations with explicit capabilities only.
Picking an appropriate "default capability" to use when compiling modules is an important challenge; it would in practice define the "min spec" to use when compiling.
Capability Overriding
---------------------
It should be possible to define multiple versions of a function, having different `[availableFor(...)]` attributes:
```
[availableFor(vulkan)] void myFunc() { ... }
[availableFor(d3d12)] void myFunc() { ... }
```
For front-end checking, these should be treated as if they were a single definition of `myFunc` with an ORed capability (e.g., `vulkan | d3d12`).
Overload resolution will pick the "best" candidate at a call site based *only* on the signatures of the function (note that this differs greatly from how profile-specific function overloading works in Cg).
The front-end will then generate initial IR code for each definition of `myFunc`.
Each of the IR functions will have the *same* mangled name, but different bodies, and each will have appropriate IR decorations to indicate the capabilities it requires.
The choice of which definition to use is then put off until IR linking for a particular target.
At that point we can look at all the IR functions matching a given mangled name, filter them according to the capabilities of the target, and then select the "best" one.
In general a definition `A` of an IR symbol is better than another definition `B` if the capabilities on `A` imply those on `B` but not versa.
(In practice this probably needs to be "the capabilities on `A` intersected with those of the target," and similarly for `B`)
This approach allows us to defer profile-based choices of functions to very late in the process. The one big "gotcha" to be aware of is when functions are overloaded based on pipeline stage, where we would then have to be careful when generating DXIL or SPIR-V modules with multiple entry points (as a single function `f` might need to be specialized twice if it calls a stage-overloaded function `g`).
Capabilities in Other Places
----------------------------
So far I've talked about capabilities on functions, but they should also be allowed on other declarations including:
- Types, to indicate that code using that type needs the given capability
- Interface conformances, to indicate that a type only conforms to the interface when the capabilities are available
- Struct fields, to indicate that the field is only present in the type when the capabilities are present
- Extension declarations, to indicate that everything in them requires the specified capabilities
We should also provide a way to specify that a `register` or other layout modifier is only applicable for specific targets/stages. Such a capability nominally exists in HLSL today, but it would be much more useful if it could be applied to specify target-API-specific bindings.
Only functions should support overloading based on capability. In all other cases there can only be one definition of an entity, and capabilities just decide when it is available.
API Extensions as Capabilities
------------------------------
One clear use case for capabilities is to represent optional extensions, including cases where a feature is "built-in" in D3D but requires an extension in Vulkan:
```
capability KHR_secret_sauce : vulkan;
[available_for(sm_7_0)] // always available for D3D Shader Model 7.0
[available_for(KHR_secret_sauce)] // Need the "secret sauce" extension for Vulkan
void improveShadows();
```
When generating code for Vulkan, we should be able to tell the user that the `improveShadows()` function requires the given extension. The user should be able to express compositions of capabilities in their `-profile` option (and similarly for the API):
```
slangc code.slang -profile vulkan+KHR_secret_sauce
```
(Note that for the command line, it is beneficial to use `+` instead of `&` to avoid conflicts with shell interpreters)
An important question is whether the compiler should automatically infer required extensions without them being specified, so that it produces SPIR-V that requires extensions the user didn't ask for.
The argument against such inference is that users should opt in to non-standard capabilities they are using, but it would be unfortunate if this in turn requires verbose command lines when invoking the compiler.
It should be possible to indicate the capabilities that a module or entry point should be compiled to use without command-line complications.
(A related challenge is when a capability can be provided by two different extensions: how should the compiler select the "right" one to use?)
Disjoint Capabilities
---------------------
Certain compositions of capabilities make no sense. If a user declared a function as needing `vulkan & d3d12` they should probably get an error message.
Knowing that certain capabilities are disjoint can also help improve the overall user experience.
If a function requires `(vulkan & extensionA) | (d3d12 & featureb)` and we know we are compiling for `vulkan` we should be able to give the user a pointed error message saying they need to ask for `extensionA`, because adding `featureB` isn't going to do any good.
As a first-pass model we could have a notion of `abstract` capabilities that are used to model the root of hierarchies of disjoint capabilities:
```
abstract capability api;
abstract capability d3d : api;
capability d3d11 : d3d;
capability d3d12 : d3d;
abstract capability khronos : api;
capability vulkan : khronos;
capability opengl : khronos;
```
As a straw man: we could have a rule that to decide if non-abstract capabilities `A` and `B` are disjoint, we look for their common ancestor in the tree of capabilities.
If the common ancestor is abstract, they are disjoint, and if not they not disjoint.
We'd also know that if the user tries to compile for a profile that includes an abstract capability but *not* some concrete capability derived from it, then that is an error (we can't generate code for just `d3d`).
The above is an over-simplification because we don't have a *tree* of capabilities, but a full *graph*, so we'd need an approach that works for the full case.
Interaction with Generics/Interfaces
------------------------------------
It should be possible for an interface requirement to have a capability requirement attached to it.
This would mean that users of the interface can only use the method/type/whatever when the capability is present (just like for any other function):
```
interface ITexture
{
float4 sampleLevel(float2 uv, float lod);
[availableFor(fragment)]
float4 sample(float2 uv); // can only call this from fragment code
}
```
When implementing an interface, any capability constraints we put on a member that satisfies an interface requirement would need to guarantee that either:
- the capabilities on our method are implied by those on the requirement (we don't require more), or
- the capabilities on the method are implied by those on the type itself, or its conformance to the interface (you can't use the conformance without the capabilities), or
- the capabilities are already implied by those the whole module is being compiled for
In each case, you need to be sure that `YourType` can't be passed as a generic argument to some function that uses just the `ITexture` interface above and have them call a method on your type from a profile that doesn't have the required capabilities.
Interaction with Heterogeneity
------------------------------
If Slang eventually supports generating CPU code as well as shaders, it should use capabilities to handle the CPU/GPU split similar to how they can be used to separate out vertex- and fragment-shader functionality.
Something like a `cpu` profile that works as a catch-all for typical host CPU capabilities would be nice, and could be used as a convenient way to mark "host" functions in a file that is otherwise compiled for a "default profile" that assumes GPU capabilities.
Conclusion
----------
Overall, the hope is that in many cases developers will be able to use capability-based partitioning and overloading of APIs to build code that only has to pass through the Slang front-end once, but that can then go through back-end code generation for each target.
In cases where this can't be achieved, the way that capability-based overloading is built into the Slang IR design means that we should be able to merge multiple target-specific definitions into one IR module, so that a module can employ target-specific specializations while still presenting a single API to consumers.

View File

@@ -0,0 +1,150 @@
Casting in the Slang Compiler
=============================
The following discussion is about casting within the C++ implementation of the slang compiler.
C++'s built in mechanisms for casting (principally dynamic_cast) is problematic within the slang compiler codebase. Code using 'dynamic_cast' requires RTTI information is available, and that a type that uses it must have a vtbl (have at least one virtual member). Some problems with this...
* There are types which we want to 'dynamic_cast' that do not have, and we do not want to have a Vtbl (for example Slang::IRInst).
* There are types which a 'dynamic_cast' doesn't do quite what we want (for example casting on Type* derived types typically wants to work on their canonical type)
* We may want to replace use of dynamic_cast in the future for speed/space or other reasons
* It is common in the code base when using a 'smart pointer' type to cast it, but still return a smart pointer
To deal with these issues we need casting within Slang to follow it's own methodology. In summary it is as follows...
* Use 'as' free function to do a typical 'dynamic like' cast.
* 'as' doesn't guarantee the returned pointer points to the same object.
* For example with Type* it *actually* does the cast on the canonical type which is often a different object.
* If you want to *literally* do a dynamic cast use 'dynamicCast' free function.
* This guarantees the returned pointer points to the same object (like normal dynamic_cast)
* If you want to return a smart pointer from a cast from a smart pointer use the .as or .dynamicCast *methods*
* If you want to determine if an 'as' cast is possible on a smart pointer use the .is method
* Doing so will produce more efficient code because a new smart pointer does not need to be constructed
These functions will also work with types that do not have Vtbl - like IRInst derived types.
Both 'as' and 'dynamicCast' handle the case if the pointer is a nullptr, by returning a nullptr. If the cast succeeds the cast pointer is returned otherwise nullptr is returned. If a cast is performed with a free function it always returns a raw pointer.
So why have 'as' and 'dynamicCast' - they seem sort of similar? The primary difference is dynamicCast *must* always return a pointer to the same object, whilst 'as' *can* return a pointer to a different object if that is the desired 'normal' casting behavior for the type. This is the case for Type* when using 'as' it may return a different object - the 'canonical type' for the Type*. For a concrete example take 'NamedExpressionType', its canonical type is the type the name relates to. If you use 'as' on it - it will produce a pointer to a different object, an object that will not be castable back into a NamedExpressionType.
Also keep in mind that 'as' behavior is based on the pointer type being cast from. For any pointer to a type derived from Type it will cast the canonical type. **BUT** if the pointer is pointing to a Type derived *object*, but the pointer type is *not* derived from Type (like say RefObject*), then 'as' will behave like dynamicCast.
All this being said 'as' in usage is seen as the 'default' way to do a 'dynamic like' cast with these special behaviour appropriate for the type when necessary.
By having the free function and method versions of 'as' and 'dynamicCast', you can choose if you want a 'raw' or 'smart' pointer type returned from the cast. If you just want to test if something is a certain type, then using as/dynamicCast free functions is the faster way to do it. If you *know* that a raw pointer is ok, because the object will remain in scope, then again using the free function is better because it does less work. But as the examples following show, care is needed because if you get it wrong the object might go out of scope and leave the raw pointer pointing to a deleted object. When in doubt the safe choice is to typically use .as (or .dynamicCast if appropriate) methods.
Following example shows the different types of casting...
```C++
void someFunction(Decl* decl, Type* type)
{
RefPtr<Decl> declRefPtr(decl);
RefPtr<Type> typeRefPtr(type);
// Use of as
{
// Casting with as on a free function returns a raw pointer
GenericDecl* genericDeclRaw0 = as<GenericDecl>(decl);
// Free function again returns a raw pointer
GenericDecl* genericDeclRaw1 = as<GenericDecl>(declRefPtr);
// Using the as *method* returns a smart pointer holding the cast result
RefPtr<GenericDecl> genericDeclRefPtr0 = declRefPtr.as<GenericDecl>();
// Of course you can use auto with either
auto genericDeclRefPtr1 = declRefPtr.as<GenericDecl>();
auto genericDeclRaw2 = as<GenericDecl>(declRefPtr);
}
// Currently using as on anything not cast *from* Type is the same as dynamicCast.
// But on Type* sometimes you may want to control the cast
{
// With a NamedExpressionType sometimes you don't want 'as' behaviour - if we want to see the information about the name (not the thing
// it relates to (the canonical type)
NamedExpressionType* namedExpressionRawPtr = dynamicCast<NamedExpressionType>(type);
// Returns the smart pointer
auto namedExpressionRefPtr = typeRefPtr.as<NamedExpressionType>();
}
```
It is important to be aware of what style of cast you use where. Take for example the following function ...
```C++
RefPtr<Expr> substitute(RefPtr<Expr> expr) const
{
return DeclRefBase::Substitute(expr);
}
```
If you want to do a cast on it, you need to be careful especially about scope, for example...
```C++
RefPtr<Expr> expr = ...;
{
// Whoops! This is a problem. When using the free function, the cast is to a *raw* pointer, so obj
// receives a raw pointer. When the RefPtr returned from Substitute goes out of scope (when the statement is left)
// the ref will be removed and if the ref count was 1 destroyed. Now obj points to a freed object and so a crash is
// likely to follow in the future!
auto obj = as<RefObject>(substitute(expr));
}
// So how do we avoid this? Well it depends what the function is returning and the scope. If it's returning a smart pointer,
// you could use the .as method
{
// This can only compile if it is a smart pointer (raw pointers don't have an as method)
auto obj = substitute(expr).as<RefObject>();
}
// Another option is to put the created thing in a smart pointer so you know it's in scope
{
RefPtr<Expr> sub = substitute(expr);
// Ok as long as sub is in scope
auto obj = as<RefObject>(sub);
}
// More awkwardly you could use free function, but assign to a smart pointer, thus maintaining scope
{
RefPtr<RefObject> obj = as<RefObject>(substitute(expr));
}
```
The following code shows the change in behavior of 'as' is based on the source *pointer* type **NOT** the *object* type..
```C++
// Derives from Type
NamedExpressionType* exprType = ...;
// Will be the Type* of the *canonical* type, because the pointer is Type derived and we are using as!
Type* type0 = as<Type>(exprType);
// It' going to be pointing to a different object, because type0 is the cast of the *canonical* type, because exprType derives from Type
SLANG_ASSERT(type0 != exprType);
// If I do a dynamicCast the result is either nullptr or a pointer that *must* point to the same object
Type* type1 = dynamicCast<Type>(exprType);
SLANG_ASSERT(type1 == exprType);
// Here, the pointer is pointing to a NamedExpressionType derived object. Which derives from Type. BUT our pointer here does *not* derive from type.
RefObject* refObj = exprType;
// 'as' just looks at the from type, and it doesn't derive from Type (it's just RefObject), so it does regular as, which is dynamicCast
Type* type2 = as<Type>(refObject);
SLANG_ASSERT(type2 == exprType);
// Finally...
// Is true even though exprType is a NamedExpression, because the cast is on the canonical type
SLANG_ASSERT(as<NamedExpression>(exprType) == nullptr);
// dynamicCast is always the same object returned, so must match
SLANG_ASSERT(dynamicCast<NamedExpression>(exprType) == exprType);
```

View File

@@ -0,0 +1,282 @@
Slang Project Coding Conventions
================================
Principles
----------
This document attempts to establish conventions to be used in the Slang codebase.
We have two goals for this convention.
The first goal is to make the code look relatively consistent so that it is easy to navigate and understand for contributors.
Having varying styles across different modules, files, functions, or lines of code makes the overall design and intention of the codebase harder to follow.
The second goal is to minimize the scope complexity of diffs when multiple maintainers work together on the codebase.
In the absence of an enforced style, developers tend to "clean up" code they encounter to match their personal preferences, and in so doing create additional diffs that increase the chances of merge conflicts and pain down the line.
Because the Slang codebase has passed through many hands and evolved without a pre-existing convention, these two goals can come into conflict.
We encourage developers to err on the side of leaving well enough alone (favoring the second goal).
Don't rewrite or refactor code to match these conventions unless you were already going to have to touch all of those lines of code anyway.
Note that external code that is incorporated into the project is excluded from all of these conventions.
Languages
---------
### C++
Most code in the Slang project is implemented in C++.
We currently assume support for some C++11 idioms, but have explicitly avoided adding dependencies on later versions.
As a general rule, be skeptical of "modern C++" ideas unless they are clearly better to simpler alternatives.
We are not quite in the realm of "Orthodox C++", but some of the same guidelines apply:
* Don't use exceptions for non-fatal errors (and even then support a build flag to opt out of exceptions)
* Don't use the built-in C++ RTTI system (home-grown is okay)
* Don't use the C++ variants of C headers (e.g., `<cstdio>` instead of `<stdio.h>`)
* Don't use the STL containers
* Don't use iostreams
The compiler implementation does not follow some of these guidelines at present; that should not be taken as an excuse to further the proliferation of stuff like `dynamic_cast`.
Do as we say, not as we do.
Some relatively recent C++ features that are okay to use:
* Rvalue references for "move semantics," but only if you are implementing performance-critical containers or other code where this really matters.
* `auto` on local variables, if the expected type is clear in context
* Lambdas are allowed, but think carefully about whether just declaring a subroutine would also work.
* Using `>>` to close multiple levels of templates, instead of `> >` (but did you really need all those templates?)
* `nullptr`
* `enum class`
* Range-based `for` loops
* `override`
* Default member initializers in `class`/`struct` bodies
Templates are suitable in cases where they improve clarity and type safety.
As a general rule, it is best when templated code is kept minimal, and forwards to a non-templated function that does the real work, to avoid code bloat.
Any use of template metaprogramming would need to prove itself exceptionally useful to pay for the increase in cognitive complexity.
We don't want to be in the business of maintaining "clever" code.
As a general rule, `const` should be used sparingly and only with things that are logically "value types."
If you find yourself having to `const`-qualify a lot of member function in type that you expect to be used as a heap-allocated object, then something has probably gone wrong.
As a general rule, default to making the implementation of a type `public`, and only encapsulate state or operations with `private` when you find that there are complex semantics or invariants that can't be provided without a heavier hand.
### Slang
The Slang project codebase also includes `.slang` files implementing the Slang core module, as well as various test cases and examples.
The conventions described here are thus the "official" recommendations for how users should format Slang code.
To the extent possible, we will try to apply the same basic conventions to both C++ and Slang.
In places where we decide that the two languages merit different rules, we will point it out.
Files and Includes
------------------
### File Names
All files and directories that are added to codebase should have names that contain only ASCII lower-case letters, digits, dots (`.`) and dashes (`-`).
Operating systems still vary greatly in their handling of case sensitivity for file names, and non-ASCII code points are handled with even less consistency; sticking to a restricted subset of ASCII helps avoids some messy interactions between case-insensitive file systems and case-sensitive source-control systems like Git.
As with all these conventions, files from external projects are exempted from these restrictions.
### Naming of Source and Header Files
In general the C++ codebase should be organized around logical features/modules/subsystem, each of which has a single `.h` file and zero or more `.cpp` files to implement it.
If there is a single `.cpp` file, its name should match the header: e.g., `parser.h` and `parser.cpp`.
If there is more than one `.cpp` file, their names should start with the header name: e.g., `parser.h` and `parser-decls.cpp` and `parser-exprs.cpp`.
If there are declarations that need to be shared by the `.cpp` files, but shouldn't appear in the public interface, then can go in a `*-impl.h` header (e.g., `parser-impl.h`).
Use best judgement when deciding what counts as a "feature." One class per file is almost always overkill, but the codebase currently leans too far in the other direction, with some oversized source files.
### Headers
Every header file should have an include guard.
Within the implementation we can use `#pragma once`, but exported API headers (`slang.h`) should use traditional `#ifdef` style guards (and they should be consumable as both C and C++).
A header should include or forward-declare everything it needs in order to compile.
It is *not* up to the programmer who `#include`s a header to sort out the dependencies.
Avoid umbrella or "catch-all" headers.
### Source Files
Every source file should start by including the header for its feature/module, before any other includes (this helps ensure that the header correctly includes its dependencies).
Functions that are only needed within that one source file can be marked `static`, but we should avoid using the same name for functions in different files (in order to support lumped/unified builds).
### Includes
In general, includes should be grouped as follows:
* First, the correspodning feature/module header, if we are in a source file
* Next, any `<>`-enlosed includes for system/OS headers
* Next, any `""`-enclosed includes for external/third-part code that is stored in the project repository
* Finally, any includes for other features in the project
Within each group, includes should be sorted alphabetically.
If this breaks because of ordering issues for system/OS/third-party headers (e.g., `<windows.h>` must be included before `<GL/GL.h>`), then ideally those includes should be mediated by a Slang-project-internal header that features can include.
Namespaces
----------
Favor fewer namespaces when possible.
Small programs may not need any.
All standard module code that a Slang user might link against should go in the `Slang` namespace for now, to avoid any possibility of clashes in a static linking scenario.
The public C API is obviously an exception to this.
Code Formatting
------------------------------
- For C++ files, please format using `clang-format`; `.clang-format` files in
the source tree define the style.
- For CMake files, please format using `gersemi`
- For shell scripts, please format using `shfmt`
- For YAML files, please use `prettier`
The formatting for the codebase is overall specified by the
[`extras/formatting.sh`](./extras/formatting.sh) script.
If you open a pull request and the formatting is incorrect, you can comment
`/format` and a bot will format your code for you.
Naming
------
### Casing
Types should in general use `UpperCamelCase`. This includes `struct`s, `class`es, `enum`s and `typedef`s.
Values should in general use `lowerCamelCase`. This includes functions, methods, local variables, global variables, parameters, fields, etc.
Macros should in general use `SCREAMING_SNAKE_CASE`.
It is important to prefix all macros (e.g., with `SLANG_`) to avoid collisions, since `namespace`s don't affect macros).
In names using camel case, acronyms and initialisms should appear eniterly in either upper or lower case (e.g., `D3DThing d3dThing`) and not be capitalized as if they were ordinary words (e.g., `D3dThing d3dThing`).
Note that this also applies to uses of "ID" as an abbreviation for "identifier" (e.g., use `nodeID` instead of `nodeId`).
### Prefixes
Prefixes based on types (e.g., `p` for pointers) should never be used.
Global variables should have a `g` prefix, e.g. `gCounter`.
Non-`const` `static` class members can have an `s` prefix if that suits your fancy.
Of course, both of these should be avoided, so this shouldn't come up often.
Constant data (in the sense of `static const`) should have a `k` prefix.
In contexts where "information hiding" is relevant/important, such as when a type has both `public` and `private` members, or just has certain operations/fields that are considered "implementation details" that most clients should not be using, an `m_` prefix on member variables and a `_` prefix on member functions is allowed (but not required).
In function parameter lists, an `in`, `out`, or `io` prefix can be added to a parameter name to indicate whether a pointer/reference/buffer is intended to be used for input, output, or both input and output.
For example:
```c++
void copyData(void* outBuffer, void const* inBuffer, size_t size);
Result lookupThing(Key k, Thing& outThing);
void maybeAppendExtraNames(std::vector<Name>& ioNames);
```
Public C APIs will prefix all symbol names while following the casing convention (e.g. `SlangModule`, `slangLoadModule`, etc.).
### Enums
C-style `enum` should use the following convention:
```c++
enum Color
{
kColor_Red,
kColor_Green,
kColor_Blue,
kColorCount,
};
```
When using `enum class`, drop the `k` and type name as prefix, but retain the `UpperCamelCase` tag names:
```c++
enum class Color
{
Red,
Green,
Blue,
Count,
};
```
When defining a set of flags, separate the type definition from the `enum`:
```c++
typedef unsigned int Axes;
enum
{
kAxes_None = 0,
kAxis_X = 1 << 0,
kAxis_Y = 1 << 1,
kAxis_Z = 1 << 2,
kAxes_All = kAxis_X | kAxis_Y | kAxis_Z,
};
```
Note that the type name reflects the plural case, while the cases that represent individual bits are named with a singular prefix.
In public APIs, all `enum`s should use the style of separating the type definition from the `enum`, and all cases should use `SCREAMING_SNAKE_CASE`:
```c++
typedef unsigned int SlangAxes;
enum
{
SLANG_AXES_NONE = 0,
SLANG_AXIS_X = 1 << 0,
SLANG_AXIS_Y = 1 << 1,
SLANG_AXIS_Z = 1 << 2,
SLANG_AXES_ALL = SLANG_AXIS_X | SLANG_AXIS_Y | SLANG_AXIS_Z,
};
```
### General
Names should default to the English language and US spellings, to match the dominant conventions of contemporary open-source projects.
Function names should either be named with action verbs (`get`, `set`, `create`, `emit`, `parse`, etc.) or read as questions (`isEnabled`, `shouldEmit`, etc.).
Whenever possible, compiler concepts should be named using the most widely-understood term available: e.g., we use `Token` over `Lexeme`, and `Lexer` over `Scanner` simply because they appear to be the more common names.
Avoid abbreviations and initialisms unless they are already widely established across the codebase; a longer name may be cumbersome to write in the moment, but the code will probably be read many more times than it is written, so clarity should be preferred.
An important exception to this is common compiler concepts or techniques which may have laboriously long names: e.g., Static Single Assignment (SSA), Sparse Conditional Copy Propagation (SCCP), etc.
One gotcha particular to compiler front-ends is that almost every synonym for "type" has some kind of established technical meaning; most notably the term "kind" has a precise meaning that is relevant in our domain.
It is common practice in C and C++ to define tagged union types with a selector field called a "type" or "kind," which does not usually match this technical definition.
If a developer wants to avoid confusion, they are encouraged to use the term "flavor" instead of "type" or "kind" since this term (while a bit silly) is less commonly used in the literature.
Comments and Documentation
--------------------------
You probably know the drill: comments are good, but an out-of-date comment can be worse than no comment at all.
Try to write comments that explain the "why" of your code more than the "what."
When implementing a textbook algorithm or technique, it may help to imagine giving the reviewer of your code a brief tutorial on the topic.
In cases where comments would benefit from formatting, use Markdown syntax.
We do not currently have a setup for extracting documentation from comments, but if we add one we will ensure that it works with Markdown.
When writing comments, please be aware that your words could be read by many people, from a variety of cultures and backgrounds.
Default to a plain-spoken and professional tone and avoid using slang, idiom, profanity, etc.

View File

@@ -0,0 +1,166 @@
Understanding Declaration References (Out of Date)
====================================
This document is intended as a reference for developers working on the Slang compiler implementation.
As you work on the code, you'll probably notice a lot of places where we use the `DeclRef<T>` type:
* Expressions like `VarExpr` and `MemberExpr` are subclasses of `DeclRefExpr`, which holds a `DeclRef<Decl>`.
* The most common subclass of `Type` is `DeclRefType`, which holds a `DeclRef<Decl>` for the type declaration.
* Named types (references to `typedef`s) hold a `DeclRef<TypedefDecl>`
* The name lookup process relies a lot on `DeclRef<ContainerDecl>`
So what in the world is a `DeclRef`?
The short answer is that a `DeclRef` packages up two things:
1. A pointer to a `Decl` in the parsed program AST
2. A set of "substitutions" to be applied to that decl
Why do we need `DeclRef`s?
--------------------------
In a compiler for a simple language, we might represent a reference to a declaration as simply a pointer to the AST node for the declaration, or some kind of handle/ID that references that AST node.
A representation like that will work in simple cases, for example:
```hlsl
struct Cell { int value };
Cell a = { 3 };
int b = a.value + 4;
```
In this case, the expression node for `a.value` can directly reference the declaration of the field `Cell::value`, and from that we can conclude that the type of the field (and hence the expression) is `int`.
In contrast, things get more complicated as soon as we have a language with generics:
```hlsl
struct Cell<T> { T value; };
// ...
Cell<int> a = { 3 };
int b = a.value + 4;
```
In this case, if we try to have the expression `a.value` only reference `Cell::value`, then the best we can do is conclude that the field has type `T`.
In order to correctly type the `a.value` expression, we need enough additional context to know that it references `Cell<int>::value`, and from that to be able to conclude that a reference to `T` in that context is equivalent to `int`.
We can represent that information as a substitution which maps `T` to `int`:
```
[ Cell::T => int ]
```
Then we can encode a reference to `Cell<int>::value` as a reference to the single declaration `Cell::value` with such a substitution applied:
```
Cell::value [Cell::T => int]
```
If we then want to query the type of this field, we can first look up the type stored on the AST (which will be a reference to `Cell::T`) and apply the substitutions from our field reference to get:
```
Cell::T [Cell::T => int]
```
Of course, we can then simplify the reference by applying the substitutions, to get:
```
int
```
How is this implemented?
------------------------
At the highest level, a `DeclRef` consists of a pointer to a declaration (a `Decl*`) plus a single-linked list of `Substution`s.
These substitutions fill in the missing information for any declarations on the ancestor chain for the declaration.
Each ancestor of a declaration can introduce an expected substitution along the chain:
* Most declarations don't introduce any substitutions: e.g., when referencing a non-generic `struct` we don't need any addition information.
* A surrounding generic declaration requires a `GenericSubstitution` which specifies the type argument to be plugged in for each type parameter of the declaration.
* A surrounding `interface` declaration usually requires a `ThisTypeSubstitution` that identifies the specific type on which an interface member has been looked up.
All of the expected substitutions should be in place in the general case, even when we might not have additional information. E.g., within a generic declaration like this:
```hlsl
struct Cell<T>
{
void a();
void b() { a(); }
}
```
The reference to `a` in the body of `b` will be represented as a declaration reference to `Cell::a` with a substitution that maps `[Cell::T => Cell::T]`. This might seem superfluous, but it makes it clear that we are "applying" the generic to arguments (even if they are in some sense placeholder arguments), and not trying to refer to an unspecialized generic.
There are a few places in the compiler where we might currently bend these rules, but experience has shown that failing to include appropriate substitutions is more often than not a source of bugs.
What in the world is a "this type" substitution?
------------------------------------------------
When using interface-constrained generics, we need a way to invoke methods of the interface on instances of a generic parameter type.
For example, consider this code:
```hlsl
interface IVehicle
{
associatedtype Driver;
Driver getDriver();
}
void ticketDriver<V : IVehicle>(V vehicle)
{
V.Driver driver = vehicle.getDriver();
sentTicketTo(driver);
}
```
In the expression `vehicle.getDriver`, we are referencing the declaration of `IVehicle::getDriver`, and so a naive reading tells us that the return type of the call is `IVehicle.Driver`, but that is an associated type and not a concrete type. It is clear in context that the expression `vehicle.getDriver()` should result in a `V.Driver`.
The way the compiler encodes that is that we treat the expression `v.getDriver` as first "up-casting" the value `v` (of type `V`) to the interface `IVehicle`. We know this is valid because of the generic constraint `V : IVehicle`. The result of the up-cast operation is an expression with a type that references `IVehicle`, but with a substitution to track the fact that the underlying implementation type is `V`. This amounts to something like:
```
IVehicle [IVehicle.This => V]
```
where `IVehicle.This` is a way to refer to "the concrete type that is implementing `IVehicle`".
Looking up the `getDriver` method on this up-cast expression yields a reference to:
```
IVehicle::getDriver [IVehicle.This => V]
```
And extracting the return type of that method gives us a reference to the type:
```
IVehicle::Driver [IVehicle.This => V]
```
which turns out to be exactly what the front end produces when it evaluates the type reference `V.Driver`.
As this example shows, a "this type" substitution allows us to refer to interface members while retaining knowledge of the specific type on which those members were looked up, so that we can compute correct references to things like associated types.
What does any of this mean for me?
----------------------------------
When working in the Slang compiler code, try to be aware of whether you should be working with a plain `Decl*` or a full `DeclRef`.
There are many queries like "what is the return type of this function?" that typically only make sense if you are applying them to a `DeclRef`.
The `syntax.h` file defines helpers for most of the existing declaration AST nodes for querying properties that should represent substitutions (the type of a variable, the return type of a function, etc.).
If you are writing code that is working with a `DeclRef`, try to use these accessors and avoid being tempted to extract the bare declaration and start querying it.
Some things like `Modifier`s aren't (currently) affected by substitutions, so it can make sense to query them on a bare declaration instead of a `DeclRef`.
Conclusion
----------
Working with `DeclRef`s can be a bit obtuse at first, but they are the most elegant solution we've found to the problems that arise when dealing with generics and interfaces in the compiler front-end. Hopefully this document gives you enough context to see why they are important, and hints at how their representation in the compiler helps us implement some cases that would be tricky otherwise.

View File

@@ -0,0 +1,252 @@
Existential Types
=================
This document attempts to provide some background on "existential types" as they pertain to the design and implementation of Slang.
The features described here are *not* reflected in the current implementation, so this is mostly a sketch of where we can go with the language and compiler.
Background: Generics and Universal Quantification
-------------------------------------------------
Currently Slang supports using interfaces as generic constraints. Let's use a contrived example:
```hlsl
interface IImage { float4 getValue(float2 uv); }
float4 offsetImage<T : IImage>(T image, float2 uv)
{
float2 offset = ...;
return image.getValue(uv + offset)
}
```
Generics like this are a form of "universal quantification" in the terminology of type theory.
This makes sense, because *for all* types `T` that satisfy the constraints, `offsetImage` provides an implementation of its functionality.
When we think of translating `offsetImage` to code, we might at first only think about how we can specialize it once we have a particular type `T` in mind.
However, we can also imagine trying to generate one body of code that can implement `offsetImage` for *any* type `T`, given some kind of runtime representation of types.
For example, we might generate C++ code like:
```c++
struct IImageWitnessTable { float4 (*getValue)(void* obj, float2 uv); };
float4 offsetImage(Type* T, IImageWitnessTable* W, void* image, float2 uv)
{
float2 offset = ...;
return W->getvalue(image, uv + offset);
}
```
This translation takes the generic parameters and turns them into ordinary runtime parameters: the type `T` becomes a pointer to a run-time type representation, while the constraint that `T : IImage` becomes a "witness table" of function pointers that, we assume, implements the `IImage` interface for `T`. So, the syntax of generics is *not* tied to static specialization, and can admit a purely runtime implementation as well.
Readers who are familiar with how languages like C++ are implemented might see the "witness table" above and realize that it is kind of like a virtual function table, just being passed alongside the object, rather than stored in its first word.
Using Interfaces Like Types
---------------------------
It is natural for a user to want to write code like the following:
```hlsl
float4 modulateImage(IImage image, float2 uv)
{
float4 factor = ...;
return factor * image.getValue(uv);
}
```
Unlike `offsetImage`, `modulateImage` is trying to use the `IImage` interface as a *type* and not just a constraint.
This code appears to be asking for a dynamic implementation rather than specialization (we'll get back to that...) and so we should be able to implement it similarly to our translation of `offsetImage` to C++.
Something like the following makes a lot of sense:
```c++
struct IImage { Type* T; IImageWitnessTable* W; void* obj; };
float4 modulateImage(IImage image, float2 uv)
{
float4 factor = ...;
return factor * image.W->getvalue(image.obj, uv);
}
```
Similar to the earlier example, there is a one-to-one mapping of the parameters of the Slang function the user wrote to the parameters of the generated C++ function.
To make this work, we had to bundle up the information that used to be separate parameters to the generic as a single value of type `IImage`.
Existential Types
-----------------
It turns out that when we use `IImage` as a type, it is what we'd call an *existential* type.
That is because if I give you a value `img` of type `IImage` in our C++ model, then you know that *there exists* some type `img.T`, a witness table `img.W` proving the type implements `IImage`, and a value `img.obj` of that type.
Existential types are the bread and butter of object-oriented programming.
If I give you an `ID3D11Texture2D*` you don't know what its concrete type is, and you just trust me that some concrete type *exists* and that it implements the interface.
A C++ class or COM component can implement an existential type, with the constraint that the interfaces that a given type can support is limited by the way that virtual function tables are intrusively included inside the memory of the object, rather than externalized.
Many modern languages (e.g., Go) support adapting existing types to new interfaces, so that a "pointer" of interface type is actually a fat pointer: one for the object, and one for the interface dispatch table.
Our examples so far have assumed that the type `T` needs to be passed around separately from the witness table `W`, but that isn't strictly required in some implementations.
In type theory, the most important operation you can do with an existential type is to "open" it, which means to have a limited scope in which you can refer to the constituent pieces of a "bundled up" value of a type like `IImage`.
We could imagine "opening" an existential as something like:
```
void doSomethingCool<T : IImage>(T val);
void myFunc(IImage img)
{
open img as obj:T in
{
// In this scope we know that `T` is a type conforming to `IImage`,
// and `obj` is a value of type `T`.
//
doSomethingCool<T>(obj);
}
}
```
Self-Conformance
----------------
The above code with `doSomethingCool` and `myFunc` invites a much simpler solution:
```
void doSomethingCool<T : IImage>(T val);
void myFunc(IImage img)
{
doSomethingCool(img);
}
```
This seems like an appealing thing for a language to support, but there are some subtle reasons why this isn't possible to support in general.
If we think about what `doSomethingCool(img)` is asking for, it seems to be trying to invoke the function `doSomethingCool<IImage>`.
That function only accepts type parameters that implement the `IImage` interface, so we have to ask ourselves:
Does the (existential) type `IImage` implement the `IImage` interface?
Knowing the implementation strategy outline above, we can re-phrase this question to: can we construct a witness table that implements the `IImage` interface for values of type `IImage`?
For simple interfaces this is sometimes possible, but in the general case there are other desirable language features that get in the way:
* When an interface has associated types, there is no type that can be chosen as the associated type for the interface's existential type. The "obvious" approach of using the constraints on the associated type can lead to unsound logic when interface methods take associated types as parameters.
* When an interface uses the "this type" (e.g., an `IComparable` interface with a `compareTo(ThisType other)` method), it isn't correct to simplify the this type to the interface type (just because you have two `IComarable` values doesn't mean you can compare them - they have to be of the same concrete type!)
* If we allow for `static` method on interfaces, then what implementation would we use for these methods on the interface's existential type?
Encoding Existentials in the IR
-------------------------------
Existentials are encoded in the Slang IR quite simply. We have an operation `makeExistential(T, obj, W)` that takes a type `T`, a value `obj` that must have type `T`, and a witness table `W` that shows how `T` conforms to some interface `I`. The result of the `makeExistential` operation is then a value of the type `I`.
Rather than include an IR operation to "open" an existential, we can instead just provide accessors for the pieces of information in an existential: one to extract the type field, one to extract the value, and one to extract the witness table. These would idiomatically be used like:
```
let e : ISomeInterface = /* some existential */
let T : Type = extractExistentialType(e);
let W : WitnessTbale = extractExistentialWitnessTable(e);
let obj : T = extractExistentialValue(e);
```
Note how the operation to extract `obj` gets its result type from the previously-executed extraction of the type.
Simplifying Code Using Existentials
-----------------------------------
It might seem like IR code generated using existentials can only be implemented using dynamic dispatch.
However, within a local scope it is clear that we can simplify expressions whenever `makeExistential` and `extractExistential*` operations are paired.
For example:
```
let e : ISomeInterface = makeExistential(A, a, X);
...
let B = extractExistentialType(e);
let b : B = extractExistentialValue(e);
let Y = extractExistentialWitnessTable(e);
```
It should be clear in context that we can replace `B` with `A`, `b` with `a`, and `Y` with `X`, after which all of the `extract*` operations and the `makeExistential` operation are dead and can be eliminated.
This kind of simplification works within a single function, as long as there is no conditional logic involving existentials.
We require further transformation passes to allow specialization in more general cases:
* Copy propagation, redundancy elimination and other dataflow optimizations are needed to simplify use of existentials within functions
* Type legalization passes, including some amount of scalarization, are needed to "expose" existential-type fields that are otherwise buried in a type
* Function specialization, is needed so that a function with existential parameters is specialized based on the actual types used at call sites
Transformations just like these are already required when working with resource types (textures/samplers) on targets that don't support first-class computation on resources, so it is possible to share some of the same logic.
Similarly, any effort we put into validation (to ensure that code is written in a way that *can* be simplified) can hopefully be shared between existentials and resources.
Compositions
------------
So far I've only talked about existential types based on a single interface, but if you look at the encoding as a tuple `(obj, T, W)` there is no real reason that can't be generalized to hold multiple witness tables: `(obj, T, W0, ... WN)`. Interface compositions could be expressed at the language level using the `&` operator on interface (or existential) types.
The IR encoding doesn't need to change much to support compositions: we just need to allow multiple witness tables on `makeExistential` and have an index operand on `extractExistentialWitnessTable` to get at the right one.
The hardest part of supporting composition of interfaces is actually in how to linearize the set of interfaces in a way that is stable, so that changing a function from using `IA & IB` to `IB & IA` doesn't change the order in which witness tables get packed into an existential value.
Why are we passing along the type?
----------------------------------
I'm glossing over something pretty significant here, which is why anybody would pass around the type as part of the existential value, when none of our examples so far have made use of it.
This sort of thing isn't very important for languages where interface polymorphism is limited to heap-allocated "reference" types (or values that have been "boxed" into reference types), because the dynamic type of an object can almost always be read out of the object itself.
When dealing with a value type, though, we have to deal with things like making *copies*:
```
interface IWritable { [mutating] void write(int val); }
struct Cell : IWritable { int data; void write(int val) { data = val; } }
T copyAndClobber<T : IWritable>(T obj)
{
T copy = obj;
obj.write(9999);
return copy;
}
void test()
{
Cell cell = { 0 };
Cell result = copyAndClobber(cell);
// what is in `result.data`?
}
```
If we call `copyAndClober` on a `Cell` value, then does the line `obj.write` overwrite the data in the explicit `copy` that was made?
It seems clear that a user would expect `copy` to be unaffected in the case where `T` is a value type.
How does that get implemented in our runtime version of things? Let's imagine some C++ translation:
```
void copyAndClobber(Type* T, IWriteableWitnessTable* W, void* obj, void* _returnVal)
{
void* copy = alloca(T->sizeInBytes);
T->copyConstruct(copy, obj);
W->write(obj, 9999);
T->moveConstruct(_returnVal, copy);
}
```
Because this function returns a value of type `T` and we don't know how big that is, let's assume the caller is passing in a pointer to the storage where we should write the result.
Now, in order to have a local `copy` of the `obj` value that was passed in, we need to allocate some scratch storage, and only the type `T` can know how many bytes we need.
Furthermore, when copying `obj` into that storage, or subsequently copying the `copy` variable into the function result, we need the copy/move semantics of type `T` to be provided by somebody.
This is the reason for passing through the type `T` as part of an existential value.
If we only wanted to deal with reference types, this would all be greatly simplified, because the `sizeInBytes` and the copy/move semantics would be fixed: everything is a single pointer.
All of the same issues arise if we're making copies of existential values:
```
IWritable copyAndClobberExistential(IWritable obj)
{
IWritable copy = obj;
obj.write(9999);
return copy;
}
```
If we want to stay consistent and say that `copy` is an actual copy of `obj` when the underlying type is a value rather than a reference type, then we need the copy/move operations for `IWritable` to handle invoking the copy/move operations of the underlying encapsulated type.
Aside: it should be clear from these examples that implementing generics and existential types with dynamic dispatch has a lot of complexity when we have to deal with value types (because copying requires memory allocation).
It is likely that a first implementation of dynamic dispatch support for Slang would restrict it to reference types (and would thus add a `class` keyword for defining reference types).

View File

@@ -0,0 +1,74 @@
Deploying Experimental API Additions
====================================
This page intends to provide guidance to Slang developers when extending the Slang API, particularly when working on experimental features.
It applies to the "COM-lite" Slang API, rather than the deprecated C Slang API (sp* functions).
* Note: This guidance relates to Slang API changes, not to language changes. That is, what Slang does with shader source code across releases is not discussed here.
The goal is to maintain binary compatibility as much as possible between Slang releases, and to aid applications in dealing with changes to Slang.
Slang is distributed as a dynamic library, and there is an expectation from Slang API users that upgrading by installing an updated slang-compiler.dll or slang-compiler.so will not break their application unnecessarily.
ABI compatibility within the Slang API can be preserved between releases if some rules are followed by developers.
Slang API uses a "COM-lite" structure wherein functionality is exposed through interfaces on objects. If the interfaces never change, ABI compatibility is preserved, but changes happen. When adding or changing interfaces, please observe the following:
1. It is preferred to create *new* COM interfaces when adding new functionality.
* This maintains ABI compatibility.
* Applications must acquire access to the new functionality using QueryInterface(), which will gracefully fail if the slang-compiler.dll/libslang-compiler.so does not implement the functionality.
2. Changes to existing virtual methods in COM interfaces should be avoided, as that is an ABI breakage.
* If a change is required though, change the interface's UUID.
3. New virtual methods _may_ be added (only) to the end of existing COM interface structs.
* This does not disturb the ABI compatibility of the associated vtable. Old apps can remain unaware of the new function pointers appended to the end of the vtable.
* A UUID change is not necessary.
* Note that in the event that a Slang application which uses the added feature is run with an old slang-compiler.dll/libslang-compiler.so, the experience for the user is not as clean as if the added method belongs to a new interface.
Adding Experimental Interfaces
==============================
When the above recommendations cannot be followed, as with features that are expected to be iterated on or are regarded as temporary, there are additional recommendations.
Interfaces that are expected to change must be marked `_Experimental` in their class name and in their UUID name.
For example,
```csharp
/* Experimental interface for doing something cool. This interface is susceptible to ABI breakage. */
struct ICoolNewFeature_Experimental : public ISlangUnknown
{
SLANG_COM_INTERFACE(0x8e12e8e3, 0x5fcd, 0x433e, { 0xaf, 0xcb, 0x13, 0xa0, 0x88, 0xbc, 0x5e, 0xe5 })
virtual SLANG_NO_THROW SlangResult SLANG_MCALL coolMethod() = 0;
};
#define SLANG_UUID_ICoolNewFeature_Experimental ICoolNewFeature_Experimental::getTypeGuid()
```
Note: Use uuidgen to generate IIDs new interfaces.
Removing Experimental Interfaces
================================
By the nature of being marked "Experimental", users have been warned that the interfaces are not officially supported and may be removed. You may simply delete the class and UUID, e.g. "ICoolNewFeature_Experimental" struct may be deleted from slang.h along with the definition of SLANG_UUID_ICoolNewFeature_Experimental.
This will show up in applications as QueryInterface failures.
It is nice, but not required, to retain the interface declarations for some time after removing internal support before deleting them from slang.h, so that applications have time to remove their dependence on the unsupported feature while still being able to compile in the interim.
Changing Experimental Interfaces
================================
Backwards incompatible changes to Slang COM interfaces should be accompanied with a UUID change.
In the event that an old application runs with a new slang library, applications are more capable of gracefully handling an unavailable interface than a changed one. The former may be still be functional, or include a helpful error message, whereas the latter is most likely a crash of some sort.
Promoting Experimental Interfaces
=================================
The class name and the UUID name should be changed in slang.h and in the slang source code, e.g. Rename "ICoolNewFeature_Experimental" to just "ICoolFeature".
The SLANG_UUID for the interface should be renamed to omit "EXPERIMENTAL" but its value should remain the same. This is because, if there are no backwards incompatible changes that accompany the promotion from experimental to permanent, applications written against the experimental version can continue working against Slang libraries where the interface was promoted to permanent.

View File

@@ -0,0 +1,486 @@
Interfaces Design
=================
This document intends to lay out the proposed design for a few inter-related features in Slang:
- Interfaces
- Associated Types
- Generics
Introduction
------------
The basic problem here is not unique to shader programming: you want to write code that accomplished one task, while abstracting over how to accomplish another task.
As an example, we might want to write code to integrate incident radiance over a list of lights, while not concerning ourself with how to evaluate a reflectance function at each of those lights.
If we were doing this task on a CPU, and performance wasn't critical, we could probably handle this with higher-order functions or an equivalent mechanism like function pointers:
float4 integrateLighting(
Light[] lights,
float4 (*brdf)(float3 wi, float3 wi, void* userData),
void const* brdfUserData)
{
float4 result = 0;
for(/* ... */) {
// ...
result += brdf(wi, wo, brdfUserDat);
}
return result;
}
Depending on the scenario, we might be able to generate statically specialized code by using templates instead:
template<typename BRDF>
float4 integrateLighting(Light[] lights, BRDF const& brdf)
{
// ...
result += brdf(wi, wo);
// ...
}
Current shading languages support neither higher-order functions nor templates/generics, so neither of these options is viable.
Instead practitioners typically use preprocessor techniques to either stich together the final code, or to substitute in different function/type definitions to make a definition like `integrateLighting` reusable.
These ad hoc approaches actually work well in practice; we aren't proposing to replace them *just* to make code abstractly "cleaner."
Rather, we've found that the ad hoc approaches end up interacting poorly with the resource binding model in modern APIs, so that *something* less ad hoc is required to achieve our performance goals.
At that point, we might as well ensure that the mechanism we introduce is also a good fit for the problem.
Overview
--------
The basic idea for our approach is as follows:
- Start with the general *semantics* of a generic-based ("template") approach
- Use the accumulated experience of the programming language community to ensure that our generics are humane (in other words: not like C++)
- Expore the possibility of syntax sugar to let people use more traditional OOP-style syntax when it can reduce verbosity without harming understanding
In general, our conceptual model is being ripped off wholesale from Rust and Swift.
The basic design principle is "when in doubt, do what Swift does."
Interfaces
----------
An **interface** in Slang is akin to a `protocol` in Swift or a `trait` in Rust.
The choice of the `interface` keyword is to highlight the overlap with the conceptually similar construct that appeared in Cg, and then later in HLSL.
### Declaring an interface
An interface is a named collection of **requirements**; any type that **implements** the interface must provide definitions that satisfy those requirements.
Here is a simple interface, with one requirement:
interface Light
{
float3 illuminate(float3 P_world);
}
The `Light` interface requires a (member) function called `illuminate` with the given signature.
### Declaring that a type implementats an interface
A user-defined `struct` type can declare that it implements an interface, by using conventional "inheritance" syntax:
struct PointLight : Light
{
float3 P_light;
float3 illuminate(float3 P_world)
{
float distance = length(P_light - P_world);
// ...
}
}
It is a static error if a type declares that it implements an interface, but it does not provide all of the requirements:
struct BadLight : Light
{
// ERROR: type 'BadLight' cannot implement 'Light'
// because it does not provide the required 'illuminate' function
}
### Interface Inheritance
While this document does not propose general notions of inheritance be added to Slang, it does make sense to allow an interface to inherit from zero or more other interfaces:
interface InfinitessimalLight : Light
{
float3 getDirection(float3 P_world);
}
In this case the `InfinitessimalLight` interface inherits from `Light`, and declares one new requirement.
In order to check that a type implements `InfinitessimalLight`, the compiler will need to check both that it implements `Light` and that it provides the new "direct" requirements in `InfinitessimalLight`.
Declaring that a type implements an interface also implicitly declares that it implements all the interfaces that interface transitively inherits from:
struct DirectionalLight : InfinitessimalLight
{
float3 L;
float3 dir;
float3 getDirection(float3 P_world) { return dir; }
float3 illuminate(float3 P_world)
{
// Okay, this is the point where I recognize
// that this function definition is not
// actually reasonable for a light...
}
### Interfaces and Extensions
It probably needs its own design document, but Slang currently has very basic support for `extension` declarations that can add members to an existing type.
These blocks correspond to `extension` blocks in Swift, or `impl` blocks in Rust.
This can be used to declare that a type implements an interface retroactively:
extension PointLight : InfinitessimalLight
{
float3 getDirection(float3 P_world)
{
return normalize(P_light - P_world);
}
}
In this case we've used an extension to declare the `PointLight` also implements `InfinitessimalLight`. For the extension to type-check we need to provide the new required function (the compiler must recognize that the implementation of `Light` was already provided by the original type definition).
There are some subtleties around using extensions to add interface implementations:
- If the type already provides a method that matches a requireemnt, can the extension "see" it to satisfying new requirements?
- When can one extension "see" members (or interface implementations) added by another?
A first implementation can probably ignore the issue of interface implementations added by extensions, and only support them directly on type definitions.
Generics
--------
All of the above discussion around interfaces neglected to show how to actually *use* the fact that, e.g., `PointLight` implements the `Light` interface.
That is intentional, because at the most basic level, interfaces are designed to be used in the context of **generics**.
### Generic Declarations
The Slang compiler currently has some ad hoc support for generic declarations that it uses to implement the HLSL standard module (which has a few generic types).
The syntax for those is currently very bad, and it makes sense to converge on the style for generic declarations used by C# and Swift:
float myGenericFunc<T>(T someValue);
Types can also be generic:
struct MyStruct<T> { float a; T b; }
Ideally we should also allow interfaces and interface requirements to be generic, but there will probably be some limits due to implementation complexity.
### Type Constraints
Unlike C++, Slang needs to be able to type-check the body of a generic function ahead of time, so it can't rely on `T` having particular members:
// This generic is okay, because it doesn't assume anything about `T`
// (other than the fact that it can be passed as input/output)
T okayGeneric<T>(T a) { return a; }
// This generic is not okay, because it assumes that `T` supports
// certain operators, and we have no way of knowing it this is true:
T notOkayGeneric<T>(T a) { return a + a; }
In order to rely on non-trivial operations in a generic parameter type like `T`, the user must **constrain** the type parameter using an interface:
float3 mySurfaceShader<L : Light>(L aLight)
{
return aLight.illuminate(...);
}
In this example, we have constrained the type parameter `L` so that it must implement the interface `Light`.
As a result, in the body of the function, the compiler can recognize that `aLight`, which is of type `L`, must implement `Light` and thus have a member `illuminate`.
When calling a function with a constrained type parameter, the compiler must check that the actual type argument (whether provided explicitly or inferred) implements the interface given in the constraint:
mySurfaceShader<PointLight>(myPointLight); // OK
mySurfaceShader(myPointLight); // equivalent to previous
mySurfaceShader(3.0f); // ERROR: `float` does not implement `Light`
Note that in the erroneous case, the error is reported at the call site, rather than in the body of the callee (as it would be for C++ templates).
For cases where we must constrain a type parameter to implement multiple interfaces, we can join the interface types with `&`:
interface Foo { void foo(); }
interface Bar { void bar(); }
void myFunc<T : Foo & Bar>(T val)
{
val.foo();
val.bar();
}
If we end up with very complicated type constraints, then it makes sense to support a "`where` clause" that allows requirements to be stated outside of the generic parameter list:
void myFunc<T>(T val)
where T : Foo,
T : Bar
{}
Bot the use of `&` and `where` are advanced features that we might cut due to implementation complexity.
### Value Parameters
Because HLSL has generics like `vector<float,3>` that already take non-type parameters, the language will need *some* degree of support for generic parameters that aren't types (at least integers need to be supported).
We need syntax for this that doesn't bloat the common case.
In this case, I think that what I've used in the current Slang implementation is reasonable, where a value parameter needs a `let` prefix:
void someFunc<
T, // type parameter
T : X, // type parameter with constraint
T = Y, // type parameter with default
T : X = Y, // type parameter with constraint and default
let N : int, // value parameter (type must be explicit)
let N : int = 3> // value parameter with default
()
{ ... }
We should also extend the `where` clauses to support inequality constraints on (integer) value parameters to enforce rules about what ranges of integers are valid.
The front-end should issue error messages if it can statically determine these constraints are violated, but it should probably defer full checking until the IR (maybe... we need to think about how much of a dependent type system we are willing to have).
Associated Types
----------------
While the syntax is a bit different, the above mechanisms have approximately the same capabilities as Cg interfaces.
What the above approach can't handle (and neither can Cg) is a reusable definition of a surface material "pattern" that might blend multiple material layers to derive parameters for a specific BRDF.
That is, suppose we have two BRDFs: one with two parameters, and one with six.
Different surface patterns may want to target different BRDFs.
So if we write a `Material` interface like:
interface Material
{
BRDFParams evaluatePattern(float2 uv);
}
Then what should `BRDFParams` be? The two-parameter or six-parameter case?
An **associated type** is a concept that solves exactly this problem.
We don't care *what* the concrete type of `BRDFParams` is, so long as *every* implementation of `Material` has one.
The exact `BRDFParams` type can be different for each implementation of `Material`; the type is *associated* with a particular implementation.
We will crib our syntax for this entirely from Swift, where it is verbose but explicit:
interface Material
{
associatedtype BRDFParams;
BRDFParams evaluatePattern(float2 uv);
float3 evaluateBRDF(BRDFParams param, float3 wi, float3 wo);
}
In this example we've added an associated type requirement so that every implementation of `Material` must supply a type named `BRDFParams` as a member.
We've also added a requirement that is a function to evaluate the BRDF given its parameters and incoming/outgoing directions.
Using this declaration one can now define a generic function that works on any material:
float3 evaluateSurface<M : Material, L : Light>(
M material,
L[] lights,
float3 P_world,
float2 uv)
{
M.BRDFParams brdfParams = material.evaluatePattern(uv);
for(...)
{
L light = lights[i];
// ...
float3 reflectance = material.evaluateBRDF(brdfParams, ...);
}
}
Some quick notes:
- The use of `associatedtype` (for associated types) and `typealias` (for `typedef`-like definitions) as distinct keywords in Swift was well motivated by their experience (they used to use `typealias` for both). I would avoid having the two cases be syntactically identical.
- Swift has a pretty involved inference system where a type doesn't actually need to explicitly provide a type member with the chosen name. Instead, if you have a required method that takes or returns the associated type, then the compiler can infer what the type is by looking at the signature of the methods that meet other requirements. This is a complex and magical feature, and we shouldn't try to duplicate it.
- Both Rust and Swift call this an "associated type." They are related to "virtual types" in things like Scala (which are in turn related to virtual classes in beta/gbeta). There are similar ideas that arise in Haskell-like languages with type classes (IIRC, the term "functional dependencies" is relevant).
### Alternatives
I want to point out a few alternatives to the `Material` design above, just to show that associated types seem to be an elegant solution compared to the alternatives.
First, note that we could break `Material` into two interfaces, so long as we are allowed to place type constraints on associated types:
interface BRDF
{
float3 evaluate(float3 wi, float3 wo);
}
interface Material
{
associatedtype B : BRDF;
B evaluatePattern(float2 uv);
}
This refactoring might be cleaner if we imagine that a shader library would have family of reflectance functions (implementing `BRDF`) and then a large library of material patterns (implementing `Material`) - we wouldn't want each and every material to have to implement a dummy `evaluateBRDF` that just forwards to a BRDF instance nested in it.
Looking at that type `B` there, we might start to wonder if we could just replace this with a generic type parameter on the interface:
interface Material< B : BRDF >
{
B evaluatePattern(float2 uv);
}
This would change any type that implements `Material`:
// old:
struct MyMaterial : Material
{
typealias B = GGX;
GGX evaluatePattern(...) { ... }
}
// new:
struct MyMaterial : Material<GGX>
{
GGX evaluatePattern(...) { ... }
}
That doesn't seem so bad, but it ignores the complexity that arises at any use sites, e.g.:
float3 evaluateSurface<B : BRDF, M : Material<B>, L : Light>(
M material,
L[] lights,
float3 P_world,
float2 uv)
{ ... }
The type `B` which is logically an implementation detail of `M` now surfaces to the generic parameter list of any function that wants to traffic in materials.
This reduces the signal/noise ratio for anybody reading the code, and also means that any top-level code that is supposed to be specializing this function (suppose this was a fragment entry point) now needs to understand how to pick apart the `Material` it has on the host side to get the right type parameters.
This kind of issue has existed in the PL community at least as far back as the ML module system (it is tough to name search, but the concepts of "parameterization" vs. "fibration" is relevant here), and the Scala researchers made a clear argument (I think it was in the paper on "un-types") that there is a categorical distinction between the types that are logicall the *inputs* to an abstraction, and the types that are logically the *outputs*. Generic type parameters and associated types handle these two distinct roles.
Returning an Interface
----------------------
The revised `Material` definition:
interface BRDF
{
float3 evaluate(float3 wi, float3 wo);
}
interface Material
{
associatedtype B : BRDF;
B evaluatePattern(float2 uv);
}
has a function `evaluatePattern` that returns a type that implements an interface.
In the case where the return type is concrete, this isn't a problem (and the nature of associated types means that `B` will be concrete in any actual concrete implementation of `Material`).
There is an open question of whether it is ever necessary (or even helpful) to have a function that returns a value of *some* type known to implement an interface, without having to state that type in the function signature.
This is a point that has [come up](https://github.com/rust-lang/rfcs/blob/master/text/1951-expand-impl-trait.md) in the Rust world, where they have discussed using a keyword like `some` to indicate the existential nature of the result type:
// A function that returns *some* implementation of `Light`
func foo<T>() -> some Light;
The Rust proposal linked above has them trying to work toward `impl` as the keyword, and allowing it in both argument and result positions (to cover both universal and existential quantification).
In general, such a feature would need to have many constraints:
- The concrete return type must be fixed (even if clients of the function should be insulated from the choice), given the actual generic arguments provided.
- If the existential is really going to be sealed, then the caller shouldn't be allowed to assume anything *except* that two calls to the same function with identical generic arguments should yield results of identical type.
Under those constraints, it is pretty easy to see that an existential-returning method like:
interface Foo<T>
{
func foo<U>() -> some Bar;
}
can in principle be desugared into:
interface Foo<T>
{
associatedtype B<U> : Bar;
func foo<U>() -> B<U>;
}
with particular loss in what can be expressed.
The same desugaring approach should apply to global-scope functions that want to return an existential type (just with a global `typealias` instead of an `associatedtype`).
It might be inconvenient for the user to have to explicitly write the type-level expression that yields the result type (consider cases where C++ template metaprogrammers would use `auto` as a result type), but there is really no added power.
Object-Oriented Sugar
---------------------
Having to explicitly write out generic parameter lists is tedious, especially in the (common) case where we will have exactly one parameter corresponding to each generic type parameter:
// Why am I repeating myself?!
//
void foo<L : Light, M : Material, C : Camera)(
L light, M material, C camera);
The intent seems to be clear if we instead write:
void foo(Light light, Material material, Camera camera);
We could consider the latter to be sugar for the former, and allow users to write in familiar syntax akin to what ws already supported in Cg.
We'd have to be careful with such sugar, though, because there is a real and meaningful difference between saying:
- "`material` has type `Material` which is an interface type"
- "`material` has type `M` where `M` implements `Material`"
In particular, if we start to work with associated types:
let b = material.evaluatePattern(...);
It makes sense to say that `b` has type `M.BRDF`.
It does **not** make sense to say that `b` has type `Material.BRDF`, because there is no such concrete type.
(A third option is to say that `b` has type `material.BRDF`, which is basically the point where you have "virtual types" because we are now saying the type is a member of the *instance* and not of an enclosing *type*)
Note that the issue of having or not having object-oriented sugar is technically orthogonal from whether we allow "existential return types."
However, allowing the user to think of interfaces in traidtional OOP terms leads to it being more likely that they will try to declare:
- functions that return an interface type
- local variables of interface type (which they might even assign to!)
- fields of interface type in their `struct`s
All of these complicate the desugaring step, because we would de facto have types/functions that mix up two stages of evaluation: a compile-time type-level step and a run-time value-level step.
Ultimately, we'd probably need to express these by having a multi-stage IR (with two stages) which we optimize in the staged setting before stage-splitting to get separate type-level and value-level operations (akin to the desugaring for existential return types I described above).
My sense is that a certain amount of multi-stage programming may already be needed to deal with certain HLSL/GLSL idioms. In particular:
- GLSL supports passing unsigned arrays (e.g., `int[] a`) to a function, and then having the function use the size of the array (`a.length`) to do loops, etc. These would need to be lowered to distinct SPIR-V code for every array size used (if I understand the restrictions correctly), and so the feature is perhaps best thought of as passing both a compile-time integer parameter and a run-time array parameter (where the size comes from that parameter)
- HLSL and GLSL both have built-in functions where certain parameters are required to be compile-time constants. A feature-complete front-end must detect when calls to these functions are valid, and report errors to the user. In order to make the errors easier to explain to the user, it would be helpful to have an explicit notion of constant-rate computation, and require that the user express explicit constant-rate parameters/expressions.
All of this ties into the question of whether we need/want to support more general kinds of compile-time evaluation for specialization (e.g., statically-determine `if` statements or loops).
Other Languages
---------------
It is worth double-checking whether implementing all of this from scratch in Slang is a good idea, or if there is somewhere else we can achieve similar results more quickly:
- The Metal shading language has much of what we'd want. It is based on C++ templates, which are maybe not the ideal mechanism, and the compiler is closed-source so we can't easily add functionality. Still, it should be possible to prototype a lot of what we want on top of Metal 2.
- The open-source HLSL compiler doesn't support any of the new ideas here, but it may be that adding them to `dxc` would be faster than adding them to the Slang project code. Using `dxc` is a no-go for some of the other Slang requirements (that come from our users on the Falcor project).
- Swift already supports almost every thing on our list of requirements, but as it stands today there is no easy path to using it for low-level GPU code generation. It also fails to meet our goals for incremental adoption, high-level source output, etc.
In the long run, however, the Swift compiler seems like an attractive intercept for this work, because their long-term roadmap seems like it will close a lot of the gap with what we've done so far.
Conclusion
----------
This document has described the basic syntax and semantics for three related features -- interfaces, generics, and associated types -- along with some commentary on longer-term directions.
My expectation is that we will use the syntax as laid down here, unless we have a very good reason to depart from it, and we will prioritize implementation work as needed to get interesting shader library functionality up and running.

View File

@@ -0,0 +1,306 @@
# Slang IR Instruction Management and Versioning
This document explains how Slang's intermediate representation (IR) instructions are defined, generated, and versioned. It covers the workflow for adding or modifying instructions and the mechanisms that ensure backwards compatibility for serialized IR modules.
## High-Level Concepts
The Slang IR uses a code generation approach where instruction definitions are centralized in a Lua file (`slang-ir-insts.lua`), and various C++ headers and source files are generated from this single source of truth. This ensures consistency across the codebase and enables sophisticated features like backwards compatibility through stable instruction naming.
### Key Components
- **Instruction Definitions** (`slang-ir-insts.lua`): The canonical source for all IR instruction definitions
- **Stable Names** (`slang-ir-insts-stable-names.lua`): Maps instruction names to permanent integer IDs for backwards compatibility
- **Code Generation** (via Fiddle): Generates C++ enums, structs, and tables from the Lua definitions
- **Module Versioning**: Tracks compatibility ranges for serialized IR modules
## The Instruction Definition System
### Source of Truth: `slang-ir-insts.lua`
All IR instructions are defined in `source/slang/slang-ir-insts.lua`. This file contains a hierarchical table structure that defines:
- Instruction names and their organization into categories
- Struct names for the C++ representation (if different from the default)
- Flags like `hoistable`, `parent`, `global`, etc.
- (Optionally) Minimum operand counts
- (Optionally) The operands themselves
- Parent-child relationships in the instruction hierarchy
Here's a simplified example of how instructions are defined:
```lua
local insts = {
{ nop = {} },
{
Type = {
{
BasicType = {
hoistable = true,
{ Void = { struct_name = "VoidType" } },
{ Bool = { struct_name = "BoolType" } },
{ Int = { struct_name = "IntType" } },
-- ... more basic types
},
},
-- ... more type categories
},
},
-- ... more instruction categories
}
```
The hierarchy is important: instructions inherit properties from their parent categories. For example, all `BasicType` instructions inherit the `hoistable = true` flag.
### Code Generation Flow
The Fiddle tool processes `slang-ir-insts.lua` and generates several outputs:
1. **Enum Definitions** (`slang-ir-insts-enum.h`):
- `IROp` enum with values like `kIROp_Void`, `kIROp_Bool`, etc.
- Range markers like `kIROp_FirstBasicType` and `kIROp_LastBasicType`
2. **Struct Definitions** (`slang-ir-insts.h`):
- C++ struct definitions for instruction types not manually defined
- `leafInst()` and `baseInst()` macros for RTTI support
- If operands of an IR are specified in `slang-ir-insts.lua` in the format `{ { "operand1_name", "operand1_type" }, {"operand2_name"} }` and so on,
Fiddle will generate getters for each of the operands as part of the IR's struct. Note that the order in which the operands are listed matters and
specification of the type of the operand is optional; defaulting to "IRInst" when the type is not specified.
3. **Instruction Info Table** (`slang-ir-insts-info.cpp`):
- Maps opcodes to their string names, operand counts, and flags
- Used for debugging, printing, and validation
4. **Stable Name Mappings** (`slang-ir-insts-stable-names.cpp`):
- Bidirectional mapping between opcodes and stable IDs
- Critical for backwards compatibility
## Adding or Modifying Instructions
### Adding a New Instruction
To add a new IR instruction:
1. **Edit `slang-ir-insts.lua`**: Add your instruction in the appropriate category:
```lua
{ MyNewInst = { min_operands = 2 } },
```
2. **Run the build**: The build system will automatically regenerate the C++ files.
3. **Update the stable names**: Either
- Run the validation script:
**Note**: Skip make command if lua is already built.
```bash
make -C external/lua MYCFLAGS="-DLUA_USE_POSIX" MYLIBS=""
./external/lua/lua extras/check-ir-stable-names.lua update
```
- Or add a new ID to the mapping in `source/slang/slang-ir-insts-stable-names.lua`, this is checked for consistency in CI so it's safe to add manually.
This assigns a permanent ID to your new instruction.
4. **Implement the instruction logic**: Add handling in relevant files like:
- `slang-ir-insts.h` (if you need a custom struct definition)
- `slang-emit-*.cpp` files for code generation
- `slang-ir-lower-*.cpp` files for transformations
5. **Update the module version**: In `slang-ir.h`, increment `k_maxSupportedModuleVersion`:
```cpp
const static UInt k_maxSupportedModuleVersion = 1; // was 0
```
### Modifying an Existing Instruction
Modifications require more care:
- **Adding operands or changing semantics**: This is a breaking change. You must:
1. Increment both `k_minSupportedModuleVersion` and `k_maxSupportedModuleVersion`
2. Document the change in the version history
- **Renaming**: Don't rename instructions directly. Instead:
1. Add the new instruction
2. Mark the old one as deprecated
3. Eventually remove it in a major version bump
## The Stable Name System
### Purpose
When Slang serializes IR modules, it needs to handle the case where the compiler version that reads a module is different from the one that wrote it. Instructions might have been added, removed, or reordered in the `IROp` enum.
The stable name system solves this by assigning permanent integer IDs to each instruction. These IDs never change once assigned.
### How It Works
1. **Assignment**: When a new instruction is added, the `check-ir-stable-names.lua` script assigns it the next available ID.
2. **Serialization**: When writing a module, opcodes are converted to stable IDs:
```cpp
auto stableName = getOpcodeStableName(value);
```
3. **Deserialization**: When reading, stable IDs are converted back:
```cpp
value = getStableNameOpcode(stableName);
```
4. **Validation**: The CI system ensures the stable name table stays synchronized with the instruction definitions.
### Maintenance
The stable name table is validated in CI:
```bash
./extras/check-ir-stable-names-gh-actions.sh
```
This script:
- Verifies all instructions have stable names
- Checks for duplicate IDs
- Ensures the mapping is bijective
- Can automatically fix missing entries
## Module Versioning
### Version Types
Slang tracks two version numbers:
1. **Module Version** (`IRModule::m_version`): The semantic version of the IR instruction set
- Range: `k_minSupportedModuleVersion` to `k_maxSupportedModuleVersion`
- Stored in each serialized module
2. **Serialization Version** (`IRModuleInfo::serializationVersion`): The format version
- Allows changes to how data is encoded
### When to Update Versions
**Minor Version Bump** (increment `k_maxSupportedModuleVersion` only):
- Adding new instructions
- Adding new instruction flags that don't affect existing code
- Adding new optional operands
**Major Version Bump** (increment both min and max):
- Removing instructions
- Changing instruction semantics
- Modifying minimum operand counts or types
- Any change that breaks compatibility
### Version Checking
During deserialization:
```cpp
if (fossilizedModuleInfo->serializationVersion != IRModuleInfo::kSupportedSerializationVersion)
return SLANG_FAIL;
// Later, after loading instructions:
if (hasUnrecognizedInsts)
return SLANG_FAIL;
```
## Serialization Details
### The Flat Representation
For efficiency, IR modules are serialized as a "flat" representation:
```cpp
struct FlatInstTable
{
List<InstAllocInfo> instAllocInfo; // Op + operand count
List<Int64> childCounts; // Children per instruction
List<SourceLoc> sourceLocs; // Source locations
List<Int64> operandIndices; // Flattened operand references
List<Int64> stringLengths; // For string/blob constants
List<uint8_t> stringChars; // Concatenated string data
List<UInt64> literals; // Integer/float constant values
};
```
This representation:
- Minimizes pointer chasing during deserialization
- Groups similar data together for better cache performance
- Enables efficient bulk operations
### Traversal Order
Instructions are serialized in a specific order for performance:
```cpp
traverseInstsInSerializationOrder(moduleInst, [&](IRInst* inst) {
// Process instruction
});
```
The traversal:
1. Visits instructions in preorder (parent before children)
2. Optionally reorders module-level instructions to group constants together
3. Maintains deterministic ordering for reproducible builds
## Debugging and Validation
### Available Tools
1. **Module Info Inspection**:
```bash
slangc -get-module-info module.slang-module
```
Shows module name, version, and compiler version.
2. **Version Query**:
```bash
slangc -get-supported-module-versions
```
Reports the supported version range.
3. **IR Dumping**:
```bash
slangc -dump-ir module.slang
```
Shows the IR in human-readable form.
### Common Issues
**"Unrecognized instruction" errors**: The module contains instructions unknown to this compiler version. Update Slang or recompile the module.
**Stable name validation failures**: Run the update script and commit the changes:
**Note**: Skip make command if lua is already built.
```bash
make -C external/lua MYCFLAGS="-DLUA_USE_POSIX" MYLIBS=""
./external/lua/lua extras/check-ir-stable-names.lua update
```
**Version mismatch**: The module was compiled with an incompatible Slang version. Check the version ranges and recompile if necessary.
## Best Practices
1. **Always update stable names**: After adding instructions, run the validation script before committing.
2. **Document version changes**: When bumping module versions, add a comment explaining what changed.
3. **Prefer addition over modification**: When possible, add new instructions rather than changing existing ones.
4. **Group related changes**: If making multiple breaking changes, do them together in a single version bump.

View File

@@ -0,0 +1,275 @@
The Design of Slang's Intermediate Representation (IR)
======================================================
This document details some of the important design choices for Slang's IR.
Goals and Non-Goals
-------------------
The IR needs to balance many goals which can sometimes come into conflict.
We will start by enumerating these goals (and related non-goals) explicitly so that we can better motivate specific design choices.
* Obviously it must be simple to lower any source code in Slang code to the IR. It is however a non-goal for the lowering process to be lossless; we do not need to recover source-level program structure from the IR.
* The IR must be amenable to standard dataflow analyses and optimizations. It should be possible to read a paper on a compiler algorithm or technique and apply it to our IR in a straightforward manner, and with the expected asymptotic efficiency.
* As a particular case of analysis and optimization, it should be possible to validate flow-dependent properties of an input function/program (e.g., whether an `[unroll]` loop is actually unrollable) using the IR, and emit meaningful error messages that reference the AST-level names/locations of constructs involved in an error.
* It should be possible to compile modules to the IR separately and then "link" them in a way that depends only on IR-level (not AST-level) constructs. We want to allow changing implementation details of a module without forcing a re-compile of IR code using that module (what counts as "implementation details") is negotiable.
* There should be a way to serialize IR modules in a round-trip fashion preserving all of the structure. As a long-term goal, the serialized format should provide stability across compiler versions (working more as an IL than an IR)
* The IR must be able to encode "generic" (type-parameterized) constructs explicitly, and to express transformations from generic to specialized (or dynamic-dispatch) code in the IR. In particular, it must be possible for a module to make use of generic defined in another (separately-compiled) module, with validation performed before linking, and specialization performed after.
* The IR must be able to express code that is close to the level of abstraction of shader intermediate languages (ILs) like SPIR-V and DXIL, so that we can minimize the amount of work required (and the number of issues that can arise) when translating the IR to these targets. This can involve lowering and legalization passes to match the constraints of those ILs, but it should not require too much work to be done outside of the IR.
* It should be possible to translate code in the IR back into high-level-language code, including things like structured control-flow constructs.
* Whenever possible, invariants required by the IR should be built into its structure so that they are easier to maintain.
* We should strive to make the IR encoding, both in memory and when serialized, as compact as is practically possible.
Inspirations
------------
The IR design we currently use takes inspiration from three main sources:
* The LLVM project provides the basic inspiration for the approach to SSA, such as using a typed IR, the decision to use the same object to represent an instruction and the SSA value it produces, and the push to have an extremely simple `replaceAllUsesWith` primitive. It is easy to forget that it is possible to design a compiler with different design decisions; the LLVM ones just happen to both be well-motivated and well-known.
* The Swift IL (SIL) provides the inspiration for our approach for encoding SSA "phi nodes" (blocks with arguments), and also informs some of how we have approached encoding generics and related features like existential types.
* The SPIR-V IL provides the inspiration for the choice to uniformly represent types as instructions, for how to encode "join points" for structured control flow, and for the concept of "decorations" for encoding additional metadata on instructions.
Key Design Decisions
--------------------
### Everything is an Instruction
The Slang IR strives for an extremely high degree of uniformity, so almost every concept in the IR is ultimately just an instruction:
* Ordinary add/sub/mul/etc. operations are instructions, as are function calls, branches, function parameters, etc.
* Basic blocks in functions, as well as functions themselves are "parent instructions" that can have other instructions as children
* Constant values (e.g., even `true` and `false`) are instructions
* Types are instructions too, and can have operands (e.g., a vector type is the `VectorType` instruction applied to operands for the element type and count)
* Generics are encoded entirely using ordinary instructions: a generic is encoded like a function that just happens to do computation at the type level
* It isn't true right now, but eventually decorations will also be instructions, so that they can have operands like any other instruction
* An overall IR module is itself an instruction so that there is a single tree that owns everything
This uniformity greatly simplifies the task of supporting generics, and also means that operations that need to work over all instructions, such as cloning and serialization, can work with a single uniform representation and avoid special-casing particular opcodes.
The decision to use an extremely uniform design, even going as far to treat types as "ordinary" instructions, is similar to SPIR-V, although we do not enforce many of the constraints SPIR-V does on how type and value instructions can be mixed.
### Instructions Have a Uniform Structure
Every instruction has:
* An opcode
* A type (the top-level module is the only place where this can be null)
* Zero or more operands
* Zero or more decorations
* Zero or more children
Instructions are not allowed to have any semantically-relevant information that is not in the above list.
The only exception to this rule is instructions that represent literal constants, which store additional data to represent their value.
The in-memory encoding places a few more restrictions on top of this so that, e.g., currently an instruction can either have operands of children, but not both.
Because everything that could be used as an operand is also an instruction, the operands of an instruction are stored in a highly uniform way as a contiguous array of `IRUse` values (even the type is contiguous with this array, so that it can be treated as an additional operand when required).
The `IRUse` type maintains explicit links for use-def information, currently in a slightly bloated fashion (there are well-known techniques for reducing the size of this information).
### A Class Hierarchy Mirrored in Opcodes
There is a logical "class hierarchy" for instructions, and we support (but do not mandate) declaring a C++ `struct` type to expose an instruction or group of instructions.
These `struct` types can be helpful to encode the fact that the program knows an instruction must/should have a particular type (e.g., having a function parameter of type `IRFunction*` prevents users from accidentally passing in an arbitrary `IRInst*` without checking that it is a function first), and can also provide convenience accessors for operands/children.
Do make "dynamic cast" operations on this class hierarchy efficient, we arrange for the instruction opcodes for the in-memory IR to guarantee that all the descendents of a particular "base class" will occupy a contiguous range of opcodes. Checking that an instruction is in that range is then a constant-time operation that only looks at its opcode field.
There are some subtleties to how the opcodes are ordered to deal with the fact that some opcodes have a kind of "multiple inheritance" thing going on, but that is a design wart that we should probably remove over time, rather than something we are proud of.
### A Simpler Encoding of SSA
The traditional encoding of SSA form involves placing "phi" instructions at the start of blocks that represent control-flow join points where a variable will take on different values depending on the incoming edge that is taken.
There are of course benefits to sticking with tradition, but phi instructions also have a few downsides:
- The operands to phi instructions are the one case where the "def dominates use" constraint of SSA appears to be violated. I say "appears" because officially the action of a phi occurs on the incoming edge (not in the target block) and that edge will of course be dominated by the predecessor block. It still creates a special case that programmers need to be careful about. This also complicates serialization in that there is no order in which the blocks/instructions of a function can be emitted that guarantees that every instruction always precedes all of its uses in the stream.
- All of the phi instructions at the start of the block must effectively operate in parallel, so that they all "read" from the correct operand before "writing" to the target variable. Like the above special case, this is only a problem for a phi related to a loop back-edge. It is of course possible to always remember the special interpretation of phi instructions (that they don't actually execute sequentially like every other instruction in a block), but its another special case.
- The order of operands to a phi instruction needs to be related back to the predecessor blocks, so that one can determine which value is to be used for which incoming edge. Any transformation that modifies the CFG of a function needs to be careful to rewrite phi instructions to match the order in which predecessors are listed, or else the compiler must maintain a side data structure that remembers the mapping (and update it instead).
- Directly interpreting/executing code in an SSA IR with phi instructions is made more difficult because when branching to a block we need to immediately execute any phi instructions based on the block from which we just came. The above issues around phis needing to be executed in parallel, and needing to track how phi operands relate to predecessor blocks also add complexity to an interpreter.
Slang ditches traditional phi functions in favor of an alternative that matches the Swift IL (SIL).
The idea doesn't really start in Swift, but rather in the existing observation that SSA form IR and a continuation-passing style (CPS) IR are semantically equivalent; one can encode SSA blocks as continuation functions, where the arguments of the continuation stand in for the phi instructions, and a branch to the block becomes just a call.
Like Swift, we do not use an explicit CPS representation, but instead find a middle ground of a traditional SSA IR where instead of phi instructions basic blocks have parameters.
The first N instructions in a Slang basic block are its parameters, each of which is an `IRParam` instruction.
A block that would have had N phi instructions now has N parameters, but the parameters do not have operands.
Instead, a branch instruction that targets that block will have N *arguments* to match the parameters, representing the values to be assigned to the parameters when this control-flow edge is taken.
This encoding is equivalent in what it represents to traditional phi instructions, but nicely solves the problems outlined above:
- The phi operands in the successor block are now arguments in the *predecessor* block, so that the "def dominates use" property can be enforced without any special cases.
- The "assignment" of the argument values to parameters is now encoded with a single instruction, so that the simultaneity of all the assignments is more clear. We still need to be careful when leaving SSA form to obey those semantics, but there are no tricky issues when looking at the IR itself.
- There is no special work required to track which phi operands come from which predecessor block, since the operands are attached to the terminator instruction of the predecessor block itself. There is no need to update phi instructions after a CFG change that might affect the predecessor list of a block. The trade-off is that any change in the *number* of parameters of a block now requires changes to the terminator of each predecessor, but that is a less common change (isolated to passes that can introduce or eliminate block parameters/phis).
- It it much more clear how to give an operational semantics to a "branch with arguments" instead of phi instructions: compute the target block, copy the arguments to temporary storage (because of the simultaneity requirement), and then copy the temporaries over the parameters of the target block.
The main caveat of this representation is that it requires branch instructions to have room for arguments to the target block. For an ordinary unconditional branch this is pretty easy: we just put a variable number of arguments after the operand for the target block. For branch instructions like a two-way conditional, we might need to encode two argument lists - one for each target block - and an N-way `switch` branch only gets more complicated.
The Slang IR avoids the problem of needing to store arguments on every branch instruction by banning *critical edges* in IR functions that are using SSA phis/parameters. A critical edge is any edge from a block with multiple successors (meaning it ends in a conditional branch) to one with multiple predecessors (meaning it is a "join point" in the CFG).
Phi instructions/parameters are only ever needed at join points, and so block arguments are only needed on branches to a join point.
By ruling out conditional branches that target join points, we avoid the need to encode arguments on conditional branch instructions.
This constraint could be lifted at some point, but it is important to note that there are no programs that cannot be represented as a CFG without critical edges.
### A Simple Encoding of the CFG
A traditional SSA IR represents a function as a bunch of basic blocks of instructions, where each block ends in a *terminator* instruction.
Terminators are instructions that can branch to another block, and are only allowed at the end of a block.
The potential targets of a terminator determine the *successors* of the block where it appears, and contribute to the *predecessors* of any target block.
The successor-to-predecessor edges form a graph over the basic blocks called the control-flow graph (CFG).
A simple representation of a function would store the CFG explicitly as a graph data structure, but in that case the data structure would need to be updated whenever a change is made to the terminator instruction of a branch in a way that might change the successor/predecessor relationship.
The Slang IR avoids this maintenance problem by noting an important property.
If block `P`, with terminator `t`, is a predecessor of `S`, then `t` must have an operand that references `S`.
In turn, that means that the list of uses of `S` must include `t`.
We can thus scan through the list of predecessors or successors of a block with a reasonably simple algorithm:
* To find the successors of `P`, find its terminator `t`, identify the operands of `t` that represent successor blocks, and iterate over them. This is O(N) in the number of outgoing CFG edges.
* To find the predecessors of `S`, scan through its uses and identify users that are terminator instructions. For each such user if this use is at an operand position that represents a successor, then include the block containing the terminator in the output. This is O(N) in the number of *uses* of a block, but we expect that to be on the same order as the number of predecessors in practice.
Each of these actually iterates over the outgoing/incoming CFG *edges* of a block (which might contain duplicates if one block jumps to another in, e.g, multiple cases of a `switch`).
Sometimes you actually want the edges, or don't care about repeats, but in the case where you want to avoid duplicates the user needs to build a set to deduplicate the lists.
The clear benefit of this approach is that the predecessor/successor lists arise naturally from the existing encoding of control-flow instructions. It creates a bit of subtle logic when walking the predecessor/successor lists, but that code only needs to be revisited if we make changes to the terminator instructions that have successors.
### Explicit Encoding of Control-Flow Join Points
In order to allow reconstruction of high-level-language source code from a lower-level CFG, we need to encode something about the expected "join point" for a structured branch.
This is the logical place where control flow is said to "reconverge" after a branch, e.g.:
```hlsl
if(someCondition) // join point is "D"
{
A;
}
else
{
B;
if(C) return;
}
D;
```
Note that (unlike what some programming models would say) a join point is *not* necessarily a postdominator of the conditional branch. In the example above the block with `D` does not postdominate the block with `someCondition` nor the one with `B`. It is even possible to construct cases where the high-level join point of a control-flow construct is unreachable (e.g., the block after an infinite loop).
The Slang IR encodes structured control flow by making the join point be an explicit operand of a structured conditional branch operation. Note that a join-point operand is *not* used when computing the successor list of a block, since it does not represent a control-flow edge.
This is slightly different from SPIR-V where join points ("merge points" in SPIR-V) are encoded using a metadata instruction that precedes a branch. Keeping the information on the instruction itself avoids cases where we move one but not the other of the instructions, or where we might accidentally insert code between the metadata instruction and the terminator it modifies.
In the future we might consider using a decoration to represent join points.
When using a loop instruction, the join point is also the `break` label. The SPIR-V `OpLoopMerge` includes not only the join point (`break` target) but also a `continue` target. We do not currently represent structured information for `continue` blocks.
The reason for this is that while we could keep structured information about `continue` blocks, we might not be able to leverage it when generating high-level code, because the syntactic form of a `for` loop (the only construct in C-like languages where `continue` can go somewhere other than the top of the loop body) only allows an *expression* for the continue clause and not a general *statement*, but we cannot guarantee that after optimization the code in an IR-level "continue clause" would constitute a single expression.
The approach we use today means that the code in "continue clause" might end up being emitted more than once in final code; this is deemed acceptable because it is what `fxc` already does.
When it comes time to re-form higher-level structured control flow from Slang IR, we use the structuring information in the IR to form single-entry "regions" of code that map to existing high-level control-flow constructs (things like `if` statements, loops, `break` or `continue` statements, etc.).
The current approach we use requires the structuring information to be maintained by all IR transformations, and also currently relies on some invariants about what optimizations are allowed to do (e.g., we had better not introduce multi-level `break`s into the IR).
In the future, it would be good to investigate adapting the "Relooper" algorithm used in Emscripten so that we can recover valid structured control flow from an arbitrary CFG; for now we put off that work.
If we had a more powerful restructuring algorithm at hand, we could start to support things like multi-level `break`, and also ensure that `continue` clauses don't lead to code duplication any more.
## IR Global and Hoistable Value Deduplication
Types, constants and certain operations on constants are considered "global value" in the Slang IR. Some other insts like `Specialize()` and `Ptr(x)` are considered as "hoistable" insts, in that they will be defined at the outer most scope where their operands are available. For example, `Ptr(int)` will always be defined at global scope (as direct children of `IRModuleInst`) because its only operand, `int`, is defined at global scope. However if we have `Ptr(T)` where `T` is a generic parameter, then this `Ptr(T)` inst will be always be defined in the block of the generic. Global and hoistable values are always deduplicated and we can always assume two hoistable values with different pointer addresses are distinct values.
The `IRBuilder` class is responsible for ensuring the uniqueness of global/hoistable values. If you call any `IRBuilder` methods that creates a new hoistable instruction, e.g. `IRBuilder::createIntrinsicInst`, `IRBuilder::emitXXX` or `IRBuilder::getType`, `IRBuilder` will check if an equivalent value already exists, and if so it returns the existing inst instead of creating a new one.
The trickier part here is to always maintain the uniqueness when we modify the IR. When we update the operand of an inst from a non-hoistable-value to a hoistable-value, we may need to hoist `inst` itself as a result. For example, consider the following code:
```
%1 = IntType
%p = Ptr(%1)
%2 = func {
%x = ...;
%3 = Ptr(%x);
%4 = ArrayType(%3);
%5 = Var (type: %4);
...
}
```
Now consider the scenario where we need to replace the operand in `Ptr(x)` to `int` (where `x` is some non-constant value), we will get a `Ptr(int)` which is now a global value and should be deduplicated:
```
%1 = IntType
%p = Ptr(%1)
%2 = func {
%x = ...;
//%3 now becomes %p.
%4 = ArrayType(%p);
%5 = Var (type: %4);
...
}
```
Note this code is now breaking the invariant that hoistable insts are always defined at the top-most scope, because `%4` becomes is no longer dependent on any local insts in the function, and should be hoisted to the global scope after replacing `%3` with `%p`. This means that we need to continue to perform hoisting of `%4`, to result this final code:
```
%1 = IntType
%p = Ptr(%1)
%4 = ArrayType(%p); // hoisted to global scope
%2 = func {
%x = ...;
%5 = Var (type: %4);
...
}
```
As illustrated above, because we need to maintain the invariants of global/hoistable values, replacing an operand of an inst can have wide-spread effect on the IR.
To help ensure these invariants, we introduce the `IRBuilder.replaceOperand(inst, operandIndex, newOperand)` method to perform all the cascading modifications after replacing an operand. However the `IRInst.setOperand(idx, newOperand)` will not perform the cascading modifications, and using `setOperand` to modify the operand of a hoistable inst will trigger a runtime assertion error.
Similarly, `inst->replaceUsesWith` will also perform any cascading modifications to ensure the uniqueness of hoistable values. Because of this, we need to be particularly careful when using a loop to iterate the IR linked list or def-use linked list and call `replaceUsesWith` or `replaceOperand` inside the loop.
Consider the following code:
```
IRInst* nextInst = nullptr;
for (auto inst = func->getFirstChild(); inst; inst = nextInst)
{
nextInst = inst->getNextInst(); // save a copy of nestInst
// ...
inst->replaceUsesWith(someNewInst); // Warning: this may be unsafe, because nextInst could been moved to parent->parent!
}
```
Now imagine this code is running on the `func` defined above, imagine we are now at `inst == %3` and we want to replace `inst` with `Ptr(int)`. Before calling `replaceUsesWith`, we have stored `inst->nextInst` to `nextInst`, so `nextInst` is now `%4`(the array type). Now after we call `replaceUsesWith`, `%4` is hoisted to global scope, so in the next iteration, we will start to process `%4` and follow its `next` pointer to `%2` and we will be processing `func` instead of continue walking the child list!
Because of this, we should never be calling `replaceOperand` or `replaceUsesWith` when we are walking the IR linked list. If we want to do so, we must create a temporary workList and add all the insts to the work list before we make any modifications. The `IRInst::getModifiableChildren` utility function will return a temporary work list for safe iteration on the children. The same can be said to the def-use linked list. There is `traverseUses` and `traverseUsers` utility functions defined in `slang-ir.h` to help with walking the def-use list safely.
Another detail to keep in mind is that any local references to an inst may become out-of-date after a call to `replaceOperand` or `replaceUsesWith`. Consider the following code:
```
IRBuilder builder;
auto x = builder.emitXXX(); // x is some non-hoistable value.
auto ptr = builder.getPtrType(x); // create ptr(x).
x->replaceUsesWith(intType); // this renders `ptr` obsolete!!
auto var = builder.emitVar(ptr); // use the obsolete inst to create another inst.
```
In this example, calling `replaceUsesWith` will cause `ptr` to represent `Ptr(int)`, which may already exist in the global scope. After this call, all uses of `ptr` should be replaced with the global `Ptr(int)` inst instead. `IRBuilder` has provided the mechanism to track all the insts that are removed due to deduplication, and map those removed but not yet deleted insts to the existing inst. When using `ptr` to create a new inst, `IRBuilder` will first check if `ptr` should map to some existing hoistable inst in the global deduplication map and replace it if possible. This means that after the call to `builder.emitVar`, `var->type` is not equal to to `ptr`.
### Best Practices
In summary, the best practices when modifying the IR is:
- Never call `replaceUsesWith` or `replaceOperand` when walking raw linked lists in the IR. Always create a work list and iterate on the work list instead. Use `IRInst::getModifiableChildren` and `traverseUses` when you need to modify the IR while iterating.
- Never assume any local references to an `inst` is up-to-date after a call to `replaceUsesWith` or `replaceOperand`. It is OK to continue using them as operands/types to create a new inst, but do not assume the created inst will reference the same inst passed in as argument.

View File

@@ -0,0 +1,265 @@
An overview of the Slang Compiler
=================================
This document will attempt to walk through the overall flow of the Slang compiler, as an aid to developers who are trying to get familiar with the codebase and its design.
More emphasis will be given to places where the compiler design is nontraditional, or might surprise newcomers; things that are straightforward won't get much detail.
High-Level Concepts
-------------------
Compilation is always performed in the context of a *compile request*, which bundles together the options, input files, and request for code generation.
Inside the code, there is a type `CompileRequest` to represent this.
The user specifies some number of *translation units* (represented in the code as a `TranslationUnitRequest`) which comprise some number of *sources* (files or strings).
HLSL follows the traditional C model where a "translation unit" is more or less synonymous with a source file, so when compiling HLSL code the command-line `slangc` will treat each source file as its own translation unit.
For Slang code, the command-line tool will by default put all source files into a single translation unit (so that they represent a shared namespace0).
The user can also specify some number of *entry points* in each translation unit (`EntryPointRequest`), which combines the name of a function to compile with the pipeline stage to compile for.
In a single compile request, we can generate code for zero or more *targets* (represented with `TargetRequest`) a target defines both the format for output code (e.g., DXIL or SPIR-V) and a *profile* that specifies the capability level to assume (e.g., "Shader Model 5.1").
It might not be immediately clear why we have such fine-grained concepts as this, but it ends up being quite important to decide which pieces of the compiler are allowed to depend on which pieces of information (e.g., whether or not a phase of compilation gets to depend on the chosen target).
The "Front End"
---------------
The job of the Slang front-end is to turn textual source code into a combination of code in our custom intermediate representation (IR) plus layout and binding information for shader parameters.
### Lexing
The first step in the compiler (after a source file has been loaded into memory) is to *lex* it.
The `Lexer` type is implement in `lexer.{h,cpp}` and produces `Token`s that represent the contents of the file on-demand as requested by the next phase of compilation.
Each token stores a `TokenCode` that indicates the kind of token, the raw text of the token, and the location in the source code where it is located.
Source locations use a somewhat clever encoding to avoid being bloated (they are a single integer rather than separate file, line, and column fields).
We don't make any attempt in the lexer to extract the actual value of integer and floating-point literals; we just store the raw text.
We also don't try to distinguish keywords from identifiers; keywords show up as ordinary identifier tokens.
Much of the complexity (and inefficiency) in the current lexer is derived from the need to support C-isms like backspace line continuation, and special case rules like allowing `<>` to delimit a file name string after a `#include`.
### Preprocessing
The preprocessor (`Preprocessor`) in `preprocessor.{h,cpp}` deals with `#include` constructs, macro expansions, etc.
It pulls tokens from the lexer as needed (making sure to set flags to control the lexer behavior when required) and uses a limited lookahead to decide what to do with each token.
The preprocessor maintains a stack of input streams, with the original source file at the bottom, and pushes entries for `#include`d files, macros to expand etc.
Macro definitions store a sequence of already-lexed tokens, and expansion simply "replays" these tokens.
Expansion keeps a notion of an "environment" for looking up identifiers and mapping them to macro definitions.
Calling through to a function-style macro creates a fresh environment that maps the macro parameter names to pseudo-macros for the arguments.
We still tokenize code in inactive preprocessor conditionals, but don't evaluate preprocessor directives inside inactive blocks (except those that may change the active/inactive state).
Preprocessor directives are each handled as a callback on the preprocessor state and are looked up by name; adding a new directive (if we ever had a reason to) is a fairly simple task.
One important detail of the preprocessor is that it runs over a full source file at once and produces a flat array of `Token`s, so that there is no direct interaction between the parser and preprocessor.
### Parsing
The parser (`Parser` in `parser.{h,cpp}`) is mostly a straightforward recursive-descent parser.
Because the input is already tokenized before we start, we can use arbitrary lookahead, although we seldom look ahead more than one token.
Traditionally, parsing of C-like languages requires context-sensitive parsing techniques to distinguish types from values, and deal with stuff like the C++ "most vexing parse."
Slang instead uses heuristic approaches: for example, when we encounter an `<` after an identifier, we first try parsing a generic argument list with a closing `>` and then look at the next token to determine if this looks like a generic application (in which case we continue from there) or not (in which case we backtrack).
There are still some cases where we use lookup in the current environment to see if something is a type or a value, but officially we strive to support out-of-order declarations like most modern languages.
In order to achieve that goal we will eventually move to a model where we parse the bodies of declarations and functions in a later pass, after we have resolved names in the global scope.
One important choice in the parser is that we strive to avoid hard-coding keywords as much as possible.
We already track an environment for C-like parsing, and we simply extend that so that we also look up declaration and statement keywords in the environment.
This means that most of the language "keywords" in Slang aren't keywords at all, and instead are just identifiers that happen to be bound to syntax in the default environment.
Syntax declarations are associated with a callback that is invoked to parse the construct they name.
The design of treating syntax as ordinary declarations has a long-term motivation (we'd like to support a flexible macro system) but it also has short-term practical benefits.
It is easy for us to add new modifier keywords to the language without touching the lexer or parser (just adding them to the core module), and we also don't have to worry about any of Slang's extended construct (e.g., `import`) breaking existing HLSL code that just happens to use one of those new keywords as a local variable name.
What the parser produces is an abstract syntax tree (AST).
The AST currently uses a strongly-typed C++ class hierarchy with a "visitor" API generated via some ugly macro magic.
Dynamic casting using C++ RTTI is used in many places to check the class of an AST node; we aren't happy with this but also haven't had time to implement a better/faster solution.
In the parsed AST, both types and expressions use the same representation (because in an expression like `A(B)` it is possible that `A` will resolve to a type, or to a function, and we don't know which yet).
One slightly odd design choice in the parser is that it attaching lexical scoping information to the syntax nodes for identifiers, and any other AST node that need access to the scope/environment where it was defined. This is a choice we will probably change at some point, but it is deeply ingrained right now.
### Semantic Checking
The semantic checking step (`check.{h,cpp}`) is, not surprisingly, the most complicated and messiest bit of the compiler today.
The basic premise is simple: recursively walk the entire AST and apply semantic checking to each construct.
Semantic checking applies to one translation unit at a time.
It has access to the list of entry points for the translation unit (so it can validate them), but it *not* allowed to depend on the compilation target(s) the user might have selected.
Semantic checking of an expression or type term can yield the same AST node, with type information added, or it can return newly constructed AST needs (e.g., when an implicit cast needs to be inserted).
Unchecked identifiers or member references are always resolved to have a pointer to the exact declaration node they are referencing.
Types are represented with a distinct class hierarchy from AST nodes, which is also used for a general notion of compile-time values which can be used to instantiate generic types/functions/etc.
An expression that ends up referring to a type will have a `TypeType` as its type, which will hold the actual type that the expression represents.
The most complicated thing about semantic checking is that we strive to support out-of-order declarations, which means we may need to check a function declaration later in the file before checking a function body early in the file.
In turn, that function declaration might depend on a reference to a nested type declared somewhere else, etc.
We currently solve this issue by doing some amount of on-demand checking; when we have a reference to a function declaration and we need to know its type, we will first check if the function has been through semantic checking yet, and if not we will go ahead and recursively type check that function before we proceed.
This kind of unfounded recursion can lead to real problems (especially when the user might write code with circular dependencies), so we have made some attempts to more strictly "phase" the semantic checking, but those efforts have not yet been done systematically.
When code involved generics and/or interfaces, the semantic checking phase is responsible for ensuring that when a type claims to implement an interface it provides all of the requirements of that interface, and it records the mapping from requirements to their implementations for later use. Similarly, the body of a generic is checked to make sure it uses type parameters in ways that are consistent with their constraints, and the AST is amended to make it explicit when an interface requirement is being employed.
### Lowering and Mandatory Optimizations
The lowering step (`lower-to-ir.{h,cpp}`) is responsible for converting semantically valid ASTs into an intermediate representation that is more suitable for specialization, optimization, and code generation.
The main thing that happens at this step is that a lot of the "sugar" in a high-level language gets baked out. For example:
- A "member function" in a type will turn into an ordinary function that takes an initial `this` parameter
- A `struct` type nested in another `struct` will turn into an ordinary top-level `struct`
- Compound expressions will turn into sequences of instructions that bake the order of evaluation
- High-level control-flow statements will get resolved to a control-flow graph (CFG) of basic blocks
The lowering step is done once for each translation unit, and like semantic checking it does *not* depend on any particular compilation target.
During this step we attach "mangled" names to any imported or exported symbols, so that each function overload, etc. has a unique name.
After IR code has been generated for a translation unit (now called a "module") we next perform a set of "mandatory" optimizations, including SSA promotion and simple copy propagation and elimination of dead control-flow paths.
These optimizations are not primarily motivated by a desire to speed up code, but rather to ensure that certain "obvious" simplifications have been performed before the next step of validation.
After the IR has been "optimized" we perform certain validation/checking tasks that would have been difficult or impossible to perform on the AST.
For example, we can validate that control flow never reached the end of a non-`void` function, and issue an error otherwise.
There are other validation tasks that can/should be performed at this step, although not all of them are currently implemented:
- We should check that any `[unroll]` loops can actually be unrolled, by ensuring that their termination conditions can be resolved to a compile-time constant (even if we don't know the constant yet)
- We should check that any resource types are being used in ways that can be statically resolved (e.g., that the code never conditionally computes a resource to reference), since this is a requirement for all our current targets
- We should check that the operands to any operation that requires a compile-time constant (e.g., the texel offset argument to certain `Sample()` calls) are passed values that end up being compile-time constants
The goal is to eliminate any possible sources of failure in low-level code generation, without needing to have a global view of all the code in a program.
Any error conditions we have to push off until later starts to limit the value of our separate compilation support.
### Parameter Binding and Type Layout
The next phase of parameter binding (`parameter-binding.{h,cpp}`) is independent of IR generation, and proceeds based on the AST that came out of semantic checking.
Parameter binding is the task of figuring out what locations/bindings/offsets should be given to all shader parameters referenced by the user's code.
Parameter binding is done once for each target (because, e.g., Vulkan may bind parameters differently than D3D12), and it is done for the whole compile request (all translation units) rather than one at a time.
This is because when users compile something like HLSL vertex and fragment shaders in distinct translation units, they will often share the "same" parameter via a header, and we need to ensure that it gets just one location.
At a high level, parameter binding starts by computing the *type layout* of each shader parameter.
A type layout describes the amount of registers/bindings/bytes/etc. that a type consumes, and also encodes the information needed to compute offsets/registers for individual `struct` fields or array elements.
Once we know how much space each parameter consumes, we then inspect an explicit binding information (e.g., `register` modifiers) that are relevant for the target, and build a data structure to record what binding ranges are already consumed.
Finally, we go through any parameters without explicit binding information and assign them the next available range of the appropriate size (in a first-fit fashion).
The parameter binding/layout information is what the Slang reflection API exposes. It is layered directly over the Slang AST so that it accurately reflects the program as the user wrote it, and not the result of lowering that program to our IR.
This document describes parameter binding as a "front end" activity, but in practice it is something that could be done in the front-end, the back-end or both.
When shader code involves generic type parameters, complete layout information cannot be generated until the values of these parameters are fully known, and in practice that might not happen until the back end.
### Serialization
It is not yet fully implemented, but our intention is that the last thing the front-end does is to serialize the following information:
- A stripped-down version of the checked AST for each translation unit including declarations/types, but not function bodies
- The IR code for each translation unit
- The binding/layout information for each target
The above information is enough to type-check a subsequent module that `import`s code compile in the front-end, to link against its IR code, or to load and reflect type and binding information.
The "Back End"
--------------
The Slang back end logically starts with the user specifying:
- An IR module, plus any necessary modules to link in and provide its dependencies
- An entry point in that module, plus arguments for any generic parameters that entry point needs
- A compilation target (e.g., SPIR-V for Vulkan)
- Parameter binding/layout information for that module and entry point, computed for the chosen target
We eventually want to support compiling multiple entry points in one pass of the back end, but for now it assumes a single entry point at a time
### Linking and Target Specialization
The first step we perform is to copy the chosen entry point and anything it depends on, recursively into a "fresh" IR module.
We make a copy of things so that any optimization/transformation passes we do for one target don't alter the code the front-end produced in ways that affect other targets.
While copying IR code into the fresh module, we have cases where there might be multiple definitions of the same function or other entity.
In those cases, we apply "target specialization" to pick the definition that is the best for the chosen target.
This step is where we can select between, say, a built-in definition of the `saturate` function for D3D targets, vs. a hand-written one in a Slang standard module to use for GLSL-based targets.
### API Legalization
If we are targeting a GLSL-based platform, we need to translate "varying" shader entry point parameters into global variables used for cross-stage data passing.
We also need to translate any "system value" semantics into uses of the special built-in `gl_*` variables.
We currently handle this kind of API-specific legalization quite early in the process, performing it right after linking.
### Generic Specialization
Once the concrete values for generic parameters are know we can set about specializing code to the known types.
We do this by cloning a function/type/whatever and substituting in the concrete arguments for the parameters.
This process can be continued as specializing one function may reveal opportunities to specialize others.
During this step we also specialize away lookup of interface requirements through their witness tables, once generic witness-table parameters have been replaced with concrete witness tables.
At the end of specialization, we should have code that makes no use of user-defined generics or interfaces.
### Type Legalization
While HLSL and Slang allow a single `struct` type to contain both "ordinary" data like a `float3` and "resources" like a `Texture2D`, the rules for GLSL and SPIR-V are more restrictive.
There are some additional wrinkles that arise for such "mixed" types, so we prefer to always "legalize" the types in the users code by replacing an aggregate type like:
```hlsl
struct Material { float4 baseColor; Texture2D detailMap; };
Material gMaterial;
```
with separate declarations for ordinary and resource fields:
```hlsl
struct Material { float4 baseColor; }
Material gMaterial;
Texture2D gMaterial_detailMap;
```
Changing the "shape" of a type like this (so that a single variable becomes more than one) needs to be done consistently across all declarations/functions in the program (hence why we do it after specialization, so that all concrete types are known).
### Other Optimizations
We dont' currently apply many other optimizations on the IR code in the back-end, under the assumption that the lower-level compilers below Slang will do some of the "heavy lifting."
That said, there are certain optimizations that Slang must do eventually, for semantic completeness. One of the most important examples of these is implementing the semantics of the `[unroll]` attribute, since we can't always rely on downstream compilers to have a capable unrolling implementation.
We expect that over time it will be valuable for Slang to support a wider array of optimization passes, as long as they are ones that are considered "safe" to do above the driver interface, because they won't interfere with downstream optimization opportunities.
### Emission
Once we have transformed the IR code into something that should be legal for the chosen target, we emit code in the appropriate format for the target. This can be high-level source code (such as HLSL, GLSL, Metal, WGSL, C++, or CUDA) or binary formats (such as SPIR-V, DXIL, PTX, or MetalLib) depending on the compilation target.
The emit logic is mostly just a scan over the IR code to emit a high-level declaration for each item: an IR structure type becomes a `struct` declaration, and IR function becomes a function definition, etc.
In order to make the generated code a bit more readable, the Slang compiler currently does *not* emit declarations using their mangled names and instead tries to emit everything using a name based on how it was originally declared.
To improve the readability of function bodies, the emit logic tries to find consecutive sequences of IR instructions that it can emit as a single high-level language expression. This reduces the number of temporaries in the output code, but we need to be careful about inserting parentheses to respect operator precedence, and also to not accidentally change the order of evaluation of code.
When emitting a function body, we need to get from the low-level control flow graph (CFG) to high-level structured control-flow statements like `if`s and loops. We currently do this on a per-function basis during code emission, using an ad hoc algorithm based on control-flow structured information we stored in the IR.
A future version of the compiler might implement something more complete like the "Relooper" algorithm used by Emscripten.
### Downstream Compiler Execution
For certain targets and compilation paths, we invoke downstream compilers to generate binary code (and optionally to disassemble that code for console output). For example:
- DXIL and DXBC targets use dxc and fxc respectively
- SPIR-V, although generated directly from the Slang IR by default, can instead use glslang if the `-emit-spirv-via-glsl` option is specified for `slangc`. If that option is used, GLSL is emitted from the Slang IR to pass to glslang for SPIR-V generation
- PTX generation uses NVRTC
- MetalLib and MetalLibAssembly targets use the Metal compiler (MetalC)
Targets that have output emitted directly from the Slang IR without the use of downstream compilers include high-level source formats like HLSL, GLSL, Metal, WGSL, C++, and CUDA source, as well as the default SPIR-V binary generation path.
The Slang compiler also supports a "pass through" mode where it skips most of the steps outlined so far and just passes text along to downstream compilers directly. This is primarily intended as a debugging aid for developers working on Slang, since it lets you use the same command-line arguments to invoke both Slang compilation and compilation with these other compilers.
Conclusion
----------
Hopefully this whirlwind introduction to the flow of the Slang compiler gives some idea of how the project fits together, and makes it easier to dive into the code and start being productive.

View File

@@ -0,0 +1,68 @@
# Resolving Ambiguity in Slang's Parser
A typical text-book style compiler front-end usually features explicit stages: tokenization, parsing, and semantic checking. Slang's original design follows this pattern, but the design has a drawback that it cannot effectively disambiguate the syntax due to lack of semantic info during parsing.
For example, without knowing what `X` is, it is impossible to tell whether `X<a&&b>(5)` means calling a generic function `X` with argument `5`, or computing the logical `AND` between condition `X < a` and `b > 5`.
Slang initially addresses this problem with a heursitic: if the compiler sees `IDENTIFIER` followed by `<`, it will try to parse the expression as a generic specialization first, and if that succeeds, it checks the token after the closing `>` to see if the following token is one of the possible "generic specialization followers". In this example, the next token is `(`, which is a "generic specialization follower", so the compiler determines that the expression being parsed is very likely a generic function call, and it will parse the expression as such. For reference, the full set of "generic specialization followers" are: `::`, `.`, `(`, `)`, `[`, `]`, `:`, `,`, `?`, `;`, `==`, `!=`, `>` and `>>`.
This simplistic heuristic is originated from the C# compiler, which works well there since C# doesn't allow generic value arguments, therefore things like `X<a&&b>...` or `X<a<y>...` can never be valid generic specializations. This isn't the case for Slang, where generic arguments can be int or boolean values, so `a&&b` and `a<y` are valid as generic arguments. Although using the same heuristic here works most of the time, it is still causing a lot of confusion to the users when the heuristic fails.
The ambiguity problem can be systematically solved if the parser has access to semantic info. If the parser knows that `X` is / isn't a generic, then it can parse the expression accordingly without any guess work. The key challenge is to make such semantic info available while we are still parsing.
## Two-stage Parsing
Slang solves this problem by breaking parsing into two stages: the decl parsing stage, and body parsing stage. Initially, we will parse the user source in the decl parsing stage. In this stage, we parse all decls, such as `struct`s, variables, functions etc. as usual, except that when we are about to parse the body of a function, we will just collect all tokens enclosed by `{` and `}` and store them in a raw list as a `UnparsedStmt` AST node. By deferring the parsing of function bodies, we no longer need to guess whether a `<` token inside a function body means generic specialization or less-than comparison.
After the decl parsing stage, we have the AST that represents the decl structure but not the function bodies. With this initial AST, we can start semantic checking. Once we reached the `UnparsedStmt` nodes, the semantic visitor will spawn a new `Parser` and start to parse the tokens stored in the `UnparsedStmt` node. When we spawn the parser in a semantic visitor, initialize the parser to be in `Body` parsing stage, and pass a pointer to the semantic visitor to the parser. This way, we are triggering the second parsing stage from the semantic visitor.
During the second parsing stage, whenever we see a `<` and need to disambiguate, we will use the semantic visitor to check the expression that has been parsed so far before `<`. If we are able to type check the expression and find it to be a `DeclRefExpr` referencing a generic decl, or an `OverloadedExpr` where one of the candidate is a generic decl, then we know `<` should be parsed as a generic specialization instead of `operator <`. If the expression before `<` checks to be a reference to a variable or a property, we should parse it as the comparison operator. The reason we are still parsing `<` as generic specialization when the expression before it is an non-generic function or type, is to allow us provide better error messages instead of just a "syntax error" somewhere down the line: in this case the user is most likely treating the non-generic type or function as a generic one by mistake, so we should diagnose as such. In the case that we are unable to properly check the preceeding expression or it checks to something else that we don't know, the compiler will fallback to the heuristic based method for disambiguation.
Note that in the second stage, parsing and semantic checking is interleaved organically. We no longer have a clean boundary between parsing and checking. However, the checking that happens in the second stage is on-demand and checks only necessary parts of the code to determine the type of the expression preceeding the `<` token. Any other code irrelevant to disambiguation purposes are left unchecked. Once the function body is fully parsed, the semantic visitor working on the function will make sure every node of the parsed AST is visited.
This two stage parsing technique should work well to correctly disambiguate code inside a function body. However the current implementation is not 100% bulletproof. Expressions at decl level, such as default values for struct members or function parameters, are still fully parsed in the first stage using the heuristic based method. However this should be a lesser problem in practice, because the default values are typically simple expressions and the chances of running into wrongly disambiguated case is much lower than in function bodies.
## Scope of Local Variables
Another issue linked with parsing is to correctly support the scope of local variables. A local variable should only be visible to code after its declaration within the same `{}` block. Consider this example:
```cpp
static int input = 100;
int f()
{
input = 2; // global `input` is now 2
int input = input + 1; // local `input` is now 3
input = input + 2; // local `input` is now 5
return input; // returns 5.
}
```
In Slang's implementation, we are creating a `ScopeDecl` container node for each `BlockStatement`, and variable declarations inside the block are added to the same `ScopeDecl`. This creates a problem for two stage parsing: to allow any expression to check during disambiguation, we need to insert variables into the scope as soon as they are parsed, but this means that when we are doing the "full checking" after the entire body is parsed, all variables are already registered in scope and discoverable when we are checking the earlier statements in the block. This means that the compiler cannot report an error if the user attempts to use a variable that is defined later in the block. In the example above, it means that when we are checking the first statement `input = 2`, the lookup logic for `input` will find the local variable instead of the global variable, thus generating the wrong code.
One way to solve this problem is instead of registering all local variables to the same scope owned by the containing `BlockStmt`, we make each local variable declaration own its own scope, that is ended at the end of the owning block. This way, all statements following the local variable declaration become the children of the local variable `DeclStmt`, effectively parsing the above example as:
```cpp
static int input = 100;
int f()
{
input = 2; // global `input` is now 2
{
int input = input + 1; // local `input` is now 3
input = input + 2; // local `input` is now 5
return input; // returns 5.
}
}
```
This will ensure the scope data-structure matches the semantic scope of the variable, and allow the compiler to produce the correct diagnostics.
However, expressing scope this way creates long nested chains in the AST, and leads to inefficient lookup and deep ASTs that risk overflowing the stack. Instead, Slang stays with the design to put all variables in the same block registered to the same `ScopeDecl`, but uses a separate state on each `VarDecl` called `hiddenFromLookup` to track whether or not the decl should be visible to lookup. During parsing, all decls are set to visible by default, so they can be used for disambiguation purpose. Once parsing is fully done and we are about to check a `BlockStmt`, we will first visit all `DeclStmt`s in the block, mark it as `invisible`, then continue checking the children statements. When checking encounters a `DeclStmt`, it will then mark the decl as `visible`, allowing it to be found by lookup logic for code after the declaration side. This solution allows us to respect the semantic scope of local variables without actually forming a long chain of scopes for a sequence of statements.
## Future Work: Extend Staged Parsing to Decl Scopes
We can further extend this to properly support expressions in global/decl scopes, such as default value expressions for struct members, or the type expressions for functions and global/member variables. To do so, we will use a different strategy for parsing expressions in the first parsing stage. Instead of parsing the expression directly, we should identify the token boundary of an expression without detailed understanding of the syntax. We will parse all expressions into `UnparsedExpr` nodes, which contain unparsed tokens for each expression. By doing so, the first parsing stage will give us an AST that is detailed enough to identify the names of types and functions, and whether or not they are generic. Then we can perform the semantic checking on the intial AST, and use the semantic checking to drive the parsing and checking of any `UnparsedExpr` and `UnparsedStmt`s.
## Future Work: ScopeRef
We can get rid of the `hiddenFromLookup` flag and use a more immutable representation of AST nodes if we introduce the concept of a `ScopeRef` that is a `Scope*` + `endIndex` to mark the boundary of the referenced scope. This way, different statements in a block can have different `ScopeRef` to the same scope but different ending member index. If we are looking up through a `ScopeRef` and find a variable in the scope that has an index greater than `endIndex`, we should treat the variable as invisible and report an error. This is cleaner, allowing better error messages, and avoids having to maintain mutable state flags on Decls.

View File

@@ -0,0 +1,216 @@
Semantic Checking
=================
The semantic checking logic in the Slang compiler is located in `source/slang/slang-check*`.
Semantic checking is applied in the front end after parsing, and before lowering of code to the IR.
The main job of the semantic checking stage is to detect and forbid code that has errors in it.
The errors and other diagnostics reported are intended to be of benefit to the user, but semantic checking is also important for the overall function of the compiler.
Stages of compilation after semantic checking (e.g., lowering to the IR) are allowed to *assume* that the code they operate on is semantically valid, and may assert-fail or even crash on invalid code.
Semantic checking is thus not an optional step, and there is no meaningful way to turn it off.
Semantic Checking can be broken into three main kinds of work, and we will discuss how each is implemented in the following sections:
* Checking of "terms" which include expressions and type expressions
* Checking of statements
* Checking of declarations
Checking Terms
--------------
### Some Terminology for Terms
We use the word "term" to refer generically to something that can be evaluated to produce a result, but where we do not yet know if the result will be a type or a value. For example, `Texture2D` might be a term that results in a type, while `main` might be a term that results in a value (of function type), but both start out as a `NameExpr` in the AST. Thus the AST uses the class hierarchy under `Expr` to represent terms, whether they evaluate to values or types.
There is also the `Type` hierarchy, but it is important to understand that `Type` represents types as their logical immutable selves, while `Expr`s that evaluate to types are *type expressions* which can be concretely pointed to in the user's code. Type expressions have source locations, because they represent something the user wrote in their code, while `Type`s don't have singular locations by default.
The codebase uses the notion of a `TypeRepr` for those `Expr`s that should only ever evaluate to types, and there is also a `TypeExp` type that is meant to package up a `Type` with an optional `Expr` for a type expression that produced it. The names of these implementation types aren't great, and should probably not be spread further.
A value-bearing `Expr` will eventually be given a `Type` that describes the type of value it produces.
An `Expr` that evaluates to a type will eventually be given a `Type` that uses the `TypeType` subclass to indicate the specific type it evaluated to.
The `TypeType` idea is kind of kludge to represent "kinds" (the "types of types") in our system.
More correctly, we should say that every `Expr` gets a *classifier*, with the classifiers for value expressions being `Type`s and the classifiers for type expressions being kinds, but we haven't had time or inclination to fix the model yet.
### The Big Picture
Checking of terms is largely done as an ad hoc postorder traversal of the AST.
That is, in order to check a compound expression like `f(a)` we first need to check `f` and `a` before we can check the function call.
When checking an expression there are four main things that have to be done:
1. Recursively check all sub-expressions.
2. Detect and diagnose any errors (or warnings) in the current expression.
3. Optionally construct a new expression to replace the current expression (or one of its sub-expressions) in cases where the syntactic form of the input doesn't match the desired semantics (e.g., make an implicit type conversion explicit in the AST).
4. Determine the correct type for the result expression, and store it so that it can be used by subsequent checking.
Those steps may end up being interleaved in practice.
### Handling Errors Gracefully
If an error is detected in a sub-expression, then there are a few issues that need to be dealt with:
* We need to ensure that an erroneous sub-expression can't crash the compiler when it goes on to check a parent expression. For example, leaving the type of an expression as null when it has errors is asking for trouble.
* We ideally want to continue to diagnose other unrelated errors in the same expression, statement, function, or file. That means that we shouldn't just bail out of semantic checking entirely (e.g., by throwing an exception).
* We don't want to produce "cascading" errors where, e.g., an error in `a` causes us to also report an error in `a + b` because no suitable operator overload was found.
We tackle all of these problems by introducing the `ErrorType` and `ErrorExpr` classes.
If we can't determine a correct type for an expression (say, because it has an error) then we will assign it the type `ErrorType`.
If we can't reasonably form an expression to return *at all* then we will return an `ErrorExpr` (which has type `ErrorType`).
These classes are designed to make sure that subsequent code won't crash on them (since we have non-null objects), but to help avoid cascading errors.
Some semantic checking logic will detect `ErrorType`s on sub-expressions and skip its own checking logic (e.g., this happens for function overload resolution), producing an `ErrorType` further up.
In other cases, expressions with `ErrorType` can be silently consumed.
For example, an erroneous expression is implicitly convertible to *any* type, which means that assignment of an error expression to a local variable will always succeed, regardless of variable's type.
### Overload Resolution
One of the most involved parts of expression checking is overload resolution, which occurs when there is an expression of the form `f(...)` where `f` could refer to multiple function declarations.
Our basic approach to overload resolution is to iterate over all the candidates and add them to an `OverloadResolveContext`.
The context is responsible for keeping track of the "best" candidate(s) seen so far.
Traditionally a language defines rules for which overloads are "better" than others that focus only on candidates that actually apply to the call site.
This is the right way to define language semantics, but it can produce sub-optimal diagnostics when *no* candidate was actually applicable.
For example, suppose the user wrote `f(a,b)` and there are 100 functions names `f`, but none works for the argument types of `a` and `b`.
A naive approach might just say "no overload applicable to arguments with such-and-such types."
A more advanced compiler might try to list all 100 candidates, but that wouldn't be helpful.
If it turns out that of the 100 candidates, only 10 of them have two parameters, then it might be much more helpful to list only the 10 candidates that were even remotely applicable at the call site.
The Slang compiler strives to provide better diagnostics on overload resolution by breaking the checking of a candidate callee into multiple phases, and recording the earliest phase at which a problem was detected (if any).
Candidates that made it through more phases of checking without errors are considered "better" than other candidates, even if they ultimately aren't applicable.
### Type Conversions
Conversion of values from one type to another can occur both explicitly (e.g., `(int) foo`) and implicitly (e.g., `while(foo)` implicitly converts `foo` to a `bool`).
Type conversion also tied into overload resolution, since some conversions get ranked as "better" than others when deciding between candidates (e.g., converting an `int` to a `float` is preferred over converting it to a `double`).
We try to bottleneck all kinds of type conversion through a single code path so that the various kinds of conversion can be handled equivalently.
### L-Values
An *l-value* is an expression that can be used as the destination of an assignment, or for read-modify-write operations.
We track the l-value-ness of expressions using `QualType` which basically represents a `Type` plus a bit to note whether something is an l-value or not.
(This type could eventually be compressed down to be stored as a single pointer, but we haven't gotten to that yet)
We do not currently have a concept like the `const` qualifier in C/C++, that would be visible to the language user.
Propagation of l-value-ness is handled in an ad hoc fashion in the small number of expression cases that can ever produce l-values.
The default behavior is that expressions are not l-values and the implicit conversion from `Type` to `QualType` reflects this.
Checking Statements
-------------------
Checking of statements is relatively simpler than checking expressions.
Statements do not produce values, so they don't get assigned types/classifiers.
We do not currently have cases where a statement needs to be transformed into an elaborated form as part of checking (e.g., to make implicit behavior explicit), so statement checking operates "in place" rather than optionally producing new AST nodes.
The most interesting part of statement checking is that it requires information about the lexical context.
Checking a `return` statement requires knowing the surrounding function and its declared result type.
Checking a `break` statement requires knowing about any surrounding loop or `switch` statements.
We represent the surrounding function explicitly on the `SemanticsStmtVisitor` type, and also use a linked list of `OuterStmtInfo` threaded up through the stack to track lexically enclosing statements.
Note that semantic checking of statements at the AST level does *not* encompass certain flow-sensitive checks.
For example, the logic in `slang-check-stmt.cpp` does not check for or diagnose any of:
* Functions that fail to `return` a value along some control flow paths
* Unreachable code
* Variables used without being initialized first
All of the above are instead intended to be handled at the IR level (where dataflow analysis is easier) during the "mandatory" optimization passes that follow IR lowering.
Checking Declarations
---------------------
Checking of declarations is the most complicated and involved part of semantic checking.
### The Problem
Simple approaches to semantic checking of declarations fall into two camps:
1. One can define a total ordering on declarations (usually textual order in the source file) and only allow dependencies to follow that order, so that checking can follow the same order. This is the style of C/C++, which is inherited from the legacy of traditional single-pass compilers.
2. One can define a total ordering on *phases* of semantic checking, so that every declaration in the file is checked at phase N before any is checked at phase N+1. E.g., the types of all variables and functions must be determined before any expressions that use those variables/functions can be checked. This is the style of, e.g., Java and C#, which put a premium on defining context-free languages that don't dictate order of declaration.
Slang tries to bridge these two worlds: it has inherited features from HLSL that were inspired by C/C++, while it also strives to support out-of-order declarations like Java/C#.
Unsurprisingly, this leads to unique challenges.
Supporting out-of-order declarations means that there is no simple total order on declarations (we can have mutually recursive function or type declarations), and supporting generics with value parameters means there is no simple total order on phases.
For that last part observe that:
* Resolving an overloaded function call requires knowing the types of the parameters for candidate functions.
* Determining the type of a parameter requires checking type expressions.
* Type expressions may contain value arguments to generics, so checking type expressions requires checking value expressions.
* Value expressions can include function calls (e.g., operator invocations), which then require overload resolution to type-check.
### The Solution
Our declaration checking logic takes the idea of phase-based checking as a starting point, but instead of a global ordering on phases we use a per-declaration order.
Each declaration in the Slang AST will have a `DeclCheckState` that represents "how checked" that declaration is.
We can apply semantic checking logic to a declaration `D` to raise its state to some desired state `S`.
By default, the logic in `slang-check-decl.cpp` will do a kind of "breadth-first" checking strategy where it will try to raise all declarations to the one state before moving on to the next.
In many cases this will reproduce the behavior of a Java or C#-style compiler with strict phases.
The main difference for Slang is that whenever, during the checking of some declaration `D`, we discover that we need information from some other declaration `E` that would depend on `E` being in state `S`, we manually call a routine `ensureDecl(E,S)` whose job is to ensure that `E` has been checked enough for us to proceed.
The `ensureDecl` operation will often be a no-op, if the declaration has already been checked previously, but in cases where the declaration *hasn't* been checked yet it will cause the compiler to recursively re-enter semantic checking and try to check `E` until it reached the desired state.
In pathological cases, this method can result in unbounded recursion in the type checker. The breadth-first strategy helps to make such cases less likely, and introducing more phases to semantic checking can also help reduce problems.
In the long run we may need to investigate options that don't rely on unbounded recursion.
### The Rules
As a programmer contributing to the semantic checking infrastructure, the declaration-checking strategy requires following a few rules:
* If a piece of code is about to rely on some property of a declaration that might be null/absent/wrong if checking hasn't been applied, it should use `ensureDecl` to make sure the declaration in question has been checked enough for that property to be available.
* If adding some `ensureDecl`s leads to an internal compiler error because of circularity in semantic checking, then either the `ensureDecl`s were misplaced, or they were too strong (you asked for more checking than was necessary), or in the worse case we need to add more phases (more `DeclCheckState`s) to separate out the checking steps and break the apparent cycle.
* In very rare cases, semantic checking for a declaration may want to use `SetCheckState` to update the state of the declaration itself before recursively `ensureDecl`ing its child declarations, but this must be done carefully because it means you are claiming that the declaration is in some state `S`, while not having complete the checking that is associated with state `S`.
* It should *never* be necessary to modify `checkModuleDecl` so that it performs certain kinds of semantic analysis on certain declarations before others (e.g., iterate over all the `AggTypeDecl`s before all the `FuncDecl`s). If you find yourself tempted to modify it in such a way, then add more `DeclCheckState`s to reflect the desired ordering. It is okay to have phases of checking that only apply to a subset of declarations.
* Every statement and expression/term should be checked once and only once. If something is being checked twice and leading to failures, the right thing is to fix the source of the problem in declaration checking, rather than make the expression/statement checking be defensive against this case.
Name Lookup
-----------
Lookup is the processing of resolving the contextual meaning of names either in a lexical scope (e.g., the user wrote `foo` in a function body - what does it refer to?) or in the scope of some type (e.g., the user wrote `obj.foo` for some value `obj` of type `T` - what does it refer to?).
Lookup can be tied to semantic analysis quite deeply.
In order to know what a member reference like `obj.foo` refers to, we not only need to know the type of `obj`, but we may also need to know what interfaces that type conforms to (e.g., it might be a type parameter `T` with a constraint `T : IFoo`).
In order to support lookup in the presence of our declaration-checking strategy described above, the lookup logic may be passed a `SemanticsVisitor` that it can use to `ensureDecl()` declarations before it relies on their properties.
However, lookup also currently gets used during parsing, and in those cases it may need to be applied without access to the semantics-checking infrastructure (since we currently separate parsing and semantic analysis).
In those cases a null `SemanticsVisitor` is passed in, and the lookup process will avoid using lookup approaches that rely on derived semantic information.
This is fine in practice because the main thing that gets looked up during parsing are names of `SyntaxDecl`s (which are all global) and also global type/function/variable names.
Known Issues
------------
The largest known issue for the semantic checking logic is that there are currently dependencies between parsing and semantic checking.
Just like a C/C++ parser, the Slang parser sometimes needs to disambiguate whether an identifier refers to a type or value to make forward progress, and that would in general require semantic analysis.
Ideally the way forward is some combination of the following two strategies:
* We should strive to make parsing at the "global scope" fully context-insensitive (e.g., by using similar lookahead heuristics to C#). We are already close to this goal today, but will need to be careful that we do not introduce regressions compared to the old parser (perhaps a "compatibility" mode for legacy HLSL code is needed?)
* We should delay the parsing of nested scopes (both function and type bodies bracketed with `{}`) until later steps of the compiler. Ideally, parsing of function bodies can be done in a context-sensitive manner that interleaves with semantic checking, closer to the traditional C/C++ model (since we don't care about out-of-order declarations in function bodies).

View File

@@ -0,0 +1,258 @@
Serialization
=============
Slang's infrastructure for serialization is currently in flux, so there exist a mixture of different subsystems, using a mixture of different techniques.
This document is curently minimal, and primarily serves to provide a replacement for an older draft that no longer reflects the state of the codebase.
The Fossil Format
=================
The "fossil" format is a memory-mappable binary format for general-purpose serialization.
Goals
-----
The main goals of the fossil format are:
* Data can be read from memory as-is.
* Basic types are stored at offsets that are naturally aligned (e.g., a 4-byte integer is 4-byte aligned)
* Pointers are encoded as relative offsets, and can be traversed without any "relocation" step after data is loaded.
* Supports general-purpose data, including complicated object graphs.
* Data can include embedded layout information, allowing code to traverse it without statically knowing the structure.
* Embedded layout information should support versioning; new code should be able to load old data by notcing what has/hasn't been encoded.
* Layout information is *optional*, and data can be traversed with minimal overhead by code that knows/assumes the layout
Top-Level Structure
-------------------
A serialized blob in fossil format starts with a header (see `Slang::Fossil::Header`), which in turn points to the *root value*.
All other data in the blob should be reachable from the root value, and an application can choose to make the root value whatever type they want (an array, structure, etc.).
Encoding
--------
### Endian
All data is read/written in the endianness of the host machine.
There is currently no automatic support for encoding endian-ness as part of the format; a byte-order mark should be added if we ever need to support big-endian platforms.
### Fixed-Size Types
#### Basic Types
Basic types like fixed-width integers and floating-point numbers are encoded as-is.
That is, an N-byte value is stored directly as N bytes of data with N-byte alignment.
A Boolean value is encoded as an 8-bit unsigned integer holding either zero or one.
#### Pointers
A pointer is encoded as a 4-byte signed integer, representing a relative offset.
If the relative offset value is zero, then the pointer is null.
Otehrwise, the relative offset value should be added to the offset of the pointer itself, to get the offset of the target.
#### Optionals
An optional value of some type `T` (e.g., the equivalent of a `std::optional<T>`) is encoded as a pointer to a `T`.
If the pointer is null, the optional has no value; otherwise the value is stored at the offset being pointed to.
Note that when encoding a pointer to an optional (`std::optional<T> *`) or an optional pointer (`std::optional<T*>`), there will be two indirections.
#### Records
Things that are conceptually like a `struct` or tuple are encoded as *records*, which are simply a sequence of *fields*.
The alignment of a record is the maximum alignment of its fields.
Fields in a record are laid out sequentially, where each field gets the next suitably-aligned offset after the preceding field.
No effort is made to fill in "gaps" left by preceding fields.
Note: currently the size of a record is *not* rounded up to be a multiple of its alignment, so it is possible for one field to be laid out in the "tail padding" of the field before it.
This behavior should probably be changed, so that the fossilized layout better matches what C/C++ compilers tend to do.
### Variable-Size Types
Types where different instances may consume a different number of bytes may be encoded either *inline* or *indirectly*.
If a variable-size type `V` is being referred to by a pointer or optional (e.g., `V*` or `std::optional<V>`), then it will be encoded inline as the target address of that pointer/optional.
In all other contexts, including when a `V` is used as a field or a record, it will be encoded indirectly (conceptually, as if the field was actually a `V*`).
When a variable-size type is encoded indirectly, a null pointer should be interpreted as an empty instance of the type `V`.
#### Arrays
An array of `T` is encoded as a sequence of `T` values, separated by the *stride* of `T` (the size of `T` rounded up to the alignment of `T`).
The offset of the array is the offset of its first element.
The number of elements in the array is encoded as a 4-byte unsigned integer stored immediately *before* the offset of the array itself.
#### Strings
A string is encoded in the same way that an array of 8-bit bytes would be (including the count stored before the first element).
The only additional detail is that the serialized data *must* include an additional nul byte after the last element of the string.
The data of a string is assumed to be in UTF-8 encoding, but there is nothing about the format that validates or enforces this.
#### Dictionaries
A dictionary with keys of type `K` and values of type `V` is encoded in the same way as an array of `P`, where `P` is a two-element tuple of a `K` and a `V`.
There is currently no provision made for efficient lookup of elements of a fossilized dictionary.
#### Variants
A *variant* is a fossilized value that can describe its own layout.
The content of variant holding a value of type `T` is encoded exactly as a record with one field of type `T` would be, starting at the offset of the variant itself.
The four bytes immediately preceding a variant store a relative pointer to the fossilized layout for the type `T` of the content.
### Layouts
Every layout starts with a 4-byte unsigned integer that holds a tag representing the kind of layout (see `Slang::FossilizedValKind`).
The value of the tag determines what, if any, information appears after the tag.
In any place where a relative pointer to a layout is expected, a null pointer may be used to indicate that the relevant layout information is either unknown, or was elided from the fossilized data.
#### Pointer-Like Types
For pointers (`T*`) and optionals (`Optional<T>`), the tag is followed by a relative pointer to a layout for `T`.
#### Container Types
For arrays and dictionaries, the tag is followed by:
* A relative pointer to a layout for the element type
* A 4-byte unsigned integer holding the stride between elements
#### Record Types
For records, the tag is followed by:
* A 4-byte unsigned integer holding the number of fields, `N`
* `N` 8-byte values representing the fields, each comprising:
* A relative pointer to the type of the field
* A 4-byte unsigned integer holding the offset of that field within the record
The RIFF Support Code
=====================
There is code in `source/core/slang-riff.{h,cpp}` that implements abastractions for reading and writing RIFF-structured files.
The current RIFF implementation is trying to be "correct" for the RIFF format as used elsewhere (e.g., for `.wav` files), but it is unclear if this choice is actually helping us rather than hurting us.
It is likely that we will want to customize the format if we keep using (e.g., at the very least increase the minimum alignment of chunks).
RIFF is a simple chunk-based file format that is used by things like WAV files, and has inspired many similar container formats used in media/games.
The RIFF structures are currently being used for a few things:
* The top-level structure of serialized files for slang modules, "module libraries". This design choice is being utilized so that the compiler can navigate the relevant structures and extract the parts it needs (e.g., just the digest of a module, but not the AST or IR).
* Repro files are using a top-level RIFF container, but it is just to encapsulate a single blob of raw data (with internal offset-based pointers)
* The structure of the IR and `SourceLoc` serialization formats uses RIFF chunks for their top-level structure, but doesn't really make use of the ability to navigate them in memory or perform random access.
* The actual serialized AST format is currently a deep hierarchy of RIFF chunks.
* There is also code for a RIFF-based hierarchical virtual file-system format, and that format is being used for the serialized core module (seemingly just because it includes support for LZ4; the actual "file system" that gets serialized seems to only have a single file in it).
General-Purpose Hierarchical Data Serialization
===============================================
The code in `source/slang/slang-serialize.{h,cpp}` implements a framework for serialization that is intended to be lightweight for users to adopt, while also scaling to more complicated cases like our AST serialization.
In the simplest cases, all a programmer needs to know is that if they have declared a type like:
struct MyThing
{
float f;
List<OtherThing> others;
SomeObject* obj;
};
then they can add serialization support for their type by writing a function like:
void serialize(Serializer const& serializer, MyThing& value)
{
SLANG_SCOPED_SERIALIZER_STRUCT(serializer);
serialize(serializer, value.f);
serialize(serializer, value.others);
serialize(serializer, value.obj);
}
If the `OtherThing` and `SomeObject` types were already set up with their own serialization support, then that should be all that's needed.
Of course there's a lot more to it in once you get into the details and the difficult cases.
For now, looking at `source/slang/slang-serialize.h` is probably the best way to learn more about the approach.
One key goal of this serialization system is that it allows the serialized format to be swapped in and out without affecting the per-type `serialize` functions.
Currently there are only a small number of implementations.
RIFF Serialization
------------------
The files `slang-serialize-riff.{h,cpp}` provide an implementation of the general-purpose serialization framework that reads/writes RIFF files with a particular kind of structure, based on what had previously been hard-coded for use in serializing the AST to RIFF.
In practice this representation is kind of like an encoding of JSON as RIFF chunks, with leaf/data chunks for what would be leaf values in JSON, and container chunks for arrays and dictionaries (plus other aggregates that would translate into arrays or dictionaries in JSON).
Fossil Serialization
--------------------
The files `slang-serialize-fossil.{h,cpp}` provide an implementation of the generla-purpose serialization framwork that reads/writes the "fossil" format, which is described earlier in this document.
AST Serialization
=================
AST serialization is implementation as an application of the general-purpose framework described above.
There is an `ASTSerializer` type that expands on `Serializer` to include the additional context that is needed for handling AST-related types like `SourceLoc`, `Name`, and the `NodeBase` hierarchy.
The Old Serialization System
============================
The old serialization system has largely been removed, but some vestiges of it are still noticeable.
There was an older serialization system in place that made use of an extensive RTTI system that types had to be registered with, plus a set of boilerplate macros for interfacing with that system that were generated from the C++ declarations of the AST node types.
That system was also predicated on the idea that to serialize a user C++ type `Foo`, one would also hand-author a matching C++ type `SerialFooData`, and then write code to translate a `Foo` to/from a `SerialFooData` plus code to read/write a `SerialFooData` from the actual serialized data format.
The IR and `SourceLoc` serialization approaches are currently still heavily influenced by the old serialization system, and there are still vestigates of the RTTI infrastructure that was introduced to support it.
The hope is that as more subsystems are ported to use newer approaches to serialization, this code can all be eliminated.
The following sections are older text that describes some of the formats that have not yet been revisited.
IR Serialization
----------------
This mechanism is *much* simpler than generali serialization, because by design the IR types are very homogeneous in style. There are a few special cases, but in general an instruction consists of
* Its type
* A SourceLoc
* 0 or more operands.
* 0 or more children.
Within the IR instructions are pointers to IRInst derived types. As previously discussed serializing pointers directly is generally not a good idea. To work around this the pointers are turned into 32 bit indices. Additionally we know that an instruction can belong to at most one other instruction.
When serializing out special handling is made for child instructions - their indices are made to be a contiguous range of indices for all instructions that belong to each parent. The indices are ordered into the same order as the children are held in the parent. By using this mechanism it is not necessary to directly save off the indices that belong to a parent, only the range of indices.
The actual serialization mechanism is similar to the generalized mechanism - referenced objects are saved off in order of their indices. What is different is that the encoding fixes the size of the Inst to `IRSerialData`. That this can hold up to two operands, if the instruction has more than two operands then one of the UInt32 is the operand count and the other is an offset to a list of operands. It probably makes sense to alter this in the future to stream the instructions payload directly.
IR serialization allows a simple compression mechanism, that works because much of the IR serialized data is UInt32 data, that can use a variable byte encoding.
SourceLoc Serialization
-----------------------
SourceLoc serialization presents several problems. Firstly we have two distinct serialization mechanisms that need to use it - IR serialization and generalized serialization. That being the case it cannot be saved directly in either, even though it may be referenced by either.
To keep things simple for now we build up SourceLoc information for both IR and general serialization via their writers adding their information into a SerialSourceLocWriter. Then we can save this information into a RIFF section, that can be loaded before either general or IR deserialization is used.
When reading the SourceLoc information has to be located and deserialized before any AST or IR deserialization. The SourceLoc data can then be turned into a SerialSourceLocReader, which is then either set on the `SerialReaders` `SerialExtraObjects`. Or passed to the `IRSerialReader`.

View File

@@ -0,0 +1,254 @@
Core Module Intrinsics
======================
The following document aims to cover a variety of systems used to add target specific features. They are most extensively used in the slang core module.
**NOTE!** These features should *not* be considered stable! They can be used in regular slang code to add features, but they risk breaking with any Slang version change. Additionally the features implementation can be very particular to what is required for a specific feature set, so might not work as expected in all scenarios.
As these features are in flux, it is quite possible this document is behind the current features available within the Slang code base.
If you want to add support for a feature for a target to Slang, implementing it as a part of the Slang standard modules is typically a good way to progress. Depending on the extension/feature it may not be possible to add support exclusively via changes to the standard module alone. That said most support for target specific extensions and features involve at least some changes to the slang standard modules including the core module, and typically using the mechanisms described here.
## Core Module
The main place these features are used are within the slang core module. This is implemented with a set of slang files within the slang project
* core.meta.slang
* hlsl.meta.slang
* diff.meta.slang
Looking at these files will demonstrate the features in use.
Most of the intrinsics and attributes have names that indicate that they are not for normal use. This is typically via a `__` prefix.
The `.meta.slang` files look largely like Slang source files, but their contents can also be generated programmatically with C++ code. A section of code can drop into `C++` code if it is proceeded by `${{{{`. The C++ section is closed with a closing `}}}}`. This mechanism is typically used to generate different versions of a similar code sequence. Values from the C++ code can be accessed via the `$()`, where the contents of the brackets specifies something that can be calculated from within the C++ code.
As an example, to produce an an array with values 0 to 9 we could write...
```slang
// Slang code
${{{{
// C++ code, calling out to a C++ function getTime, the result is held in variable time
int cppTime = getTime();
}}}}
// Back to Slang code, can access the C++ variable previously defined as cppTime. Due to $().
// The code inside the $() is executed on the C++ side, so can do calculations. In practice it would be easier
// to just use call $(getTime() + 1), but this demonstrates variables are accessible.
int slangTime = $(cppTime + 1);
```
# Attributes
## [__readNone]
A `[__readNone]` indicates a function that computes its results strictly based on argument values, without reading or writing through any pointer arguments, or any other state that could be observed by a caller.
## [__NoSideEffect]
Specifies a function declaration has no observable side effects.
## [__unsafeForceInlineEarly]
Inlines the contained code, but does so very early stage. Being earlier allows allows some kinds of inlining transformations to work, that wouldn't work with regular inlining. It also means it must be used with *care*, because it may produce unexpected results for more complex scenarios.
## [__NonCopyableType]
Marks a type to be non-copyable, causing SSA pass to skip turning variables of the the type into SSA values.
## [__AlwaysFoldIntoUseSiteAttribute]
A call to the decorated function should always be folded into its use site.
## [KnownBuiltin("name")]
A `[KnownBuiltin("name")]` attribute allows the compiler to identify this declaration during compilation, despite obfuscation or linkage removing optimizations
# Intrinsics
<a id="target-intrinsic"></a>
## __target_intrinsic(target, expansion)
This is a widely used and somewhat complicated intrinsic. Placed on a declaration it describes how the declaration should be emitted for a target. The complexity is that `expansion` is applied via a variety of rules. `target` is a "target capability", commonly it's just the emit target for the intrinsic, so one of...
* hlsl
* glsl
* cuda - CUDA
* cpp - C++ output (used for exe, shared-library or host-callable)
* spirv - Used for slangs SPIR-V direct mechanism
A function definition can have a `target_intrinsic` *and* a body. In that case, the body will be used for targets where the `target_intrinsic` isn't defined.
If the intrinsic can be emitted as is, the expansion need not be specified. If only the *name* needs to changed (params can be passed as is), only the name to be expanded to needs to be specified *without* `()`. In this scenario it is not necessary to specify as a string in quotes, and just the identifier name can be used.
Currently `HLSL` has a special handling in that it is *assumed* if a declaration exists that it can be emitted verbatim to HLSL.
The target can also be a capability atom. The atoms are listed in "slang-capability-defs.h".
What is perhaps of importance here is that for some features for a specific target can have multiple ways of achieving the same effect - for example "GL_NV_ray_tracing" and "GL_EXT_ray_tracing" are two different ray tracing extensions available for Vulkan through GLSL. The `-profile` option can disambiguate which extension is actually desired, and the capability with that name on the `target_intrinsic` specifies how to implement that feature for that specific extension.
The expansion mechanism is implemented in "slang-intrinsic-expand.cpp" which will be most up to date.
The `expansion` value can be a string or an identifier. If it is an identifier, it will just be emitted as is replacing the name of the declaration the intrinsics is associated with.
Sections of the `expansion` string that are to be replaced are prefixed by the `$` sigil.
* $0-9 - Indicates the parameter at that index. For a method call $0 is `this`.
* $T0-9 - The type for the param at the index. If the type is a texture resource derived type, returns the *element* type.
* $TR - The return type
* $G0-9 - Replaced by the type/value at that index of specialization
* $S0-9 - The scalar type of the generic at the index.
* $p - Used on texturing operations. Produces the combined texture sampler arguments as needed for GLSL.
* $C - The $C intrinsic is a mechanism to change the name of an invocation depending on if there is a format conversion required between the type associated by the resource and the backing ImageFormat. Currently this is only implemented on CUDA, where there are specialized versions of the RWTexture writes that will do a format conversion.
* $E - Sometimes accesses need to be scaled. For example in CUDA the x coordinate for surface access is byte addressed. $E will return the byte size of the *backing element*.
* $c - When doing texture access in GLSL the result may need to be cast. In particular if the underlying texture is 'half' based, GLSL only accesses (read/write) as float. So we need to cast to a half type on output. When storing into a texture it is still the case the value written must be half - but we don't need to do any casting there as half is coerced to float without a problem.
* $z - If we are calling a D3D texturing operation in the form t.Foo(s, ...), where `t` is a Texture&lt;T&gt;, then this is the step where we try to properly swizzle the output of the equivalent GLSL call into the right shape.
* $N0-9 - Extract the element count from a vector argument so that we can use it in the constructed expression.
* $V0-9 - Take an argument of some scalar/vector type and pad it out to a 4-vector with the same element type (this is the inverse of `$z`).
* $a - We have an operation that needs to lower to either `atomic*` or `imageAtomic*` for GLSL, depending on whether its first operand is a subscript into an array. This `$a` is the first `a` in `atomic`, so we will replace it accordingly.
* $A - We have an operand that represents the destination of an atomic operation in GLSL, and it should be lowered based on whether it is an ordinary l-value, or an image subscript. In the image subscript case this operand will turn into multiple arguments to the `imageAtomic*` function.
* $XP - Ray tracing ray payload
* $XC - Ray tracing callable payload
* $XH - Ray tracing hit object attribute
* $P - Type-based prefix as used for CUDA and C++ targets (I8 for int8_t, F32 - float etc)
## __attributeTarget(astClassName)
For an attribute, specifies the AST class (and derived class) the attribute can be applied to.
## __builtin
Identifies the declaration is being "builtin".
## __builtin_requirement(requirementKind)
A modifier that indicates a built-in associated type requirement (e.g., `Differential`). The requirement is one of `BuiltinRequirementKind`.
The requirement value can just be specified via the `$()` mechanism.
## __builtin_type(tag)
Specifies a builtin type - the integer value of one of the enumeration BaseType.
## __magic_type(clsName, tag)
Used before a type declaration. The clsName is the name of the class that is used to represent the type in the AST in Slang *C++* code. The tag is an optional integer value that is in addition and meaningful in the context of the class type.
##__intrinsic_type(op)
Used to specify the IR opcode associated with a type. The IR opcode is listed as something like `$(kIROp_HLSLByteAddressBufferType)`, which will expand to the integer value of the opcode (because the opcode value is an enum value that is visible from C++). It is possible to just write the opcode number, but that is generally inadvisable as the ids for ops are not stable. If a code change in Slang C++ adds or removes an opcode the number is likely to be incorrect.
As an example from the core module
```slang
__magic_type(HLSLByteAddressBufferType)
__intrinsic_type($(kIROp_HLSLByteAddressBufferType))
struct ByteAddressBuffer
{
// ...
};
```
# General
## __generic<>
Is an alternate syntax for specifying a declaration that is generic. The more commonly used form is to list the generic parameters in `<>` after the name of the declaration.
## attribute_syntax
Attribute syntax provides a mechanism to introduce an attribute type in Slang.
Right now the basic form is:
```
attribute_syntax [name(parmName: paramType, ...)] : syntaxClass;
```
There can be 0 or more params associated with the attribute, and if so the () are not needed.
* `name` gives the name of the attribute to define.
* `paramName` is the name of param that are specified with attribute use
* `paramType` is the type of the value associated with the param
* `syntaxClass` is the name of an AST node class that we expect this attribute to create when checked.
For example
```
__attributeTarget(FuncDecl)
attribute_syntax [CudaDeviceExport] : CudaDeviceExportAttribute;
```
Defines an attribute `CudaDeviceExport` which can only be applied to FuncDecl or derived AST types. Once semantically checked will be turned into a `CudaDeviceExportAttribute` attribute in the AST.
With a parameter
```
__attributeTarget(InterfaceDecl)
attribute_syntax [anyValueSize(size:int)] : AnyValueSizeAttribute;
```
Defines an attribute `anyValueSize` that can be applied to `InterfaceDecl` and derived types. It takes a single parameter called `anyValueSize` of `int` type.
## Ref<T>
Allows returning or passing a value "by reference".
# GLSL/Vulkan specific
## __glsl_version(version)
Used to specify the GLSL version number that is required for the subsequent declaration. When Slang emits GLSL source, the version at the start of the file, will be the largest version seen that emitted code uses.
For example
```slang
__glsl_version(430)
```
## __glsl_extension
Specifies the GLSL extension that is required for the declaration to work. A declaration that has the intrinsic, when output to GLSL will additionally add `#extension` to the the GLSL or SPIR-V output.
Multiple extensions can be applied to a decoration if that is applicable, if there are multiple ways of implementing that can be emitted in the same manner (see the section around [target](#target-intrinsic)) for more details.
## __spirv_version
When declaration is used for SPIR-V target will take the highest value seen to be the SPIR-V version required. For compilation through GLSLANG, the value is passed down to to GLSLANG specifying this SPIR-V is being targeted.
Example
```
__spirv_version(1.3)
```
## vk::spirv_instruction
Provides a way to use a limited amount of `GL_EXT_spirv_intrinsics` the extension.
```
vk::spirv_instruction(op, set)
```
Op is the integer *value* for the op. The `set` is optional string which specifies the instruction set the op is associated with.
For example
```
__specialized_for_target(glsl)
[[vk::spirv_instruction(1, "NonSemantic.DebugBreak")]]
void debugBreak();
```
# CUDA specific
## __cuda_sm_version
When declaration is used with this intrinsic for a CUDA target, the highest shader model seen will be passed down to the downstream CUDA compile (NVRTC).
# NVAPI
## [__requiresNVAPI]
If declaration is reached during a compilation for an applicable target (D3D11/12), will indicate that [NVAPI support](../nvapi-support.md) is required for declaration to work.

View File

@@ -0,0 +1,590 @@
# Slang Compiler Diagnostic Guidelines
## Overview
The Slang compiler aims to provide clear, actionable, and user-friendly diagnostics that help developers quickly understand and fix issues in their code. These guidelines draw from best practices established by Rust, Clang, and Swift compilers while adapting them for Slang's specific needs.
## Diagnostic Structure
A complete diagnostic in Slang consists of:
```
error[E0000]: main error message
--> file.slang:LL:CC
|
LL | <code>
| ^^^^ primary label
|
LL | <related code>
| -------------- secondary label
|
= note: additional context without a span
= help: suggestion for fixing the issue
```
### Core Components
- **Level**: `error`, `warning`, `lint`, `remark` (plus attached `note`, `help`)
- **Error Code**: Optional identifier (e.g., `E0308`) for detailed documentation lookup
- **Message**: Concise description of the problem
- **Source Location**: File path, line, and column information
- **Code Snippet**: The affected code with visual indicators
- **Labels**: Primary and secondary spans with explanatory text
- **Sub-diagnostics**: Additional notes and suggestions
- **Documentation Links**: References to relevant language guide chapters
## Diagnostic Levels
### Error
Emitted when the compiler cannot proceed with compilation:
- Syntax errors
- Type mismatches that prevent code generation
- Unresolved symbols
- Constraint violations
- Missing interface implementations
### Warning
Emitted for problematic but compilable code:
- Deprecated feature usage
- Unused variables or imports
- Potentially incorrect but syntactically valid code
- Code that may behave unexpectedly
- Can be turned into errors with `-werror`
### Lint
Off-by-default style or clarity guidelines:
- Extraneous parentheses
- Style violations
- Code clarity improvements
### Note
Provides additional context for errors and warnings:
- Related code locations
- Explanations of why something failed
- References to relevant language rules
### Help
Offers actionable suggestions:
- How to fix the problem
- Alternative approaches
- Links to documentation
### Remark
Off-by-default informational messages:
- Optimization hints
- Compilation progress information
- Performance suggestions
- Code generation notes
## Writing Style Guidelines
### Message Content
1. **Be concise and precise**
- ❌ "The compiler failed to find a matching type"
- ✅ "type mismatch: expected `int`, found `string`"
2. **Use plain language**
- Avoid compiler jargon when possible
- Define technical terms when necessary
- Write for developers who may be new to the language
3. **Include relevant context**
```
error[E0277]: interface `IAddable` is not implemented for type `String`
--> file.slang:7:22
|
4 | interface IAddable { This add(This other); }
| ---------------------- required by this interface
5 | String s1 = "hello";
6 | String s2 = "world";
7 | String result = add(s1, s2);
| ^^^ `add` requires `IAddable` interface
```
### Grammar and Formatting
1. **No ending punctuation** for single-sentence messages
- ✅ ``cannot find type `Foo` in this scope``
- ❌ ``cannot find type `Foo` in this scope.``
2. **Use backticks** for code elements
- Types: `` `float4` ``, `` `Texture2D<float4>` ``
- Identifiers: `` `myVariable` ``
- Keywords: `` `interface` ``, `` `struct` ``
3. **Lowercase start** for messages
-`missing semicolon`
-`Missing semicolon`
4. **Active voice** when describing problems
- ✅ ``function `foo` takes 2 arguments but 3 were provided``
- ❌ ``3 arguments were provided but function `foo` takes 2``
5. **Use Oxford comma** in lists
- ✅ `` expected one of `int`, `float`, or `double` ``
- ❌ `` expected one of `int`, `float` or `double` ``
6. **Use correct articles** (a vs. an)
-`an interface`
-`a struct`
- ✅ ``an `IFoo` implementation``
-`a interface`
### Type Aliases and Underlying Types
When type aliases are involved, show the underlying type when it helps clarify the error:
```
error[E0308]: type mismatch
--> file.slang:10:23
|
10 | ColorRGBA color = 0.5;
| ^^^ expected `ColorRGBA` (aka `float4`), found `float`
```
Display options for controlling type alias expansion:
- `-show-type-aliases=always`: Always show "aka" annotations
- `-show-type-aliases=helpful`: Show only when it clarifies (default)
- `-show-type-aliases=never`: Never expand type aliases
## Error Codes
### Format
- Use a letter prefix followed by 5 digits: `E00001`, `W00001`
- Group related errors in ranges:
- **TBD**
### Documentation
**Each error code needs:**
- Brief description
- Links to documentation
**Optionally:**
- Common causes
- Example code that triggers the error
- Suggested fixes
## Suggestions and Fix-its
### Applicability Levels
1. **MachineApplicable**: Can be automatically applied
```
help: add missing semicolon
|
5 | return value;
| +
```
2. **HasPlaceholders**: Requires user input
```
help: specify the type explicitly
|
5 | let color: <type> = value;
| +++++++++
```
3. **MaybeIncorrect**: Suggestion might not be appropriate
```
help: consider adding the `[shader("compute")]` attribute
|
5 | [shader("compute")]
| +++++++++++++++++++
6 | void main() {
```
### Guidelines for Suggestions
- Provide fix-its only when confidence is high
- Show the exact change needed
- Use placeholders (`<type>`, `<name>`) when user input is required
- Prefer showing code transformations over textual descriptions
## Span and Location Information
### Primary Spans
- Point to the exact location of the error
- Keep spans as small as possible while remaining meaningful
- For multi-token constructs, highlight the most relevant part
### Secondary Spans
- Show related code that contributes to the error
- Use different labels to distinguish multiple spans
- Order spans by relevance, not just by source location
### Example
```
error[E0308]: type mismatch in function call
--> file.slang:10:11
|
8 | void expectInt(int x) { }
| ----- expected `int` here
9 |
10 | expectInt("hello");
| ^^^^^^^ found `string`
```
## Error Cascading Prevention
We shouldn't be generating many dependent errors from a single mistake.
We should at least be checking that there are no additional error messages in all our diagnostic tests. At the moment we generally only check for the presence of the tested diagnostic.
To avoid overwhelming users with follow-on errors:
1. **Stop type-checking** in a scope after critical type errors
2. **Mark symbols as poisoned** when their definition has errors
3. **Limit error propagation** from generic instantiation failures
4. **Track error origins** to suppress duplicate reports
Example:
```
error[E0412]: the type `MyTexture` is not defined
--> file.slang:5:5
|
5 | MyTexture tex;
| ^^^^^^^^^ type not found
|
= note: subsequent errors involving `tex` have been suppressed
```
## Diagnostic Priority and Limits
### Priority System
When multiple errors exist, show them in this order:
TBD
1. Syntax errors
2. Import/module errors
3. Type definition errors
4. Interface implementation errors
5. Type mismatch errors
6. Other semantic errors
7. Warnings
8. Remarks
### Error Limits
- Configurable via `-max-errors=N`
- Show message when limit reached:
```
error: aborting due to 20 previous errors; use `-max-errors=N` to see more
```
## Lint System
Lints are a good opportunity to attach fix-its for a LSP or LLM.
### Lint Naming
- Use snake_case
- Name should make sense with "allow": `allow unused_variables`
- Be specific about what is being checked
- Group related lints with common prefixes
### Lint Levels
1. **allow**: Off by default
2. **warn**: On by default, produces warnings
3. **deny**: On by default, produces errors
### Lint Groups
Define logical groups:
- **style**: Code formatting and naming conventions
- NON_CAMEL_CASE_NAMES
- NON_UPPER_CASE_CONSTANTS
- INCONSISTENT_SPACING
- **correctness**: Potential bugs or incorrect usage
- **performance**: Performance-related suggestions
## Special Diagnostic Features
### Generic Type Diffing
When dealing with complex generic types, highlight differences:
```
error[E0308]: type mismatch
= note: expected `RWStructuredBuffer<float4>`
found `RWStructuredBuffer<float3>`
^^^^^^ types differ here
```
### Macro Expansion Context
Show the expansion chain for errors in macros:
```
error[E0369]: invalid operation
--> file.slang:20:5
|
20 | MY_MACRO!(x + y);
| ^^^^^^^^^^^^^^^^^ in this macro invocation
|
::: macros.slang:5:10
|
5 | $left + $right
| ^ cannot add these types
```
### Similar Name Suggestions
```
error[E0425]: cannot find `printn` in scope
--> file.slang:5:5
|
5 | printn("hello");
| ^^^^^^ not found
|
= help: a similar function exists: `println`
help: did you mean `println`?
|
5 | println("hello");
| ~~~~~~~
```
## IDE Integration
### LSP-Specific Formatting
Optimize diagnostics for Language Server Protocol:
- Include `DiagnosticRelatedInformation` for secondary spans
- Provide `CodeAction` items for fix-its
- Support incremental diagnostic updates
- Include diagnostic tags (deprecated, unnecessary)
### Inline Error Markup
Specifications for IDE display:
```json
{
"severity": "error",
"range": {
"start": { "line": 10, "character": 5 },
"end": { "line": 10, "character": 10 }
},
"message": "undefined variable `count`",
"code": "E00123",
"codeDescription": { "href": "https://docs.shader-slang.org/errors/E00123" }
}
```
### Quick-Fix Protocol
Standardized fix communication:
```json
{
"title": "Add missing interface implementation",
"kind": "quickfix",
"diagnostics": ["E00987"],
"edit": {
"changes": {
"file.slang": [
{
"range": { "start": { "line": 15, "character": 0 } },
"newText": "interface MyStruct : IRenderable {\n // implementation\n}\n"
}
]
}
}
}
```
### Diagnostic Severity Mappings
Map compiler levels to IDE severity:
- `error``DiagnosticSeverity.Error` (1)
- `warning``DiagnosticSeverity.Warning` (2)
- `remark``DiagnosticSeverity.Information` (3)
- `note``DiagnosticSeverity.Hint` (4)
## Internationalization
TBD (can we use LLMs here?)
## Testing Diagnostics
### Diagnostic Verification
TBD Test file syntax to be parsed and checked against machine readable output
Filecheck style test descriptions, but can be tested using the machine readable output.
```
void test() {
int x = "string";
// ERROR: type mismatch
// ^^^^^^^^ expected `int`, found `string`
// HELP: change the type annotation
}
```
### Test Coverage Requirements
- Each diagnostic should have at least one test
- Test both positive and negative cases
- Verify fix-its compile successfully
- Check error recovery after applying suggestions
## Progressive Disclosure
### Beginner-Friendly Defaults
- Show simple, actionable messages by default
- Hide implementation details unless relevant
- Provide links to learn more
## Performance Considerations
1. Don't compute expensive diagnostics unless needed
2. Avoid reporting the same error multiple times
3. Cache diagnostic messages for repeated errors
4. Use error limits to prevent runaway diagnostics
## Command-Line Interface
### Display Options
- `-error-format=json`: Machine-readable output
- `-color=auto|always|never`: Control color output
- `-show-error-codes`: Display error codes
- `-explain E00001`: Show detailed error explanation
- `-verbose-diagnostics`: Show additional diagnostic information
- `-max-errors=N`: Set maximum error count
- `-show-type-aliases=always|helpful|never`: Control type alias display
### Verbose Mode
With `-verbose-diagnostics`:
- Show full type signatures including type aliases
- Include compiler passes information
- Show all possible fixes, not just the most likely
- Display internal compiler state when relevant
### Example JSON Output
```json
{
"level": "error",
"code": "E0308",
"message": "type mismatch",
"spans": [
{
"file": "main.slang",
"line": 10,
"column": 15,
"text": "float3 color = float4(1, 0, 0, 1);",
"label": "expected `float3`, found `float4`"
}
],
"children": [
{
"level": "help",
"message": "use `.xyz` to extract the first three components",
"spans": [
{
"file": "main.slang",
"line": 10,
"column": 35,
"suggestion": ".xyz"
}
]
}
],
"documentation_url": "https://docs.shader-slang.org/errors/E00345"
}
```
## Best Practices Checklist
Before adding a new diagnostic:
- [ ] Is the message clear and actionable?
- [ ] Is the span as precise as possible?
- [ ] Would a fix-it help?
- [ ] Error code
- [ ] Is the severity level appropriate?
- [ ] Are related locations shown with notes?
- [ ] Is the message properly capitalized and punctuated, grammar etc.
- [ ] Will this message make sense in different contexts?
- [ ] Have we considered error cascading?
- [ ] Is there a relevant documentation link?
- [ ] Does the documentation have examples?
- [ ] Have we added tests for this diagnostic?
## Examples of Good Diagnostics
### Type Mismatch
```
error[E0308]: mismatched types
--> src/main.slang:5:16
|
4 | float3 expectVec3(float3 v) { return v; }
| ------- expected due to this parameter type
5 | expectVec3(float4(1, 0, 0, 1));
| ^^^^^^^^^^^^^^^^^^^ expected `float3`, found `float4`
|
= help: use `.xyz` to extract the first three components
= note: see https://docs.shader-slang.org/types/vectors for vector swizzling
```
### Missing Interface Implementation
```
error[E0277]: type `String` doesn't implement interface `IArithmetic`
--> src/main.slang:10:24
|
10 | String result = s1 + s2;
| ^ operator `+` requires `IArithmetic` interface
|
= note: the interface `IArithmetic` is not implemented for `String`
= note: string concatenation requires explicit method calls
= help: use `s1.concat(s2)` instead
= note: see https://docs.shader-slang.org/interfaces/operators
```
These guidelines should be treated as living documentation that evolves with the Slang compiler's needs and user feedback. Regular reviews and updates ensure diagnostics remain helpful and relevant.

View File

@@ -0,0 +1,114 @@
Slang Doc System
================
Slang contains a rudimentary documentation generation system. The mechanism used to mark up source is similar to [doxygen](https://www.doxygen.nl/manual/docblocks.html). Namely
```
/**
... text ... (JavaDoc style)
*/
void someFunctionA() {}
/*!
.. text .. (QT style)
another line
*/
void someFunctionB() {}
/// ... text ... (Multi line)
/// another line
void someFunctionC() {}
//!... text ... (QT Multi line)
//! another line
void someFunctionD() {}
```
All of the above examples will add the documentation for the declaration that appears after them. Also note that this slightly diverges from doxygen in that an empty line before and after in a multi line comment is *not* required.
We can also document the parameters to a function similarly
```
/// My function
void myFunction(
/// The A parameter
int a,
/// The B parameter
int b);
```
If you just need a single line comment to describe something, you can place the documentation after the parameter as in
```
/// My function
void myFunction( int a, //< The A parameter
int b) //< The B parameter
{}
```
This same mechanisms work for other kinds of common situations such as with enums
```
/// An enum
enum AnEnum
{
Value, ///< A value
/// Another value
/// With a multi-line comment
AnotherValue,
};
```
Like `doxygen` we can also have multi line comments after a declaration for example
```
/// An enum
enum AnEnum
{
Value, ///< A value
///< Some more information about `Value`
/// Another value
/// With a multi-line comment
AnotherValue,
};
```
To actually get Slang to output documentation you can use the `-doc` option from the `slangc` command line, or pass it in as parameter to `spProcessCommandLineArguments` or `processCommandLineArguments`. The documentation is currently output by default to the same `ISlangWriter` stream as diagnostics. So for `slangc` this will generally mean the terminal/stderr.
Currently the Slang doc system does not support any of the 'advanced' doxygen documentation features. If you add documentation to a declaration it is expected to be in [markdown](https://guides.github.com/features/mastering-markdown/).
Currently the only documentation style supported is a single file 'markdown' output. Future versions will support splitting into multiple files and linking between them. Also future versions may also support other documentation formats/standards.
It is possible to generate documentation for the slang core module. This can be achieved with `slangc` via
```
slangc -doc -compile-core-module
```
The documentation will be written to a file `stdlib-doc.md`.
It should be noted that it is not necessary to add markup to a declaration for the documentation system to output documentation for it. Without the markup the documentation is going to be very limited, in essence saying the declaration exists and other aspects that are available from the source. This may not be very helpful. For this reason and other reasons there is a mechanism to control the visibility of items in your source.
There are 3 visibility levels 'public', 'internal' and 'hidden'/'private'. There is a special comment that controls visibility for subsequent lines. The special comment starts with `//@` as shown below.
```
//@ public:
void thisFunctionAppearsInDocs() {}
//@ internal:
void thisFunctionCouldAppearInInternalDocs() {}
//@ hidden:
void thisFunctionWillNotAppearInDocs() {}
```

View File

@@ -0,0 +1,42 @@
Frequently Asked Questions
==========================
### How did this project start?
The Slang project forked off from the ["Spire"](https://github.com/spire-lang/spire) shading language research project.
In particular, Slang aims to take the lessons learned in that research effort (about how to make more productive shader compilation languages and tools) and apply them to a stystem that is easier to adopt, and hopefully more amenable to production use.
### Why should I use Slang instead of glslang, hlsl2glslfork, the Microsoft open-source HLSL compiler, etc.?
If you are mostly just shopping around for a tool to get HLSL shaders working on other graphics APIs, then [this](http://aras-p.info/blog/2014/03/28/cross-platform-shaders-in-2014/) blog post is probably a good place to start.
If one of those tools meets your requirements, then you should probably use it.
Slang is a small project, and early in development, so you might find that you hit fewer bumps in the road with one of the more established tools out there.
The goal of the Slang project is not to make "yet another HLSL-to-GLSL translator," but rather to create a shading language and supporting toolchain that improves developer productivity (and happiness) over the existing HLSL language and toolchain, while providing a reasonable adoption path for developers who have an existing investment in HLSL shader code.
If you think that is something interesting and worth supporting, then please get involved!
### What would make a shading language more productive?
This is probably best answered by pointing to the most recent publication from the Spire research project:
[Shader Components: Modular and High Performance Shader Development](http://graphics.cs.cmu.edu/projects/shadercomp/)
Some other papers for those who would like to read up on our inspiration:
[A System for Rapid Exploration of Shader Optimization Choices](http://graphics.cs.cmu.edu/projects/spire/)
[Spark: Modular, Composable Shaders for Graphics Hardware](https://graphics.stanford.edu/papers/spark/)
### Who is using Slang?
Right now the only user of Slang is the [Falcor](https://github.com/NVIDIA/Falcor) real-time rendering framework developed and used by NVIDIA Research.
The implementation of Slang has so far focused heavily on the needs of Falcor.
### Won't we all just be using C/C++ for shaders soon?
The great thing about both Vulkan and D3D12 moving to publicly-documented binary intermediate languages (SPIR-V and DXIL, respectively) is that there is plenty of room for language innovation on top of these interfaces.
Having support for writing GPU shaders in a reasonably-complete C/C++ language would be great.
We are supportive of efforts in the "C++ for shaders" direction.
The Slang effort is about trying to solve the challenges that are unique to the real-time graphics domain, and that won't magically get better by switching to C++.

View File

@@ -0,0 +1,9 @@
### Derivatives In Compute
An entry point may be decorated with `[DerivativeGroupQuad]` or `[DerivativeGroupLinear]` to specify how to use derivatives in compute shaders.
GLSL syntax may also be used, but is not recommended (`derivative_group_quadsNV`/`derivative_group_linearNV`).
Targets:
* **_SPIRV:_** Enables `DerivativeGroupQuadsNV` or `DerivativeGroupLinearNV`.
* **_GLSL:_** Enables `derivative_group_quadsNV` or `derivative_group_LinearNV`.
* **_HLSL:_** Does nothing. `sm_6_6` is required to use derivatives in compute shaders. HLSL uses an equivalent of `DerivativeGroupQuad`.

View File

@@ -0,0 +1,205 @@
Texture Footprint Queries
=========================
Slang supports querying the *footprint* of a texture sampling operation: the texels that would be accessed when performing that operation.
This feature is supported on Vulkan via the `GL_NV_shader_texture_footprint` extension, and on D3D12 via the `NvFootprint*` functions exposed by NVAPI.
# Background
There are many GPU rendering techniques that involve generating a texture (e.g., by rendering to it) and then sampling from that texture in a 3D rendering pass, such that it is difficult to predict *a priori* which parts of the texture will be accessed, or not.
As one example, consider rendering a shadow map that will be accessed when shading a g-buffer.
Depending on the geometry that was rendered into the g-buffer, and the occlusion that might exist, some parts of the shadow map might not be needed at all.
In principle, an application could use a compute pass on the g-buffer to compute, for each pixel, the part of the shadow-map texture that it will access - its footprint.
The application could then aggregate these footprints into a stencil mask or other data structure that could be used to optimize the rendering pass that generates the shadow map.
Unfortunately, it is almost impossible for applications to accurately and reliably predict the texel data that particular sampling operations will require, once non-trivial texture filtering modes are considered.
Sampling operations support a wide variety of state that affects the lookup and filtering of texels. For example:
* When bilinear filtering is enabled, a sampling operation typically accesses the four texels closest to the sampling location and blends them.
* When trilinear filtering is enabled, a sampling operation may access texels at two different mip levels.
* When anisotropic filtering is enabled, a sampling operation may take up to N *taps* (where N is the maximum supported degree of anisotropy), each of which may itself access a neighborhood of texels to produce a filtered value for that tap.
* When sampling a cube map, a sampling operation may straddle the "seam" between two or even three cube faces.
Texture footprint queries are intended to solve this problem by providing application developers with a primitive that can query the footprint of a texture sampling operation using the exact same sampler state and texture coordinates that will be used when sampling the texture later.
# Slang Shader API
Rather than exactly mirror the Vulkan GLSL extension or the NVAPI functions, the Slang core module provides a single common interface that can map to either of those implementations.
## Basics
A typical 2D texture sampling operation is performed using the `Sample()` method on `Texture2D`:
```hlsl
Texture2D<float4> texture = ...;
SamplerState sampler = ...;
float2 coords = ...;
// Sample a 2D texture
float4 color = texture.Sample(
sampler, coords);
```
To query the footprint that would be accessed by this operation, we can use an operation like:
```hlsl
uint granularity = ...;
TextureFootprint2D footprint = texture.queryFootprintCoarse(granularity,
sampler, coords);
```
Note that the same arguments used to call `Sample` above are here passed to `queryFootprint` in the exact same order.
The returned `footprint` encodes a conservative footprint of the texels that would be accessed by the equivalent `Sample` operation above.
Texture footprints are encoded in terms of blocks of texels, and the size of those blocks determined the *granularity* of the footprint.
The `granularity` argument to `queryFootprintCoarse` above indicates the granularity of blocks that the application requests.
In cases where a filtering operation might access two mip levels - one coarse and one fine - a footprint query only returns information about one of the two levels.
The application selects between these options by calling either `queryFootprintCoarse` or `queryFootprintFine`.
## Variations
A wide range of footprint queries are provided, corresponding to various cases of texture sampling operations with different parameters.
For 2D textures, the following functions are supported:
```hlsl
TextureFootprint2D Texture2D.queryFootprintCoarse(
uint granularity, SamplerState sampler, float2 coords);
TextureFootprint2D Texture2D.queryFootprintFine(
uint granularity, SamplerState sampler, float2 coords);
TextureFootprint2D Texture2D.queryFootprintCoarseBias(
uint granularity, SamplerState sampler, float2 coords,
float lodBias);
TextureFootprint2D Texture2D.queryFootprintFineBias(
uint granularity, SamplerState sampler, float2 coords,
float lodBias);
TextureFootprint2D Texture2D.queryFootprintCoarseLevel(
uint granularity, SamplerState sampler, float2 coords,
float lod);
TextureFootprint2D Texture2D.queryFootprintFineLevel(
uint granularity, SamplerState sampler, float2 coords,
float lod);
TextureFootprint2D Texture2D.queryFootprintCoarseGrad(
uint granularity, SamplerState sampler, float2 coords,
float2 dx, float2 dy);
TextureFootprint2D Texture2D.queryFootprintFineGrad(
uint granularity, SamplerState sampler, float2 coords,
float2 dx, float2 dy);
// Vulkan-only:
TextureFootprint2D Texture2D.queryFootprintCoarseClamp(
uint granularity, SamplerState sampler, float2 coords,
float lodClamp);
TextureFootprint2D Texture2D.queryFootprintFineClamp(
uint granularity, SamplerState sampler, float2 coords,
float lodClamp);
TextureFootprint2D Texture2D.queryFootprintCoarseBiasClamp(
uint granularity, SamplerState sampler, float2 coords,
float lodBias,
float lodClamp);
TextureFootprint2D Texture2D.queryFootprintFineBiasClamp(
uint granularity, SamplerState sampler, float2 coords,
float lodBias,
float lodClamp);
TextureFootprint2D Texture2D.queryFootprintCoarseGradClamp(
uint granularity, SamplerState sampler, float2 coords,
float2 dx, float2 dy,
float lodClamp);
TextureFootprint2D Texture2D.queryFootprintFineGradClamp(
uint granularity, SamplerState sampler, float2 coords,
float2 dx, float2 dy,
float lodClamp);
```
For 3D textures, the following functions are supported:
```hlsl
TextureFootprint3D Texture3D.queryFootprintCoarse(
uint granularity, SamplerState sampler, float3 coords);
TextureFootprint3D Texture3D.queryFootprintFine(
uint granularity, SamplerState sampler, float3 coords);
TextureFootprint3D Texture3D.queryFootprintCoarseBias(
uint granularity, SamplerState sampler, float3 coords,
float lodBias);
TextureFootprint3D Texture3D.queryFootprintFineBias(
uint granularity, SamplerState sampler, float3 coords,
float lodBias);
TextureFootprint3D Texture3D.queryFootprintCoarseLevel(
uint granularity, SamplerState sampler, float3 coords,
float lod);
TextureFootprint3D Texture3D.queryFootprintFineLevel(
uint granularity, SamplerState sampler, float3 coords,
float lod);
// Vulkan-only:
TextureFootprint3D Texture3D.queryFootprintCoarseClamp(
uint granularity, SamplerState sampler, float3 coords,
float lodClamp);
TextureFootprint3D Texture3D.queryFootprintFineClamp(
uint granularity, SamplerState sampler, float3 coords,
float lodClamp);
TextureFootprint3D Texture3D.queryFootprintCoarseBiasClamp(
uint granularity, SamplerState sampler, float3 coords,
float lodBias,
float lodClamp);
TextureFootprint3D Texture3D.queryFootprintFineBiasClamp(
uint granularity, SamplerState sampler, float3 coords,
float lodBias,
float lodClamp);
```
## Footprint Types
Footprint queries on 2D and 3D textures return values of type `TextureFootprint2D` and `TextureFootprint3D`, respectively, which are built-in `struct`s defined in the Slang core module:
```
struct TextureFootprint2D
{
typealias Anchor = uint2;
typealias Offset = uint2;
typealias Mask = uint2;
typealias LOD = uint;
typealias Granularity = uint;
property anchor : Anchor { get; }
property offset : Offset { get; }
property mask : Mask { get; }
property lod : LOD { get; }
property granularity : Granularity { get; }
property isSingleLevel : bool { get; }
}
struct TextureFootprint3D
{
typealias Anchor = uint3;
typealias Offset = uint3;
typealias Mask = uint2;
typealias LOD = uint;
typealias Granularity = uint;
property anchor : Anchor { get; }
property offset : Offset { get; }
property mask : Mask { get; }
property lod : LOD { get; }
property granularity : Granularity { get; }
property isSingleLevel : bool { get; }
}
```
A footprint is encoded in terms of *texel groups*, where the `granularity` determines the size of those groups.
When possible, the returned footprint will match the granularity passed into the query operation, but a larger granularity may be selected in cases where the footprint is too large to encode at the requested granularity.
The `anchor` property specifies an anchor point in the texture, in the vicinity of the footprint. Its components are in multiples of 8 texel groups.
The `offset` property specifies how the bits in `mask` map to texel groups in the vicinity of the `anchor` point.
The `mask` property is a 64-bit bitfield (encoded as a `uint2`), where each bit represents footprint coverage of one texel group, within a 8x8 (for 2D textures) or 4x4x4 neighborhood of texel groups.
The `lod` property indicates the mipmap level that would be accessed by the sampling operation.
The `isSingleLevel` property indicates if the sampling operation is known to access only a single mip level.
Note that this property will always be `false` when using the D3D/NVAPI path.

View File

@@ -0,0 +1,259 @@
Slang Language Guide
====================
This document will try to describe the main characteristics of the Slang language that might make it different from other shading languages you have used.
The Basics
----------
Slang is similar to HLSL, and it is expected that many HLSL programs can be used as Slang code with no modifications.
Big-picture stuff that is supported:
* A C-style preprocessor
* Ordinary function, `struct`, `typedef`, etc. declarations
* The standard vector/matrix types like `float3` and `float4x4`
* The less-used explicit `vector<T,N>` and `matrix<T,R,C>` types
* `cbuffer` declarations for uniform parameters
* Global-scope declarations of texture/sampler parameters, including with `register` annotations
* Entry points with varying `in`/`out` parameters using semantics (including `SV_*` system-value semantics)
* The built-in templated resource types like `Texture2D<T>` with their object-oriented syntax for sampling operations
* Attributes like `[unroll]` are parsed, and passed along for HLSL/DXBC output, but dropped for other targets
* `struct` types that contain textures/samplers as well as ordinary uniform data, both as function parameters and in constant buffers
* The built-in functions up through Shader Model 6.0 (as documented on MSDN) are supported
New Features
------------
### Import Declarations
In order to support better software modularity, and also to deal with the issue of how to integrate shader libraries written in Slang into other languages, Slang introduces an `import` declaration construct.
The basic idea is that if you write a file `foo.slang` like this:
```hlsl
// foo.slang
float4 someFunc(float4 x) { return x; }
```
you can then import this code into another file in Slang, HLSL, or GLSL:
```hlsl
// bar.slang
import foo;
float4 someOtherFunc(float4 y) { return someFunc(y); }
```
The simplest way to think of it is that the `import foo` declaration instructs the compiler to look for `foo.slang` (in the same search paths it uses for `#include` files), and give an error if it isn't found.
If `foo.slang` is found, then the compiler will go ahead and parse and type-check that file, and make any declarations there visible to the original file (`bar.slang` in this example).
When it comes time to generate output code, Slang will output any declarations from `import`ed files that were actually used (it skips those that are never referenced), and it will cross-compile them as needed for the chosen target.
A few other details worth knowing about `import` declarations:
* The name you use on the `import` line gets translated into a file name with some very simple rules. An underscore (`_`) in the name turns into a dash (`-`) in the file name, and dot separators (`.`) turn into directory separators (`/`). After these substitutions, `.slang` is added to the end of the name.
* If there are multiple `import` declarations naming the same file, it will only be imported once. This is also true for nested imports.
* Currently importing does not imply any kind of namespacing; all global declarations still occupy a single namespace, and collisions between different imported files (or between a file and the code it imports) are possible. This is a bug.
* If file `A.slang` imports `B.slang`, and then some other file does `import A;`, then only the names from `A.slang` are brought into scope, not those from `B.slang`. This behavior can be controlled by having `A.slang` use `__exported import B;` to also re-export the declarations it imports from `B`.
* An import is *not* like a `#include`, and so the file that does the `import` can't see preprocessor macros defined in the imported file (and vice versa). Think of `import foo;` as closer to `using namespace foo;` in C++ (perhaps without the same baggage).
### Explicit Parameter Blocks
One of the most important new features of modern APIs like Direct3D 12 and Vulkan is an interface for providing shader parameters using efficient *parameter blocks* that can be stored in GPU memory (these are implemented as descriptor tables/sets in D3D12/Vulkan, and "attribute buffers" in Metal).
However, HLSL and GLSL don't support explicit syntax for parameter blocks, and so shader programmers are left to manually pack parameters into blocks either using `register`/`layout` modifiers, or with API-based remapping (in the D3D12 case).
Slang supports a simple and explicit syntax for exploiting parameter blocks:
```hlsl
struct ViewParams
{
float3 cameraPos;
float4x4 viewProj;
TextureCube envMap;
};
ParameterBlock<ViewParams> gViewParams;
```
In this example, the fields of `gViewParams` will be assigned to registers/bindings in a way that supports allocating them into a single parameter block.
For example, when generating GLSL for Vulkan, the Slang compiler will generate a single `uniform` block (for `cameraPos` and `viewProj`) and a global `textureCube` for `envMap`, both decorated with the same `layout(set = ...)`.
### Interfaces
Slang supports declaring `interface`s that user-defined `struct` types can implement.
For example, here is a simple interface for light sources:
```hlsl
// light.slang
struct LightSample { float3 intensity; float3 direction; };
interface ILight
{
LightSample sample(float3 position);
}
```
We can now define a simple user type that "conforms to" (implements) the `ILight` interface:
```hlsl
// point-light.slang
import light;
struct PointLight : ILight
{
float3 position;
float3 intensity;
LightSample sample(float3 hitPos)
{
float3 delta = hitPos - position;
float distance = length(delta);
LightSample sample;
sample.direction = delta / distance;
sample.intensity = intensity * falloff(distance);
return sample;
}
}
```
### Generics
Slang supports *generic* declarations, using the common angle-bracket (`<>`) syntax from languages like C#, Java, etc.
For example, here is a generic function that works with any type of light:
```hlsl
// diffuse.slang
import light;
float4 computeDiffuse<L : ILight>( float4 albedo, float3 P, float3 N, L light )
{
LightSample sample = light.sample(P);
float nDotL = max(0, dot(N, sample.direction));
return albedo * nDotL;
}
```
The `computeDiffuse` function works with any type `L` that implements the `ILight` interface.
Unlike with C++ templates, the `computeDiffuse` function can be compiled and type-checked once (you won't suddenly get unexpected error messages when plugging in a new type).
#### Global-Scope Generic Parameters
Putting generic parameter directly on functions is helpful, but in many cases existing HLSL shaders declare their parameters at global scope.
For example, we might have a shader that uses a global declaration of material parameters:
```hlsl
Material gMaterial;
```
In order to allow such a shader to be converted to use a generic parameter for the material type (to allow for specialization), Slang supports declaring type parameters at the global scope:
```hlsl
type_param M : IMaterial;
M gMaterial;
```
Conceptually, you can think of this syntax as wrapping your entire shader program in a generic with parameter `<M : IMaterial>`.
This isn't beautiful syntax, but it may help when incrementally porting an existing HLSL codebase to use Slang's features.
### Associated Types
Sometimes it is difficult to define an interface because each type that implements it might need to make its own choice about some intermediate type.
As a concrete example, suppose we want to define an interface `IMaterial` for material surface shaders, where each material might use its own BRDF.
We want to support evaluating the *pattern* of the surface separate from the reflectance function.
```hlsl
// A reflectance function
interface IBRDF
{
float3 eval(float3 wi, float3 wo);
}
struct DisneyBRDF : IBRDF { ... };
struct KajiyaKay : IBRDF { ... };
// a surface pattern
interface IMaterial
{
??? evalPattern(float3 position, float2 uv);
}
```
What is the type `???` that `evalPattern` should return? We know that it needs to be a type that supports `IBRDF`, but *which* type?
One material might want to use `DisneyBRDF` while another wants to use `KajiyaKay`.
The solution in Slang, as in modern languages like Swift and Rust, is to use *associated types* to express the dependence of the BRDF type on the material type:
```hlsl
interface IMaterial
{
associatedtype B : IBRDF;
B evalPattern(float3 position, float2 uv);
}
struct MyCoolMaterial : IMaterial
{
typedef DisneyBRDF B;
B evalPattern(float3 position, float2 uv)
{ ... }
}
```
Associated types are an advanced concept, and we only recommend using them when they are needed to define a usable interface.
Future Extensions
-----------------
### Implicit Generics Syntax
The syntax for generics and interfaces in Slang is currently explicit, but verbose:
```hlsl
float4 computeDiffuse<L : ILight>( L light, ... )
{ ... }
```
As a future change, we would like to allow using an interface like `ILight` as an ordinary parameter type:
```hlsl
float4 computeDiffuse( ILight light, ... )
{ ... }
```
This simpler syntax would act like "syntactic sugar" for the existing explicit generics syntax, so it would retain all of the important performance properties.
### Returning a Value of Interface Type
While the above dealt with using an interface as a parameter type, we would eventually like to support using an interface as the *return* type of a function:
```hlsl
ILight getALightSource(Scene scene) { ... }
```
Implementing this case efficiently is more challenging. In most cases, an associated type can be used instead when an interface return type would be desired.
Not Supported
-------------
Some features of the current HLSL language are not supported, but probably will be given enough time/resources:
* Local variables of texture/sampler type (or that contain these)
* Matrix swizzles
* Explicit `packoffset` annotations on members of `cbuffer`s
Some things from HLSL are *not* planned to be supported, unless there is significant outcry from users:
* Pre-D3D10 and D3D11 syntax and operations
* The "effect" system, and the related `<>` annotation syntax
* Explicit `register` bindings on textures/samplers nested in `cbuffer`s
* Any further work towards making HLSL a subset of C++ (simply because implementing a full C++ compiler is way out of scope for the Slang project)

View File

@@ -0,0 +1,19 @@
> Note: This document is a work in progress. It is both incomplete and, in many cases, inaccurate.
Slang Language Reference
========================
Contents
--------
* [Introduction](introduction.md)
* [Basic Concepts](basics.md)
* [Lexical Structure](lexical-structure.md)
* [Preprocessor](preprocessor.md)
* [Types](types.md)
* [Expressions](expressions.md)
* [Statements](statements.md)
* [Declarations](declarations.md)
* [Attributes](attributes.md)
* [Graphics Shaders and Compute Kernels](shaders-and-kernels.md)
* [Glossary](glossary.md)

View File

@@ -0,0 +1,32 @@
> Note: This document is a work in progress. It is both incomplete and, in many cases, inaccurate.
Attributes
==========
> Note: This section is not yet complete.
## [[vk::spirv_instruction]]
** SPIR-V only **
This attribute is only available for Vulkan SPIR-V output.
The attribute allows access to SPIR-V intrinsics, by supplying a function declaration with the appropriate signature for the SPIR-V op and no body. The intrinsic takes a single parameter which is the integer value for the SPIR-V op.
In the example below the add function, uses the mechanism to directly use the SPIR-V integer add 'op' which is 128 in this case.
```HLSL
// 128 is OpIAdd in SPIR-V
[[vk::spirv_instruction(128)]]
uint add(uint a, uint b);
RWStructuredBuffer<uint> resultBuffer;
[numthreads(4,1,1)]
void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID)
{
uint threadId = dispatchThreadID.x;
resultBuffer[threadId] = add(threadId, threadId);
}
```

View File

@@ -0,0 +1,33 @@
# Program Behavior
## Observable behavior {#observable}
TODO
## Classification of Behavior {#classification}
Slang classifies the observable behavior of a program as follows:
1. **Precisely defined.** The observable behavior is defined precisely for all targets. Examples of precisely
defined behavior:
- Basic [unsigned integer](types-fundamental.md#integer) operations such as addition, subtraction,
multiplication.
2. **Implementation-defined.** The observable behavior is defined by the target and it is documented. The target
consists of the shader compilation target, the declared extensions, and the target device with
drivers. Examples of implementation-defined behavior:
- Size of [bool](types-fundamental.md#boolean)
- Evaluation of [floating point](types-fundamental.md#floating) numbers. For example, whether the target
implements [IEEE 754-2019](https://doi.org/10.1109/IEEESTD.2019.8766229) standard or something else.
- Memory layout when composed from fundamental types
- Target capabilities
- Available texture types and operations
3. **Unspecified.** The observable behavior is defined by the target but documentation is not
required. Examples of unspecified behavior:
- The bit-exact formulae for texture sampling algorithms
- Memory layouts of opaque types and their underlying data
4. **Undefined.** The program behavior is undefined. No guarantees are made. Possible results include a
program crash; data corruption; and differing computational results depending on optimization level, target
language/driver/device, or timing. Examples of undefined behavior:
- Data race
- Out-of-bounds memory access
- Application use of Slang internal language features

View File

@@ -0,0 +1,189 @@
# Execution Divergence and Reconvergence
Threads are said to be on a *uniform path* or *converged path* when either their execution has not diverged or
it has reconverged. When the threads are on a uniform path, the control flow is said to be *uniform*.
In structured control flow, divergence occurs when threads take different control flow
paths on conditional branches. Threads reconverge when the branches join.
Control flow uniformity is considered in the following scopes:
- *Thread-group-uniform path*: all threads in the thread group are on a uniform path.
- *Wave-uniform path*: all threads in the wave are on a uniform path.
In addition, a *mutually convergent* set of threads refers to the threads in a wave that are on a mutually
uniform path. When the execution has diverged, there is more than one such set.
> 📝 **Remark 1:** All threads start on uniform control flow at the shader entry point.
> 📝 **Remark 2:** In SPIR-V terminology: uniform control flow (or converged control flow) is the state when
> all threads execute the same
> [dynamic instance](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#DynamicInstance) of an
> instruction.
> 📝 **Remark 3:** Uniformity does not mean synchronicity. Even when the threads are on a uniform path, it
> does not mean that their progress is uniform. In particular, threads in a wave are not guaranteed to execute
> in lockstep, even if the programming model follows SIMT. Synchronization can be forced with a control flow
> barrier, but this usually incurs a performance overhead.
> 📝 **Remark 4:** Avoiding long divergent execution paths is often a good strategy to improve performance.
## Divergence and Reconvergence in Structured Control Flow {#divergence}
**`if` statements:**
Divergence occurs when some threads take the *then* branch and others take the *else*
branch. Reconvergence occurs when threads exit the *then* and *else* branches.
Example 1:
```hlsl
// divergence occurs when some threads evaluate
// the condition as `true` and others as `false`
if (cond)
{
// "then" path
}
else
{
// "else" path
}
// reconvergence
```
Example 2:
```hlsl
// divergence occurs when some threads evaluate
// the condition as `true` and others as `false`
if (cond)
{
// "then" path
}
// reconvergence
```
> 📝 **Remark**: There is no divergence when all threads take the same branch.
**`switch` statements:**
Divergence occurs when threads jump to different case groups. Reconvergence occurs when threads exit the
switch statement. Additionally, reconvergence between threads on adjacent case label groups occurs on a switch
case fall-through.
A case group is the set of case labels that precede the same non-empty statement.
Example 1:
```hlsl
// divergence occurs when threads jump to different
// case label groups:
switch (value)
{
// first case group
case 0:
case 2:
doSomething1();
break;
// second case group
case 1:
doSomething2();
break;
// third case group
case 3:
default:
doSomething3();
break;
}
// reconvergence
```
Example 2:
```hlsl
// divergence occurs when threads jump to different
// case label groups:
switch (value)
{
case 0: // first case group
doSomething1();
// fall-through
case 1: // second case group
// reconvergence between the first and
// the second case group
doSomething2();
// fall-through
default: // third case group
// reconvergence between the second and the third case group
//
// all threads are now on the same path
doSomething3();
break;
}
// no reconvergence here, since it already happened in
// the default case group.
```
**Loop statements:**
Divergence occurs when some threads exit a loop while the rest continue. Reconvergence occurs when all
threads have exited the loop.
Example 1:
```hlsl
[numthreads(128,1,1)]
void computeMain(uint3 threadId : SV_DispatchThreadID)
{
uint numLoops = 50 + (threadId.x & 1);
for (uint i = 0; i < numLoops; ++i)
{
// divergence after 50 iterations:
// - even-numbered threads exit the loop
// - odd-numbered threads continue for one more iteration
}
// reconvergence
}
```
## Thread-Group-Tangled Functions on Divergent Paths
Thread-group-tangled functions are supported only on thread-group-uniform paths. It is
[undefined behavior](basics-behavior.md#classification) to invoke a thread-group-tangled function on a
divergent path.
## Wave-Tangled Functions on Divergent Paths
The wave-tangled functions require special consideration when the execution within the wave has diverged:
1. Not all targets support wave-tangled functions on divergent paths. When unsupported, the results are
[undefined](basics-behavior.md#classification) when invoked on divergent paths. See
[target platforms](../target-compatibility.md) for details.
2. When supported, wave-tangled functions apply only between the mutually convergent thread
set by default. That is, synchronization occurs between those threads that are on the same path.
Example 1:
```hlsl
[numthreads(128,1,1)]
void computeMain(uint3 threadId : SV_DispatchThreadID)
{
uint minimumThreadId = 0;
// trigger divergence
if ((threadId.x & 1) == 0)
{
// smallest thread id that took the 'then' branch
minimumThreadId = WaveActiveMin(threadId.x);
}
else
{
// smallest thread id that took the 'else' branch
minimumThreadId = WaveActiveMin(threadId.x);
}
// reconvergence
}
```

View File

@@ -0,0 +1,3 @@
# Memory Model
TODO

View File

@@ -0,0 +1,102 @@
# Program Execution
At a high level, Slang program execution is defined as follows:
1. Workload for a Slang program is *dispatched* (compute) or *launched* (graphics).
2. The dispatched or launched workload is divided into entry point *invocations*, which are executed by
*threads*.
An individual entry point invocation handles one point of parametric input. The input parameters and the entry
point return value are specific to the types of the entry points. For example:
- A fragment shader entry point is invoked once per rasterized fragment. The set of entry point invocations is
determined by the rasterizer. Per-invocation inputs for the fragment shader come from the rasterizer and the
vertex shader stage. The output of a fragment shader is a per-fragment value. For example, a vector with
red/green/blue/alpha color components for an RGBA render target.
- A compute kernel entry point is invoked once per user-defined input parameter point. The inputs are the
*thread coordinates* that identify the invocation. A compute kernel has no intrinsic output. Instead, it
stores the results in output buffers.
Inputs and outputs of different graphics shaders and compute kernels are described in more detail in [graphics
shaders and compute kernels](shaders-and-kernels.md).
The graphics launches are determined by the draw calls and the graphics pipeline configuration. How a launch
is precisely divided into shader entry point invocations depends on the target.
A compute dispatch is explicit, and it has an application-defined subdivision structure. For a compute
dispatch, the application defines the input parameter space as a grid of thread groups as follows:
1. A compute dispatch is a user-specified 3-dimensional set of integer-valued points. The user passes a
3-dimensional grid dimension vector `grid_dim`, which specifies the grid points `g` such that
`0`&le;`g.x`&lt;`grid_dim.x`, `0`&le;`g.y`&lt;`grid_dim.y`, `0`&le;`g.z`&lt;`grid_dim.z`.
2. For every point in the grid, a thread group is instantiated. The thread group size is similarly specified
with a 3-dimensional vector `group_dim`. Within the thread group, individual thread invocations `b` are
instantiated such that `0`&le;`b.x`&lt;`group_dim.x`, `0`&le;`b.y`&lt;`group_dim.y`,
`0`&le;`b.z`&lt;`group_dim.z`. The thread group dimensions are typically specified in the compute entry
point as an attribute or as compute dispatch parameters.
3. An individual invocation is executed by an individual thread for every grid and thread group point
combination. There are a total of
`grid_dim.x`\*`grid_dim.y`\*`grid_dim.z`\*`group_dim.x`\*`group_dim.y`\*`group_dim.z` invocations per
dispatch.
In both graphics launches and compute dispatches, individual invocations are grouped into waves. The wave size
is a power-of-two in the range [4, 128] and is defined by the target.
In graphics launches, waves are formed from the launch by a target-defined mechanism. They need not have
more in common than that they belong in the same pipeline stage using the same entry point. In particular, a
wave in a fragment stage may process fragments from different geometric primitives.
In compute dispatches, a wave is subdivided from a thread group in a target-defined manner. Usually, a wave
consists of adjacent invocations, but in general, the application should not make any assumptions about the
wave shapes.
Some waves may be only partially filled when the compute thread group or the graphics launch does not align
with the wave size. In compute dispatches, the thread group size should generally be a multiple of the wave
size for best utilization.
# Thread Group Execution Model
All threads within a thread group execute on the same set of execution resources. This allows a thread group
to share local memory allocated with the `groupshared` attribute. Related barriers include
`GroupMemoryBarrier()` and `GroupMemoryBarrierWithGroupSync()`.
The thread group execution model applies only to compute kernels.
# Wave Execution Model
All threads in a wave execute in the single instruction, multiple threads (SIMT) model.
Threads in a wave can synchronize and share data efficiently using *wave-tangled* functions such as ballots,
reductions, shuffling, control flow barriers with the wave scope, and similar operations. For example, atomic
memory accesses to the same memory location by multiple threads can often be coalesced within the wave and
then performed by a single thread. This can significantly reduce the number of atomic memory accesses, and
thus, increase performance.
Wave-tangled functions operate over all participating threads in the wave. In general, the inputs for a
wave-tangled function are the inputs of all participating threads, and similarly, the outputs of a
wave-tangled function are distributed over the participating threads executing the function.
Usually, the participating threads are the active threads on a [mutually convergent
path](basics-execution-divergence-reconvergence.md).
The threads within a wave belong to one of the following classes:
- *active thread*---a thread that participates in producing a result.
- *inactive thread*---a thread that does not produce any side effects. A thread can be inactive for one of the
following reasons:
- The thread is not executing the [current path](basics-execution-divergence-reconvergence.md#divergence).
- The wave could not be fully utilized when assigning threads.
- The thread has executed a `discard` statement, which disables the thread (fragment shaders only).
- *helper thread*---a thread that is used to compute derivatives, typically for fragment quads. A helper
thread does not produce any other side effects, and it does not participate in wave-tangled functions unless
otherwise stated.
Despite the SIMT execution model, Slang does not require that wave invocations execute in lockstep, unless
they are on a mutually convergent control flow path and they are executing synchronizing functions such as
control flow barriers (*e.g.*, `GroupMemoryBarrierWithWaveSync()`).
> 📝 **Remark 1:** The actual execution hardware may or may not be implemented using an SIMT instruction
> set. In particular, a CPU target would generally not use SIMT instructions.
> 📝 **Remark 2:** In SPIR-V terminology, wave-tangled functions are called *tangled instructions* with the
> subgroup scope.

View File

@@ -0,0 +1,3 @@
# Translation Overview
TODO

View File

@@ -0,0 +1,9 @@
# Basic Concepts
TODO: Add overview
* [Translation overview](basics-translation-overview.md)
* [Program Execution](basics-program-execution.md)
* [Execution Divergence and Reconvergence](basics-execution-divergence-reconvergence.md)
* [Memory Model](basics-memory-model.md)
* [Program Behavior](basics-behavior.md)

View File

@@ -0,0 +1,776 @@
> Note: This document is a work in progress. It is both incomplete and, in many cases, inaccurate.
Declarations
============
Modules
-------
A module consists of one or more source units that are compiled together.
The global declarations in those source units comprise the body of the module.
In general, the order of declarations within a source unit does not matter; declarations can refer to other declarations (of types, functions, variables, etc.) later in the same source unit.
Declarations (other than `import` declarations) may freely be defined in any source unit in a module; declarations in one source unit of a module may freely refer to declarations in other source units.
Imports
-------
An import declaration is introduced with the keyword `import`:
```hlsl
import Shadowing;
```
An import declaration searches for a module matching the name given in the declaration, and brings the declarations in that module into scope in the current source unit.
> Note: an `import` declaration only applies to the scope of the current source unit, and does *not* import the chosen module so that it is visible to other source units of the current module.
The name of the module being imported may use a compound name:
```hlsl
import MyApp.Shadowing;
```
The mechanism used to search for a module is implementation-specific.
> Note: The current Slang implementation searches for a module by translating the specified module name into a file path by:
>
> * Replacing any dot (`.`) separators in a compound name with path separators (e.g., `/`)
>
> * Replacing any underscores (`_`) in the name with hyphens (`-`)
>
> * Appending the extension `.slang`
>
> The implementation then looks for a file matching this path on any of its configured search paths.
> If such a file is found it is loaded as a module comprising a single source unit.
The declarations of an imported module become visible to the current module, but they are not made visible to code that later imports the current module.
> Note: An experimental feature exists for an "exported" import declaration:
>
> ```hlsl
> // inside A.slang
> __exported import Shadowing;
> ```
>
> This example imports the declarations from `Shadowing` into the current module (module `A`),
> and also sets up information so that if other code declares `import A` then it can see
> both the declarations in `A` and those in `Shadowing`.
> Note: Mixing `import` declarations and traditional preprocessor-based (`#include`) modularity
> in a codebase can lead to surprising results.
>
> Some things to be aware of:
>
> * Preprocessor definitions in your module do *not* affect the code of modules you `import`.
>
> * Preprocessor definitions in a module you `import` do *not* affect your code
>
> * The above caveats also apply to "include guards" and `#pragma once`, since they operate at the granularity of a source unit (not across modules)
>
> * If you `import` two modules, and then both `#include` the same file, then those two modules may end up with duplicate declarations with the same name.
>
> As a general rule, be wary of preprocessor use inside of code meant to be an `import`able module.
Variables
---------
Variables are declared using the keywords `let` and `var`:
```hlsl
let x = 7;
var y = 9.0;
```
A `let` declaration introduces an immutable variable, which may not be assigned to or used as the argument for an `in out` or `out` parameter.
A `var` declaration introduces a mutable variable.
An explicit type may be given for a variable by placing it after the variable name and a colon (`:`):
```hlsl
let x : int = 7;
var y : float = 9.0;
```
If no type is specified for a variable, then a type will be inferred from the initial-value expression.
It is an error to declare a variable that has neither a type specifier or an initial-value expression.
It is an error to declare a variable with `let` without an initial-value expression.
A variable declared with `var` may be declared without an initial-value expression if it has an explicit type specifier:
```
var y : float;
```
In this case the variable is _uninitialized_ at the point of declaration, and must be explicitly initialized by assigning to it.
Code that uses the value of an uninitialized variable may produce arbitrary results, or even exhibit undefined behavior depending on the type of the variable.
Implementations *may* issue an error or warning for code that might make use of an uninitialized variable.
### Traditional Syntax
Variables may also be declared with traditional C-style syntax:
```hlsl
const int x = 7;
float y = 9.0;
```
For traditional variable declarations a type must be specified.
> Note: Slang does not support an `auto` type specifier like C++.
Traditional variable declarations are immutable if they are declared with the `const` modifier, and are otherwise mutable.
### Variables at Global Scope
Variables declared at global scope may be either a global constant, a static global variables, or a global shader parameters.
#### Global Constants
A variable declared at global scope and marked with `static` and `const` is a _global constant_.
A global constant must have an initial-value expression, and that initial-value expression must be a compile-time constant expression.
#### Static Global Variables
A variable declared at global scope and marked with `static` (but not with `const`) is a _static global variable_.
A static global variable provides storage for each invocation executing an entry point.
Assignments to a static global variable from one invocation do not affect the value seen by other invocations.
> Note: the semantics of static global variable are similar to a "thread-local" variable in other programming models.
A static global variable may include an initial-value expression; if an initial-value expression is included it is guaranteed to be evaluated and assigned to the variable before any other expression that references the variable is evaluated.
There is no guarantee that the initial-value expression for a static global variable is evaluated before entry point execution begins, or even that the initial-value expression is evaluated at all (in cases where the variable might not be referenced at runtime).
> Note: the above rules mean that an implementation may perform dead code elimination on static global variables, and may choose between eager and lazy initialization of those variables at its discretion.
#### Global Shader Parameters
A variable declared at global scope and not marked with `static` (even if marked with `const`) is a _global shader parameter_.
Global shader parameters are used to pass arguments from application code into invocations of an entry point.
The mechanisms for parameter passing are specific to each target platform.
> Note: Currently only global shader parameters of opaque types or arrays of opaque types are supported.
A global shader parameter may include an initial-value epxression, but such an expression does not affect the semantics of the compiled program.
> Note: Initial-value expressions on global shader parameters are only useful to set up "default values" that can be read via reflection information and used by application code.
### Variables at Function Scope
Variables declared at _function scope_ (in the body of a function, initializer, subscript accessor, etc.) may be either a function-scope constant, function-scope static variable, or a local variable.
#### Function-Scope Constants
A variable declared at function scope and marked with both `static` and `const` is a _function-scope constant_.
Semantically, a function-scope constant behaves like a global constant except that is name is only visible in the local scope.
#### Function-Scope Static Variables
A variable declared at function scope and marked with `static` (but not `const`) is a _function-scope static variable_.
Semantically, a function-scope static variable behaves like a global static variable except that its name is only visible in the local scope.
The initial-value expression for a function-scope static variable may refer to non-static variables in the body of the function.
In these cases initialization of the variable is guaranteed not to occur until at least the first time the function body is evaluated for a given invocation.
#### Local Variables
A variable declared at function scope and not marked with `static` (even if marked with `const`) is a _local variable_.
A local variable has unique storage for each _activation_ of a function by an invocation.
When a function is called recursively, each call produces a distinct activation with its own copies of local variables.
Functions
---------
Functions are declared using the `func` keyword:
```hlsl
func add(x: int, y: float) -> float { return float(x) + y; }
```
Parameters
----------
The parameters of the function are declared as `name: type` pairs.
Parameters may be given a _default value_ by including an initial-value-expression clause:
```hlsl
func add(x: int, y: float = 1.0f) { ... }
```
Parameters may be marked with a _direction_ which affects how data is passed between caller and callee:
```hlsl
func add(x: in out int, y : float) { x += ... }
```
The available directions are:
* `in` (the default) indicates typical pass-by-value (copy-in) semantics. The callee receives a *copy* of the argument passed by the caller.
* `out` indicates copy-out semantics. The callee writes to the parameter and then a copy of that value is assigned to the argument of the caller after the call returns.
* `in out` or `inout` indicates pass-by-value-result (copy-in and copy-out) semantics. The callee receives a copy of the argument passed by the caller, it may manipulate the copy, and then when the call returns the final value is copied back to the argument of the caller.
An implementation may assume that at every call site the arguments for `out` or `in out` parameters never alias.
Under those assumptions, the `out` and `inout` cases may be optimized to use pass-by-reference instead of copy-in and copy-out.
> Note: Applications that rely on the precise order in which write-back for `out` and `in out` parameters is performed are already on shaky semantic ground.
Body
----
The _body_ of a function declaration consists of statements enclosed in curly braces `{}`.
In some cases a function declaration does not include a body, and in these cases the declaration must be terminated with a semicolon (`;`):
```hlsl
func getCount() -> int;
```
> Note: Slang does not require "forward declaration" of functions, although
> forward declarations are supported as a compatibility feature.
>
> The only place where a function declaration without a definition should be
> required is in the body of an `interface` declaration.
The result type of a function mayb be specified after the parameter list using a _result type clause_ consisting of an arrow (`->`) followed by a type.
If the function result type is `void`, the result type clause may be elided:
```hlsl
func modify(x: in out int) { x++; }
```
### Traditional Syntax
Functions can also be declared with traditional C-style syntax:
```hlsl
float add(int x, float y) { return float(x) + y; }
void modify(in out int x) { x ++; }
```
> Note: Currently traditional syntax must be used for shader entry point functions,
> because only the traditional syntax currently supports attaching semantics to
> parameters.
### Entry Points
An _entry point_ is a function that will be used as the starting point of execution for one or more invocations of a shader.
Structure Types
---------------
Structure types are declared using the `struct` keyword:
```hlsl
struct Person
{
var age : int;
float height;
int getAge() { return age; }
func getHeight() -> float { return this.height; }
static func getPopulation() -> int { ... }
}
```
The body of a structure type declaration may include variable, type, function, and initializer declarations.
### Fields
Variable declarations in the body of a structure type declaration are also referred to as _fields_.
A field that is marked `static` is shared between all instances of the type, and is semantically like a global variable marked `static`.
A non-`static` field is also called an _instance field_.
### Methods
Function declarations in the body of a structure type declaration are also referred to as _methods_.
A method declaration may be marked `static`.
A `static` method must be invoked on the type itself (e.g., `Person.getPopulation()`).
A non-`static` method is also referred to as an _instance method_.
Instance methods must be invoked on an instance of the type (e.g., `somePerson.getAge()`).
The body of an instance method has access to an implicit `this` parameter which refers to the instance on which the method was invoked.
By default the `this` parameter of an instance method acts as an immutable variable.
An instance method with the `[mutating]` attribute receives a mutable `this` parameter, and can only be invoked on a mutable value of the structure type.
### Inheritance
A structure type declaration may include an _inheritance clause_ that consists of a colon (`:`) followed by a comma-separated list of interface types that the structure type inherits from:
```
struct Person : IHasAge, IHasName
{ .... }
```
When a structure type declares that it inherits from an interface, the programmer asserts that the structure type implements the required members of the interface.
Currently only interface types may be named in the inheritance clause of a structure type.
A structure type may inherit from multiple interfaces.
> Note: In language versions prior to Slang 2026, struct-to-struct inheritance was permitted but generates a compiler warning.
> In Slang 2026 and later, struct-to-struct inheritance is not supported and generates an error.
### Syntax Details
A structure declaration does *not* need to be terminated with a semicolon:
```hlsl
// A terminating semicolon is allowed
struct Stuff { ... };
// The semicolon is not required
struct Things { ... }
```
When a structure declarations ends without a semicolon, the closing curly brace (`}`) must be the last non-comment, non-whitespace token on its line.
For compatibility with C-style code, a structure type declaration may be used as the type specifier in a traditional-style variable declaration:
```hlsl
struct Association
{
int from;
int to;
} associations[] =
{
{ 1, 1 },
{ 2, 4 },
{ 3, 9 },
};
```
If a structure type declaration will be used as part of a variable declaration, then the next token of the variable declaration must appear on the same line as the closing curly brace (`}`) of the structure type declaration.
The whole variable declaration must be terminated with a semicolon (`;`) as normal.
Enumeration Types
-----------------
Enumeration type declarations are introduced with the `enum` keyword:
```hlsl
enum Color
{
Red,
Green = 3,
Blue,
}
```
### Cases
The body of an enumeration type declaration consists of a comma-separated list of case declarations.
An optional trailing comma may terminate the lis of cases.
A _case declaration_ consists of the name of the case, along with an optional initial-value expression that specifies the _tag value_ for that case.
If the first case declaration in the body elides an initial-value expression, the value `0` is used for the tag value.
If any other case declaration elides an initial-value expressions, its tag value is one greater than the tag value of the immediately preceding case declaration.
An enumeration case is referred to as if it were a `static` member of the enumeration type (e.g., `Color.Red`).
### Inheritance
An enumeration type declaration may include an inheritance clause:
```hlsl
enum Color : uint
{ ... }
```
The inheritance clause of an enumeration declaration may currently only be used to specify a single type to be used as the _tag type_ of the enumeration type.
The tag type of an enumeration must be a built-in scalar integer type.
The tag value of each enumeration case will be a value of the tag type.
If no explicit tag type is specified, the type `int` is used instead.
> Note: The current Slang implementation has bugs that prevent explicit tag types from working correctly.
### Conversions
A value of an enumeration type can be implicitly converted to a value of its tag type:
```hlsl
int r = Color.Red;
```
Values of the tag type can be explicitly converted to the enumeration type:
```hlsl
Color red = Color(r);
```
Type Aliases
------------
A type alias is declared using the `typealias` keyword:
```hlsl
typealias Height = int;
```
A type alias defines a name that will be equivalent to the type to the right of `=`.
### Traditional Syntax
Type aliases can also be declared with traditional C-style syntax:
```hlsl
typedef int Height;
```
Constant Buffers and Texture Buffers
------------------------------------
As a compatibility feature, the `cbuffer` and `tbuffer` keywords can be used to introduce variable declarations.
A declaration of the form:
```hlsl
cbuffer Name
{
F field;
// ...
}
```
is equivalent to a declaration of the form:
```hlsl
struct AnonType
{
F field;
// ...
}
__transparent ConstantBuffer<AnonType> anonVar;
```
In this expansion, `AnonType` and `anonVar` are fresh names generated for the expansion that cannot collide with any name in user code, and the modifier `__transparent` makes it so that an unqualified reference to `field` can implicitly resolve to `anonVar.field`.
The keyword `tbuffer` uses an equivalent expansion, but with `TextureBuffer<T>` used instead of `ConstantBuffer<T>`.
Interfaces
----------
An interface is declared using the `interface` keyword:
```hlsl
interface IRandom
{
uint next();
}
```
The body of an interface declaration may contain function, initializer, subscript, and associated type declarations.
Each declaration in the body of an interface introduces a _requirement_ of the interface.
Types that declare conformance to the interface must provide matching implementations of the requirements.
Functions, initializers, and subscripts declared inside an interface must not have bodies; default implementations of interface requirements are not currently supported.
An interface declaration may have an inheritance clause:
```hlsl
interface IBase
{
int getBase();
}
interface IDerived : IBase
{
int getDerived();
}
```
The inheritance clause for an interface must only list other interfaces.
If an interface `I` lists another interface `J` in its inheritance clause, then `J` is a _base interface_ of `I`.
In order to conform to `I`, a type must also conform to `J`.
Associated Types
----------------
An associated type declaration is introduced with `associatedtype`:
```hlsl
associatedtype Iterator;
```
An associated type declaration introduces a type into the signature of an interface, without specifying the exact concrete type to use.
An associated type is an interface requirement, and different implementations of an interface may provide different types that satisfy the same associated type interface requirement:
```
interface IContainer
{
associatedtype Iterator;
...
}
struct MyArray : IContainer
{
typealias Iterator = Int;
...
}
struct MyLinkedList : IContainer
{
struct Iterator { ... }
...
}
```
It is an error to declare an associated type anywhere other than the body of an interface declaration.
An associated type declaration may have an inheritance clause.
The inheritance clause of an associated type may only list interfaces; these are the _required interfaces_ for the associated type.
A concrete type that is used to satisfy an associated type requirement must conform to all of the required interfaces of the associated type.
Initializers
------------
An initializer declaration is introduced with the `__init` keyword:
```hlsl
struct MyVector
{
float x, float y;
__init(float s)
{
x = s;
y = s;
}
}
```
> Note: Initializer declarations are a non-finalized and unstable feature, as indicated by the double-underscore (`__`) prefix on the keyword.
> Arbitrary changes to the syntax and semantics of initializers may be introduced in future versions of Slang.
An initializer declaration may only appear in the body of an interface or a structure type.
An initializer defines a method for initializing an instance of the enclosing type.
> Note: A C++ programmer might think of an initializer declaration as similar to a C++ _constructor_.
An initializer has a parameter list and body just like a function declaration.
An initializer must not include a result type clause; the result type of an initializer is always the enclosing type.
An initializer is invoked by calling the enclosing type as if it were a function.
E.g., in the example above, the initializer in `MyVector` can be invoked as `MyVector(1.0f)`.
An initializer has access to an implicit `this` variable that is the instance being initialized; an initializer must not be marked `static`.
The `this` variable of an initializer is always mutable; an initializer need not, and must not, be marked `[mutating]`.
> Note: Slang currently does not enforce that a type with an initializer can only be initialized using its initializers.
> It is possible for user code to declare a variable of type `MyVector` above, and explicitly write to the `x` and `y` fields to initialize it.
> A future version of the language may close up this loophole.
> Note: Slang does not provide any equivalent to C++ _destructors_ which run automatically when an instance goes out of scope.
Subscripts
----------
A subscript declaration is introduced with the `__subscript` keyword:
```hlsl
struct MyVector
{
...
__subscript(int index) -> float
{
get { return index == 0 ? x : y; }
}
}
```
> Note: subscript declarations are a non-finalized and unstable feature, as indicated by the double-underscore (`__`) prefix on the keyword.
> Arbitrary changes to the syntax and semantics of subscript declarations may be introduced in future versions of Slang.
A subscript declaration introduces a way for a user-defined type to support subscripting with the `[]` braces:
```hlsl
MyVector v = ...;
float f = v[0];
```
A subscript declaration lists one or more parameters inside parentheses, followed by a result type clause starting with `->`.
The result type clause of a subscript declaration cannot be elided.
The body of a subscript declaration consists of _accessor declarations_.
Currently only `get` accessor declarations are supported for user code.
A `get` accessor declaration introduces a _getter_ for the subscript.
The body of a getter is a code block like a function body, and must return the appropriate value for a subcript operation.
The body of a getter can access the parameters of the enclosing subscript, as a well as an implicit `this` parameter of the type that encloses the accessor.
The `this` parameter of a getter is immutable; `[mutating]` getters are not currently supported.
Extensions
----------
An extension declaration is introduced with the `extension` keyword:
```hlsl
extension MyVector
{
float getLength() { return sqrt(x*x + y*y); }
static int getDimensionality() { return 2; }
}
```
An extension declaration adds behavior to an existing type.
In the example above, the `MyVector` type is extended with an instance method `getLength()`, and a static method `getDimensionality()`.
An extension declaration names the type being extended after the `extension` keyword.
The body of an extension declaration may include type declarations, functions, initializers, and subscripts.
> Note: The body of an extension may *not* include variable declarations.
> An extension cannot introduce members that would change the in-memory layout of the type being extended.
The members of an extension are accessed through the type that is being extended.
For example, for the above extension of `MyVector`, the introduced methods are accessed as follows:
```hlsl
MyVector v = ...;
float f = v.getLength();
int n = MyVector.getDimensionality();
```
An extension declaration need not be placed in the same module as the type being extended; it is possible to extend a type from third-party or standard module code.
The members of an extension are only visible inside of modules that `import` the module declaring the extension;
extension members are *not* automatically visible wherever the type being extended is visible.
An extension declaration may include an inheritance clause:
```hlsl
extension MyVector : IPrintable
{
...
}
```
The inheritance clause of an extension declaration may only include interfaces.
When an extension declaration lists an interface in its inheritance clause, it asserts that the extension introduces a new conformance, such that the type being extended now conforms to the given interface.
The extension must ensure that the type being extended satisfies all the requirements of the interface.
Interface requirements may be satisfied by the members of the extension, members of the original type, or members introduced through other extensions visible at the point where the conformance was declared.
It is an error for overlapping conformances (that is, of the same type to the same interface) to be visible at the same point.
This includes cases where two extensions declare the same conformance, as well as those where the original type and an extension both declare the same conformance.
The conflicting conformances may come from the same module or difference modules.
In order to avoid problems with conflicting conformances, when a module `M` introduces a conformance of type `T` to interface `I`, one of the following should be true:
* the type `T` is declared in module `M`, or
* the type `I` is declared in module `M`
Any conformance that does not follow these rules (that is, where both `T` and `I` are imported into module `M`) is called a _retroactive_ conformance, and there is no way to guarantee that another module `N` will not introduce the same conformance.
The runtime behavior of programs that include overlapping retroactive conformances is currently undefined.
Currently, extension declarations can only apply to structure types; extensions cannot apply to enumeration types or interfaces.
Generics
--------
Many kinds of declarations can be made _generic_: structure types, interfaces, extensions, functions, initializers, and subscripts.
A generic declaration introduces a _generic parameter list_ enclosed in angle brackets `<>`:
```hlsl
T myFunction<T>(T left, T right, bool condition)
{
return condition ? left : right;
}
```
### Generic Parameters
A generic parameter list can include one or more parameters separated by commas.
The allowed forms for generic parameters are:
* A single identifier like `T` is used to declare a _generic type parameter_ with no constraints.
* A clause like `T : IFoo` is used to introduce a generic type parameter `T` where the parameter is _constrained_ so that it must conform to the `IFoo` interface.
* A clause like `let N : int` is used to introduce a generic value parameter `N`, which takes on values of type `int`.
> Note: The syntax for generic value parameters is provisional and subject to possible change in the future.
Generic parameters may declare a default value with `=`:
```hlsl
T anotherFunction<T = float, let N : int = 4>(vector<T,N> v);
```
For generic type parameters, the default value is a type to use if no argument is specified.
For generic value parameters, the default value is a value of the same type to use if no argument is specified.
### Explicit Specialization
A generic is _specialized_ by applying it to _generic arguments_ listed inside angle brackets `<>`:
```hlsl
anotherFunction<int, 3>
```
Specialization produces a reference to the declaration with all generic parameters bound to concrete arguments.
When specializing a generic, generic type parameters must be matched with type arguments that conform to the constraints on the parameter, if any.
Generic value parameters must be matched with value arguments of the appropriate type, and that are specialization-time constants.
An explicitly specialized function, type, etc. may be used wherever a non-generic function, type, etc. is expected:
```hlsl
int i = anotherFunction<int,3>( int3(99) );
```
### Implicit Specialization
If a generic function/type/etc. is used where a non-generic function/type/etc. is expected, the compiler attempts _implicit specialization_.
Implicit specialization infers generic arguments from the context at the use site, as well as any default values specified for generic parameters.
For example, if a programmer writes:
```hlsl
int i = anotherFunction( int3(99) );
```
The compiler will infer the generic arguments `<int, 3>` from the way that `anotherFunction` was applied to a value of type `int3`.
> Note: Inference for generic arguments currently only takes the types of value arguments into account.
> The expected result type does not currently affect inference.
### Syntax Details
The following examples show how generic declarations of different kinds are written:
```
T genericFunction<T>(T value);
funct genericFunction<T>(value: T) -> T;
__init<T>(T value);
__subscript<T>(T value) -> X { ... }
struct GenericType<T>
{
T field;
}
interface IGenericInterface<T> : IBase<T>
{
}
```
> Note: Currently there is no user-exposed syntax for writing a generic extension.

View File

@@ -0,0 +1,6 @@
Member access operators
=======================
PLACEHOLDER:
- operator `[]`
- operator `.`

View File

@@ -0,0 +1,353 @@
> Note: This document is a work in progress. It is both incomplete and, in many cases, inaccurate.
Expressions
===========
Expressions are terms that can be _evaluated_ to produce values.
This section provides a list of the kinds of expressions that may be used in a Slang program.
In general, the order of evaluation of a Slang expression proceeds from left to right.
Where specific expressions do not follow this order of evaluation, it will be noted.
Some expressions can yield _l-values_, which allows them to be used on the left-hand-side of assignment, or as arguments for `out` or `in out` parameters.
Literal Expressions
-------------------
Literal expressions are never l-values.
### Integer Literal Expressions
An integer literal expression consists of a single integer literal token:
```hlsl
123
```
An unsuffixed integer literal expression always has type `int`.
### Floating-Point Literal Expressions
A floating-point literal expression consists of a single floating-point literal token:
```hlsl
1.23
```
A unsuffixed floating-point literal expression always has type `float`.
### Boolean Literal Expressions
Boolean literal expressions use the keywords `true` and `false`.
### String Literal Expressions
A string literal expressions consists of one or more string literal tokens in a row:
```hlsl
"This" "is one" "string"
```
Identifier Expression
---------------------
An _identifier expression_ consists of a single identifier:
```hlsl
someName
```
When evaluated, this expression looks up `someName` in the environment of the expression and yields the value of a declaration with a matching name.
An identifier expression is an l-value if the declaration it refers to is mutable.
### Overloading
It is possible for an identifier expression to be _overloaded_, such that it refers to one or more candidate declarations with the same name.
If the expression appears in a context where the correct declaration to use can be disambiguated, then that declaration is used as the result of the name expression; otherwise use of an overloaded name is an error at the use site.
### Implicit Lookup
It is possible for a name expression to refer to nested declarations in two ways:
* In the body of a method, a reference to `someName` may resolve to `this.someName`, using the implicit `this` parameter of the method
* When a global-scope `cbuffer` or `tbuffer` declaration is used, `someName` may refer to a field declared inside the `cbuffer` or `tbuffer`
Member Expression
-----------------
A _member expression_ consists of a base expression followed by a dot (`.`) and an identifier naming a member to be accessed:
```hlsl
base.m
```
When `base` is a structure type, this expression looks up the field or other member named by `m`.
Just as for an identifier expression, the result of a member expression may be overloaded, and might be disambiguated based on how it is used.
A member expression is an l-value if the base expression is an l-value and the member it refers to is mutable.
### Implicit Dereference
If the base expression of a member reference is a _pointer-like type_ such as `ConstantBuffer<T>`, then a member reference expression will implicitly dereference the base expression to refer to the pointed-to value (e.g., in the case of `ConstantBuffer<T>` this is the buffer contents of type `T`).
### Vector Swizzles
When the base expression of a member expression is of a vector type `vector<T,N>` then a member expression is a _vector swizzle expression_.
The member name must conform to these constraints:
* The member name must comprise between one and four ASCII characters
* The characters must be come either from the set (`x`, `y`, `z`, `w`) or (`r`, `g`, `b`, `a`), corresponding to element indics of (0, 1, 2, 3)
* The element index corresponding to each character must be less than `N`
If the member name of a swizzle consists of a single character, then the expression has type `T` and is equivalent to a subscript expression with the corresponding element index.
If the member name of a swizzle consists of `M` characters, then the result is a `vector<T,M>` built from the elements of the base vector with the corresponding indices.
A vector swizzle expression is an l-value if the base expression was an l-value and the list of indices corresponding to the characters of the member name contains no duplicates.
### Matrix Swizzles
> Note: The Slang implementation currently doesn't support matrix swizzles.
### Static Member Expressions
When the base expression of a member expression is a type instead of a value, the result is a _static member expression_.
A static member expression can refer to a static field or static method of a structure type.
A static member expression can also refer to a case of an enumeration type.
A static member expression (but not a member expression in general) may use the token `::` instead of `.` to separate the base and member name:
```hlsl
// These are equivalent
Color.Red
Color::Red
```
This Expression
---------------
A _this expression_ consists of the keyword `this` and refers to the implicit instance of the enclosing type that is being operated on in instance methods, subscripts, and initializers.
The type of `this` is `This`.
Parenthesized Expression
----------------------
An expression wrapped in parentheses `()` is a _parenthesized expression_ and evaluates to the same value as the wrapped expression.
Call Expression
---------------
A _call expression_ consists of a base expression and a list of argument expressions, separated by commas and enclosed in `()`:
```hlsl
myFunction( 1.0f, 20 )
```
When the base expression (e.g., `myFunction`) is overloaded, a call expression can disambiguate the overloaded expression based on the number and type or arguments present.
The base expression of a call may be a member reference expression:
```hlsl
myObject.myFunc( 1.0f )
```
In this case the base expression of the member reference (e.g., `myObject` in this case) is used as the argument for the implicit `this` parameter of the callee.
### Mutability
If a `[mutating]` instance is being called, the argument for the implicit `this` parameter must be an l-value.
The argument expressions corresponding to any `out` or `in out` parameters of the callee must be l-values.
A call expression is never an l-value.
### Initializer Expressions
When the base expression of a call is a type instead of a value, the expression is an initializer expression:
```hlsl
float2(1.0f, 2.0f)
```
An initializer expression initialized an instance of the specified type using the given arguments.
An initializer expression with only a single argument is treated as a cast expression:
```hlsl
// these are equivalent
int(1.0f)
(int) 1.0f
```
Subscript Expression
--------------------
A _subscript expression_ consists of a base expression and a list of argument expressions, separated by commas and enclosed in `[]`:
```hlsl
myVector[someIndex]
```
A subscript expression invokes one of the subscript declarations in the type of the base expression. Which subscript declaration is invoked is resolved based on the number and types of the arguments.
A subscript expression is an l-value if the base expression is an l-value and if the subscript declaration it refers to has a setter or by-reference accessor.
Subscripts may be formed on the built-in vector, matrix, and array types.
Initializer List Expression
---------------------------
An _initializer list expression_ comprises zero or more expressions, separated by commas, enclosed in `{}`:
```
{ 1, "hello", 2.0f }
```
An initialier list expression may only be used directly as the initial-value expression of a variable or parameter declaration; initializer lists are not allowed as arbitrary sub-expressions.
> Note: This section will need to be updated with the detailed rules for how expressions in the initializer list are used to initialize values of each kind of type.
Cast Expression
---------------
A _cast expression_ attempt to coerce a single value (the base expression) to a desired type (the target type):
```hlsl
(int) 1.0f
```
A cast expression can perform both built-in type conversions and invoke any single-argument initializers of the target type.
### Compatibility Feature
As a compatibility feature for older code, Slang supports using a cast where the base expression is an integer literal zero and the target type is a user-defined structure type:
```hlsl
MyStruct s = (MyStruct) 0;
```
The semantics of such a cast are equivalent to initialization from an empty initializer list:
```hlsl
MyStruct s = {};
```
Assignment Expression
---------------------
An _assignment expression_ consists of a left-hand side expression, an equals sign (`=`), and a right-hand-side expressions:
```hlsl
myVar = someValue
```
The semantics of an assignment expression are to:
* Evaluate the left-hand side to produce an l-value,
* Evaluate the right-hand side to produce a value
* Store the value of the right-hand side to the l-value of the left-hand side
* Yield the l-value of the left-hand-side
Operator Expressions
--------------------
### Prefix Operator Expressions
The following prefix operators are supported:
| Operator | Description |
|-----------|-------------|
| `+` | identity |
| `-` | arithmetic negation |
| `~` | bit-wise Boolean negation |
| `!` | Boolean negation |
| `++` | increment in place |
| `--` | decrement in place |
A prefix operator expression like `+val` is equivalent to a call expression to a function of the matching name `operator+(val)`, except that lookup for the function only considers functions marked with the `__prefix` keyword.
The built-in prefix `++` and `--` operators require that their operand is an l-value, and work as follows:
* Evaluate the operand to produce an l-value
* Read from the l-value to yield an _old value_
* Increment or decrement the value to yield a _new value_
* Write the new value to the l-value
* Yield the new value
### Postfix Operator Expressions
The following postfix operators are supported:
| Operator | Description |
|-----------|-------------|
| `++` | increment in place |
| `--` | decrement in place |
A postfix operator expression like `val++` is equivalent to a call expression to a function of the matching name `operator++(val)`, except that lookup for the function only considers functions marked with the `__postfix` keyword.
The built-in prefix `++` and `--` operators require that their operand is an l-value, and work as follows:
* Evaluate the operand to produce an l-value
* Read from the l-value to yield an _old value_
* Increment or decrement the value to yield a _new value_
* Write the new value to the l-value
* Yield the old value
### Infix Operator Expressions
The follow infix binary operators are supported:
| Operator | Kind | Description |
|-----------|-------------|-------------|
| `*` | Multiplicative | multiplication |
| `/` | Multiplicative | division |
| `%` | Multiplicative | remainder of division |
| `+` | Additive | addition |
| `-` | Additive | subtraction |
| `<<` | Shift | left shift |
| `>>` | Shift | right shift |
| `<` | Relational | less than |
| `>` | Relational | greater than |
| `<=` | Relational | less than or equal to |
| `>=` | Relational | greater than or equal to |
| `==` | Equality | equal to |
| `!=` | Equality | not equal to |
| `&` | BitAnd | bitwise and |
| `^` | BitXor | bitwise exclusive or |
| `\|` | BitOr | bitwise or |
| `&&` | And | logical and |
| `\|\|` | Or | logical or |
| `+=` | Assignment | compound add/assign |
| `-=` | Assignment | compound subtract/assign |
| `*=` | Assignment | compound multiply/assign |
| `/=` | Assignment | compound divide/assign |
| `%=` | Assignment | compound remainder/assign |
| `<<=` | Assignment | compound left shift/assign |
| `>>=` | Assignment | compound right shift/assign |
| `&=` | Assignment | compound bitwise and/assign |
| `\|=` | Assignment | compound bitwise or/assign |
| `^=` | Assignment | compound bitwise xor/assign |
| `=` | Assignment | assignment |
| `,` | Sequencing | sequence |
With the exception of the assignment operator (`=`), an infix operator expression like `left + right` is equivalent to a call expression to a function of the matching name `operator+(left, right)`.
### Conditional Expression
The conditional operator, `?:`, is used to select between two expressions based on the value of a condition:
```hlsl
useNegative ? -1.0f : 1.0f
```
The condition may be either a single value of type `bool`, or a vector of `bool`.
When a vector of `bool` is used, the two values being selected between must be vectors, and selection is performed component-wise.
> Note: Unlike C, C++, GLSL, and most other C-family languages, Slang currently follows the precedent of HLSL where `?:` does not short-circuit.
>
> This decision may change (for the scalar case) in a future version of the language.
> Programmer are encouraged to write code that does not depend on whether or not `?:` short-circuits.

View File

@@ -0,0 +1,91 @@
# Glossary
[Compute Dispatch](basics-program-execution.md)
: See Dispatch.
[Dispatch](basics-program-execution.md)
: A single dispatch of compute work. A dispatch is an explicit operation that specifies the input parameter
grid on which thread groups are instantiated.
Entry point (TODO: link)
: A designated function from which a thread begins execution.
[Graphics launch](basics-program-execution.md)
: See Launch.
[Implementation-defined behavior](basics-behavior.md#classification)
: The observable behavior is defined by the implementation, and it is documented in the [target platforms
documentation](../target-compatibility.md) or documentation provided by the implementation. Implementation
includes the target language, the device and its driver, and declared extensions and available capabilities.
[Launch](basics-program-execution.md)
: A single launch of graphics work. A graphics launch consists of an unspecified number of draw calls that
activate the graphics pipeline.
[Mutually convergent set of threads](basics-execution-divergence-reconvergence.md)
: A set of threads in a wave that are on the same uniform control flow path. When the execution has diverged,
there is more than one such set.
[Observable behavior](basics-behavior.md#observable)
: Program behavior observable over the execution interface. The interface includes resource variables,
shared memory, and execution control.
[Precisely defined behavior](basics-behavior.md#classification)
: The observable behavior is precisely defined for all targets.
Program (TODO: link)
: A program is a composition of units of linkable code. A program includes a set of entry points, which may be
invoked by a compute dispatch or by a graphics launch.
[Tangled function](basics-program-execution.md)
: A function in which a set of threads participates. The scope of a tangled function is either wave or thread
group. Tangled functions include synchronous operations such as control barriers and cooperative functions
that collect inputs from and distribute outputs to the participating threads.
[Thread](basics-program-execution.md)
: A sequential stream of executed instructions. In Slang, thread execution starts from an entry point
invocation. The thread terminates when it finishes executing the entry point, when it is discarded, or when
it exits abnormally.
[Thread group](basics-program-execution.md)
: The second-level group of threads in the execution hierarchy. The thread group size is determined by the
application within target-specified limits. A thread group executes on the same execution resources, and it
can communicate efficiently using shared memory (`groupshared` modifier).
[Thread-group-tangled function](basics-program-execution.md)
: A function in which all threads of a thread group participate. Examples include thread-group-level control
barriers. Unless otherwise stated, it is [undefined behavior](basics-behavior.md#classification) to invoke
thread-group-tangled functions on non-thread-group-uniform paths.
[Thread-group-uniform path](basics-execution-divergence-reconvergence.md)
: All threads in the thread group are on a uniform path.
[Undefined behavior](basics-behavior.md#classification)
: The observable behavior is not defined. Possible results include crashes, data corruption, and inconsistent
execution results across different optimization levels and different targets.
[Uniform control flow](basics-execution-divergence-reconvergence.md)
: All threads are on a uniform path.
[Uniform path](basics-execution-divergence-reconvergence.md)
: A control flow path is uniform when the control flow has not diverged or it has reconverged. Divergence
occurs when threads take different paths on conditional branches. Reconvergence occurs when the conditional
branches join. Control flow uniformity is usually considered in the thread-group and the wave scopes.
[Unspecified behavior](basics-behavior.md#classification)
: The observable behavior is unspecified but within boundaries. Documentation is not required.
[Wave](basics-program-execution.md)
: The smallest-granularity group of threads in the execution hierarchy. The wave size is a power of two in
the range [4, 128] defined by the target. Threads in a wave may participate in wave-tangled functions such as
wave ballots and wave reductions. For an example, see `WaveActiveMin()`.
[Wave-tangled function](basics-program-execution.md)
: A function in which a subset of threads of the wave participates. Typically, the subset consists of the
active and mutually convergent threads. Wave-tangled functions include reductions such as `WaveActiveMin()`,
ballots such as `WaveActiveBallot()`, and functions that imply a wave-level control flow barrier such as
`GroupMemoryBarrierWithWaveSync()`.
[Wave-uniform path](basics-execution-divergence-reconvergence.md)
: All threads in the wave are on a uniform path.

View File

@@ -0,0 +1,3 @@
# Slang Language Goals
TODO

View File

@@ -0,0 +1,124 @@
# Typographical Conventions
## Grammar
The Slang grammar in this document is presented using a variation of the
[Extended BackusNaur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form) as follows:
**Terminal symbol**
> **`'terminal'`** --- terminal symbol spelled exactly as is within quotes<br>
> **\<regex\>** --- terminal symbol expressed by a regular expression
[Terminal symbols](https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols#Terminal_symbols) are those
that can appear in the language. In the Slang language reference manual, two forms are used: literal strings that
appear exactly as defined and symbols defined by regular expressions. Whitespace characters between terminal symbols
are generally meaningless and ignored except where explicitly stated.
**Non-terminal symbol**
> *`nonterminal`*
[Non-terminal symbols](https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols#Nonterminal_symbols) do
not appear in the language. They are used in defining production rules.
**Production Rule**
> *`lhs`* = \<expr\>
A production rule defines how the left-hand-side non-terminal *`lhs`* may be substituted with the
right-hand-side grammatical expression \<expr\>. The expression consists of non-terminals and terminals using
the grammar expression building blocks described below.
Sometimes more than one production rule definitions are provided for the left-hand-side non-terminal. This
means that the non-terminal may be substituted with any of the definitions, which is grammatically equivalent
to alternation. Multiple production rules are used to associate semantics for individual production rules.
In case more multiple production rules may be successfully used, the semantics of the rule introduced earlier
apply, unless explicit precedence (*i.e.*, priority) is provided.
Note that a production rule definition is not explicitly terminated with a semi-colon (;).
**Concatenation**
> *`lhs`* = *`symbol1`* *`symbol2`* ... *`symbolN`*
[Concatenation](https://en.wikipedia.org/wiki/Concatenation) expresses a sequence of symbols, and it is
expressed without a comma. Symbols may be terminal or non-terminal.
**Alternation**
> *`lhs`* = *`alternative1`* \| *`alternative2`*
[Alternation](https://en.wikipedia.org/wiki/Alternation_(formal_language_theory)) expresses alternative
productions. That is, one (and exactly one) alternative is used.
**Grouping**
> *`lhs`* = ( *`subexpr`* )
Grouping is used to denote the order of production.
**Optional**
> *`lhs`* = [ *`subexpr`* ]
An optional subexpression may occur zero or one times in the production.
**Repetition**
> *`lhs`* = *`expr`*\* &nbsp;&nbsp;&nbsp;&nbsp; --- 0 or more times repetition<br>
> *`lhs`* = *`expr`*+ &nbsp;&nbsp;&nbsp;&nbsp; --- 1 or more times repetition
A repeated expression occurs any number of times (* -repetition); or one or more times (+ -repetition).
(*`expr`*+) is equivalent to (*`expr`* *`expr`*\*).
**Precedence**
The following precedence list is used in the production rule expressions:
|**Precedence**| **Grammar expressions** | **Description**
|--------------|:---------------------------|:---------------------------------
|Highest | ( ... ) [ ... ] | grouping, optional
| | \* \+ | repetition
| | *`symbol1`* *`symbol2`* | concatenation (left-associative)
| | *`symbol1`* \| *`symbol2`* | alternation (left-associative)
|Lowest | = | production rule definition
For example, the following production rule definitions are equivalent:
> *`lhs`* = `expr1` `expr2`+ \| `expr3` [ `expr4` `expr5` ] `expr6` <br>
>
> *`lhs`* = (`expr1` `expr2`+) \| ((`expr3` [ (`expr4` `expr5`) ]) `expr6`) <br>
## Code Examples
Code examples are presented as follows.
**Example:**
```hlsl
struct ExampleStruct
{
int a, b;
}
```
## Remarks and Warnings
Remarks provide supplemental information such as recommendations, background information, rationale, and
clarifications. Remarks are non-normative.
> 📝 **Remark:** Remarks provide useful information.
Warnings provide important information that the user should be aware of. For example, warnings call out
experimental features that are subject to change and internal language features that should not be used in
user code.
> ⚠️ **Warning:** Avoid using Slang internal language features. These exist to support Slang internal modules
> such as `hlsl.meta.slang`. Internal features are generally undocumented. They are subject to change without
> notice, and they may have caveats or otherwise not work as expected.

View File

@@ -0,0 +1,42 @@
> Note: This document is a work in progress. It is both incomplete and, in many cases, inaccurate.
# Introduction
Slang is a programming language primarily designed for use in *shader programming*, by which we mean
performance-oriented GPU programming for real-time graphics.
## General Topics
* [Language Goals (TODO)](introduction-goals.md)
* [Typographical Conventions](introduction-typographical-conventions.md)
## Purpose of this document
This document aims to provide a detailed reference for the Slang language and its supported constructs.
The Slang compiler *implementation* may deviate from the language as documented here in a few key ways:
* The implementation is necessarily imperfect and can have bugs.
* The implementation may not fully support constructs documented here, or their capabilities may not be as
complete as what is documented.
* The implementation may support certain constructs that are not properly documented. Constructs that are:
- *deprecated* --- These are called out with a ⚠️ **Warning**. Other documentation may be removed to
discourage use.
- *experimental* --- These are called out with a ⚠️ **Warning**. The constructs are subject to change and the
documentation may not be yet up to date.
- *internal* --- These are called out with a ⚠️ **Warning**. The constructs are often not otherwise
documented to discourage use.
Where possible, this document calls out known deviations between the language as defined here and the
implementation in the compiler, often including GitHub issue links.
## Terminology
> Note: This section is not yet complete.
>
> This section should detail how the document uses terms like "may" and "must," if we intend for those to be used in a manner consistent with [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).

View File

@@ -0,0 +1,121 @@
> Note: This document is a work in progress. It is both incomplete and, in many cases, inaccurate.
Lexical Structure
=================
Source Units
------------
A _source unit_ comprises a sequence of zero or more _characters_ which for purposes of this document are defined as Unicode scalars (code points).
Encoding
--------
Implementations *may* accept source units stored as files on disk, buffers in memory, or any appropriate implementation-specified means.
When source units are stored as byte sequences, they *should* be encoded using UTF-8.
Implementations *may* support additional implemented-specified encodings.
Whitespace
----------
_Horizontal whitespace_ consists of space (U+0020) and horizontal tab (U+0009).
A _line break_ consists of a line feed (U+000A), carriage return (U+000D) or a carriage return followed by a line feed (U+000D, U+000A).
Line breaks are used as line separators rather than terminators; it is not necessary for a source unit to end with a line break.
Escaped Line Breaks
-------------------
An _escaped line break_ comprises a backslack (`\`, U+005C) follow immediately by a line break.
Comments
--------
A _comment_ is either a line comment or a block comment:
```hlsl
// a line comment
/* a block comment */
```
A _line comment_ comprises two forward slashes (`/`, U+002F) followed by zero or more characters that do not contain a line break.
A line comment extends up to, but does not include, a subsequent line break or the end of the source unit.
A _block comment_ begins with a forward slash (`/`, U+002F) followed by an asterisk (`*`, U+0052).
A block comment is terminated by the next instance of an asterisk followed by a forward slash (`*/`).
A block comment contains all characters between where it begins and where it terminates, including any line breaks.
Block comments do not nest.
It is an error if a block comment that begins in a source unit is not terminated in that source unit.
Phases
------
Compilation of a source unit proceeds _as if_ the following steps are executed in order:
1. Line numbering (for subsequent diagnostic messages) is noted based on the locations of line breaks
2. Escaped line breaks are eliminated. No new characters are inserted to replace them. Any new escaped line breaks introduced by this step are not eliminated.
3. Each comments is replaced with a single space (U+0020)
4. The source unit is _lexed_ into a sequence of tokens according the lexical grammar in this chapter
5. The lexed sequence of tokens is _preprocessed_ to produce a new sequence of tokens (Chapter 3)
6. Subsequent processing is performed on the preprocessed sequence of tokens
Identifiers
-----------
An _identifier_ begins with an uppercase or lowercase ASCII letter (`A` through `Z`, `a` through `z`), or an underscore (`_`).
After the first character, ASCII digits (`0` through `9`) may also be used in an identifier.
The identifier consistent of a single underscore (`_`) is reserved by the language and must not be used by programs.
Otherwise, there are no fixed keywords or reserved words.
Words that name a built-in language construct can also be used as user-defined identifiers and will shadow the built-in definitions in the scope of their definition.
Literals
--------
### Integer Literals
An _integer literal_ consists of an optional radix specifier followed by digits and an optional suffix.
The _radix specifier_ may be:
* `0x` or `0X` to specify a hexadecimal literal (radix 16)
* `0b` or `0B` to specify a binary literal (radix 2)
When no radix specifier is present a radix of 10 is used.
Octal literals (radix 8) are not supported.
A `0` prefix on an integer literal does *not* specify an octal literal as it does in C.
Implementations *may* warn on integer literals with a `0` prefix in case users expect C behavior.
The _digits_ of an integer literal may include ASCII `0` through `9`.
In the case of a hexadecimal literal, digits may include the letters `A` through `F` (and `a` through `f`) which represent digit values of 10 through 15.
It is an error for an integer literal to include a digit with a value greater than or equal to the radix.
The digits of an integer literal may also include underscore (`_`) characters, which are ignored and have no semantic impact.
The _suffix_ on an integer literal may be used to indicate the desired type of the literal:
* A `u` suffix indicates the `uint` type
* An `l` or `ll` suffix indicates the `int64_t` type
* A `ul` or `ull` suffix indicates the `uint64_t` type
### Floating-Point Literals
> Note: This section is not yet complete.
### String Literals
> Note: This section is not yet complete.
### Character Literals
> Note: This section is not yet complete.
Operators and Punctuation
-------------------------
> Note: This section is not yet complete.

View File

@@ -0,0 +1,19 @@
> Note: This document is a work in progress. It is both incomplete and, in many cases, inaccurate.
Preprocessor
============
Slang supports a C-style preprocessor with the following directives:
* `#include`
* `#define`
* `#undef`
* `#if`, `#ifdef`, `#ifndef`
* `#else`, `#elif`
* `#endif`
* `#error`
* `#warning`
* `#line`
* `#pragma`
> Note: This section is not yet complete.

View File

@@ -0,0 +1,26 @@
> Note: This document is a work in progress. It is both incomplete and, in many cases, inaccurate.
Graphics Shaders and Compute Kernels
====================================
This section describes the graphics and compute entry points, and their inputs and outputs.
> **TODO**
Graphics pipeline stage entry points: (aka graphics shaders)
- fragment
- vertex
- geometry
- hull
- domain
- raygeneration
- intersection
- anyhit
- closesthit
- miss
- callable
- task
- mesh
Compute kernel entry points:
- compute

View File

@@ -0,0 +1,253 @@
> Note: This document is a work in progress. It is both incomplete and, in many cases, inaccurate.
Statements
==========
Statements are used to define the bodies of functions and determine order of evaluation and control flow for an entire program.
Statements are distinct from expressions in that statements do not yield results and do not have types.
This section lists the kinds of statements supported by Slang.
Expression Statement
--------------------
An expression statement consists of an expression followed by a semicolon:
```hlsl
doSomething();
a[10] = b + 1;
```
An implementation may warn on an expression statement that has to effect on the results of execution.
Declaration Statement
---------------------
A declaration may be used as a statement:
```hlsl
let x = 10;
var y = x + 1;
int z = y - x;
```
> Note: Currently only variable declarations are allowed in statement contexts, but other kinds of declarations may be enabled in the future.
Block Statement
---------------
A block statement consists of zero or more statements wrapped in curly braces `{}`:
```hlsl
{
int x = 10;
doSomething(x);
}
```
A block statement provides local scoping to declarations.
Declarations in a block are visible to later statements in the same block, but not to statements or expressions outside of the block.
Empty Statement
---------------
A single semicolon (`;`) may be used as an empty statement equivalent to an empty block statement `{}`.
Conditional Statements
----------------------
### If Statement
An _if statement_ consists of the `if` keyword and a conditional expression in parentheses, followed by a statement to execute if the condition is true:
```hlsl
if(somethingShouldHappen)
doSomething();
```
An if statement may optionally include an _else clause_ consisting of the keyword `else` followed by a statement to execute if the condition is false:
```hlsl
if(somethingShouldHappen)
doSomething();
else
doNothing();
```
### Switch Statement
A _switch statement_ consists of the `switch` keyword followed by an expression wrapped in parentheses and a _body statement_:
```hlsl
switch(someValue)
{
...
}
```
The body of a switch statement must be a block statement, and its body must consist of switch case clauses.
A _switch case clause_ consists of one or more case labels or default labels, followed by one or more statements:
```hlsl
// this is a switch case clause
case 0:
case 1:
doBasicThing();
break;
// this is another switch case clause
default:
doAnotherThing();
break;
```
A _case label_ consists of the keyword `case` followed by an expressions and a colon (`:`).
The expression must evaluate to a compile-time constant integer.
A _default label_ consists of the keyword `default` followed by a colon (`:`).
It is an error for a case label or default label to appear anywhere other than the body of a `switch` statement.
It is an error for a statement to appear inside the body of a `switch` statement that is no part of a switch case clause.
Switch case clauses may either exit via a `break` or other control transfer statement, or "fall through" to the next case clause by omitting the `break`:
```hlsl
switch(value)
{
case 0:
x = 10;
// Fall through to case 1
case 1:
result = x + value;
break;
default:
result = -1;
break;
}
```
> **Note:** Some targets (FXC/D3D11 and WGSL) do not support fall-through natively. For these targets, the compiler restructures the code by duplicating the fall-through destination into each source case. This may affect wave/subgroup convergence if the duplicated code contains wave operations. Warning 41026 is emitted when this restructuring occurs.
Loop Statements
---------------
### For Statement
A _for statement_ uses the following form:
```hlsl
for( <initial statement> ; <condition expression> ; <side effect expression> ) <body statement>
```
The _initial statement_ is optional, but may declare a variable whose scope is limited to the for statement.
The _condition expression_ is optional. If present it must be an expression that can be coerced to type `bool`. If absent, a true value is used as the condition.
The _side effect expression_ is optional. If present it will executed for its effects before each testing the condition for every loop iteration after the first.
The _body statement_ is a statement that will be executed for each iteration of the loop.
### While Statement
A _while statement_ uses the following form:
```hlsl
while( <condition expression> ) <body statement>
```
and is equivalent to a `for` loop of the form:
```hlsl
for( ; <condition expression> ; ) <body statement>
```
### Do-While Statement
A _do-while statement_ uses the following form:
```hlsl
do <body statement> while( <condition expression> )
```
and is equivalent to a `for` loop of the form:
```hlsl
for(;;)
{
<body statement>
if(<condition expression>) continue; else break;
}
```
Control Transfer Statements
---------------------------
### Break Statement
A `break` statement transfers control to after the end of the closest lexically enclosing switch statement or loop statement:
```hlsl
break;
```
### Continue Statement
A `continue` statement transfers control to the start of the next iteration of a loop statement.
In a for statement with a side effect expression, the side effect expression is evaluated when `continue` is used:
```hlsl
break;
```
### Return Statement
A `return` statement transfers control out of the current function.
In the body of a function with a `void` result type, the `return` keyword may be followed immediately by a semicolon:
```hlsl
return;
```
Otherwise, the `return` keyword must be followed by an expression to use as the value to return to the caller:
```hlsl
return someValue;
```
The value returned must be able to coerce to the result type of the lexically enclosing function.
### Discard Statement
A `discard` statement can only be used in the context of a fragment shader, in which case it causes the current invocation to terminate and the graphics system to discard the corresponding fragment so that it does not get combined with the framebuffer pixel at its coordinates.
Operations with side effects that were executed by the invocation before a `discard` will still be performed and their results will become visible according to the rules of the platform.
Compile-Time For Statement
--------------------------
A _compile-time for statement_ is used as an alternative to preprocessor techniques for loop unrolling.
It looks like:
```hlsl
$for( <name> in Range(<initial-value>, <upper-bound>)) <body statement>
```
The _initial value_ and _upper bound_ expressions must be compile-time constant integers.
The semantics of a compile-time for statement are as if it were expanded into:
```hlsl
{
let <name> = <initial-value>;
<body statement>
}
{
let <name> = <initial-value> + 1;
<body statement>
}
...
{
let <name> = <upper-bound> - 1;
<body statement>
}
```

View File

@@ -0,0 +1,125 @@
# Array Types
An *array type* is specifies an array of contiguously allocated elements. The array size may be either known
at compile-time or determined at runtime. The array size is always fixed during the lifetime of the array
object.
## Declaration Syntax
```hlsl
// (1) 1-dimensional array of length N
var varName : ElementType[N];
ElementType[N] varName;
ElementType varName[N];
// (2) N-element array of M-element arrays
//
var varName : ElementType[M][N];
ElementType[M][N] varName;
ElementType varName[N][M]; // note the order of N, M
// (3) 1-dimensional array of unknown length
var varName : ElementType[];
ElementType[] varName;
ElementType varName[];
// (4) Unknown-length array of M-element arrays
var varName : ElementType[M][];
ElementType[M][] varName;
ElementType varName[][M];
// (5) Type alias for N-element array of M-element arrays
typealias ArrayType = ElementType[3][2];
```
where:
- `ElementType` is the type of the array element. The element type may not have an unknown length.
- This implies that only the outermost dimension may have an unknown length.
- Array length expressions `N` and `M` are specialization-time constant integers.
- When specified, array length must be non-negative.
- `varName` is the variable identifier
The declarations within each group are equivalent.
When using the `var` or `let` syntax for variable declaration, array length declarations may only appear in the
type.
An array with any dimension length of 0 is called a 0-length array. A 0-length array has 0
size. Instantiations of 0-length arrays are discarded. This includes variables, function parameters, and
struct data member. 0-length arrays may not be accessed during runtime using the subscript operator.
Restrictions for unknown-length arrays:
- When a non-const data member in a `struct` is an unknown-length array, it must be the last data member.
- An unknown-length array cannot be instantiated as a local variable unless the length can be inferred at
compile-time in which case it becomes a known-length array.
- A function parameter with an unknown-length array cannot be `out` or `inout`.
> 📝 **Remark 1:** Declaring an array as part of the type is recommended. For example:
> ```hlsl
> var arr : int[3][4];
> ```
> 📝 **Remark 2:** When using the C-style variable declaration syntax, array declarations binding to the variable
> identifier are applied from right to left. However, when binding to the type, the declarations are
> applied from left to right. Consider:
> ```hlsl
> int[2][3] arr[5][4];
> ```
> which is equivalent to:
> ```hlsl
> int[2][3][4][5] arr;
> ```
> 📝 **Remark 3:** Equivalent to `ElementType[N][M]` array type declaration would be
> `std::array<std::array<ElementType, N>, M>` in C++.
> 📝 **Remark 4:** Unlike in C and C++, array types in Slang do not decay to pointer types. The implication is that
> array objects are always passed as values in assignment and function calls, similar to `std::array`. To
> avoid memory copies when possible, the compiler attempts to optimize these as pass by constant references or
> pointers when the target supports it.
> 📝 **Remark 5:** 0-length arrays can be used to disable data members in `struct` types. See [Generics (TODO)](TODO)
> for further information.
### Element Count Inference for Unknown-Length Array
When a variable is declared with an unknown-length array type and it also includes an initial-value expression:
```hlsl
int a[] = { 0xA, 0xB, 0xC, 0xD };
```
the compiler will attempt to infer the element count based on the type and/or structure of the initial-value expression.
In the above case, the compiler will infer an element count of 4 from the structure of the initializer-list expression.
Thus, the preceding declaration is equivalent to:
```hlsl
int a[4] = { 0xA, 0xB, 0xC, 0xD };
```
A variable declared in this fashion semantically has a known-length array type and not an unknown-length array
type. The use of an unknown-length array type for the declaration is a convenience feature.
## Memory Layout
### Natural Layout
The _stride_ of an array element type is the size of the element rounded up to the smallest multiple of its
alignment. The stride defines the byte offset difference between adjacent elements.
The natural layout rules for an array type `T[]` or `T[N]`:
* Element `i` of the array starts at a byte offset relative to the array base address that is `i` times the
element stride of the array.
* The alignment of the array type is the alignment of `T`.
* The size of an unknown-length array type is unknown.
* The size of a known-length array with zero elements is zero
* The size of a known-size array with a nonzero number `N` of elements is the size of `T` plus `N - 1` times the element stride of the array
### C-Style Layout
The C-style layout of an array type differs from the natural layout in that the array size is `N` times the
element stride.
### D3D Constant Buffer Layout
The D3D constant buffer layout of an array type differs from the natural layout in that the array size is `N`
times the element stride.

View File

@@ -0,0 +1,6 @@
# Type Attributes
**TODO:** Describe user-visible type-related attributes:
- `open`, `sealed`
- `anyValueSize`
- `RequirePrelude`

View File

@@ -0,0 +1,3 @@
# Classes
TODO

View File

@@ -0,0 +1,3 @@
# Enumerations
TODO

View File

@@ -0,0 +1,177 @@
# Type Extension
## Syntax
[Struct extension](#struct) declaration:
> **`'extension'`** *`type-expr`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[**`':'`** *`bases-clause`*]<br>
> **`'{'`** *`member-list`* **`'}'`**
[Generic struct extension](#generic-struct) declaration:
> **`'extension'`** *`generic-params-decl`* *`type-expr`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[**`':'`** *`bases-clause`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(**`'where'`** *`where-clause`*)\*<br>
> **`'{'`** *`member-list`* **`'}'`**
### Parameters
- *`type-expr`* is the type to extend.
- *`generic-params-decl`* are the generic parameters for a [generic struct extension](#generic-struct).
- *`bases-clause`* is an optional list of [interface](types-interface.md) conformance specifications to be added.
- *`where-clause`* is an optional generic constraint expression. See [Generics (TODO)](TODO).
- *`member-list`* is a list of struct members to be added. A member is one of:
- *`var-decl`* is a member static variable declaration. See [Variables (TODO)](TODO)
- *`type-decl`* is a nested [type declaration](types.md).
- *`function-decl`* is a member function declaration. See [Functions (TODO)](TODO)
- *`constructor-decl`* is a [constructor declaration](types-struct.md#constructor).
- *`property-decl`* is a [property declaration](types-struct.md#property).
- *`subscript-op-decl`* is a [subscript operator declaration](types-struct.md#subscript-op).
- *`function-call-op-decl`* is a [function call operator declaration](types-struct.md#function-call-op).
## Description
An existing `struct` type or a set of `struct` types can be extended with one or more `extension`
declarations. An `extension` may be used to add static data members, member functions, constructors,
properties, subscript operators, and function call operators to an existing type. An `extension` may not
change the data layout of a `struct`, that is, it cannot be used to append non-static data members.
> 📝 **Remark:** An [interface](types-interface.md) type cannot be extended. This would add new requirements
> to all conforming types, which would invalidate existing conformances.
## Struct Extension {#struct}
A previously defined `struct` can be extended using an `extension` declaration. The declaration appends new
members to the `struct` definition.
**Example 1:**
```hlsl
struct ExampleStruct
{
uint32_t a;
uint32_t getASquared()
{
return a * a;
}
}
extension ExampleStruct
{
// add a member function to ExampleStruct
[mutating] void addToA(uint32_t x)
{
a = a + x;
}
}
```
An extension can also be used to provide interface requirements to a struct.
**Example 2:**
```hlsl
interface IReq
{
int requiredFunc();
}
struct TestClass : IReq
{
}
extension TestClass
{
int requiredFunc()
{
return 42;
}
}
[shader("compute")]
void main(uint3 id : SV_DispatchThreadID)
{
TestClass obj = { };
obj.requiredFunc();
}
```
And finally, an extension can add new interface conformances to a struct:
**Example 3:**
```hlsl
interface IReq
{
int requiredFunc();
}
struct TestClass
{
}
extension TestClass : IReq
{
int requiredFunc()
{
return 42;
}
}
[shader("compute")]
void main(uint3 id : SV_DispatchThreadID)
{
IReq obj = TestClass();
obj.requiredFunc();
}
```
> ⚠️ **Warning:** When an extension and the base structure contain a member with the same signature, it is
> currently undefined which member is effective. ([Issue #9660](https://github.com/shader-slang/slang/issues/9660))
## Generic Struct Extension {#generic-struct}
All structs conforming to an interface may be extended using a generic extension declaration. The generic
extension declaration adds new members to all conforming types. In case there are multiple declarations with
the same signature, the one in the concrete type takes precedence.
**Example:**
```hlsl
interface IBase
{
int getA();
}
struct ConcreteInt16 : IBase
{
int16_t a;
int getA()
{
return a;
}
}
struct ConcreteInt32 : IBase
{
int32_t a;
int getA()
{
return a;
}
}
extension<T : IBase> T
{
// added to all types conforming to
// interface IBase
int getASquared()
{
return getA() * getA();
}
}
```
See [Generics (TODO)](TODO) for further information on generics.

View File

@@ -0,0 +1,80 @@
# Fundamental Types
The following types are collectively called the _fundamental types_:
- The `void` type
- The scalar Boolean type
- The scalar integer types
- The scalar floating point types
## Void Type {#void}
The type `void` contains no data and has a single unnamed value.
A function with return type `void` does not return a value.
Variables, arrays elements, or structure data members may not have type `void`.
## Scalar Types {#scalar}
### Boolean Type {#boolean}
Type `bool` is used to represent Boolean truth values: `true` and `false`.
The size of `bool` is target-defined. Similarly, the underlying bit patterns for `true` and `false` are
target-defined. The use of `bool` should be avoided when a specific in-memory layout of a data structure is
required. This includes data shared between different language targets even on the same device.
### Integer Types {#integer}
The following integer types are defined:
| Name | Description |
|----------------------|-------------------------|
| `int8_t` | 8-bit signed integer |
| `int16_t` | 16-bit signed integer |
| `int`, `int32_t` | 32-bit signed integer |
| `int64_t` | 64-bit signed integer |
| `uint8_t` | 8-bit unsigned integer |
| `uint16_t` | 16-bit unsigned integer |
| `uint`, `uint32_t` | 32-bit unsigned integer |
| `uint64_t` | 64-bit unsigned integer |
All arithmetic operations on signed and unsigned integers wrap on overflow.
All target platforms support the `int`/`int32_t` and `uint`/`uint32_t` types. The support for other types depends on the target and target capabilities. See [target platforms](../target-compatibility.md) for details.
All integer types are stored in memory with their natural size and alignment on all target that support them.
### Floating-Point Types {#floating}
The following floating-point type are defined:
| Name | Description | Precision (sign/exponent/significand bits) |
|-----------------------|------------------------------|--------------------------------------------|
| `half`, `float16_t` | 16-bit floating-point number | 1/5/10 |
| `float`, `float32_t` | 32-bit floating-point number | 1/8/23 |
| `double`, `float64_t` | 64-bit floating-point number | 1/11/52 |
Rules for rounding, denormals, infinite values, and not-a-number (NaN) values are generally
target-defined. IEEE 754 compliant targets adhere to the
[IEEE 754-2019](https://doi.org/10.1109/IEEESTD.2019.8766229) standard.
All targets support the `float`/`float32_t` type. Support for other types is target-defined. See
[target platforms](../target-compatibility.md) for details.
## Alignment and data layout
The size of a Boolean type is targed-defined. All other fundamental types have precisely defined sizes.
All fundamental types are _naturally aligned_. That is, their alignment is the same as their size.
All fundamental types use [little-endian](https://en.wikipedia.org/wiki/Endianness) representation.
All signed integers use [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement) representation.
> 📝 **Remark:** Fundamental types in other languages are not always naturally aligned. In particular, the alignment
> of C type `uint64_t` on x86-32 is typically 4 bytes.

View File

@@ -0,0 +1,507 @@
# Interfaces
## Syntax
Interface declaration:
> [*`modifier-list`*]<br>
> **`'interface'`** *`identifier`* [*`generic-params-decl`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[**`':'`** *`bases-clause`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(**`'where'`** *`where-clause`*)\*<br>
> **`'{'`** *`member-list`* **`'}'`**
Associated named type declaration:
> *`associated-type-decl`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;**`'associatedtype'`** *`identifier`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[**`':'`** *`bases-clause`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(**`'where'`** *`where-clause`*)\*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;**`';'`**
### Parameters
- *`modifier-list`* is an optional list of modifiers (TODO: link)
- *`identifier`* is the name of the declared interface type
- *`generic-params-decl`* is an optional generic parameters declaration. See [Generics (TODO)](TODO).
- *`bases-clause`* is an optional list of inherited [interfaces](types-interface.md).
- *`where-clause`* is an optional generic constraint expression. See [Generics (TODO)](TODO).
- *`member-list`* is a list of interface members. A member is one of:
- *`var-decl`* is a `static` `const` member variable declaration of type
[int](types-fundamental.md#integer) or [bool](types-fundamental.md#boolean). See [Variables (TODO)](TODO)
- *`associated-type-decl`* is an associated type declaration. See below.
- *`function-decl`* is a member function declaration. See [Functions (TODO)](TODO)
- *`constructor-decl`* is a [constructor declaration](types-struct.md#constructor).
- *`property-decl`* is a [property declaration](types-struct.md#property).
- *`subscript-op-decl`* is a [subscript operator declaration](types-struct.md#subscript-op).
- *`function-call-op-decl`* is a [function call operator declaration](types-struct.md#function-call-op).
## Description
An `interface` specifies a set of member functions that a conforming type must provide. An interface can then
be used in place of a concrete type, allowing for the same code to use different concrete types via the
methods defined by the interface.
An interface consists of:
- Any number of member function prototypes, [constructors](types-struct.md#constructor), or
[function call operator declarations](types-struct.md#function-call-op) without a body. A concrete type
inheriting from the interface must provide compatible member function implementations with the same names.
- Any number of member functions or [function call operators](types-struct.md#function-call-op) with the
implementation body, which are added to the inheriting type (either a concrete `struct` or another
`interface`). Constructors are not allowed.
- Inheriting types may override member functions by using the `override` modifier with a compatible member
function declaration.
- Any number of [property](types-struct.md#property) or [subscript operator](types-struct.md#subscript-op)
declarations. An interface may declare either `get` or `set` or both methods without the implementation
body. A concrete type inheriting from the interface must provide implementations for the property and
subscript operator declarations.
- A property or a subscript operator declaration may be implemented with compatible `get` and `set` methods
as required by the interface.
- Alternatively, a property may be implemented by declaring a compatible variable with a matching name.
- Any number of *associated named types*, which a concrete inheriting type must provide. An associated named
type may be provided by:
- Declaring a nested [structure](types-struct.md) with the same name. OR
- Defining a type alias for the name with [typealias](types.md#alias) or
[typedef](types.md#alias) declarations.
- Any number of `static` `const` data members without initializers. A concrete inheriting type must provide
compatible static data members with the same names and types.
- The type of a `static` `const` member must be either `int` or `bool`.
The interface member functions may be static or non-static.
An object of a type conforming to an interface can be converted to the interface type.
An interface may also inherit from another interface. The inherited members add to the inheriting interface.
A member function implementation is compatible with an interface member function when:
- The implementation function can be called with the parameter types of the interface; AND
- The implementation function return type can be converted to the interface function return type.
A member property (or variable) is compatible with an interface member property when the implementation
property (or variable) is convertible to the interface property and vice versa.
Interface members may be declared with access control specifiers `public` or `internal`. The default member
visibility is the same as the visibility of the interface. See [access control (TODO)](TODO) for further
information.
When a [structure](types-struct.md) implements an interface member requirement, the visibility of the member
may not be higher than the requirement. However, it can be lower.
**Example:**
```hlsl
interface IReq
{
}
interface ITest
{
// Static data member requirement
static const int staticDataMember;
// Static member function requirement
static float staticMethod(int a);
// Property requirement
property testProp : float
{
get; // must be readable
set; // must be writable
}
// Constructor requirement
__init(float f);
// Non-static member function requirement
float someMethod(int a);
// Overridable non-static member function
// with default implementation.
float someMethodWithDefaultImplementation(int a)
{
return testProp + float(a);
}
// Function call operator requirement
float operator () (uint x, uint y);
// Subscript operator requirement
__subscript (uint i0) -> float { get; set; }
// Associated type requirement
associatedtype AssocType;
// Associated type requirement, provided type must
// conform to IReq
associatedtype AssocTypeWithRequirement : IReq;
}
struct TestClass : ITest
{
// Required data member
static const int staticDataMember = 5;
// Required static member function
static float staticMethod(int a)
{
return float(a) * float(a);
}
float propUnderlyingValue;
float arr[10] = { };
// Required constructor
__init(float f)
{
propUnderlyingValue = f + 1.0f;
}
// Required property
//
// Note that alternatively, a data member
// "float testProp;" could have also been provided.
property testProp : float
{
get
{
return propUnderlyingValue - 1.0f;
}
set(float newVal)
{
propUnderlyingValue = newVal + 1.0f;
}
}
// Required non-static member function
//
// Note that the parameters and the return value
// are not required to match as long as they are
// compatible.
float someMethod(int64_t a)
{
return float(a) * propUnderlyingValue;
}
// Required function call operator
float operator () (uint x, uint y)
{
return float(x * y);
}
// Required subscript operator
__subscript (uint i0) -> float
{
get
{
return arr[i0];
}
set
{
arr[i0] = newValue;
}
}
// Required associated type provided by using a
// type alias.
typealias AssocType = int;
// Required associated type provided by a nested type.
struct AssocTypeWithRequirement : IReq
{
}
}
```
> 📝 **Remark 1:** The test for an inheriting member function compatibility is equivalent to whether a wrapper
> function with the interface member function signature may invoke the inheriting member function, passing the
> parameters and the return value as is.
>
> For example:
> ```hlsl
> interface IBase
> {
> // a member function that a concrete type
> // must provide
> int32_t someFunc(int8_t a, int16_t b);
> }
>
> struct Test : IBase
> {
> // Implementation of IBase.someFunc(). This is
> // compatible because the corresponding wrapper
> // is well-formed (see below)
> int16_t someFunc(int a, int32_t b)
> {
> return int16_t(a + b);
> }
> }
>
> // A wrapper invoking Test.someFunc() with the
> // parameters and the return value of the interface
> // member function declaration
> int32_t IBase_wrapper_Test_someFunc(
> Test obj, int8_t a, int16_t b)
> {
> return obj.someFunc(a, b);
> }
> ```
> 📝 **Remark 2:** An interface can also be parameterized using generics.
>
> For example:
>
> ```hlsl
> interface ITypedReq<T>
> {
> T someFunc(T param);
> }
>
> struct TestClass : ITypedReq<uint>
> {
> uint someFunc(uint param)
> {
> return 123 + param;
> }
> }
>
> [shader("compute")]
> void main(uint3 id : SV_DispatchThreadID)
> {
> TestClass obj = { };
>
> obj.someFunc(id.x);
> }
> ```
>
> See [Generics (TODO)](TODO) for further information on generics.
## Interface-Conforming Variants
> ⚠️ **Warning:** This language feature is experimental and subject to change.
A variable declared with an interface type is an *interface-conforming variant*or *interface variant* for
short. An interface variant may have any type conforming to the interface. When an interface variant is
instantiated, the following restrictions apply:
- The types conforming to the interface type may not have data members with opaque types such as `Texture2D`
- The types conforming to the interface type may not have data members with non-copyable types
- The types conforming to the interface type may not have data members with unsized types
Further, invoking a member function of an interface variant has performance overhead due to dynamic
dispatching.
> 📝 **Remark 1:** Function parameters with interface types do not impose the above restrictions when invoked
> with variables with types known at compile time.
> 📝 **Remark 2:** Initializing an interface variant using the default initializer is deprecated. Invoking a
> default-initialized interface variant is undefined behavior.
> 📝 **Remark 3:** In `slangc`, an interface variant is said to have an [existential
> type](https://en.wikipedia.org/wiki/Type_system#Existential_types), meaning that its type exists such that
> it conforms to the specified interface.
## Example
```hlsl
RWStructuredBuffer<int> outputBuffer;
interface IBase
{
// a member function that concrete types
// must define
int getA();
// a static const data member that concrete types
// must define
static const int8_t bias;
}
interface ITest : IBase
{
// Note: concrete types inheriting from this interface
// must define getA()
// a member function defined by the interface,
// to be overridden by ConcreteInt16
int someFunc()
{
return getA() + bias;
}
// a static member function that ConcreteInt16 and
// ConcreteInt32 must provide.
static int getUnderlyingWidth();
}
struct ConcreteInt32 : ITest
{
static const int8_t bias = 3;
int32_t a;
int32_t getA()
{
return a;
}
static int getUnderlyingWidth()
{
return 32;
}
}
struct ConcreteInt16 : ITest
{
static const int8_t bias = 1;
int16_t a;
int getA()
{
return a;
}
// override default implementation of someFunc()
override int someFunc()
{
return a + 5;
}
static int getUnderlyingWidth()
{
return 16;
}
}
// This function accepts any object conforming to
// interface ITest
int getValSquared(ITest i)
{
return i.getA() * i.getA();
}
// This function creates a concrete object and returns
// it as an interface variant.
ITest createConcrete(bool is32, uint initialValue)
{
if (is32)
return ConcreteInt32(initialValue);
else
return ConcreteInt16(initialValue);
}
[shader("compute")]
[numthreads(16, 16, 1)]
void main(uint3 id : SV_DispatchThreadID)
{
int ret;
// Pass an object to a function
// with interface-typed argument
if ((id.x & 1) == 1)
{
ConcreteInt32 val = { id.y };
// A copy of getValSquared() is specialized for
// ConcreteInt32.
ret = getValSquared(val);
}
else
{
ConcreteInt16 val = { id.y };
// A copy of getValSquared() is specialized for
// ConcreteInt16.
ret = getValSquared(val);
}
outputBuffer[id.x] = ret;
// An interface variant. Declaring this imposes
// restrictions on the types conforming to the
// interface.
ITest iobj;
if ((id.x & 1) == 1)
{
// create and assign a ConcreteInt32 object to the
// interface variant
iobj = createConcrete(true, id.y);
}
else
{
// create and assign a ConcreteInt16 object to the
// interface variant
iobj = createConcrete(false, id.y);
}
// Note: Invoking a member function of an interface variant
// has the overhead of dynamic dispatching.
outputBuffer[id.x] += iobj.someFunc();
// Dynamic dispatch overhead also here:
outputBuffer[id.x] += iobj.getUnderlyingWidth();
}
```
## Memory Layout and Dispatch Mechanism
The memory layout of an interface-conforming variant is unspecified. Type-based dynamic dispatching of a
member function invocation is unspecified. Both are subject to change in future versions of `slangc`.
### Non-Normative Description of Interface-Conforming Variants
> 📝 **Remark:** The contents of this section are informational only and subject to change.
In the current implementation, the layout of an interface-conforming variant is a tagged union, conceptually
as follows:
```hlsl
// Note: unions do not exist in Slang
union InterfaceConcreteObjectTypes
{
ConcreteType1 obj1;
ConcreteType2 obj2;
ConcreteType3 obj3;
// ...
}
struct InterfaceImplementationType
{
uint32_t typeTag;
InterfaceConcreteObjectTypes tuple;
}
```
However, since Slang does not have union types where the underlying data is reinterpreted as one of the union
types, union types are emulated. Emulation is performed by packing/unpacking the data for a concretely-typed
object to/from the underlying representation. This involves memory copies.
When the type is not known at compile-time, dynamic dispatch based on the type tag is performed to invoke
member functions. Internally, this is performed as follows:
1. A `switch` statement based on the type tag selects the correct type-specific implementation for the
subsequent steps.
2. The concretely-typed object from the union is unpacked (non-static member functions only)
3. The member function of the concrete type is invoked
4. The concretely-typed object is packed back to the union (non-static mutating member functions only)
When the type is known at compile-time, the code using an interface type is specialized for the concrete
type. This avoids the performance overhead of dynamic dispatching and union types.
### Non-Normative Description of Interface-typed Function Parameters
> 📝 **Remark:** The contents of this section are informational only and subject to change.
In the current implementation, when a function with an interface-typed parameter is invoked with a type known
at compile time, the function is specialized for the concrete type. This essentially creates a copy of the
function with the interface-typed parameter replaced with a concrete-typed parameter.
See also [Generic Functions](TODO-Generics.md).

View File

@@ -0,0 +1,173 @@
# Pointer Types
A pointer to type `T` represents an address of an object of type `T`.
> ⚠️ **Warning:** Pointers are not yet fully implemented in `slangc`.
Current limitations include:
- Pointers to local memory are supported only on CUDA and CPU targets.
- Slang does not support pointers to opaque handle types such as `Texture2D`.
For handle pointers, use `DescriptorHandle<T>` instead.
- Slang does not currently support `const` pointers.
- Slang does not support custom alignment specification. Functions
[loadAligned()](../../../core-module-reference/global-decls/loadaligned-4.html) and
[storeAligned()](../../../core-module-reference/global-decls/storealigned-5.html) may be used for loads and
stores using pointers with known alignment.
- Pointers are not supported on all targets.
- Slang does not currently support inheritance with pointers. In particular, a pointer to a structure
conforming to interface `I` cannot be cast to a pointer to `I`.
See also GitHub issue [#9061](https://github.com/shader-slang/slang/issues/9061).
## Declaration Syntax {#syntax}
> *`simple-type-id-spec`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`modifier-list`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;*`type-identifier`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`generic-params-decl`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(**`'['`** [*`constant-index-expr`*] **`']'`** | **`'*'`** )*
See [type specifier syntax](types.md#syntax) for full type specifier syntax.
> 📝 **Remark 1:** A pointer type specified with the declaration syntax is equivalent to
> [generic pointer type](#generic-pointer) `Ptr<T, Access.ReadWrite, AddressSpace.Device>`.
> 📝 **Remark 2:** Pointers can also be declared using [variable declarations](declarations.md). In this case, a
> variable is declared as a pointer to a type, rather than the type itself being a pointer type.
### Parameters
- *`modifier-list`* is an optional list of modifiers (TODO: link)
- *`type-identifier`* is an identifier that names an existing type or a generic type. For example, this may be
a [fundamental type](types-fundamental.md), [vector/matrix generic type](types-vector-and-matrix.md),
user-defined type such as a named [structure type](types-struct.md), [interface type](types-interface.md),
[enumeration type](types-enum.md), type alias, or a type provided by a module.
- *`generic-params-decl`* is a generic parameters declaration. See [Generics (TODO)](TODO).
- **`'['`** [*`constant-index-expr`*] **`']'`** is an [array dimension declaration](types-array.md) with an
optional constant integral expression specifying the dimension length.
- **`'*'`** is a [pointer declaration](types-pointer.md).
## Generic Pointer Types {#generic-pointer}
Type aliases provided by the Slang standard library:
- A generic pointer type: [Ptr<T, AccessMode, AddressSpace>](../../../core-module-reference/types/ptr-0/index.html)
- Pointer to immutable data: [ImmutablePtr<T, AddressSpace>](../../../core-module-reference/types/immutableptr-09.html)
### Parameters
- *`T`* is the element type.
- *`AccessMode`* is the storage access mode.
- *`AddressSpace`* is the storage address space.
See [pointer traits](#traits).
## Description
The pointer declaration `*` applied to a base type creates a pointer type. The base type may be
any [addressable](types-traits.md) type including pointer and [array](types-array.md) types.
To obtain the address of an object, the *address-of* operator `&` is used. The address may be assigned to a
pointer variable with a matching type. Alternatively, `__getAddress(obj)` may be used.
To access the *pointed-to* object, the pointer dereference operator `*` is used. If the pointed-to type is a
[structure](types-struct.md) or a [class](types-class.md) type, the member access operators `.` or `->` may be
used to dereference the pointer and access a member.
When a pointer points to an array element, an integer value may be added to or subtracted from it. The
resulting pointer points to an element offset by that value.
A pointer value belongs to one of the following classes:
- a pointer to an object with matching [traits](#traits), including a pointer to an array element
- a pointer past the end of an object
- a null pointer, which is a special pointer value that points to nothing
- an invalid pointer, otherwise.
It is [undefined behavior](basics-behavior.md#classification) to dereference a pointer that does not point to
an object with matching traits.
For a comprehensive description, see [pointer expressions (TODO)](expressions.md).
> ⚠️ **Warning:** When a pointer is to an element in a multi-dimensional array, pointer arithmetic must
> always result in a pointer that is in the same innermost array or a pointer past the last object in the
> array (which may not be dereferenced). Any other result is
> [undefined behavior](basics-behavior.md#classification).
> 📝 **Remark 1:** Currently, there are no `const` pointers in Slang. Pointers to read-only data and immutable
> data may be declared with [generic pointer types](#generic-pointer).
> 📝 **Remark 2:** Consider the following pointer arithmetic:
>
> ```hlsl
> var arr : uint[10] = { };
> var ptr : uint *;
>
> ptr = &arr[9]; // OK: ptr points to the last element
> // of the array
>
> ptr++; // Still OK: ptr points to one past the
> // last element
>
> ptr++; // Pointer is now invalid
>
> ptr--; // No validity guarantees with invalid
> // pointers in pointer expressions;
> // dereferencing would be undefined behavior
> ```
## Pointer Traits {#traits}
A pointer type has the following traits:
- type of the pointed-to object
- access mode
- address space
A valid pointer may only point to objects with matching traits.
The default pointer address space is `AddressSpace.Device`, and the default access mode is
`Access.ReadWrite`. It is not possible to use different address spaces or access modes using the declaration
syntax.
Pointers for other address spaces and access modes may be declared by using type alias
[Ptr<T, AccessMode, AddressSpace>](../../../core-module-reference/types/ptr-0/index.html) provided
by the standard library. There is no implicit conversion from read-write to read-only pointers.
## Examples {#examples}
### Pointers Denoting a Range
```hlsl
RWStructuredBuffer<uint> outputBuffer;
cbuffer Globals
{
Ptr<uint> g_inputData;
uint g_inputDataLen;
}
// Calculate sum of half-open range [start, end)
uint sumOfValues(uint *start, uint *end)
{
uint sum = 0;
for (uint *i = start; i != end; ++i)
{
sum = sum + *i;
}
return sum;
}
[numthreads(1, 1, 1)]
void main(uint3 id : SV_DispatchThreadID)
{
// Calculate sum of elements 0, 1, ..., 9 provided
// the input data buffer is big enough.
outputBuffer[id.x] +=
sumOfValues(
g_inputData, &g_inputData[min(g_inputDataLen, 10)]);
}
```

View File

@@ -0,0 +1,22 @@
# Special Types
## Opaque Types {#opaque}
Opaque types are built-in types that have target-defined representation in memory. This includes the bit
representation, the underlying type, and characteristics such as size and alignment. Other shader languages
may refer to these as *opaque handle types*, *opaque descriptor types*, *resource object types*, or *resource
data types*.
The full list of opaque types supported by Slang can be found in the core module reference. Some important
examples:
* Texture types such as `Texture2D<T>`, `TextureCubeArray<T>`, `DepthTexture2D<T>`, and `RWTexture2DMS<T>`
* Combined texture-sampler types such as `Sampler2D<T>` and `Sampler2DShadow<T>`
* Sampler state types: `SamplerState` and `SamplerComparisonState`
* Buffer types like `ConstantBuffer<T>` and `StructuredBuffer<T>`
* Parameter blocks: `ParameterBlock<T>`
Slang makes no guarantees about layout rules or type conversion rules of opaque types.
**TODO**: Add other special types

View File

@@ -0,0 +1,656 @@
# Structures
## Syntax
Struct *no-body* declaration:
> *`struct-decl`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`modifier-list`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;**`'struct'`** [*`identifier`*] [*`generic-params-decl`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[**`':'`** *`bases-clause`*] [**`'='`** *`type-expr`*] **`';'`**
Struct *with-members* declaration:
> *`struct-decl`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`modifier-list`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;**`'struct'`** [*`identifier`*] [*`generic-params-decl`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[**`':'`** *`bases-clause`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(**`'where'`** *`where-clause`*)\*<br>
> **`'{'`** *`member-list`* **`'}'`**
Struct *link-time extern type* declaration:
> *`struct-decl`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`modifier-list`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;**`'extern'`** **`'struct'`** [*`identifier`*] [*`generic-params-decl`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[**`':'`** *`bases-clause`*] [**`'='`** *`type-expr`*] **`';'`**
Struct *link-time export type alias* declaration:
> *`struct-decl`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`modifier-list`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;**`'export'`** **`'struct'`** [*`identifier`*] [*`generic-params-decl`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[**`':'`** *`bases-clause`*] **`'='`** *`type-expr`* **`';'`**
Member list:
> *`member-list`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;( *`var-decl`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;| *`type-decl`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;| *`function-decl`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;| *`constructor-decl`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;| *`property-decl`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;| *`subscript-op-decl`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;| *`function-call-op-decl`* )*
### Parameters
- *`modifier-list`* is an optional list of modifiers (TODO: link)
- *`identifier`* is an optional name of the declared struct type
- *`generic-params-decl`* is an optional generic parameters declaration. See [Generics (TODO)](TODO).
- *`bases-clause`* is an optional list of inherited [interfaces](types-interface.md).
- *`type-expr`* is an optional type expression for an alias type. See [Modules (TODO)](TODO).
- *`where-clause`* is an optional generic constraint expression. See [Generics (TODO)](TODO).
- *`member-list`* is a list of struct members. A member is one of:
- *`var-decl`* is a member variable declaration. See [Variables (TODO)](TODO)
- *`type-decl`* is a nested [type declaration](types.md).
- *`function-decl`* is a member function declaration. See [Functions (TODO)](TODO)
- *`constructor-decl`* is a [constructor declaration](#constructor).
- *`property-decl`* is a [property declaration](#property).
- *`subscript-op-decl`* is a [subscript operator declaration](#subscript-op).
- *`function-call-op-decl`* is a [function call operator declaration](#function-call-op).
> ⚠️ **Warning:** `Slangc` currently accepts bracketed attributes right after the **`'struct'`** keyword. This
> is deprecated syntax and expected to be removed. Bracketed attributes should be added in *`modifier-list`*,
> instead. ([Issue #9691](https://github.com/shader-slang/slang/issues/9691))
## Description
A structure is a type consisting of an ordered sequence of members. A `struct` declaration has the following
forms:
- The *no-body* declaration specifies an existence of a structure type. The declaration simply specifies that
a structure type with the specified name exists. This enables its use in type expressions without the member
declarations.
- The *with-members* declaration defines the structure type with a layout and an [extensible](types-extension.md)
list of non-layout members.
- The *link-time extern type* declaration specifies the existence of a structure type that is defined in
another module. See [Modules (TODO)](TODO).
- The *link-time export type* declaration specifies that a structure type is exported with a type alias. See
[Modules (TODO)](TODO).
When the *`identifier`* is specified, it is the name of the structure. Otherwise, the structure is anonymous,
which means that it is assigned an unspecified unique name. The main use of anonymous structures is in inline
type definition expressions. For example, `struct { int a; } obj;` defines variable `obj` with an anonymous
structure type that has field `int a;`. Anonymous structure declarations are meaningful only in the
*with-members* form.
A structure member is declared in the structure body and is one of the following:
- A static data member; declared as a variable with the `static` keyword.
- A non-static data member (*aka.* field); declared as a variable.
- A [constructor](#constructor).
- A [static member function](#static-member-function); declared as a function with the `static` modifier
keyword.
- A [non-static member function](#nonstatic-member-function); declared as a function without the `static`
modifier keyword.
- A nested type; declared as a type or a type alias.
- A [`property` declaration](#property).
- A [`__subscript` declaration](#subscript-op).
- A [function call operator declaration](#function-call-op).
A data member and a member function can be declared with the `static` keyword.
- The storage for a static data member is allocated from the global storage. A static member function may:
- Access static data members of the structure.
- Invoke other static member functions of the structure.
- The storage for a non-static data member is allocated as part of the structure. A non-static member function may:
- Access both the static and the non-static data members.
- Invoke both the static and the non-static member functions.
Data members may be assigned with a default initializer. The following rules apply:
- When an object is initialized using an initializer list, the default initializer of a non-static data member
specifies the initial value when the initializer list does not provide one.
- When an object is initialized using a constructor, the default initializer of a non-static data member
specifies the initial value of the data member. A constructor may override this, unless the member is
`const`.
- `static const` data members must have a default initializer.
For further information, see [Initialization (TODO)](TODO).
The non-static data members are allocated sequentially within the `struct` when a variable of this type is
allocated. See [Variables (TODO)](TODO).
A nested type is a regular type enclosed within the scope of the outer `struct`.
A structure may conform to one or more [interface](types-interface.md) types.
A structure may be extended with a [type extension](types-extension.md).
`struct` members may be declared with access control specifiers `public`, `internal`, or `private` (specified
in *`modifier-list`*). The default member visibility is `internal`. Nested `struct` members have access to
`private` members of the enclosing `struct`. See [access control (TODO)](TODO) for further information.
> ⚠️ **Warning:** Structure inheriting from another structure is deprecated. It may not work as expected.
## Objects {#object}
An object is an *instance* of a `struct`. An instance consists of all non-static data members defined in a
`struct`. The data members may be initialized using an initializer list or a constructor. For details, see
[variable declarations](declarations.md).
## Constructors {#constructor}
### Syntax
Declaration without body: (interfaces only)
> **`'__init'`** **`'('`** *`param-list`* **`')'`** (**`'where'`** *`where-clause`*)\* **`';'`**
Declaration with body:
> **`'__init'`** **`'('`** *`param-list`* **`')'`** (**`'where'`** *`where-clause`*)\*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;**`'{'`** *`body-stmt`*\* **`'}'`**
### Description
When a user-provided constructor is defined for a `struct`, a constructor is executed on object
instantiation. A constructor can have any number of parameters. A constructor does not have a return
type. More than one constructors may be defined in which case overload resolution is performed to select the
most appropriate constructor given the initialization parameters.
The constructor parameters are provided in the optional initializer list. When an initializer is not provided,
the no-parameter constructor is invoked.
If a non-static data member is not initialized by the constructor, it has an undefined state after object
instantiation.
`const` data members cannot be initialized by the constructor.
*`where-clause`* is an optional generic constraint expression, discussed in [Generics (TODO)](TODO).
**Example:**
```hlsl
struct TestClass
{
int a, b;
__init()
{
a = 1;
b = 2;
}
__init(int _a)
{
a = 1;
b = 2;
}
__init(int _a, int _b)
{
a = _a;
b = _b;
}
}
TestClass obj1;
// obj1.a = 1;
// obj1.b = 2;
//
// Note: TestClass obj1 = { }; also calls the constructor
// without parameters
TestClass obj2 = { 42 };
// obj2.a = 42;
// obj2.b = 2;
TestClass obj3 = { 42, 43 };
// obj3.a = 42;
// obj3.b = 43;
```
When no user-provided constructor is defined, an aggregate initialization is performed, instead. In aggregate
initialization, an initializer list contains values for the `struct` non-static data members. If the
initializer list does not contain enough values, the remaining data members are default-initialized. If no
initializer list is provided, a class without a user-provided constructor is instantiated in an undefined
state.
> 📝 **Remark 1:** When a class without user-provided constructor is instantiated without an initializer list, the
> object's initial state is undefined. This includes data members which have members with user-provided
> constructors.
>
> ```hlsl
> struct TestField
> {
> int x;
> __init() { x = 5; }
> }
>
> struct TestClass
> {
> int a, b;
> TestField f;
> }
>
> // note: obj is instantiated with an undefined state
> // regardless of TestField having a user-provided constructor.
>
> TestClass obj;
> ```
> 📝 **Remark 2:** Accessing data members that are in undefined state is undefined behavior.
## Static Member Functions {#static-member-function}
A static member function is a regular function enclosed within the `struct` name space. Static member
functions may access only static structure members.
Invocation of a static member function does not require an object.
## Non-static Member Functions {#nonstatic-member-function}
A non-static member function has a hidden parameter `this` that refers to an object. The hidden parameter
is used to reference the object data members and to invoke other non-static member functions.
In the function body, other members may be referenced using `this.`, although it is optional.
By default, only a read access to the object members is allowed by a member function. If write access is
required, the member function must be declared with the `[mutating]` attribute.
Non-static member functions cannot be invoked without an object.
> 📝 **Remark:** In C++ terminology, a member function is `const` by default. Attribute `[mutating]` makes it
> a non-`const` member function.
## Properties {#property}
### Syntax
Modern syntax, implicit `get` declaration: (interfaces only)
> **`'property'`** *`identifier`* **`':'`** *`type-expr`* **`';'`**
Modern syntax, explicit accessor declarations:
> **`'property'`** *`identifier`* **`':'`** *`type-expr`*<br>
> **`'{'`** *`accessor-decl`*\* **`'}'`**
Traditional syntax, implicit `get` declaration: (interfaces only)
> **`'property'`** *`traditional-var-decl`* **`';'`**
Traditional syntax, explicit accessor declarations:
> **`'property'`** *`traditional-var-decl`*<br>
> **`'{'`** *`accessor-decl`*\* **`'}'`**
Accessor declaration syntax, no body: (interfaces only)
> *`accessor-decl`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(**`'get'`** \| **`'set'`**) [**`'('`** *`param-list`* **`')'`**] **`';'`**
Accessor declaration syntax, with body:
> *`accessor-decl`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(**`'get'`** \| **`'set'`**) [**`'('`** *`param-list`* **`')'`**]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;**`'{'`** *`body-stmt`*\* **`'}'`**
### Description
A property is a non-static member that provides a data member access interface. Properties of objects are
accessed similarly to data members: reading a property is directed to the `get` accessor of the property and
writes are directed to the `set` accessor, respectively.
A property that only provides the `get` accessor is a read-only property. A property that only provides the
`set` accessor is a write-only property. A property that provides both is a read/write property.
The parentheses in the `get` accessor declaration are optional. The `get` accessor accepts no parameters.
The parentheses and the parameter in the `set` accessor declaration are optional. In case the parameter is not
specified in the declaration, parameter `newValue` with the same type as the property is provided to the `set`
body.
The property declaration forms without accessor or accessor body declarations are useful only in
[interface](types-interface.md) declarations.
> ⚠️ **Warning:** Property reference accessor `ref` is a Slang internal language feature. It is subject to
> change and may not work as expected.
**Example:**
```hlsl
struct TestClass
{
float m_val;
// automatically updated derivative of m_val
bool m_valIsPositive;
property someProp : float
{
get
{
return m_val;
}
set
{
m_val = newValue;
m_valIsPositive = (newValue > 0.0f);
}
}
}
[shader("compute")]
void main(uint3 id : SV_DispatchThreadID)
{
TestClass obj = { };
// this sets both obj.m_val and obj.m_valIsPositive
obj.someProp = 3.0f;
}
```
> 📝 **Remark 1:** A property can be used to replace a non-`static` data member when additional logic is desired to
> be added systematically to data member access. This can avoid refactoring call sites.
> 📝 **Remark 2:** A non-static data member can be used to implement an interface property requirement. See
> [interfaces](types-interface.md) for details.
> 📝 **Remark 3:** In the example above, the property could have also been declared as:
>
> ```hlsl
> struct TestClass
> {
> // ...
>
> property someProp : float
> {
> get()
> {
> return m_val;
> }
>
> set(float newVal)
> {
> m_val = newVal;
> m_valIsPositive = (newVal > 0.0f);
> }
> }
> }
> ```
## Accessing Members and Nested Types
The static and non-static structure members and nested types are accessed using \``.`\`.
> ⚠️ **Warning:** The C++-style scope resolution operator \``::`\` is deprecated. It should not be used.
**Example:**
```hlsl
// struct type declaration
struct TestStruct
{
// data member
int a;
// static data member, initial value 5
static int b = 5;
// static constant data member, initial value 6
static const int c = 6;
// nested type
struct NestedStruct
{
static int c = 6;
int d;
}
// member function with read-only access
// to non-static data members
int getA()
{
// also just plain "return a" would do
return this.a;
}
// member function with read/write access
// to non-static data members
[mutating] int incrementAndReturnA()
{
// modification of data member
// requires [mutating]
a = a + 1;
return a;
}
// static member function
static int getB()
{
return b;
}
static int incrementAndReturnB()
{
// [mutating] not needed for
// modifying static data member
b = b + 1;
return b;
}
}
// instantiate an object of type TestStruct using defaults
TestStruct obj = { };
// instantiate an object of type NestedStruct
TestStruct.NestedStruct obj2 = { };
// access an object data member directly
obj.a = 42;
// access a static data member directly
int tmp0 = TestStruct.b + TestStruct.NestedStruct.c;
// invoke object member functions
int tmp1 = obj.getA();
int tmp2 = obj.incrementAndReturnA();
// invoke static members functions
// '.' can be used to resolve scope
int tmp3 = TestStruct.getB();
// '::' is equivalent to '.' for static member access,
// but '.' is recommended.
int tmp4 = TestStruct::incrementAndReturnB();
```
## Subscript operator {#subscript-op}
### Syntax
Implicit `get` declaration: (interfaces only)
> **`'__subscript'`** [**`'('`** *`param-list`* **`')'`**]<br>
Explicit accessor declarations:
> **`'__subscript'`** [**`'('`** *`param-list`* **`')'`**] **`'->'`** *`type-expr`*<br>
> **`'{'`** *`accessor-decl`*\* **`'}'`**
See [properties](#property) for *`accessor-decl`* syntax.
### Description
A subscript `[]` operator can be added in a structure using a `__subscript` declaration. It is conceptually
similar to a `property` with the main differences being that it operates on the instance of a `struct`
(instead of a member) and it accepts parameters.
A subscript declaration may have any number of parameters, including no parameters at all.
The `get` accessor of a `__subscript` declaration is invoked when the subscript operator is applied to an
object to return a value. The parentheses in the `get` accessor declaration are optional.
The `set` accessor of a `__subscript` declaration is invoked when the subscript operator is applied to an
object to assign a value. The parentheses and the parameter in the `set` accessor declaration are optional. In
case the parameter is not specified in the declaration, a parameter `newValue` with the same type as specified
for the subscript operator is provided to the `set` body.
Multiple `__subscript` declarations are allowed as long as the declarations have different
signatures. Overload resolution is the same as overload resolution with function invocations.
> ⚠️ **Warning:** Subscript operator reference accessor `ref` is a Slang internal language feature. It is
> subject to change and may not work as expected.
**Example:**
```hlsl
RWStructuredBuffer<float> outputBuffer;
struct TestStruct
{
var arr : float[10][10];
// declare a 0-parameter subscript operator
__subscript () -> float
{
get { return arr[0][0]; }
set { arr[0][0] = newValue; }
}
// declare a 1-parameter subscript operator
__subscript (int i) -> float
{
get { return arr[0][i]; }
set { arr[0][i] = newValue; }
}
// declare a 2-paramater subscript operator
__subscript (int i0, int i1) -> float
{
get { return arr[i1][i0]; }
set { arr[i1][i0] = newValue; }
}
}
void main(uint3 id : SV_DispatchThreadID)
{
TestStruct x = { };
x[] = id.z;
x[id.y] = id.z;
x[id.x, id.y] = id.z;
outputBuffer[id.x] = x[];
outputBuffer[id.y] = x[id.x];
outputBuffer[id.z] = x[id.x, id.y];
}
```
## Function call operator {#function-call-op}
### Syntax
Declaration without body: (interfaces only)
> *`type-expr`* **`'operator'`** **`'(' ')'`** **`'('`** *`param-list`* **`')'`** **`';'`**
Declaration with body:
> *`type-expr`* **`'operator'`** **`'(' ')'`** **`'('`** *`param-list`* **`')'`**<br>
> **`'{'`** *`body-stmt`*\* **`'}'`**
### Description
A function call `()` operator can be added using an `operator ()` declaration. This allows applying parameters
to an object as if the object was a function.
Multiple declarations are allowed as long as the declarations have different signatures. Overload resolution
is the same as overload resolution with function invocations.
**Example:**
```hlsl
RWStructuredBuffer<float> outputBuffer;
struct TestStruct
{
float base;
float operator () ()
{
return base;
}
float operator () (uint x)
{
return base * float(x);
}
float operator () (uint x, uint y)
{
return base * float(x) * float(y);
}
}
void main(uint3 id : SV_DispatchThreadID)
{
TestStruct obj = { 42.0f };
outputBuffer[0] += obj();
outputBuffer[0] += obj(id.y);
outputBuffer[0] += obj(id.z, id.z * 2);
}
```
# Memory Layout
## Natural Layout
The *natural layout* for a structure type uses the following rules:
- The alignment of a structure is the maximum of 1, alignment of any member, and alignment of any parent type.
- The data is laid out in order of:
- Parent types
- Non-static data members
- Offset of the data items:
- The offset of the first data item is 0
- The offset of the *Nth* data item is the offset+size of the previous item rounded up to the alignment of
the item
- The size of the structure is offset+size of the last item. That is, the structure is not tail-padded and rounded
up to the alignment of the structure.
The following algorithm may be used:
1. Initialize variables `size` and `alignment` to zero and one, respectively
2. For each field `f` of the structure type:
1. Update `alignment` to be the maximum of `alignment` and the alignment of `f`
2. Set `size` to the smallest multiple of `alignment` not less than `size`
3. Set the offset of field `f` to `size`
4. Add the size of `f` to `size`
When this algorithm completes, `size` and `alignment` will be the size and alignment of the structure type.
> 📝 **Remark:** Most target platforms do not use the natural layout directly, but it provides a baseline for
> defining other layouts. Any layout for a structure type must guarantee an alignment at least as large as the
> standard layout.
## C-Style Layout
The C-style layout of a structure type differs from the natural layout in that the structure size is rounded
up to the structure alignment. This mirrors the layout rules used by typical C/C++ compilers.
## D3D Constant Buffer Layout
D3D constant buffer layout is similar to the natural layout with two differences:
- The minimum alignment is 16.
- If a data member crosses a 16-byte boundary and its offset is not aligned by 16, the offset is rounded up to the
next multiple of 16.
- In HLSL, this is called an _improper straddle_.
This Type
---------
Within the body of a structure or interface declaration, the keyword `This` may be used to refer to the
enclosing type. Inside of a structure type declaration, `This` refers to the structure type itself. Inside
of an interface declaration, `This` refers to the concrete type that is conforming to the interface (that is,
the type of `this`).

View File

@@ -0,0 +1,13 @@
# Type Traits
**TODO**: Describe all language-related type traits.
Incomplete list:
- copyable/non-copyable (do we have movable/non-movable?)
- opaque/non-opaque
- known size/unknown size. Note: this might need further classification into implementation-specified sizes
known by `slangc` vs sizes that only the target compiler knows (but sizeof() would still work)
- allowed storage duration: static (globals), block (function locals/params/return)
- serializable/non-serializable (types that can be assigned to interface-typed variables)
- addressable/non-addressable (types that are valid types for pointers)
- etc

View File

@@ -0,0 +1,237 @@
# Vector and Matrix Types
## Vector Types
A `vector<T, N>` represents a vector of `N` elements of type `T` where:
- `T` is a [fundamental scalar type](types-fundamental.md)
- `N` is a [specialization-time constant integer](TODO-Generics.md) in range [1, 4] denoting the number of elements.
The default values for `T` and `N` are `float` and `4`. This is for backwards compatibility.
### Element Access
An element of a vector is accessed by the following means:
- Using the subscript operator `[]` (index `0` denotes the first element)
- Using the member of object operator `.` where the elements are named `x`, `y`, `z`, `w` corresponding to indexes `0`, `1`, `2`, `3`.
Example:
```hlsl
vector<int, 4> v = { 1, 2, 3, 4 };
int tmp;
tmp = v[0]; // tmp is 1
tmp = v.w; // tmp is 4
v[1] = 9; // v is { 1, 9, 3, 4 };
```
Multiple elements may be referenced by specifying two or more elements after the member access operator. This can be used to:
- Extract multiple elements. The resulting type is a vector with the size equal to the number of selected elements. The same element may be specified multiple times.
- Assign multiple elements using a vector with the size equal to the number of selected elements. The elements must be unique.
Example:
```hlsl
vector<int, 4> v = { 1, 2, 3, 4 };
int2 tmp2;
int3 tmp3;
tmp2 = v.xy; // tmp2 is { 1, 2 }
tmp3 = v.xww; // tmp3 is { 1, 4, 4 }
v.xz = vector<int, 2>(-1, -3); // v becomes { -1, 2, -3, 4 }
```
### Operators
When applying an unary arithmetic operator, the operator applies to all vector elements.
Example:
```hlsl
vector<int, 4> v = { 1, 2, 3, 4 };
vector<int, 4> tmp;
tmp = -v; // tmp is { -1, -2, -3, -4 };
```
When applying a binary arithmetic operator where the other operand is scalar, the operation applies to all vector elements with the scalar parameter.
Example:
```hlsl
vector<int, 4> v = { 1, 2, 3, 4 };
vector<int, 4> tmp;
tmp = v - 1; // tmp is { 0, 1, 2, 3 };
tmp = 4 - v; // tmp is { 4, 3, 2, 1 };
```
When applying a binary assignment operator where the right-hand operand is scalar, the assignment applies to all vector element with the scalar parameter.
Example:
```hlsl
vector<int, 4> v = { 1, 2, 3, 4 };
v += 1; // v becomes { 2, 3, 4, 5 };
v = 42; // v becomes { 42, 42, 42, 42 };
```
When applying a binary arithmetic, assignment, or comparison operator with two vectors of same length, the operator is applied element-wise.
Example:
```hlsl
vector<int, 4> v1 = { 1, 2, 3, 4 };
vector<int, 4> v2 = { 5, 6, 7, 8 };
vector<int, 4> tmp;
tmp = v1; // tmp is { 1, 2, 3, 4 };
tmp = v1 + v2; // tmp is { 6, 8, 10, 12 };
tmp = v1 * v2; // tmp is { 5, 12, 21, 32 };
vector<bool, 4> cmpResult;
cmpResult = (v1 == vector<int, 4>(1, 3, 2, 4));
// cmpResult is { true, false, false, true }
v1 -= v2; // v1 becomes { -4, -4, -4, -4 };
```
### Standard Type Aliases
Slang provides type aliases for all vectors between size 1 and 4 for fundamental scalar types. The type alias has name `<fundamental_type>N` where `<fundamental_type>` is one of the fundamental types and `N` is the vector length.
Example:
```hlsl
float4 v = { 1.0f, 2.0f, 3.0f, 4.0f }; // vector<float, 4>
int32_t2 i2 = { 1, 2 }; // vector<int, 2>
bool3 b3 = { true, false, false }; // vector<bool, 3>
```
### Memory Layout
The memory layout of a vector type is `N` contiguous values of type `T` with no padding.
The alignment of a vector type is target-defined. The alignment of `vector<T, N>` is at least the alignment of `T` and at most `N` times the alignment of `T`.
## Matrix Types
Type `matrix<T, R, C>` represents a `R`×`C` matrix of elements of type `T` where:
- `T` is a [fundamental scalar type](types-fundamental.md)
- `R` is a [specialization-time constant integer](TODO-Generics.md) in range [1, 4] denoting the number of rows.
- `C` is a [specialization-time constant integer](TODO-Generics.md) in range [1, 4] denoting the number of columns.
The default values for `T`, `R`, `C` are `float`, `4`, `4`. This is for backwards compatibility.
### Row and element access ###
A row of a matrix is accessed by the subscript operator `[]` (index `0` denotes the first row).
The element of a row is accessed by the following means:
- Using the subscript operator `[]` (index `0` denotes the first column)
- Using the member of object operator `.` where the columns are named `x`, `y`, `z`, `w` corresponding to column indexes `0`, `1`, `2`, `3`.
Example:
```hlsl
matrix<int, 3, 4> v = {
1, 2, 3, 4, // row index 0
5, 6, 7, 8, // row index 1
9, 10, 11, 12 // row index 2
};
int tmp1 = v[1][2]; // tmp1 is 7 (row index 1, column index 2)
int tmp2 = v[1].w; // tmp2 is 8 (row index 1, column index 3)
int4 tmp3 = v[2]; // tmp3 is { 9, 10, 11, 12 }
int2 tmp4 = v[0].yx; // tmp4 is { 2, 1 }
```
### Operators
When applying an unary operator, the operator applies to all matrix elements.
When applying a binary operator, it is applied element-wise. Both the left-hand side and the right-hand side operands must be matrices of the same dimensions.
The [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication) is performed using function `mul()`, which has the following basic forms:
- matrix/matrix form `mul(m1, m2)` where `m1` is an `M`×`N` matrix and `m2` is an `N`×`P` matrix. The result is an `M`×`P` matrix.
- vector/matrix form `mul(v, m)` where `v` is a vector of length `N` and `m` is an `N`×`P` matrix. The result is a vector of length `P`.
- `v` is interpreted as a row vector, *i.e.*, a `1`×`N` matrix.
- matrix/vector form `mul(m, v)` where `m` is an `M`×`N` matrix and `v` is a vector of length `N`. The result is a vector of length `M`.
- `v` is interpreted as a column vector, *i.e.*, an `N`×`1` matrix.
> 📝 **Remark 1:** The operator `*` performs element-wise multiplication. It should be used only when the element-wise multiplication of same-sized matrices is desired.
> 📝 **Remark 2:** The operator `*` differs from GLSL, where it performs matrix multiplication. When porting code from GLSL to Slang, replace matrix multiplications using `*` with calls to `mul()`.
### Standard Type Aliases
Slang provides type aliases for all matrices between 1 and 4 rows and columns for fundamental scalar
types. The type alias has name `<fundamental_type>RxC` where `<fundamental_type>` is one of the fundamental
types, `R` is the number of rows, and `C` is the number of columns.
Example:
```hlsl
// matrix<float, 4, 3>
float4x3 m = {
1.1f, 1.2f, 1.3f,
2.1f, 2.2f, 2.3f,
3.1f, 3.2f, 3.3f,
4.1f, 4.2f, 4.3f,
};
```
### Memory Layout
Matrix types support both _row-major_ and _column-major_ memory layout.
Implementations may support command-line flags or API options to control the default layout to use for matrices.
Under row-major layout, a matrix is laid out in memory equivalently to an `R`-element array of `vector<T,C>` elements.
Under column-major layout, a matrix is laid out in memory equivalent to the row-major layout of its transpose.
That is, the layout is equivalent to a `C`-element array of `vector<T,R>` elements.
> 📝 **Remark 1:** Slang currently does *not* support the HLSL `row_major` and `column_major` modifiers to set the
> layout used for specific declarations.
The alignment of a matrix is target-specified. In general, it is at least the alignment of the element and at
most the size of the matrix rounded up to the next power of two.
### Important Note for OpenGL, Vulkan, Metal, and WebGPU Targets ###
Slang considers matrices as rows of vectors (row major), similar to HLSL and the usual mathematical
conventions. However, many graphics APIs including OpenGL, Vulkan, Metal, and WebGPU consider matrices as
columns of vectors (column major).
Summary of differences
| | Slang and HLSL | GLSL, SPIR-V, MSL, WGSL |
|------------------------------------|------------------|-------------------------|
| Initializer element ordering | Row major | Column major |
| type for float, 3 rows × 4 columns | `float3x4` | `mat4x3` (or similar) |
| Element access | `m[row][column]` | `m[column][row]` |
However, for efficient element access with the subscript operator `[]`, Slang reinterprets columns as rows and
vice versa on these targets. That is, a Slang `float3x4` matrix type maps to a `mat3x4` matrix type in
GLSL. This also applies to row major and column major memory layouts. Similar reinterpretation is performed
also by other compilers when compiling HLSL to SPIR-V.
Perhaps most notably, this reinterpretation results in swapped order in matrix multiplication in target
code. For example:
Slang source code:
```hlsl
float4 doMatMul(float4x3 m, float3 v)
{
return mul(m, v);
}
```
Translated GLSL target code:
```glsl
vec4 doMatMul_0(mat4x3 m_0, vec3 v_0)
{
return (((v_0) * (m_0)));
}
```

View File

@@ -0,0 +1,177 @@
# Types
Slang types:
* [Fundamental Types](types-fundamental.md)
* [Vector and Matrix Types](types-vector-and-matrix.md)
* [Structures](types-struct.md) and [Classes](types-class.md)
* [Extensions](types-extension.md)
* [Array Types](types-array.md)
* [Pointers](types-pointer.md)
* [Interfaces](types-interface.md)
* [Special Types](types-special.md)
Other topics:
* [Type Traits](types-traits.md)
* [Type Attributes](types-attributes.md)
## Type Specifiers {#specifier}
A [type specifier](#specifier) names a type. Type specifiers are used in variable declarations, function
parameter and return type declarations, and elsewhere where a type is required. Type specifiers are divided
into two categories:
- A **simple type specifier** is a type expression that names a type but never declares one. Simple type
specifiers are used in function parameter and return type declarations, modern variable declarations, type
constraints, and other places where the ability to declare new types is not expected. Two main forms
exist:
- *Simple type identifier specifier* based on a previously declared type, optionally with an array
declaration and generic parameters.
- *Simple function type specifier* specifying a function type.
- A **type specifier** is a type expression that names a type, possibly by declaring it. A simple type
specifier is a subset of the full type specifier. A type specifier is a part of the
[variable declaration](declarations.md) syntax, which is used to declare variables, as the name suggests.
### Syntax {#syntax}
Simple type specifier:
> *`simple-type-spec`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(*`simple-type-id-spec`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;|*`simple-type-func-spec`*)
Type specifier for named non-array, array, non-pointer, and pointer types:
> *`simple-type-id-spec`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`modifier-list`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;*`type-identifier`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`generic-params-decl`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(**`'['`** [*`constant-index-expr`*] **`']'`** | **`'*'`** )*
Type specifier for function types:
> *`simple-type-func-spec`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`modifier-list`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;**`'functype'`** **`'('`** *`param-list`* **`')'`** **`'->'`** *`simple-type-id-spec`*
Full type specifier, possibly declaring a new type:
> Simple type specifier:<br>
> *`type-spec`* = *`simple-type-spec`*
> <br><br>
> struct/class/enum type specifier:<br>
> *`type-spec`* =<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(*`struct-decl`* | *`class-decl`* | *`enum-decl`*)<br>
> &nbsp;&nbsp;&nbsp;&nbsp;[*`generic-params-decl`*]<br>
> &nbsp;&nbsp;&nbsp;&nbsp;(**`'['`** [*`constant-index-expr`*] **`']'`** | **`'*'`** )*<br>
#### Parameters
- *`modifier-list`* is an optional list of modifiers (TODO: link)
- *`type-identifier`* is an identifier that names an existing type or a generic type. For example, this may be
a [fundamental type](types-fundamental.md), [vector/matrix generic type](types-vector-and-matrix.md),
user-defined type such as a named [structure type](types-struct.md), [interface type](types-interface.md),
[enumeration type](types-enum.md), type alias, or a type provided by a module.
- *`generic-params-decl`* is a generic parameters declaration. See [Generics (TODO)](TODO).
- **`'['`** [*`constant-index-expr`*] **`']'`** is an [array dimension declaration](types-array.md) with an
optional constant integral expression specifying the dimension length.
- **`'*'`** is a [pointer declaration](types-pointer.md).
- *`param-list`* is a function parameter list. See [function parameter list (TODO)](TODO).
- *`struct-decl`* is a [structure](types-struct.md) type declaration, possibly also defining the type.
- *`class-decl`* is a [class (TODO)](types-class.md) type declaration, possibly also defining the type.
- *`enum-decl`* is an [enumeration (TODO)](types-enum.md) type declaration, possibly also defining the type.
### Description
A type specifier names a type and possibly also declares a new type. The named type is always a non-generic
type. If *`type-identifier`* specifies a generic type, generic parameters *`generic-params-decl`* must be
provided to fully specialize the type.
Simple type specifiers *`simple-type-spec`* only name types but never declare new types. Simple type
specifiers are used in:
- [modern variable (TODO)](TODO) declarations
- [function parameter (TODO)](TODO) declarations
- [function return value type (TODO)](TODO) declarations
- [structure property](types-struct.md#property)
- [structure subscript operator](types-struct.md#subscript-op)
- [generic type parameter declarations (TODO)](TODO)
- [typealias](#alias) declarations
Declaration of new types is allowed in:
- Global declaration statements (TODO: link)
- Function body declaration statements (TODO: link)
- Traditional variable declarations (TODO: link)
- [structure](types-struct.md) members declaring nested types
- [extension](types-extension.md) members declaring nested types
- [typedef](#alias) declarations
> 📝 **Remark 1:** *`simple-type-spec`* is a syntactic subset of the full *`type-expr`*. The subset only names
> a type but never declares one.
> 📝 **Remark 2:** The dual nature of type expressions---naming and possibly declaring a type---is a side
> effect of the C-style type expression grammar. This extends to traditional variable declarations where a
> single declaration can declare a type and one or more variables. (TODO: link)
> 📝 **Remark 3:** Unlike in C++, `const`, `inline`, `volatile`, and similar keywords are modifiers. This
> restricts their allowed placement to the left of the type specifier. For example, `const int a = 5;` is a
> valid variable declaration but `int const a = 5;` is not.
## Type Alias Declarations {#alias}
A [type alias](#alias) is a name that refers to a previously declared type.
### Syntax
Type alias declaration:
> **`'typealias'`** *`identifier`* **`'='`** *`simple-type-spec`* **`';'`**
Typedef declaration:
> **`'typedef'`** *`type-spec`* *`identifier`* **`';'`**
Generic type alias declaration:
> **`'typealias'`** *`identifier`*<br>
> &nbsp;&nbsp;&nbsp;&nbsp; *`generic-params-decl`* (**`'where'`** *`where-clause`*)\* **`'='`**<br>
> &nbsp;&nbsp;&nbsp;&nbsp;*`simple-type-spec`* [*`generic-params-decl`*] **`';'`**
### Description
A `typealias` declaration introduces a name for a type. A `typedef` declaration is an alternative syntax that
also allows declaring a new type.
A generic type alias declaration declares a parameterized alias for a generic type. This is described in
[Generics (TODO)](TODO).
## Complete and Incomplete Types {#incomplete}
A type is incomplete when it is declared but not defined. An incomplete type cannot be used to declare
variables. An incomplete type other than `void` may be completed with a subsequent definition. For further
information, see [declarations](declarations.md).
## Memory Layout
Types in Slang do not generally have identical memory layouts in different targets. Any unspecified details on
layout may depend on the target language, the target device, the declared extensions, the compiler options,
and the context in which a type is used.
## Known and Unknown Size
Every type has either a known or an unknown size. Types with unknown size generally stem from unknown-length
arrays:
* An unknown-length array type has an unknown size.
* The size of a structure type is unknown if it has a non-static data member with unknown size.
The use of types with unknown size is restricted as follows:
* A type with unknown size cannot be used as the element type of an array.
* A type with unknown size can only be used as the last field of a structure type.
* A type with unknown size cannot be used as a generic argument to specialize a user-defined type, function,
etc. Specific built-in generic types/functions may support unknown-size types, and this will be documented
on the specific type/function.
* A type with unknown size cannot be instantiated as a variable.
> 📝 **Remark:** Unknown size is different from unspecified or target-specified size. Many
> [special types](types-special.md) have target-specified sizes; sizes of [structures](types-struct.md) and
> [arrays](types-array.md) are subject to target-specific alignment rules; and certain
> [fundamental types](types-fundamental.md) such as `bool` have target-specified sizes. Types with unspecified
> or target-specified sizes are not subject to the restrictions of types with unknown sizes, although they may
> have other restrictions.

View File

@@ -0,0 +1,228 @@
Parameter Layout Rules
======================
An important goal of the Slang project is that the rules for how shader parameters get assigned to `register`s/`binding`s is completely deterministic, so that users can rely on the compiler's behavior.
This document will attempt to explain the rules that Slang employs at a high level.
Eventually it might evolve into a formal specification of the expected behavior.
Guarantees
----------
The whole point of having a deterministic layout approach is the guarantees that it gives to users, so we will start by explicitly stating the guarantees that users can rely upon:
* A single top-level shader parameter will always occupy a contiguous range of bindings/registers for each resource type it consumes (e.g., a contiguous range of `t` registers, a contiguous range of bytes in a `cbuffer`, etc.).
* The amount of resources a parameter consumes depends only on its type, and top-level context in which it appears (e.g., is it in a `cbuffer`? an entry-point varying parameter? etc.).
* A shader parameter that is declared the same way in two different programs will get the same *amount* of resources (registers/bytes) allocated for it in both programs, but it might get a different starting offset/register.
* Changing the bodies of functions in shader code cannot change the layout of shader parameters. In particular, just because a shader parameter is "dead" does not mean it gets eliminated.
* If the user doesn't use explicit `register`/`layout` modifiers to bind parameters, then each module will get a contiguous range of bindings, and the overall program will always use a contiguous range starting from zero for each resource type.
Overview of the Layout Algorithm
--------------------------------
Layout is applied to a Slang *compile request* which comprises one or more *translation units* of user code, and zero or more `import`ed modules.
The compile request also specifies zero or more *entry points* to be compiled, where each entry point identifies a function and a profile to use.
Layout is always done with respect to a chosen *target*, and different targets might compute the resource usage for types differently, or apply different alignment.
Within a single target there may also be different layout rules (e.g., the difference between GLSL `std140` and `std430`).
Layout proceeds in four main phases:
1. Establish a global ordering on shader parameters
2. Compute the resource requirements of each shader parameter
3. Process shader parameters with fixed binding modifiers
4. Allocate bindings to parameter without fixed binding modifiers
Ordering (and Collapsing) Shader Parameters
-------------------------------------------
Shader parameters from the user's code always precede shader parameters from imported modules.
The order of parameters in the user's code is derived by "walking" through the code as follows:
* Walk through each translation unit in the order they were added via API (or the order they were listed on the command line)
* Walk through each source file of a translation unit in the order they were added/listed
* Walk through global-scope shader parameter declarations (global variables, `cbuffer`s, etc.) in the order they are listed in the (preprocessed) file.
* After all global parameters for a translation unit have been walked, walk through any entry points in the translation unit.
* When walking through an entry point, walk through all of its function parameters (both uniforms and varyings) in order, and then walk the function result as a varying output parameter.
When dealing with global-scope parameters in the user's code, it is possible for the "same" parameter to appear in multiple translation units.
Any two global shader parameters in user code with the same name are assumed to represent the same parameter, and will only be included in the global order at the first location where they are seen.
It is an error for the different declarations to have a mismatch in type, or conflicting explicit bindings.
Parameters from `import`ed modules are enumerated after the user code, using the order in which modules were first `import`ed.
The order of parameters within each module is the same as when the module was compiled, which matches the ordering given above.
Computing Resource Requirements
-------------------------------
Each shader parameter computes its resource requirements based on its type, and how it is declared.
* Global-scope parameters, entry point `uniform` parameters, and `cbuffer` declarations all use the "default" layout rules
* Entry point non-`uniform` parameters use "varying" layout rules, either input or output
* A few other special case rules exist (e.g., for laying out the elements of a `StructuredBuffer`), but most users will not need to worry about these
Note that the "default" rules are different for D3D and GL/Vulkan targets, because they have slightly different packing behavior.
### Plain Old Data
Under the default rules simple scalar types (`bool`, `int`, `float`, etc.) are laid out as "uniform" data (that is, bytes of ordinary memory).
In most cases, the size matches the expected data type size (although be aware that most targets treat `bool` as a synonym for `int`) and the alignment is the same as the size.
### Vectors
Vectors are laid out as N sequential scalars.
Under HLSL rules, a vector has the same alignment as its scalar type.
Under GLSL `std140` rules, a vector has an alignment that is its size rounded up to the next power of two (so a `float3` has `float4` alignment).
### Opaque Types
"Opaque" types include resource/sampler types like `Texture2D` and `SamplerState`.
These consume a single "slot" of the appropriate category for the chosen API.
Note that when compiling for D3D, a `Texture2D` and a `SamplerState` will consume different resources (`t` and `s` registers, respectively), but when compiling for Vulkan, they both consume the same resource ("descriptor table slot").
Opaque types currently all have an alignment of one.
### Structures
A structure is laid out by initializing a counter for each resource type, and then processing fields sequential (in declaration order):
* Compute resource usage for the field's type
* Adjust counters based on the alignment of the field for each resource type where it has non-zero usage
* Assign an offset to the field for each resource type where it has non-zero usage
* Add the resource usage of the field to the counters
An important wrinkle is that when doing layout for HLSL, we must ensure that if a field with uniform data that is smaller than 16 bytes would straddle a 16-byte boundary, we advance to the next 16-byte aligned offset.
The overall alignment of a `struct` is the maximum alignment of its fields or the default alignment (if it is larger).
The default alignment is 16 for both D3D and Vulkan targets.
The final resource usage of a `struct` is rounded up to a multiple of the alignment for each resource type. Note that we allow a `struct` to consume zero bytes of uniform storage.
It is important to note that a `struct` type can use resources of many different kinds, so in general we cannot talk about the "size" of a type, but only its size for a particular kind of resource (uniform bytes, texture registers, etc.).
### Sized Arrays
For uniform data, the size of the element type is rounded up to the target-specific minimum (e.g., 16 for D3D and Vulkan constant buffers) to arrive at the *stride* of the array. The total size of the array is then the stride times the element count.
For opaque resource types, the D3D case simply takes the stride to be the number of registers consumed by each element, and multiplies this by the element count.
For Vulkan, an array of resources uses only a single `binding`, so that the stride is always zero for these resource kinds, and the resource usage of an array is the same as its element type.
### Unsized Arrays
The uniform part of an unsized array has the same stride as for the sized case, but an effectively infinite size.
For register/binding resource usage, a Vulkan unsized array is just like a sized one, while a D3D array will consume a full register *space* instead of individual registers.
### Constant Buffers
To determine the resource usage of a constant buffer (either a `cbuffer { ... }` declaration or a `ConstantBuffer<T>`) we look at the resource usage of its element type.
If the element uses any uniform data, the constant buffer will use at least one constant-buffer register (or whatever the target-specific resource is).
If the element uses any non-uniform data, that usage will be added to that of the constant buffer.
### Parameter Blocks
A parameter block is similar to a constant buffer.
If the element type uses any uniform data, we compute resource usage for a constant buffer.
We then add in any non-uniform resource usage for the element types.
If the target requires use of register spaces (e.g., for Vulkan), then a parameter block uses a single register space; otherwise it exposes the resource usage of its element type directly.
Processing Explicit Binding Modifiers
-------------------------------------
If the user put an explicit binding modifier on a parameter, and that modifier applies to the current target, then we use it and "reserve" space in the overall binding range.
Traditional HLSL `register` modifiers only apply for D3D targets.
Slang currently allows GLSL-style `layout(binding =...)` modifiers to be attached to shader parameters, and will use those modifiers for GL/Vulkan targets.
If two parameters reserve overlapping ranges, we currently issue an error.
This may be downgraded to a warning for targets that support overlapping ranges.
Allocating Bindings to Parameters
---------------------------------
Once ranges have been reserved for parameters with explicit bindings, the compiler goes through all parameters again, in the global order and assigns them bindings based on their resource requirements.
For each resource type used by a parameter, it is allocated the first contiguous range of resources of that type that have not been reserved.
Splitting of Arrays
-------------------
In order to support `struct` types that mix uniform and non-uniform data, the Slang compiler always "splits" these types.
For example, given:
```hlsl
struct LightInfo { float3 pos; Texture2D shadowMap; };
LightInfo gLight;
```
Slang will generate code like:
```hlsl
float3 gLight_pos;
Texture2D gLight_shadowMap;
```
In a simple case like the above, this doesn't affect layout at all, but once arrays get involved, the layout can be more complicated. Consider this case:
```hlsl
struct Pair { Texture2D a; Texture2D b; };
Pair gPairs[8];
```
The output from the splitting step is equivalent to:
```hlsl
Texture2D gPairs_a[8];
Texture2D gPairs_b[8];
```
While this transformation is critical for having a type layout algorithm that applies across all APIs (and also it is pretty much required to work around various bugs in downstream compilers), it has the important down-side that the value `gPairs[0]` does not occupy a contiguous range of registers (although the top-level shader parameter `gPairs` *does*).
The Slang reflection API will correctly report the information about this situation:
* The "stride" of the `gPairs` array will be reported as one, because `gPairs[n+1].a` is always one register after `gPairs[n].a`.
* The offset of the `gPairs.b` field will be reported as 8, because `gPairs[0].b` will be 8 registers after the starting register for `gPairs`.
The Slang API tries to provide the best information it can in this case, but it is still important for users who mix arrays and complex `struct` types to know how the compiler will lay them out.
Generics
--------
Generic type parameters complicate these layout rules.
For example, we cannot compute the exact resource requirements for a `vector<T,3>` without knowing what the type `T` is.
When computing layouts for fully specialized types or programs, no special considerations are needed: the rules as described in this document still apply.
One important consequence to understand is that given a type like:
```hlsl
struct MyStuff<T>
{
int a;
T b;
int c;
}
```
the offset computed for the `c` field depends on the concrete type that gets plugged in for `T`.
We think this is the least surprising behavior for programmers who might be familiar with things like C++ template specialization.
In cases where confusion about a field like `c` getting different offsets in different specializations is a concern, users are encouraged to declare types so that all non-generic-dependent fields come before generic-dependent ones.

View File

@@ -0,0 +1,183 @@
Slang LLVM Targets
==================
The LLVM targets are capable of creating LLVM IR and object code for arbitrary
target triples (`<machine>-<vendor>-<os>`, e.g. `x86_64-unknown-linux`). This
allows for highly performant and debuggable Slang code on almost any platform,
as long as an LLVM backend exists for it.
The current state is highly experimental and there are many missing features.
The feature also focuses heavily on CPUs for now.
## Targets
The HOST / SHADER split from the [CPU target](cpu-target.md) applies here as well.
The following targets always use LLVM:
* `llvm-ir` / `SLANG_HOST_LLVM_IR` generates LLVM IR in the text representation, suitable for free-standing functions.
* `llvm-shader-ir` / `SLANG_SHADER_LLVM_IR` generates LLVM IR for compute shader entry points.
The following targets use LLVM when `-emit-cpu-via-llvm` or `EmitCPUMethod=SLANG_EMIT_CPU_VIA_LLVM` is specified:
* `host-object-code` / `SLANG_HOST_OBJECT_CODE` generates position-independent object code, which can be
linked into an executable or a static or dynamic library.
* `shader-object-code` / `SLANG_OBJECT_CODE` generates object code for compute shader entry points.
* `SLANG_HOST_HOST_CALLABLE` and `SLANG_SHADER_HOST_CALLABLE` JIT compile the module.
Support for `exe` / `SLANG_HOST_EXECUTABLE` and `sharedlib` /
`SLANG_HOST_SHARED_LIBRARY` may be added later, once the LLVM target has
stabilized. For compiling to platforms other than the current CPU running the
Slang compiler, the following options are provided:
* `-llvm-target-triple <target-triple>`. The default is the host machine's triple.
* `-llvm-cpu <cpu-name>` sets the target CPU, similar to Clang's `-mcpu=<cpu-name>`.
* `-llvm-features <features>` sets the available features, similar to LLC's `-mattr=<features>`.
## Features
* Compile stand-alone programs in Slang for platforms supported by LLVM
* Focus on memory layout correctness: type layouts such as the scalar layout are handled correctly
* Does not depend on external compilers (although, currently depends on external linkers!)
* Works well with debuggers!
## Standalone programs
You can write functions decorated with `export` to make them visible from the
resulting object code, and `__extern_cpp` to unmangle their names. So, for a
standalone Slang application, the entry point is:
```slang
export __extern_cpp int main(int argc, NativeString* argv)
{
// Do whatever you want here!
return 0;
}
```
To cross-compile, you can use `-llvm-target-triple <target-triple>`. For now,
you'll need to compile into an object file and use a compiler or linker to turn
that into an executable, e.g. with `clang main.o -o main.exe`.
## Application Binary Interface
This section defines the ABI rules which code generated by the LLVM target
follows and expects of external code calling into it.
The default type layout aligns vectors to the next power of two, structures are
aligned and padded up to the largest alignment among their fields, and booleans
are a single byte.
If you specify a different layout with flags like `-fvk-use-c-layout` or
`-fvk-use-scalar-layout`, all structure and array types on the stack and heap
will follow those layout rules.
### Types and resources
* `StructuredBuffer` and `ByteAddressBuffer` are stored as `{ Type* data; intptr_t size; }`,
where `size` is the number of elements in `data`.
* Vectors are passed as LLVM vector types; there's no direct equivalent in standard C or C++.
* Matrix types are lowered into arrays of vectors. Column and row major matrices are supported as normal.
### Aggregate parameters
All aggregates (structs and arrays) are always passed by reference in Slang's
LLVM emitter. Other than that, the target platform's C calling conventions are
followed. This stems from [LLVM not handling aggregates correctly](https://discourse.llvm.org/t/passing-structs-to-c-functions/83938/8) in calling conventions, and requiring every frontend to painstakingly
reimplement the same per-target logic if they want full C compatibility.
This means that if you declare a function like this in Slang:
```slang
export __extern_cpp MyStruct func(MyStruct val);
```
It would have the following signature in C:
```slang
void func(const MyStruct *val, MyStruct *returnval);
```
In other words, aggregate parameters are turned into pointers and aggregate
return values are turned into an additional pointer-typed parameter at the end
of the parameter list.
### C foreign functions
Due to the aggregate parameter passing limitation of LLVM, calling arbitrary C
functions from Slang is complicated, and a hypothetical binding generator would
need to generate calling convention adapter functions. A binding generator would
be a useful tool to include, but remains as future work.
## Limitations
The LLVM target support is work-in-progress, and there are currently many
limitations.
### CPU targets only
Currently, support is limited to conventional CPU targets. The emitted LLVM IR
is not compatible with LLVMs SPIR-V target, for example. At least resource
bindings and pointer address spaces would have to be accounted for to expand
support to GPU targets. Slang already has native emitters for GPU targets, so
you can use those instead of going through LLVM.
### Pointer size
Currently, the Slang compiler assumes that the size of pointers matches the
compiler's host platform. This means that on a 64-bit PC, only target triples
with 64-bit pointers generate correct code. This can be a difficulty if one
wants to build Slang programs for a 32-bit microcontroller, and should
hopefully be fixed eventually.
### Missing compute shader features
* No `groupshared`
* No barriers.
* No atomics.
* No wave operations.
These limitations stem from the fact that work items / threads of a work group
are currently run serially instead of actually being in parallel. This may be
improved upon later.
### Limited vectorization
Vector instructions are vectorized in the way typical CPU math libraries
(e.g. GLM) vectorize, as long as the target CPU allows for vector instructions.
This is worse than how GPUs do it, where each work item / thread gets a SIMD
lane. This aspect may be improved upon later.
### Compatibility with prior CPU Slang features
There are limitations regarding features of the existing C++ based [CPU target](./cpu-target.md).
The following features are not yet supported:
* `String` type.
* `new`.
* `class`.
* COM interfaces.
The implementations of these rely on C++ features, and are not trivial to
implement in LLVM. Support for them may be added later.
### Missing types
* No texture or sampler types.
* No acceleration structures.
These are missing due to limitation of scope for the initial implementation,
and may be added later.
## Gotchas
### Out-of-bounds buffer access
Attempting to index past the end of any buffer type is undefined behaviour. It
is not guaranteed to return zero as in HLSL; segmentation faults and memory
corruption are more than likely to occur!
### `sizeof`
Slang's `sizeof` may appear to "lie" to you about structs that contain padding,
unless you specify `-fvk-use-scalar-layout`. That's because it queries layout
information without knowing about the actual layout being used. Use `__sizeOf`
instead to get accurate sizes from LLVM, e.g. for memory allocation purposes.

View File

@@ -0,0 +1,89 @@
NVAPI Support
=============
Slang provides support for [NVAPI](https://developer.nvidia.com/nvapi) in several ways
* Slang allows the use of NVAPI directly, by the inclusion of the `#include "nvHLSLExtns.h"` header in your Slang code. Doing so will make all the NVAPI functions directly available and usable within your Slang source code.
* NVAPI is used to provide features implicitly for certain targets. For example support for [RWByteAddressBuffer atomics](target-compatibility.md) on HLSL based targets is supported currently via NVAPI.
* Direct and implicit NVAPI usage can be freely mixed.
Direct usage of NVAPI
=====================
Direct usage of NVAPI just requires the inclusion of the appropriate NVAPI header, typically with `#include "nvHLSLExtns.h` within your Slang source. As is required by NVAPI before the `#include` it is necessary to specify the slot and perhaps space usage. For example a typical direct NVAPI usage inside a Slang source file might contain something like...
```
#define NV_SHADER_EXTN_SLOT u0
#include "nvHLSLExtns.h"
```
In order for the include to work, it is necessary for the include path to include the folder that contains the nvHLSLExtns.h and associated headers.
Implicit usage of NVAPI
=======================
It is convenient and powerful to be able to directly use NVAPI calls, but will only work on such targets that support the mechansism, even if there is a way to support the functionality some other way.
Slang provides some cross platform features on HLSL based targets that are implemented via NVAPI. For example RWByteAddressBuffer atomics are supported on Vulkan, DX12 and CUDA. On DX12 they are made available via NVAPI, whilst CUDA and Vulkan have direct support. When compiling Slang code that uses RWByteAddressBuffer atomics Slang will emit HLSL code that use NVAPI. In order for the downstream compiler to be able to compile this HLSL it must be able to include the NVAPI header `nvHLSLExtns.h`.
It worth discussing briefly how this mechanism works. Slang has a 'prelude' mechanism for different source targets. The prelude is a piece of text that is inserted before the source that is output from compiling the input Slang source code. There is a default prelude for HLSL that is something like
```
#ifdef SLANG_HLSL_ENABLE_NVAPI
#include "nvHLSLExtns.h"
#endif
```
If there are any calls to NVAPI implicitly from Slang source, then the following is emitted before the prelude
```
#define SLANG_HLSL_ENABLE_NVAPI 1
#define NV_SHADER_EXTN_SLOT u0
#define NV_SHADER_EXTN_REGISTER_SPACE space0
```
Thus causing the prelude to include nvHLSLExtns.h, and specifying the slot and potentially the space as is required for inclusion of nvHLSLExtns.h.
The actual values for the slot and optionally the space, are found by Slang examining the values of those values at the end of preprocessing input Slang source files.
This means that if you compile Slang source that has implicit use NVAPI, the slot and optionally the space must be defined. This can be achieved with a command line -D, through the API or through having suitable `#define`s in the Slang source code.
It is worth noting if you *replace* the default HLSL prelude, and use NVAPI then it will be necessary to have something like the default HLSL prelude part of your custom prelude.
Downstream Compiler Include
---------------------------
There is a subtle detail that is perhaps worth noting here around the downstream compiler and `#include`s. When Slang outputs HLSL it typically does not contain any `#include`, because all of the `#include` in the original source code have been handled by Slang. Slang then outputs everything required to compile to the downstream compiler *without* any `#include`. When NVAPI is used explicitly this is still the case - the NVAPI headers are consumed by Slang, and then Slang will output HLSL that does not contain any `#include`.
The astute reader may have noticed that the default Slang HLSL prelude *does* contain an include, which is enabled via SLANG_HLSL_ENABLE_NVAPI macro which Slang will set with implicit NVAPI use.
```
#ifdef SLANG_HLSL_ENABLE_NVAPI
#include "nvHLSLExtns.h"
#endif
```
This means that the *downstream* compiler (such as DXC and FXC) must be able to handle this include. Include paths can be specified for downstream compilers via the [-X mechanism](user-guide/08-compiling.md#downstream-arguments). So for example...
```
-Xfxc -IpathTo/nvapi -Xdxc -IpathTo/nvapi
```
In the explicit scenario where `nvHLSLExtns.h` is included in Slang source, the include path must be specified in Slang through the regular mechanisms.
In a scenario with both implicit and explicit use, both Slang *and* the downstream compiler need to have a suitable path specified. Things can be more complicated if there is mixed implicit/explicit NVAPI usage and in the Slang source the include path is set up such that NVAPI is included with
```
#include "nvapi/nvHLSLExtns.h"
```
Since Slang and the downstream compilers can specify different include paths, the downstream compiler include path can be such that `#include "nvHLSLExtns.h"` works with the default prelude.
Another way of working around this issue is to alter the prelude for downstream compilers such that it contains an absolute path for the `#include`. This is the mechanism that is currently used with the Slang test infrastructure.
Links
-----
More details on how this works can be found in the following PR
* [Simplify workflow when using NVAPI #1556](https://github.com/shader-slang/slang/pull/1556)

View File

@@ -0,0 +1,90 @@
Slang Compilation Reproduction
==============================
Slang has both API and command line support for reproducing compilations, so called 'repro' functionality.
One use of the feature is if a compilation fails, or produces an unexpected or wrong result, it provides a simple to use mechanism where the compilation can be repeated or 'reproduced', most often on another machine. Instead of having to describe all the options, and make sure all of the files that are used are copied, and in such a way that it repeats the result, all that is required is for the compilation to be run on the host machine with repro capture enabled, and then that 'repro' used for a compilation on the test machine. There are also some mechanisms where the contents of the original compilation can be altered.
The actual data saved is the contents of the SlangCompileRequest. Currently no state is saved from the SlangSession. Saving and loading a SlangCompileRequest into a new SlangCompileRequest should provide two SlangCompileRequests with the same state, and with the second compile request having access to all the files contents the original request had directly in memory.
There are a few command line options
* `-dump-repro [filename]` dumps the compilations state (ie post attempting to compile) to the file specified afterwards
* `-extract-repro [filename]` extracts the contents of the repro file. The contained files are placed in a directory with a name, the same as the repro file minus the extension. Also contains a 'manifest'.
* `-load-repro [filename]` loads the repro and compiles using it's options. Note this must be the last arg on the command line.
* `-dump-repro-on-error` if a compilation fails will attempt to save a repro (using a filename generated from first source filename)
* `-repro-file-system [filename]` makes the repros file contents appear as the file system during a compilation. Does not set any compilation options.
* `-load-repro-directory [directory]` compiles all of the .slang-repro files found in `directory`
The `manifest` made available via `-extract-repro` provides some very useful information
* Provides an approximation of the command line that will produce the same compilation under [compile-line]
* A list of all the unique files held in the repro [files]. It specified their 'unique name' (as used to identify in the repro) and their unique identifier as used by the file system.
* A list of how paths map to unique files. Listed as the path used to access, followed by the unique name used in the repro
First it is worth just describing what is required to reproduce a compilation. Most straightforwardly the options setup for the compilation need to be stored. This would include any flags, and defines, include paths, entry points, input filenames and so forth. Also needed will be the contents of any files that were specified. This might be files on the file system, but could also be 'files' specified as strings through the slang API. Lastly we need any files that were referenced as part of the compilation - this could be include files, or module source files and so forth. All of this information is bundled up together into a file that can then later be loaded and compiled. This is broadly speaking all of the data that is stored within a repro file.
In order to capture a complete repro file typically a compilation has to be attempted. The state before compilation can be recorded (through the API for example), but it may not be enough to repeat a compilation, as files referenced by the compilation would not yet have been accessed. The repro feature records all of these accesses and contents of such files such that compilation can either be completed or at least to the same point as was reached on the host machine.
One of the more subtle issues around reproducing a compilation is around filenames. Using the API, a client can specify source files without names, or multiple files with the same name. If files are loaded via `ISlangFileSystem`, they are typically part of a hierarchical file system. This could mean they are referenced relatively. This means there can be distinct files with the same name but differentiated by directory. The files may not easily be reconstructed back into a similar hieararchical file system - as depending on the include paths (or perhaps other mechanisms) the 'files' and their contents could be arranged in a manner very hard to replicate. To work around this the repro feature does not attempt to replicate a hierarchical file system. Instead it gives every file a unique name based on their original name. If there are multiple files with the same name it will 'uniquify' them by appending an index. Doing so means that the contents of the file system can just be held as a flat collection of files. This is not enough to enable repeating the compilation though, as we now need Slang to know which files to reference when they are requested, as they are now no longer part of a hierarchical file system and their names may have been altered. To achieve this the repro functionality stores off a map of all path requests to their contents (or lack there of). Doing so means that the file system still appears to Slang as it did in the original compilation, even with all the files being actually stored using the simpler 'flat' arrangement.
This means that when a repro is 'extracted' it does so to a directory which holds the files with their unique 'flat' names. The name of the directory is the name of the repro file without it's extension, or if it has no extension, with the postfix '-files'. This directory will be referred to from now on as the `repro directory`.
When a repro is loaded, before files are loaded from the repro itself, they will first be looked for via their unique names in the `repro directory`. If they are not there the contents of the repro file will be used. If they are there, their contents will be used instead of the contents in the repro. This provides a simple mechanism to be able to alter the source in a repro. The steps more concretely would be...
1) First extract the repro (say with `-extract-repro`)
2) Go to the `repro directory` and edit files that you wish to change. You can also just delete files that do not need changing, as they will be loaded from the repro.
3) Load the repro - it will now load any files requested from the `repro directory`
Now you might want to change the compilation options. Using -load-repro it will compile with the options as given. It is not possible to change those options as part of -load-repro. If you want to change the compilation options (and files), you can use -extract-repro, and look at the manifest which will list a command line that will typically repeat the compilation. Now you can just attach the repro as a file system, and set the command line options as appropriate, based on the command line listed in the manifest. Note! If there is a fairly complex directory hierarchy, it may be necessary to specify the input sources paths *as if* they are held on the original files system. You can see how these map in the manifest.
Note that currently it is disabled to access any new source files - they will be determined as `not found`. This behaviour could be changed such that the regular file system was used, or the ISlangFilesystem set on the API is used as a fallback.
There currently isn't a mechanism to alter the options of a repro from the command line (other than altering the contents of the source). The reason for this is because of how command lines are processed currently in Slang. A future update could enable specifying a repro and then altering the command line options used. It can be achieved through the API though. Once the repro is loaded via the `spLoadRepro` function, options can be changed as normal. The two major places where option alteration may have surprising behavior are...
1) Altering the include paths - unless this may break the mechanism used to map paths to files stored in the repro file
2) Altering the ISlangFileSystem. That to make the contents of the file system appear to be that of the repro, slang uses a ISlangFileSystemExt that uses the contents of the repro file and/or the `repro directory`. If you replace the file system this mechanism will no longer work.
There are currently several API calls for using the repro functionality
```
SLANG_API SlangResult spEnableReproCapture(
SlangCompileRequest* request);
SLANG_API SlangResult spLoadRepro(
SlangCompileRequest* request,
ISlangFileSystem* fileSystem,
const void* data,
size_t size);
SLANG_API SlangResult spSaveRepro(
SlangCompileRequest* request,
ISlangBlob** outBlob
);
SLANG_API SlangResult spExtractRepro(
SlangSession* session,
const void* reproData,
size_t reproDataSize,
ISlangFileSystemExt* fileSystem);
SLANG_API SlangResult spLoadReproAsFileSystem(
SlangSession* session,
const void* reproData,
size_t reproDataSize,
ISlangFileSystem* replaceFileSystem,
ISlangFileSystemExt** outFileSystem);
```
The fileSystem parameter passed to `spLoadRepro` provides the mechanism for client code to replace the files that are held within the repro. NOTE! That the files will be loaded from this file system with their `unique names` as if they are part of the flat file system. If an attempt to load a file fails, the file within the repro is used. That `spLoadRepro` is typically performed on a new 'unused' SlangCompileRequest. After a call to `spLoadRepro` normal functions to alter the state of the SlangCompileRequest are available.
The function `spEnableReproCapture` should be set after any ISlangFileSystem has been set (if any), but before any compilation. It ensures that everything that the ISlangFileSystem accesses will be correctly recorded. Note that if a ISlangFileSystem/ISlangFileSystemExt isn't explicitly set (ie the default is used), then a request will automatically be set up to record everything appropriate and a call to this function isn't strictly required.
The function `spExtractRepro` allows for extracting the files used in a request (along with the associated manifest). They files and manifest are stored under the 'unique names' in the root of the user provided ISlangFileSystemExt.
The function `spLoadReproAsFileSystem` creates a file system that can access the contents of the repro with the same paths that were used on the originating system. The ISlangFileSystemExt produced can be set on a request and used for compilation.
Repro files are currently stored in a binary format. This format is sensitive to changes in the API, as well as internal state within a SlangCompileRequest. This means that the functionality can only be guaranteed to work with exactly the same version of Slang on the same version of compiler. In practice things are typically not so draconian, and future versions will aim to provide a more clear slang repro versioning system, and work will be performed to make more generally usable.
Finally this version of the repo system does not take into account endianness at all. The system the repro is saved from must have the same endianness as the system loaded on.

View File

@@ -0,0 +1,235 @@
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace toc
{
public class Builder
{
public static string getAnchorId(string title)
{
StringBuilder sb = new StringBuilder();
title = title.Trim().ToLower();
foreach (var ch in title)
{
if (ch >= 'a' && ch <= 'z' || ch >= '0' && ch <= '9'
|| ch == '-'|| ch =='_')
sb.Append(ch);
else if (ch == ' ' )
sb.Append('-');
}
return sb.ToString();
}
public class Node
{
public List<string> fileNamePrefix = new List<string>();
public string title;
public string shortTitle;
public string fileID;
public List<string> sections = new List<string>();
public List<string> sectionShortTitles = new List<string>();
public List<Node> children = new List<Node>();
}
public static void buildTOC(StringBuilder sb, Node n)
{
sb.AppendFormat("<li data-link=\"{0}\"><span>{1}</span>\n", n.fileID, n.shortTitle);
if (n.children.Count != 0)
{
sb.AppendLine("<ul class=\"toc_list\">");
foreach(var c in n.children)
buildTOC(sb, c);
sb.AppendLine("</ul>");
}
else if (n.sections.Count != 0)
{
sb.AppendLine("<ul class=\"toc_list\">");
for (int i = 0; i < n.sections.Count; i++)
{
var s = n.sections[i];
var shortTitle = n.sectionShortTitles[i];
sb.AppendFormat("<li data-link=\"{0}#{1}\"><span>{2}</span></li>\n", n.fileID, getAnchorId(s), shortTitle);
}
sb.AppendLine("</ul>");
}
sb.AppendLine("</li>");
}
public static string buildTOC(Node n)
{
StringBuilder sb = new StringBuilder();
sb.Append(@"<ul class=""toc_root_list"">");
buildTOC(sb, n);
sb.Append(@"</ul>");
return sb.ToString();
}
public static bool isChild(Node parent, Node child)
{
if (parent.fileNamePrefix.Count < child.fileNamePrefix.Count)
{
bool equal = true;
for (int k = 0; k < parent.fileNamePrefix.Count; k++)
{
if (parent.fileNamePrefix[k] != child.fileNamePrefix[k])
{
equal = false;
break;
}
}
return equal;
}
return false;
}
public static string getNextNonEmptyLine(string[] lines, int i)
{
i++;
while (i < lines.Length)
{
if (lines[i].Trim().Length != 0)
return lines[i];
i++;
}
return "";
}
const string shortTitlePrefix = "[//]: # (ShortTitle: ";
public static string maybeGetShortTitleImpl(string originalTitle, string[] lines, int line)
{
string nextLine = getNextNonEmptyLine(lines, line);
if (nextLine.StartsWith(shortTitlePrefix))
{
return nextLine.Substring(shortTitlePrefix.Length, nextLine.Length - shortTitlePrefix.Length - 1).Trim();
}
return originalTitle;
}
public static string escapeString(string input)
{
StringBuilder sb = new StringBuilder();
foreach (var ch in input)
{
if (ch == '<')
sb.Append("&lt;");
else if (ch == '>')
sb.Append("&gt;");
else
sb.Append(ch);
}
return sb.ToString();
}
public static string maybeGetShortTitle(string originalTitle, string[] lines, int line)
{
string title = maybeGetShortTitleImpl(originalTitle, lines, line);
return escapeString(title);
}
public static string Run(string path)
{
StringBuilder outputSB = new StringBuilder();
outputSB.AppendFormat("Building table of contents from {0}...\n", path);
var files = System.IO.Directory.EnumerateFiles(path, "*.md").OrderBy(f => System.IO.Path.GetFileName(f));
List<Node> nodes = new List<Node>();
foreach (var f in files)
{
var content = File.ReadAllLines(f);
Node node = new Node();
node.fileID = Path.GetFileNameWithoutExtension(f);
outputSB.AppendFormat(" {0}.md\n", node.fileID);
bool mainTitleFound = false;
for (int i = 1; i < content.Length; i++)
{
if (content[i].StartsWith("==="))
{
mainTitleFound = true;
node.title = content[i-1];
node.shortTitle = maybeGetShortTitle(node.title, content, i);
}
if (content[i].StartsWith("---"))
{
if (!mainTitleFound) continue;
node.sections.Add(content[i-1]);
node.sectionShortTitles.Add(maybeGetShortTitle(content[i - 1], content, i));
}
if (content[i].StartsWith("#") && !content[i].StartsWith("##") && node.title == null)
{
mainTitleFound = true;
node.title = content[i].Substring(1, content[i].Length - 1).Trim();
node.shortTitle = maybeGetShortTitle(node.title, content, i);
}
if (content[i].StartsWith("##") && !content[i].StartsWith("###"))
{
if (!mainTitleFound) continue;
var sectionStr = content[i].Substring(2, content[i].Length - 2).Trim();
node.sections.Add(sectionStr);
node.sectionShortTitles.Add(maybeGetShortTitle(sectionStr, content, i));
}
if (content[i].StartsWith("permalink:"))
{
var prefixLength = ("permalink:").Length;
var permaPath = content[i].Substring(prefixLength, content[i].Length - prefixLength).Trim();
node.fileID = Path.GetFileName(permaPath);
}
}
if (node.title == null)
{
outputSB.AppendFormat("Error: {0} does not define a title.", f);
node.title = "Untitiled";
}
var titleSecs = Path.GetFileName(f).Split('-');
foreach (var s in titleSecs)
{
if (s.Length == 2 && s[1]>='0' && s[1] <= '9')
{
node.fileNamePrefix.Add(s);
}
else
{
break;
}
}
// Find parent node.
Node parent=null;
for (int l = nodes.Count-1; l>=0; l--)
{
var n = nodes[l];
if (isChild(n, node))
{
parent = n;
break;
}
}
if (parent != null)
parent.children.Add(node);
else
{
// find child
foreach (var other in nodes)
{
if (isChild(node, other))
{
node.children.Add(other);
}
}
foreach (var c in node.children)
{
nodes.Remove(c);
}
nodes.Add(node);
}
}
var root = nodes.Find(x=>x.fileID=="index");
if (root != null)
{
var html = buildTOC(root);
var outPath = Path.Combine(path, "toc.html");
File.WriteAllText(outPath, html);
outputSB.AppendFormat("Output written to: {0}\n", outPath);
}
return outputSB.ToString();
}
}
}

View File

@@ -0,0 +1,121 @@
#!/usr/bin/env bash
# This script generates a release note.
# It prints information about breaking-changes first and the rest.
# The content is mostly based on `git log --oneline --since 202X-YY-ZZ`.
# Usage: the script takes command-line arguments to specify the range of commits to include.
# You can use either:
# 1. Date-based range with --since: docs/scripts/release-note.sh --since 2025-08-06
# 2. Hash-based range with --previous-hash: docs/scripts/release-note.sh --previous-hash abc123
# 3. Legacy positional argument (deprecated): docs/scripts/release-note.sh 2024-07-01
# This script is supposed to work on all Windows based shell systems including WSL and git-bash.
# If you make any modifications, please test them, because CI doesn't test this script.
verbose=true
$verbose && echo "Reminder: PLEASE make sure your local repo is up-to-date before running the script." >&2
gh=""
for candidate in \
"$(which gh)" \
"$(which gh.exe)" \
"/mnt/c/Program Files/GitHub CLI/gh.exe" \
"/c/Program Files/GitHub CLI/gh.exe" \
"/cygdrive/c/Program Files/GitHub CLI/gh.exe"; do
if [ -x "$candidate" ]; then
gh="$candidate"
break
fi
done
if [ "$gh" = "" ] || ! [ -x "$gh" ]; then
echo "File not found: gh or gh.exe"
echo "GitHub CLI can be downloaded from https://cli.github.com"
exit 1
fi
$verbose && echo "gh is found from: $gh" >&2
# Parse command-line arguments
use_hash=false
since=""
previous_hash=""
while [[ $# -gt 0 ]]; do
case $1 in
--since)
since="$2"
use_hash=false
shift 2
;;
--previous-hash)
previous_hash="$2"
use_hash=true
shift 2
;;
*)
# Legacy positional argument support
if [ "$since" = "" ] && [ "$previous_hash" = "" ]; then
since="$1"
use_hash=false
else
echo "Too many arguments or mixed argument styles"
exit 1
fi
shift
;;
esac
done
# Validate arguments
if [ "$since" = "" ] && [ "$previous_hash" = "" ]; then
echo "This script requires either --since or --previous-hash option."
echo "Usage: $0 [--since DATE | --previous-hash HASH]"
echo " --since DATE Generate notes since the given date (e.g., 2025-08-06)"
echo " --previous-hash HASH Generate notes since the given commit hash"
echo ""
echo "Legacy usage (deprecated): $0 DATE"
exit 1
fi
# Get commits based on the specified range
if [ "$use_hash" = true ]; then
commits="$(git log --oneline "$previous_hash"..HEAD)"
else
commits="$(git log --oneline --since "$since")"
fi
commitsCount="$(echo "$commits" | wc -l)"
echo "=== Breaking changes ==="
breakingChanges=""
for i in $(seq "$commitsCount"); do
line="$(echo "$commits" | head -n "$i" | tail -1)"
# Get PR number from the git commit title
pr="$(echo "$line" | grep '#[1-9][0-9][0-9][0-9][0-9]*' | sed 's|.* (\#\([1-9][0-9][0-9][0-9][0-9]*\))|\1|')"
[ "$pr" = "" ] && continue
# Check if the PR is marked as a breaking change
if "$gh" issue view "$pr" --json labels | grep -q 'pr: breaking change'; then
breakingChanges+="$line"$'\n'
fi
done
if [ "$breakingChanges" = "" ]; then
echo "No breaking changes"
else
echo "$breakingChanges"
fi
echo "=== All changes for this release ==="
for i in $(seq "$commitsCount"); do
line="$(echo "$commits" | head -n "$i" | tail -1)"
result="$line"
# Get PR number from the git commit title
pr="$(echo "$line" | grep '#[1-9][0-9][0-9][0-9][0-9]*' | sed 's|.* (\#\([1-9][0-9][0-9][0-9][0-9]*\))|\1|')"
if [ "$pr" != "" ]; then
# Mark breaking changes with "[BREAKING]"
if "$gh" issue view "$pr" --json labels | grep -q 'pr: breaking change'; then
result="[BREAKING] $line"
fi
fi
echo "$result"
done

View File

@@ -0,0 +1,896 @@
Shader Execution Reordering (SER)
=================================
Slang provides support for Shader Execution Reordering (SER) across multiple backends:
- **D3D12**: Via [NVAPI](nvapi-support.md) or native DXR 1.3 (SM 6.9)
- **Vulkan/SPIR-V**: Via [GL_NV_shader_invocation_reorder](https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GLSL_NV_shader_invocation_reorder.txt) (NV) or [GL_EXT_shader_invocation_reorder](https://github.com/KhronosGroup/GLSL/blob/main/extensions/ext/GLSL_EXT_shader_invocation_reorder.txt) (cross-vendor EXT)
- **CUDA**: Via OptiX
## Platform Notes
### Vulkan (NV Extension)
With `GL_NV_shader_invocation_reorder`, `HitObject` variables have special allocation semantics with limitations around flow control and assignment.
### Vulkan (EXT Extension)
The cross-vendor `GL_EXT_shader_invocation_reorder` extension provides broader compatibility. Note that `MakeHit` and `MakeMotionHit` are **NV-only** and not available with the EXT extension.
### D3D12 (DXR 1.3)
Native DXR 1.3 support (SM 6.9) provides `HitObject` without requiring NVAPI.
## Links
* [SER white paper for NVAPI](https://developer.nvidia.com/sites/default/files/akamai/gameworks/ser-whitepaper.pdf)
# API Reference
The HitObject API provides cross-platform SER functionality. The API is based on the NvAPI/DXR 1.3 interface.
## Free Functions
* [ReorderThread](#reorder-thread)
## Fused Functions (Vulkan EXT only)
* [ReorderExecute](#reorder-execute)
* [TraceReorderExecute](#trace-reorder-execute)
* [TraceMotionReorderExecute](#trace-motion-reorder-execute)
--------------------------------------------------------------------------------
# `struct HitObject`
## Description
Immutable data type representing a ray hit or a miss. Can be used to invoke hit or miss shading,
or as a key in ReorderThread. Created by one of several methods described below. HitObject
and its related functions are available in raytracing shader types only.
## Methods
* [TraceRay](#trace-ray)
* [TraceMotionRay](#trace-motion-ray)
* [MakeMiss](#make-miss)
* [MakeHit](#make-hit)
* [MakeMotionHit](#make-motion-hit)
* [MakeMotionMiss](#make-motion-miss)
* [MakeNop](#make-nop)
* [FromRayQuery](#from-ray-query)
* [Invoke](#invoke)
* [IsMiss](#is-miss)
* [IsHit](#is-hit)
* [IsNop](#is-nop)
* [GetRayDesc](#get-ray-desc)
* [GetRayFlags](#get-ray-flags)
* [GetRayTMin](#get-ray-tmin)
* [GetRayTCurrent](#get-ray-tcurrent)
* [GetWorldRayOrigin](#get-world-ray-origin)
* [GetWorldRayDirection](#get-world-ray-direction)
* [GetShaderTableIndex](#get-shader-table-index)
* [SetShaderTableIndex](#set-shader-table-index)
* [GetInstanceIndex](#get-instance-index)
* [GetInstanceID](#get-instance-id)
* [GetGeometryIndex](#get-geometry-index)
* [GetPrimitiveIndex](#get-primitive-index)
* [GetHitKind](#get-hit-kind)
* [GetAttributes](#get-attributes)
* [GetTriangleVertexPositions](#get-triangle-vertex-positions)
* [GetWorldToObject](#get-world-to-object)
* [GetObjectToWorld](#get-object-to-world)
* [GetCurrentTime](#get-current-time)
* [GetObjectRayOrigin](#get-object-ray-origin)
* [GetObjectRayDirection](#get-object-ray-direction)
* [GetShaderRecordBufferHandle](#get-shader-record-buffer-handle)
* [GetClusterID](#get-cluster-id)
* [GetSpherePositionAndRadius](#get-sphere-position-and-radius)
* [GetLssPositionsAndRadii](#get-lss-positions-and-radii)
* [IsSphereHit](#is-sphere-hit)
* [IsLssHit](#is-lss-hit)
* [LoadLocalRootTableConstant](#load-local-root-table-constant)
--------------------------------------------------------------------------------
<a id="trace-ray"></a>
# `HitObject.TraceRay`
## Description
Executes ray traversal (including anyhit and intersection shaders) like TraceRay, but returns the
resulting hit information as a HitObject and does not trigger closesthit or miss shaders.
## Signature
```
static HitObject HitObject.TraceRay<payload_t>(
RaytracingAccelerationStructure AccelerationStructure,
uint RayFlags,
uint InstanceInclusionMask,
uint RayContributionToHitGroupIndex,
uint MultiplierForGeometryContributionToHitGroupIndex,
uint MissShaderIndex,
RayDesc Ray,
inout payload_t Payload);
```
--------------------------------------------------------------------------------
<a id="trace-motion-ray"></a>
# `HitObject.TraceMotionRay`
## Description
Executes motion ray traversal (including anyhit and intersection shaders) like TraceRay, but returns the
resulting hit information as a HitObject and does not trigger closesthit or miss shaders.
**Note**: Requires motion blur support. Available on Vulkan (NV/EXT) and CUDA.
## Signature
```
static HitObject HitObject.TraceMotionRay<payload_t>(
RaytracingAccelerationStructure AccelerationStructure,
uint RayFlags,
uint InstanceInclusionMask,
uint RayContributionToHitGroupIndex,
uint MultiplierForGeometryContributionToHitGroupIndex,
uint MissShaderIndex,
RayDesc Ray,
float CurrentTime,
inout payload_t Payload);
```
--------------------------------------------------------------------------------
<a id="make-hit"></a>
# `HitObject.MakeHit`
## Description
Creates a HitObject representing a hit based on values explicitly passed as arguments, without
tracing a ray. The primitive specified by AccelerationStructure, InstanceIndex, GeometryIndex,
and PrimitiveIndex must exist. The shader table index is computed using the formula used with
TraceRay. The computed index must reference a valid hit group record in the shader table. The
Attributes parameter must either be an attribute struct, such as
BuiltInTriangleIntersectionAttributes, or another HitObject to copy the attributes from.
**Note**: This function is **NV-only** and not available with the cross-vendor EXT extension.
## Signature
```
static HitObject HitObject.MakeHit<attr_t>(
RaytracingAccelerationStructure AccelerationStructure,
uint InstanceIndex,
uint GeometryIndex,
uint PrimitiveIndex,
uint HitKind,
uint RayContributionToHitGroupIndex,
uint MultiplierForGeometryContributionToHitGroupIndex,
RayDesc Ray,
attr_t attributes);
static HitObject HitObject.MakeHit<attr_t>(
uint HitGroupRecordIndex,
RaytracingAccelerationStructure AccelerationStructure,
uint InstanceIndex,
uint GeometryIndex,
uint PrimitiveIndex,
uint HitKind,
RayDesc Ray,
attr_t attributes);
```
--------------------------------------------------------------------------------
<a id="make-motion-hit"></a>
# `HitObject.MakeMotionHit`
## Description
See MakeHit but handles Motion.
**Note**: This function is **NV-only** and not available with the cross-vendor EXT extension.
## Signature
```
static HitObject HitObject.MakeMotionHit<attr_t>(
RaytracingAccelerationStructure AccelerationStructure,
uint InstanceIndex,
uint GeometryIndex,
uint PrimitiveIndex,
uint HitKind,
uint RayContributionToHitGroupIndex,
uint MultiplierForGeometryContributionToHitGroupIndex,
RayDesc Ray,
float CurrentTime,
attr_t attributes);
static HitObject HitObject.MakeMotionHit<attr_t>(
uint HitGroupRecordIndex,
RaytracingAccelerationStructure AccelerationStructure,
uint InstanceIndex,
uint GeometryIndex,
uint PrimitiveIndex,
uint HitKind,
RayDesc Ray,
float CurrentTime,
attr_t attributes);
```
--------------------------------------------------------------------------------
<a id="make-miss"></a>
# `HitObject.MakeMiss`
## Description
Creates a HitObject representing a miss based on values explicitly passed as arguments, without
tracing a ray. The provided shader table index must reference a valid miss record in the shader
table.
## Signature
```
static HitObject HitObject.MakeMiss(
uint MissShaderIndex,
RayDesc Ray);
```
--------------------------------------------------------------------------------
<a id="make-motion-miss"></a>
# `HitObject.MakeMotionMiss`
## Description
See MakeMiss but handles Motion. Available on Vulkan (NV and EXT extensions).
## Signature
```
static HitObject HitObject.MakeMotionMiss(
uint MissShaderIndex,
RayDesc Ray,
float CurrentTime);
```
--------------------------------------------------------------------------------
<a id="make-nop"></a>
# `HitObject.MakeNop`
## Description
Creates a HitObject representing “NOP” (no operation) which is neither a hit nor a miss. Invoking a
NOP hit object using HitObject::Invoke has no effect. Reordering by hit objects using
ReorderThread will group NOP hit objects together. This can be useful in some reordering
scenarios where future control flow for some threads is known to process neither a hit nor a
miss.
## Signature
```
static HitObject HitObject.MakeNop();
```
--------------------------------------------------------------------------------
<a id="from-ray-query"></a>
# `HitObject.FromRayQuery`
## Description
Creates a HitObject from a committed RayQuery hit. The RayQuery must have a committed hit
(either triangle or procedural). If the RayQuery has no committed hit, the resulting HitObject
will represent a miss or NOP depending on the query state.
**Note**: **DXR 1.3 only**. Also available on Vulkan EXT via `hitObjectRecordFromQueryEXT`.
## Signature
```
static HitObject HitObject.FromRayQuery<RayQuery_t>(
RayQuery_t Query);
static HitObject HitObject.FromRayQuery<RayQuery_t, attr_t>(
RayQuery_t Query,
uint CommittedCustomHitKind,
attr_t CommittedCustomAttribs);
```
--------------------------------------------------------------------------------
<a id="invoke"></a>
# `HitObject.Invoke`
## Description
Invokes closesthit or miss shading for the specified hit object. In case of a NOP HitObject, no
shader is invoked.
## Signature
```
static void HitObject.Invoke<payload_t>(
RaytracingAccelerationStructure AccelerationStructure,
HitObject HitOrMiss,
inout payload_t Payload);
// DXR 1.3 overload (without AccelerationStructure)
static void HitObject.Invoke<payload_t>(
HitObject HitOrMiss,
inout payload_t Payload);
```
--------------------------------------------------------------------------------
<a id="is-miss"></a>
# `HitObject.IsMiss`
## Description
Returns true if the HitObject encodes a miss, otherwise returns false.
## Signature
```
bool HitObject.IsMiss();
```
--------------------------------------------------------------------------------
<a id="is-hit"></a>
# `HitObject.IsHit`
## Description
Returns true if the HitObject encodes a hit, otherwise returns false.
## Signature
```
bool HitObject.IsHit();
```
--------------------------------------------------------------------------------
<a id="is-nop"></a>
# `HitObject.IsNop`
## Description
Returns true if the HitObject encodes a nop, otherwise returns false.
## Signature
```
bool HitObject.IsNop();
```
--------------------------------------------------------------------------------
<a id="get-ray-desc"></a>
# `HitObject.GetRayDesc`
## Description
Queries ray properties from HitObject. Valid if the hit object represents a hit or a miss.
## Signature
```
RayDesc HitObject.GetRayDesc();
```
--------------------------------------------------------------------------------
<a id="get-ray-flags"></a>
# `HitObject.GetRayFlags`
## Description
Returns the ray flags used when tracing the ray. Valid if the hit object represents a hit or a miss.
**Note**: **DXR 1.3 and Vulkan EXT**.
## Signature
```
uint HitObject.GetRayFlags();
```
--------------------------------------------------------------------------------
<a id="get-ray-tmin"></a>
# `HitObject.GetRayTMin`
## Description
Returns the minimum T value of the ray. Valid if the hit object represents a hit or a miss.
**Note**: **DXR 1.3 and Vulkan EXT**.
## Signature
```
float HitObject.GetRayTMin();
```
--------------------------------------------------------------------------------
<a id="get-ray-tcurrent"></a>
# `HitObject.GetRayTCurrent`
## Description
Returns the current T value (hit distance) of the ray. Valid if the hit object represents a hit or a miss.
**Note**: **DXR 1.3 and Vulkan EXT** (called `GetRayTMax` in GLSL/SPIR-V).
## Signature
```
float HitObject.GetRayTCurrent();
```
--------------------------------------------------------------------------------
<a id="get-world-ray-origin"></a>
# `HitObject.GetWorldRayOrigin`
## Description
Returns the ray origin in world space. Valid if the hit object represents a hit or a miss.
**Note**: **DXR 1.3 and Vulkan EXT**.
## Signature
```
float3 HitObject.GetWorldRayOrigin();
```
--------------------------------------------------------------------------------
<a id="get-world-ray-direction"></a>
# `HitObject.GetWorldRayDirection`
## Description
Returns the ray direction in world space. Valid if the hit object represents a hit or a miss.
**Note**: **DXR 1.3 and Vulkan EXT**.
## Signature
```
float3 HitObject.GetWorldRayDirection();
```
--------------------------------------------------------------------------------
<a id="get-shader-table-index"></a>
# `HitObject.GetShaderTableIndex`
## Description
Queries shader table index from HitObject. Valid if the hit object represents a hit or a miss.
## Signature
```
uint HitObject.GetShaderTableIndex();
```
--------------------------------------------------------------------------------
<a id="get-instance-index"></a>
# `HitObject.GetInstanceIndex`
## Description
Returns the instance index of a hit. Valid if the hit object represents a hit.
## Signature
```
uint HitObject.GetInstanceIndex();
```
--------------------------------------------------------------------------------
<a id="get-instance-id"></a>
# `HitObject.GetInstanceID`
## Description
Returns the instance ID of a hit. Valid if the hit object represents a hit.
## Signature
```
uint HitObject.GetInstanceID();
```
--------------------------------------------------------------------------------
<a id="get-geometry-index"></a>
# `HitObject.GetGeometryIndex`
## Description
Returns the geometry index of a hit. Valid if the hit object represents a hit.
## Signature
```
uint HitObject.GetGeometryIndex();
```
--------------------------------------------------------------------------------
<a id="get-primitive-index"></a>
# `HitObject.GetPrimitiveIndex`
## Description
Returns the primitive index of a hit. Valid if the hit object represents a hit.
## Signature
```
uint HitObject.GetPrimitiveIndex();
```
--------------------------------------------------------------------------------
<a id="get-hit-kind"></a>
# `HitObject.GetHitKind`
## Description
Returns the hit kind. Valid if the hit object represents a hit.
## Signature
```
uint HitObject.GetHitKind();
```
--------------------------------------------------------------------------------
<a id="get-attributes"></a>
# `HitObject.GetAttributes`
## Description
Returns the attributes of a hit. Valid if the hit object represents a hit or a miss.
## Signature
```
attr_t HitObject.GetAttributes<attr_t>();
```
--------------------------------------------------------------------------------
<a id="get-triangle-vertex-positions"></a>
# `HitObject.GetTriangleVertexPositions`
## Description
Returns the world-space vertex positions of the triangle that was hit. Valid if the hit object represents a triangle hit.
**Note**: **Vulkan EXT only**. Requires `SPV_KHR_ray_tracing_position_fetch` capability.
## Signature
```
void HitObject.GetTriangleVertexPositions(out float3 positions[3]);
```
--------------------------------------------------------------------------------
<a id="load-local-root-table-constant"></a>
# `HitObject.LoadLocalRootTableConstant`
## Description
Loads a root constant from the local root table referenced by the hit object. Valid if the hit object
represents a hit or a miss. RootConstantOffsetInBytes must be a multiple of 4.
**Note**: **D3D12/HLSL only**.
## Signature
```
uint HitObject.LoadLocalRootTableConstant(uint RootConstantOffsetInBytes);
```
--------------------------------------------------------------------------------
<a id="set-shader-table-index"></a>
# `HitObject.SetShaderTableIndex`
## Description
Sets the shader table index of the hit object. Used to modify which shader gets invoked during HitObject.Invoke. **EXT extension only** (not available with NV extension).
## Signature
```
void HitObject.SetShaderTableIndex(uint RecordIndex);
```
--------------------------------------------------------------------------------
<a id="get-world-to-object"></a>
# `HitObject.GetWorldToObject`
## Description
Returns the world-to-object transformation matrix. Valid if the hit object represents a hit.
## Signature
```
float4x3 HitObject.GetWorldToObject();
// DXR 1.3 layout variants
float3x4 HitObject.GetWorldToObject3x4();
float4x3 HitObject.GetWorldToObject4x3();
```
--------------------------------------------------------------------------------
<a id="get-object-to-world"></a>
# `HitObject.GetObjectToWorld`
## Description
Returns the object-to-world transformation matrix. Valid if the hit object represents a hit.
## Signature
```
float4x3 HitObject.GetObjectToWorld();
// DXR 1.3 layout variants
float3x4 HitObject.GetObjectToWorld3x4();
float4x3 HitObject.GetObjectToWorld4x3();
```
--------------------------------------------------------------------------------
<a id="get-current-time"></a>
# `HitObject.GetCurrentTime`
## Description
Returns the current time for motion blur. Valid if the hit object represents a motion hit or miss.
**Note**: Requires motion blur support. Available on Vulkan (NV/EXT).
## Signature
```
float HitObject.GetCurrentTime();
```
--------------------------------------------------------------------------------
<a id="get-object-ray-origin"></a>
# `HitObject.GetObjectRayOrigin`
## Description
Returns the ray origin in object space. Valid if the hit object represents a hit.
## Signature
```
float3 HitObject.GetObjectRayOrigin();
```
--------------------------------------------------------------------------------
<a id="get-object-ray-direction"></a>
# `HitObject.GetObjectRayDirection`
## Description
Returns the ray direction in object space. Valid if the hit object represents a hit.
## Signature
```
float3 HitObject.GetObjectRayDirection();
```
--------------------------------------------------------------------------------
<a id="get-shader-record-buffer-handle"></a>
# `HitObject.GetShaderRecordBufferHandle`
## Description
Returns the shader record buffer handle. Valid if the hit object represents a hit or a miss.
## Signature
```
uint2 HitObject.GetShaderRecordBufferHandle();
```
--------------------------------------------------------------------------------
<a id="get-cluster-id"></a>
# `HitObject.GetClusterID`
## Description
Returns the cluster ID for cluster acceleration structures. Valid if the hit object represents a hit.
**Note**: **NV-only** (requires `GL_NV_cluster_acceleration_structure`).
## Signature
```
int HitObject.GetClusterID();
```
--------------------------------------------------------------------------------
<a id="get-sphere-position-and-radius"></a>
# `HitObject.GetSpherePositionAndRadius`
## Description
Returns the position and radius of a sphere primitive. Valid if the hit object represents a sphere hit.
**Note**: **NV-only**.
## Signature
```
float4 HitObject.GetSpherePositionAndRadius();
```
--------------------------------------------------------------------------------
<a id="get-lss-positions-and-radii"></a>
# `HitObject.GetLssPositionsAndRadii`
## Description
Returns the positions and radii of a linear swept sphere primitive. Valid if the hit object represents an LSS hit.
**Note**: **NV-only**.
## Signature
```
float2x4 HitObject.GetLssPositionsAndRadii();
```
--------------------------------------------------------------------------------
<a id="is-sphere-hit"></a>
# `HitObject.IsSphereHit`
## Description
Returns true if the HitObject represents a hit on a sphere primitive, otherwise returns false.
**Note**: **NV-only**.
## Signature
```
bool HitObject.IsSphereHit();
```
--------------------------------------------------------------------------------
<a id="is-lss-hit"></a>
# `HitObject.IsLssHit`
## Description
Returns true if the HitObject represents a hit on a linear swept sphere primitive, otherwise returns false.
**Note**: **NV-only**.
## Signature
```
bool HitObject.IsLssHit();
```
--------------------------------------------------------------------------------
<a id="reorder-thread"></a>
# `ReorderThread`
## Description
Reorders threads based on a coherence hint value. NumCoherenceHintBits indicates how many of
the least significant bits of CoherenceHint should be considered during reordering (max: 16).
Applications should set this to the lowest value required to represent all possible values in
CoherenceHint. For best performance, all threads should provide the same value for
NumCoherenceHintBits.
Where possible, reordering will also attempt to retain locality in the threads launch indices
(DispatchRaysIndex in DXR).
`ReorderThread(HitOrMiss)` is equivalent to
```
void ReorderThread( HitObject HitOrMiss, uint CoherenceHint, uint NumCoherenceHintBitsFromLSB );
```
With CoherenceHint and NumCoherenceHintBitsFromLSB as 0, meaning they are ignored.
## Signature
```
void ReorderThread(
uint CoherenceHint,
uint NumCoherenceHintBitsFromLSB);
void ReorderThread(
HitObject HitOrMiss,
uint CoherenceHint,
uint NumCoherenceHintBitsFromLSB);
void ReorderThread(HitObject HitOrMiss);
```
--------------------------------------------------------------------------------
<a id="reorder-execute"></a>
# `ReorderExecute`
## Description
Fused operation that reorders threads by HitObject and then executes the shader. Equivalent to calling `ReorderThread` followed by `HitObject.Invoke`.
**Note**: **Vulkan EXT only**. Available via `hitObjectReorderExecuteEXT` in GLSL.
## Signature
```
// GLSL: hitObjectReorderExecuteEXT(hitObject, payload)
void ReorderExecute<payload_t>(
HitObject HitOrMiss,
inout payload_t Payload);
// GLSL: hitObjectReorderExecuteEXT(hitObject, hint, bits, payload)
void ReorderExecute<payload_t>(
HitObject HitOrMiss,
uint CoherenceHint,
uint NumCoherenceHintBitsFromLSB,
inout payload_t Payload);
```
--------------------------------------------------------------------------------
<a id="trace-reorder-execute"></a>
# `TraceReorderExecute`
## Description
Fused operation that traces a ray, reorders threads by the resulting HitObject, and executes the shader. Equivalent to calling `HitObject.TraceRay`, `ReorderThread`, and `HitObject.Invoke` in sequence.
**Note**: **Vulkan EXT only**. Available via `hitObjectTraceReorderExecuteEXT` in GLSL.
## Signature
```
// GLSL: hitObjectTraceReorderExecuteEXT(...)
void TraceReorderExecute<payload_t>(
RaytracingAccelerationStructure AccelerationStructure,
uint RayFlags,
uint InstanceInclusionMask,
uint RayContributionToHitGroupIndex,
uint MultiplierForGeometryContributionToHitGroupIndex,
uint MissShaderIndex,
RayDesc Ray,
inout payload_t Payload);
// With coherence hint
void TraceReorderExecute<payload_t>(
RaytracingAccelerationStructure AccelerationStructure,
uint RayFlags,
uint InstanceInclusionMask,
uint RayContributionToHitGroupIndex,
uint MultiplierForGeometryContributionToHitGroupIndex,
uint MissShaderIndex,
RayDesc Ray,
uint CoherenceHint,
uint NumCoherenceHintBitsFromLSB,
inout payload_t Payload);
```
--------------------------------------------------------------------------------
<a id="trace-motion-reorder-execute"></a>
# `TraceMotionReorderExecute`
## Description
Fused operation for motion blur that traces a motion ray, reorders threads, and executes the shader.
**Note**: **Vulkan EXT only**. Available via `hitObjectTraceMotionReorderExecuteEXT` in GLSL. Requires motion blur support.
## Signature
```
void TraceMotionReorderExecute<payload_t>(
RaytracingAccelerationStructure AccelerationStructure,
uint RayFlags,
uint InstanceInclusionMask,
uint RayContributionToHitGroupIndex,
uint MultiplierForGeometryContributionToHitGroupIndex,
uint MissShaderIndex,
RayDesc Ray,
float CurrentTime,
uint CoherenceHint,
uint NumCoherenceHintBitsFromLSB,
inout payload_t Payload);
```

View File

@@ -0,0 +1,25 @@
Using Slang on Shader Playground
================================
A fast and simple way to try out Slang is by using the [Shader Playground](http://shader-playground.timjones.io/) website. This site allows easy and interactive testing of shader code across several compilers including Slang without having to install anything on your local machine.
Using the Slang compiler is as simple as selecting 'Slang' from the box in the top left corner from the [Shader Playground](http://shader-playground.timjones.io/). This selects the Slang language for input, and the Slang compiler for compilation. The output of the compilation is shown in the right hand panel.
The default 'Output format' is HLSL. For graphics shaders the 'Output format' can be changed to
* DXIL
* SPIR-V
* DXBC
* HLSL
* GLSL
Additionally for compute based shaders it can be set to
* C++
* CUDA
* PTX
For binary formats (such as DXIL/SPIR-V/DXBC) the output will be displayed as the applicable disassembly.
Note that C++ and CUDA output include a 'prelude'. The prelude remains the same across compilations, with the code generated for the input Slang source placed at the very end of the output.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,123 @@
# Slang Core Module Documentation Generation Tool
Slang's core module reference (https://shader-slang.com/stdlib-reference) is generated by `slangc` from the source of the core module.
This page covers how `slangc` can be used to generate this documentation.
## Generating Documentation
Follow these steps to generate the core module reference documentation and view the generated markdown files locally:
```
# clone stdlib-reference repo
git clone https://github.com/shader-slang/stdlib-reference
cd stdlib-reference
# delete existing pages
rm -rf ./interfaces
rm -rf ./types
rm -rf ./global-decls
rm -rf ./attributes
# generate updated pages
slangc -compile-core-module -doc
# optional: move generated toc.html to `_includes`
mv toc.html ./_includes/stdlib-reference-toc.html
```
`slangc` will read the `config.txt` file in the stdlib-reference repository, and then generate all the markdown files
located in `types`, `attributes`, `interfaces` and `global-decls` directory.
Note that the `index.md` in root is not generated.
You should review the generated markdown file to make sure it is formatted correctly after making comment edits in the
`*.meta.slang` files.
## Writing and Updating Documentation
The core module documentation is done directly in comments inside `source/slang/*.meta.slang` files.
A documentation comment should be placed directly above the declaration, either inside a `/** */` comment block, or
after `///`. The following directives are allowed in comments:
- `@param paramName description` documents a parameter or a generic parameter.
- `@remarks` starts the remarks section.
- `@see` starts the "See also" section.
- `@return` starts the `Return value" section.
- `@example` starts the "Example" section.
- `@category categoryID Category Name` marks the decl to be in a category. The category name is only required for the first time `categoryID` is used, and omitted for the remaining `@category` lines.
- `@internal` marks the declaration as internal.
- `@experimental` marks the declaration as experimental.
- `@deprecated` marks the declaration as deprecated.
You can use markdown syntax in any part of the comment.
For overloaded functions, only document the first overload. List all parameters from all overloads in the same comment block for the first overload. Documentation on the remaining overloads will be ignored by the tool. If an overloaded decl has differing documentation on different overload candidates, the `slangc` tool will emit a warning.
The following code is an example of how `_Texture.Sample` is documented. Notice that only the first overload is documented, and it also includes documentation for parameters which are only present in subsequent overloads, such as `offset`.
```csharp
/// Samples the texture at the given location.
///
///@param s The `SamplerState` to use for the sampling operation. This parameter is omitted when `this` is a combined texture sampler type (`isCombined == 0`).
///@param location The location to sample the texture at.
///@param offset Texel offset to apply.
///@param clamp The max level of detail to use.
///@param[out] status The result status of the operation.
/// This parameter is currently only used when targeting HLSL.
/// For other targets, the result status is always 0.
///@return The sampled texture value.
///@see `SampleBias`, `SampleLevel`, `SampleGrad`, `SampleCmp`, `SampleCmpLevelZero`.
///@remarks
/// The `Sample` function is defined for all read-only texture types, including
/// `Texture1D`, `Texture2D`, `Texture3D`, `TextureCube`,
/// `Texture1DArray`, `Texture2DArray` and `TextureCubeArray`.
///
/// The function is not available for read-write texture types.
///
/// For HLSL/D3D targets, the texture element type must be a scalar or vector of float or half types.
///
[__readNone]
[ForceInline]
[require(cpp_cuda_glsl_hlsl_metal_spirv_wgsl, texture_sm_4_0_fragment)]
T Sample(vector<float, Shape.dimensions+isArray> location)
{
...
}
[__readNone]
[ForceInline]
[require(cpp_glsl_hlsl_metal_spirv_wgsl, texture_sm_4_0_fragment)]
T Sample(vector<float, Shape.dimensions+isArray> location, constexpr vector<int, Shape.planeDimensions> offset)
{
...
}
```
Note that unlike doxygen, the directives marks the start of a new section, and applies to all following paragraphs. You don't need to repetitively mark new paragraphs
as with `@remarks`.
## What to document
- Provide a brief description of the declaration in under three sentenses.
- Document all nuances, including target specific behaviors in the remarks section.
- Include examples if needed in the examples section.
- Provide a see also section with links to related declarations.
After updating comments, build `slangc`, and run `slangc -compile-core-module -doc` in `stdlib-reference` directory to update the markdown files for preview.
Your PR only needs to include changes to *.meta.slang files. Once your PR is merged, slang CI will run `slangc` and push the updated markdown files to
the `stdlib-reference` repo.
## Hiding a declaration
Use `// @hidden:` to hide all declarations after the line for docgen purpose.
Use `// @public: ` to stop hiding all declarations after the line. These two special lines act like
C++'s visibility modifiers: they apply to everything after it.
## How to preview generated html page locally
To preview github pages locally, you need to follow instructions on setting up Jekyll:
https://docs.github.com/en/pages/setting-up-a-github-pages-site-with-jekyll/testing-your-github-pages-site-locally-with-jekyll
You will need to use Jekyll to create a Gem file before serving it.

View File

@@ -0,0 +1,360 @@
# Slang Target Compatibility
Shader Model (SM) numbers are D3D Shader Model versions, unless explicitly stated otherwise.
OpenGL compatibility is not listed here, because OpenGL isn't an officially supported target.
Items with a + means that the feature is anticipated to be added in the future.
Items with ^ means there is some discussion about support later in the document for this target.
| Feature | D3D11 | D3D12 | VK | CUDA | Metal | CPU |
| ---------------------------------------------------- | ----- | --------- | ------- | -------------- | ----- | --------- |
| [Half Type](#half) | No | Yes ^ | Yes | Yes ^ | Yes | No + |
| Double Type | Yes | Yes | Yes | Yes | No | Yes |
| Double Intrinsics | No | Limited + | Limited | Most | No | Yes |
| [u/int8_t Type](#int8_t) | No | No | Yes ^ | Yes | Yes | Yes |
| [u/int16_t Type](#int16_t) | No | Yes ^ | Yes ^ | Yes | Yes | Yes |
| [u/int64_t Type](#int64_t) | No | Yes ^ | Yes | Yes | Yes | Yes |
| u/int64_t Intrinsics | No | No | Yes | Yes | Yes | Yes |
| [int matrix](#int-matrix) | Yes | Yes | No + | Yes | No | Yes |
| [tex.GetDimensions](#tex-get-dimensions) | Yes | Yes | Yes | No | Yes | Yes |
| [SM6.0 Wave Intrinsics](#sm6-wave) | No | Yes | Partial | Yes ^ | No | No |
| SM6.0 Quad Intrinsics | No | Yes | No + | No | No | No |
| [SM6.5 Wave Intrinsics](#sm6.5-wave) | No | Yes ^ | No + | Yes ^ | No | No |
| [WaveMask Intrinsics](#wave-mask) | Yes ^ | Yes ^ | Yes + | Yes | No | No |
| [WaveShuffle](#wave-shuffle) | No | Limited ^ | Yes | Yes | No | No |
| [Tesselation](#tesselation) | Yes ^ | Yes ^ | No + | No | No | No |
| [Graphics Pipeline](#graphics-pipeline) | Yes | Yes | Yes | No | Yes | No |
| [Ray Tracing DXR 1.0](#ray-tracing-1.0) | No | Yes ^ | Yes ^ | No | No | No |
| Ray Tracing DXR 1.1 | No | Yes | No + | No | No | No |
| [Native Bindless](#native-bindless) | No | No | No | Yes | No | Yes |
| [Buffer bounds](#buffer-bounds) | Yes | Yes | Yes | Limited ^ | No ^ | Limited ^ |
| [Resource bounds](#resource-bounds) | Yes | Yes | Yes | Yes (optional) | Yes | Yes |
| Atomics | Yes | Yes | Yes | Yes | Yes | Yes |
| Group shared mem/Barriers | Yes | Yes | Yes | Yes | Yes | No + |
| [TextureArray.Sample float](#tex-array-sample-float) | Yes | Yes | Yes | No | Yes | Yes |
| [Separate Sampler](#separate-sampler) | Yes | Yes | Yes | No | Yes | Yes |
| [tex.Load](#tex-load) | Yes | Yes | Yes | Limited ^ | Yes | Yes |
| [Full bool](#full-bool) | Yes | Yes | Yes | No | Yes | Yes ^ |
| [Mesh Shader](#mesh-shader) | No | Yes | Yes | No | Yes | No |
| [`[unroll]`](#unroll] | Yes | Yes | Yes ^ | Yes | No ^ | Limited + |
| Atomics | Yes | Yes | Yes | Yes | Yes | No + |
| [Atomics on RWBuffer](#rwbuffer-atomics) | Yes | Yes | Yes | No | Yes | No + |
| [Sampler Feedback](#sampler-feedback) | No | Yes | No + | No | No | Yes ^ |
| [RWByteAddressBuffer Atomic](#byte-address-atomic) | No | Yes ^ | Yes ^ | Yes | Yes | No + |
| [Shader Execution Reordering](#ser) | No | Yes ^ | Yes ^ | No | No | No |
| [debugBreak](#debug-break) | No | No | Yes | Yes | No | Yes |
| [realtime clock](#realtime-clock) | No | Yes ^ | Yes | Yes | No | No |
| [Switch Fall-Through](#switch-fallthrough) | No ^ | Yes | Yes | Yes | Yes | Yes |
<a id="half"></a>
## Half Type
There appears to be a problem writing to a StructuredBuffer containing half on D3D12. D3D12 also appears to have problems doing calculations with half.
In order for half to work in CUDA, NVRTC must be able to include `cuda_fp16.h` and related files. Please read the [CUDA target documentation](cuda-target.md) for more details.
<a id="int8_t"></a>
## u/int8_t Type
Not currently supported in D3D11/D3D12 because not supported in HLSL/DXIL/DXBC.
Supported in Vulkan via the extensions `GL_EXT_shader_explicit_arithmetic_types` and `GL_EXT_shader_8bit_storage`.
<a id="int16_t"></a>
## u/int16_t Type
Requires SM6.2 which requires DXIL and therefore DXC and D3D12. For DXC this is discussed [here](https://github.com/Microsoft/DirectXShaderCompiler/wiki/16-Bit-Scalar-Types).
Supported in Vulkan via the extensions `GL_EXT_shader_explicit_arithmetic_types` and `GL_EXT_shader_16bit_storage`.
<a id="int64_t"></a>
## u/int64_t Type
Requires SM6.0 which requires DXIL for D3D12. Therefore not available with DXBC on D3D11 or D3D12.
<a id="int-matrix"></a>
## int matrix
Means can use matrix types containing integer types.
<a id="tex-get-dimensions"></a>
## tex.GetDimensions
tex.GetDimensions is the GetDimensions method on 'texture' objects. This is not supported on CUDA as CUDA has no equivalent functionality to get these values. GetDimensions work on Buffer resource types on CUDA.
<a id="sm6-wave"></a>
## SM6.0 Wave Intrinsics
CUDA has premliminary support for Wave Intrinsics, introduced in [PR #1352](https://github.com/shader-slang/slang/pull/1352). Slang synthesizes the 'WaveMask' based on program flow and the implied 'programmer view' of execution. This support is built on top of WaveMask intrinsics with Wave Intrinsics being replaced with WaveMask Intrinsic calls with Slang generating the code to calculate the appropriate WaveMasks.
Please read [PR #1352](https://github.com/shader-slang/slang/pull/1352) for a better description of the status.
<a id="sm6.5-wave"></a>
## SM6.5 Wave Intrinsics
SM6.5 Wave Intrinsics are supported, but requires a downstream DXC compiler that supports SM6.5. As it stands the DXC shipping with windows does not.
<a id="wave-mask"></a>
## WaveMask Intrinsics
In order to map better to the CUDA sync/mask model Slang supports 'WaveMask' intrinsics. They operate in broadly the same way as the Wave intrinsics, but require the programmer to specify the lanes that are involved. To write code that uses wave intrinsics across targets including CUDA, currently the WaveMask intrinsics must be used. For this to work, the masks passed to the WaveMask functions should exactly match the 'Active lanes' concept that HLSL uses, otherwise the result is undefined.
The WaveMask intrinsics are not part of HLSL and are only available on Slang.
<a id="wave-shuffle"></a>
## WaveShuffle
`WaveShuffle` and `WaveBroadcastLaneAt` are Slang specific intrinsic additions to expand the options available around `WaveReadLaneAt`.
To be clear this means they will not compile directly on 'standard' HLSL compilers such as `dxc`, but Slang HLSL _output_ (which will not contain these intrinsics) can (and typically is) compiled via dxc.
The difference between them can be summarized as follows
- WaveBroadcastLaneAt - laneId must be a compile time constant
- WaveReadLaneAt - laneId can be dynamic but _MUST_ be the same value across the Wave ie 'dynamically uniform' across the Wave
- WaveShuffle - laneId can be truly dynamic (NOTE! That it is not strictly truly available currently on all targets, specifically HLSL)
Other than the different restrictions on laneId they act identically to WaveReadLaneAt.
`WaveBroadcastLaneAt` and `WaveReadLaneAt` will work on all targets that support wave intrinsics, with the only current restriction being that on GLSL targets, only scalars and vectors are supported.
`WaveShuffle` will always work on CUDA/Vulkan.
On HLSL based targets currently `WaveShuffle` will be converted into `WaveReadLaneAt`. Strictly speaking this means it _requires_ the `laneId` to be `dynamically uniform` across the Wave. In practice some hardware supports the loosened usage, and others does not. In the future this may be fixed in Slang and/or HLSL to work across all hardware. For now if you use `WaveShuffle` on HLSL based targets it will be necessary to confirm that `WaveReadLaneAt` has the loosened behavior for all the hardware intended. If target hardware does not support the loosened restrictions it's behavior is undefined.
<a id="tesselation"></a>
## Tesselation
Although tesselation stages should work on D3D11 and D3D12 they are not tested within our test framework, and may have problems.
<a id="native-bindless"></a>
## Native Bindless
Bindless is possible on targets that support it - but is not the default behavior for those targets, and typically require significant effort in Slang code.
'Native Bindless' targets use a form of 'bindless' for all targets. On CUDA this requires the target to use 'texture object' style binding and for the device to have 'compute capability 3.0' or higher.
<a id="resource-bounds"></a>
## Resource bounds
For CUDA this is optional as can be controlled via the SLANG_CUDA_BOUNDARY_MODE macro in the `slang-cuda-prelude.h`. By default it's behavior is `cudaBoundaryModeZero`.
<a id="buffer-bounds"></a>
## Buffer Bounds
This is the feature when accessing outside of the bounds of a Buffer there is well defined behavior - on read returning all 0s, and on write, the write being ignored.
On CPU there is only bounds checking on debug compilation of C++ code. This will assert if the access is out of range.
On CUDA out of bounds accesses default to element 0 (!). The behavior can be controlled via the SLANG_CUDA_BOUND_CHECK macro in the `slang-cuda-prelude.h`. This behavior may seem a little strange - and it requires a buffer that has at least one member to not do something nasty. It is really a 'least worst' answer to a difficult problem and is better than out of range accesses or worse writes.
In Metal, accessing a buffer out of bounds is undefined behavior.
<a id="tex-array-sample-float"></a>
## TextureArray.Sample float
When using 'Sample' on a TextureArray, CUDA treats the array index parameter as an int, even though it is passed as a float.
<a id="separate-sampler"></a>
## Separate Sampler
This feature means that a multiple Samplers can be used with a Texture. In terms of the HLSL code this can be seen as the 'SamplerState' being a parameter passed to the 'Sample' method on a texture object.
On CUDA the SamplerState is ignored, because on this target a 'texture object' is the Texture and Sampler combination.
<a id="graphics-pipeline"></a>
## Graphics Pipeline
CPU and CUDA only currently support compute shaders.
<a id="ray-tracing-1.0"></a>
## Ray Tracing DXR 1.0
Vulkan does not support a local root signature, but there is the concept of a 'shader record'. In Slang a single constant buffer can be marked as a shader record with the `[[vk::shader_record]]` attribute, for example:
```
[[vk::shader_record]]
cbuffer ShaderRecord
{
uint shaderRecordID;
}
```
In practice to write shader code that works across D3D12 and VK you should have a single constant buffer marked as 'shader record' for VK and then on D3D that constant buffer should be bound in the local root signature on D3D.
<a id="tex-load"></a>
## tex.Load
tex.Load is only supported on CUDA for Texture1D. Additionally CUDA only allows such access for linear memory, meaning the bound texture can also not have mip maps. Load _is_ allowed on RWTexture types of other dimensions including 1D on CUDA.
<a id="full-bool"></a>
## Full bool
Means fully featured bool support. CUDA has issues around bool because there isn't a vector bool type built in. Currently bool aliases to an int vector type.
On CPU there are some issues in so far as bool's size is not well defined in size an alignment. Most C++ compilers now use a byte to represent a bool. In the past it has been backed by an int on some compilers.
<a id="unroll"></a>
## `[unroll]`
The unroll attribute allows for unrolling `for` loops. At the moment the feature is dependent on downstream compiler support which is mixed. In the longer term the intention is for Slang to contain it's own loop unroller - and therefore not be dependent on the feature on downstream compilers.
On C++ this attribute becomes SLANG_UNROLL which is defined in the prelude. This can be predefined if there is a suitable mechanism, if there isn't a definition SLANG_UNROLL will be an empty definition.
On GLSL and VK targets loop unrolling uses the [GL_EXT_control_flow_attributes](https://github.com/KhronosGroup/GLSL/blob/master/extensions/ext/GL_EXT_control_flow_attributes.txt) extension.
Metal Shading Language does not support loop unrolling.
Slang does have a cross target mechanism to [unroll loops](language-reference/06-statements.md), in the section `Compile-Time For Statement`.
<a id="rwbuffer-atomics"></a>
## Atomics on RWBuffer
For VK the GLSL output from Slang seems plausible, but VK binding fails in tests harness.
On CUDA RWBuffer becomes CUsurfObject, which is a 'texture' type and does not support atomics.
On the CPU atomics are not supported, but will be in the future.
<a id="sampler-feedback"></a>
## Sampler Feedback
The HLSL [sampler feedback feature](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) is available for DirectX12. The features requires shader model 6.5 and therefore a version of [DXC](https://github.com/Microsoft/DirectXShaderCompiler) that supports that model or higher. The Shader Model 6.5 requirement also means only DXIL binary format is supported.
There doesn't not appear to be a similar feature available in Vulkan yet, but when it is available support should be added.
For CPU targets there is the IFeedbackTexture interface that requires an implementation for use. Slang does not currently include CPU implementations for texture types.
<a id="byte-address-atomic"></a>
## RWByteAddressBuffer Atomic
The additional supported methods on RWByteAddressBuffer are...
```
void RWByteAddressBuffer::InterlockedAddF32(uint byteAddress, float valueToAdd, out float originalValue);
void RWByteAddressBuffer::InterlockedAddF32(uint byteAddress, float valueToAdd);
void RWByteAddressBuffer::InterlockedAddI64(uint byteAddress, int64_t valueToAdd, out int64_t originalValue);
void RWByteAddressBuffer::InterlockedAddI64(uint byteAddress, int64_t valueToAdd);
void RWByteAddressBuffer::InterlockedCompareExchangeU64(uint byteAddress, uint64_t compareValue, uint64_t value, out uint64_t outOriginalValue);
uint64_t RWByteAddressBuffer::InterlockedExchangeU64(uint byteAddress, uint64_t value);
uint64_t RWByteAddressBuffer::InterlockedMaxU64(uint byteAddress, uint64_t value);
uint64_t RWByteAddressBuffer::InterlockedMinU64(uint byteAddress, uint64_t value);
uint64_t RWByteAddressBuffer::InterlockedAndU64(uint byteAddress, uint64_t value);
uint64_t RWByteAddressBuffer::InterlockedOrU64(uint byteAddress, uint64_t value);
uint64_t RWByteAddressBuffer::InterlockedXorU64(uint byteAddress, uint64_t value);
```
On HLSL based targets this functionality is achieved using [NVAPI](https://developer.nvidia.com/nvapi). Support for NVAPI is described
in the separate [NVAPI Support](nvapi-support.md) document.
On Vulkan, for float the [`GL_EXT_shader_atomic_float`](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_shader_atomic_float.html) extension is required. For int64 the [`GL_EXT_shader_atomic_int64`](https://raw.githubusercontent.com/KhronosGroup/GLSL/master/extensions/ext/GL_EXT_shader_atomic_int64.txt) extension is required.
CUDA requires SM6.0 or higher for int64 support.
<a id="mesh-shader"></a>
## Mesh Shader
There is preliminary [Mesh Shader support](https://github.com/shader-slang/slang/pull/2464).
<a id="ser"></a>
## Shader Execution Reordering
More information about [Shader Execution Reordering](shader-execution-reordering.md).
Currently support is available in D3D12 via NVAPI, and for Vulkan via the [GL_NV_shader_invocation_reorder](https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GLSL_NV_shader_invocation_reorder.txt) extension.
<a id="debug-break"></a>
## Debug Break
Slang has preliminary support for `debugBreak()` intrinsic. With the appropriate tooling, when `debugBreak` is hit it will cause execution to halt and display in the attached debugger.
This is not supported on HLSL, GLSL, SPIR-V or Metal backends. Note that on some targets if there isn't an appropriate debugging environment the debugBreak might cause execution to fail or potentially it is ignored.
On C++ targets debugBreak is implemented using SLANG_BREAKPOINT defined in "slang-cpp-prelude.h". If there isn't a suitable intrinsic, this will default to attempting to write to `nullptr` leading to a crash.
Some additional details:
- If [slang-llvm](cpu-target.md#slang-llvm) is being used as the downstream compiler (as is typical with `host-callable`), it will crash into the debugger, but may not produce a usable stack trace.
- For "normal" C++ downstream compilers such as Clang/Gcc/Visual Studio, to break into readable source code, debug information is typically necessary. Disabling optimizations may be useful to break on the appropriate specific line, and have variables inspectable.
<a id="realtime-clock"></a>
## Realtime Clock
Realtime clock support is available via the API
```
// Get low 32 bits of realtime clock
uint getRealtimeClockLow();
// Get 64 bit realtime clock, with low bits in .x and high bits in .y
uint2 getRealtimeClock();
```
On D3D this is supported through NVAPI via `NvGetSpecial`.
On Vulkan this is supported via [VK_KHR_shader_clock extension](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_shader_clock.html)
On CUDA this is supported via [clock](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#time-function).
Currently this is not supported on CPU, although this will potentially be added in the future.
<a id="switch-fallthrough"></a>
## Switch Fall-Through
Switch fall-through allows code in one case to execute and then continue into the next case without a `break`:
```hlsl
switch(value)
{
case 0:
x = 10;
// Fall through to case 1
case 1:
result = x + value;
break;
}
```
This is natively supported on most targets. However, D3D11 (FXC/DXBC) and WGSL do not support fall-through in their switch statements.
For these targets, Slang restructures the code by duplicating the fall-through destination into each source case. This produces functionally correct results, but has implications:
- **Code size**: The generated code may be larger due to duplication.
- **Wave convergence**: If the duplicated code contains wave/subgroup operations, each copy executes independently, which may affect convergence behavior compared to native fall-through.
When restructuring occurs, Slang emits warning 41026 to alert developers to this behavior change.
To avoid restructuring, ensure each case ends with `break`, `return`, or another control transfer statement.

View File

@@ -0,0 +1,231 @@
# Updating external spirv
There are three directories under `external` that are related to SPIR-V:
- external/spirv-headers
- external/spirv-tools
- external/spirv-tools-generated
In order to use the latest or custom SPIR-V, they need to be updated.
## Update SPIRV-Tools
On the Slang repo, you need to update to use the latest commit of SPIRV-Tools and SPIRV-Headers.
1. Create a branch for the update.
```
# This doc will use "update_spirv" as a branch name,
# but you can use a different name.
git checkout -b update_spirv
```
1. Synchronize and update submodules.
```
git submodule sync
git submodule update --init --recursive
```
1. Update the SPIRV-Tools submodule to the latest version.
```
git -C external/spirv-tools fetch
git -C external/spirv-tools checkout origin/main
```
## Build spirv-tools
A directory, `external/spirv-tools/generated`, holds a set of files generated from spirv-tools directory.
You need to build spirv-tools in order to generate them.
```
cd external
cd spirv-tools
python3.exe utils\git-sync-deps # this step may require you to register your ssh public key to gitlab.khronos.org
cmake.exe . -B build
cmake.exe --build build --config Release
# Go back to repository root
cd ../..
```
## Update SPIRV-Headers
1. Update the SPIRV-Headers submodule to what SPIRV-Tools uses
```
git -C external/spirv-headers fetch
git -C external/spirv-tools/external/spirv-headers log -1 --oneline
git -C external/spirv-headers checkout [commit hash from the previous command]
```
Alternatively you can get the hash value of spirv-headers with the following command,
```
grep spirv_headers_revision external/spirv-tools/DEPS
```
Note that the update of SPIRV-Headers should be done after running `python3.exe utils\git-sync-deps`, because the python script will update `external/spirv-tools/external/spirv-headers` to whichever commit the current SPIRV-Tools depends on.
## Copy the generated files from `spirv-tools/build/` to `spirv-tools-generated/`
Copy the generated header files from `external/spirv-tools/build/` to `external/spirv-tools-generated/`.
```
rm external/spirv-tools-generated/*.h
rm external/spirv-tools-generated/*.inc
cp external/spirv-tools/build/*.h external/spirv-tools-generated/
cp external/spirv-tools/build/*.inc external/spirv-tools-generated/
```
## Build Slang and run slang-test
After SPIRV submodules are updated, you need to build and test.
```
# Make sure to clean up data generated from the previous SPIRV
rm -fr build
```
There are many ways to build Slang executables. Refer to the [document](https://github.com/shader-slang/slang/blob/master/docs/building.md) for more detail.
For a quick reference, you can build with the following commands,
```
cmake.exe --preset vs2022
cmake.exe --build --preset release
```
After building Slang executables, run `slang-test` to see all tests are passing.
```
set SLANG_RUN_SPIRV_VALIDATION=1
build\Release\bin\slang-test.exe -use-test-server -server-count 8
```
It is often the case that some of tests fail, because of the changes on SPIRV-Header.
You need to properly resolve them before proceed.
## Commit and create a Pull Request
After testing is done, you need to stage and commit the updated submodule references and any generated files.
Note that when you want to use new commit IDs of the submodules, you have to stage with git-add command for the directory of the submodule itself.
```
git add external/spirv-headers
git add external/spirv-tools
git add external/spirv-tools-generated
# Add any other changes needed to resolve test failures
git commit -m "Update SPIRV-Tools and SPIRV-Headers to latest versions"
git push origin update_spirv # Use your own branch name as needed
```
Once all changes are pushed to GitHub, you can create a Pull Request on the main Slang repository.
## CI Validation
The Slang CI system includes an automated check that verifies the generated files in `external/spirv-tools-generated/` are up-to-date whenever changes are made to `external/spirv-tools` or `external/spirv-headers`.
### What the CI Check Does
When you create a Pull Request that modifies the SPIRV submodules, the CI will:
1. Detect changes to `external/spirv-tools` or `external/spirv-headers`
2. Verify that the `spirv-headers` commit matches what `spirv-tools/DEPS` expects
3. Automatically regenerate the files that should be in `external/spirv-tools-generated/`
4. Compare the regenerated files with the committed files
5. **Fail the CI** if there are any discrepancies or mismatches
This ensures that:
- The `spirv-headers` version is compatible with `spirv-tools`
- All generated files are correctly synchronized with the SPIRV-Tools version
### When Does the CI Check Run?
The check only runs when:
- A pull request modifies `external/spirv-tools/**`
- A pull request modifies `external/spirv-headers/**`
- A pull request modifies `external/spirv-tools-generated/**`
- The workflow or check script itself is modified
This path filtering prevents unnecessary builds and keeps CI fast.
### What Files Are Validated?
The check verifies that the following generated files are present and up-to-date:
- `*.inc` files (e.g., `build-version.inc`, `core_tables_body.inc`, etc.)
- `*.h` files (e.g., `DebugInfo.h`, `OpenCLDebugInfo100.h`, etc.)
The `README.md` file in `external/spirv-tools-generated/` is excluded from validation.
### If the CI Check Fails
If you see a failure from the "Check SPIRV Generated Files" job, it means the generated files are out of sync. The CI output will show:
- Which files are missing
- Which files have differences
- Which files are orphaned (should be removed)
To fix the issue, follow the instructions in the CI output, or re-run the steps in the [Copy the generated files](#copy-the-generated-files-from-spirv-toolsbuild-to-spirv-tools-generated) section above.
#### Common Failure Scenarios
**Scenario 1: Missing Files**
```
ERROR: Missing file in spirv-tools-generated: new_file.inc
```
This means a new file is now generated by SPIRV-Tools but hasn't been added to the repository.
**Fix:** Follow the copy steps in the error message to add the new file.
---
**Scenario 2: Outdated Files**
```
ERROR: File differs: build-version.inc
```
This means the file exists but its content doesn't match what SPIRV-Tools would generate.
**Fix:** Regenerate and replace the file following the instructions in the CI output.
---
**Scenario 3: Orphaned Files**
```
ERROR: Orphaned file in spirv-tools-generated: old_file.inc
```
This means a file exists in the repository but is no longer generated by SPIRV-Tools.
**Fix:** Remove the file with `git rm external/spirv-tools-generated/old_file.inc`
---
**Scenario 4: spirv-headers Commit Mismatch**
```
ERROR: spirv-headers commit mismatch!
ERROR: Expected (from spirv-tools/DEPS): 6bb105b6c4b3a246e1e6bb96366fe14c6dbfde83
ERROR: Actual (submodule): 1234567890abcdef1234567890abcdef12345678
```
This means the `spirv-headers` submodule commit doesn't match what `spirv-tools` expects in its DEPS file.
**Fix:** Update `spirv-headers` to the expected commit:
```bash
git -C external/spirv-headers fetch
git -C external/spirv-headers checkout 6bb105b6c4b3a246e1e6bb96366fe14c6dbfde83
git add external/spirv-headers
```
---
**Scenario 5: Submodule Not Updated**
If you updated generated files but forgot to update the submodule reference:
```bash
git add external/spirv-tools
git add external/spirv-tools-generated
```
### Testing Locally
You can run the same check locally before pushing to catch issues early:
```bash
bash extras/check-spirv-generated.sh
```
This script will verify that your generated files match what would be produced by building SPIRV-Tools with the current submodule version.

View File

@@ -0,0 +1,103 @@
---
layout: user-guide
permalink: /user-guide/introduction
---
Introduction
============
Welcome to the _Slang User's Guide_, an introduction to the Slang language, compiler, and API.
Why use Slang?
--------------
The Slang system helps real-time graphics developers write cleaner and more maintainable GPU code, without sacrificing run-time performance.
Slang extends the HLSL language with thoughtfully selected features from modern general-purpose languages that support improved developer productivity and code quality.
These features have been carefully implemented with an understanding of GPU performance.
Some of the benefits of Slang include:
* Slang is backwards compatible with most existing HLSL code
* _Parameter blocks_ allow shader parameters to be grouped by update rate in order to take advantage of Direct3D 12 descriptor tables and Vulkan descriptor sets, without verbose and error-prone per-parameter markup
* _Interfaces_ and _generics_ provide first-class alternatives to hacky preprocessor-based or string-pasting shader specialization. Preprocessor hacks can be replaced with a well-understood language feature already used in Rust, Swift, C#, Java, and more.
* _Automatic differentiation_ greatly simplifies the implementation of learning-based techniques in shaders. Slang supports automatically generating both forward derivative and backward derivative propagation functions from forward computation code.
* Slang supports a first-class _module_ system, which enables true separate compilation and semantic checking of shader code.
* Slang supports compute, rasterization, and ray-tracing shaders
* The same Slang compiler can generate code for DX bytecode, DXIL, SPIR-V, HLSL, GLSL, CUDA, and more
* Slang provides a robust and feature-complete reflection API, which provides binding/offset/layout information about all shader parameters in a consistent format across all the supported targets
Who is Slang for?
-----------------
Slang aims to be the best language possible for real-time graphics developers who care about code quality, portability and performance.
### Real-Time Graphics Developers
Slang is primarily intended for developers creating real-time graphics applications that run on end-user/client machines, such as 3D games and digital content creation (DCC) tools.
Slang can still provide value in other scenarios -- offline rather than real-time rendering, non-graphics GPU programming, or for applications that run on a server instead of client machines -- but the system has been designed first and foremost around the requirements of real-time graphics.
### From Hobbyists to Professionals
The Slang language is simple and familiar enough for hobbyist developers to use, but scales up to the demands of professional development teams creating next-generation game renderers.
### Developers of Multi-Platform Applications
The Slang system builds for multiple OSes, supports many graphics APIs, and works with GPUs from multiple hardware vendors.
The project is completely open-source and patches to support additional platforms are welcome.
Even for developers who only care about a single target platform or graphics API, Slang can provide a better programming experience than the default/native GPU language for that API.
### Developers with an existing investment in HLSL code
One of Slang's key features is its high degree of compatibility with existing HLSL code.
Developers who are currently responsible for large HLSL codebases but find themselves chafing at the restrictions of that language can incrementally adopt the features of Slang to improve the quality of their codebase over time.
Developers who do not have an existing investment in HLSL code, or who already have a large codebase in some other language will need to carefully consider the trade-offs in migrating to a new language (whether Slang or something else).
Who is this guide for?
----------------------
The content of this guide is written for real-time graphics programmers with a moderate or higher experience level.
It assumes the reader has previously used a real-time shading language like HLSL, GLSL, or MetalSL together with an API like Direct3D 11/12, Vulkan, or Metal.
We also assume that the reader is familiar enough with C/C++ to understand code examples and API signatures in those languages.
If you are new to programming entirely, this guide is unlikely to be helpful.
If you are an experienced programmer but have never worked in real-time graphics with GPU shaders, you may find some of the terminology or concepts from the domain confusing.
If you've only ever used OpenGL or Direct3D 11 before, some references to concepts in "modern" graphics APIs like D3D12/Vulkan/Metal may be confusing.
This effect may be particularly pronounced for OpenGL users.
It may be valuable for a user with limited experience with "modern" graphics APIs to work with both this guide and a guide to their chosen API (e.g., Direct3D 12, Vulkan, or Metal) so that concepts in each can reinforce the other.
When introducing Slang language features, this guide may make reference to languages such as Swift, Rust, C#, or Java.
Readers who almost exclusively use C/C++ may find certain features surprising or confusing, especially if they insist on equating concepts with the closest thing in C++ (assuming "generics `==` templates").
Goals and Non-Goals
-------------------
The rest of this guide introduces the services provided by the Slang system and explains how to use them to solve challenges in real-time graphics programming.
When services are introduced one after another, it may be hard to glimpse the bigger picture: why these particular services? Why these implementations? Why these APIs?
Before we dive into actually _using_ Slang, let us step back and highlight some of the key design goals (and non-goals) that motivate the design:
* **Performance**: Real-time graphics demands high performance, which motivates the use of GPUs. Whenever possible, the benefits of using Slang must not come at the cost of performance. When a choice involves a performance trade-off the *user* of the system should be able to make that choice.
* **Productivity**: Modern GPU codebases are large and growing. Productivity in a large codebase is less about _writing_ code quickly, and more about having code that is understandable, maintainable, reusable, and extensible. Language concepts like "modularity" or "separate compilation" are valuable if they foster greater developer productivity.
* **Portability**: Real-time graphics developers need to support a wide variety of hardware, graphics APIs, and operating systems. These platforms differ greatly in the level of functionality they provide. Some systems hand-wave portability concerns out of existence by enforcing a "lowest common denominator" approach and/or raising their "min spec" to exclude older or less capable platforms; our goals differ greatly. We aspire to keep our "min spec" as low as is practical (e.g., supporting Direct3D 11 and not just Direct3D 12), while also allowing each target to expose its distinguishing capabilities.
* **Ease of Adoption**: A language feature or service is worthless if nobody can use it. When possible, the system should be compatible with existing code and approaches. New language features should borrow syntax and semantics from other languages users might be familiar with. APIs and tools might need to support complicated and detailed use-cases, but should also provide conveniences and short-cuts for the most common cases.
* **Predictability**: Code should do what it appears to, consistently, across as many platforms as possible. Whenever possible the compiler should conform to programmer expectation, even in the presence of "undefined behavior." Tools and optimization passes should keep their behavior as predictable as possible; simple tools empower the user to do smart things.
* **Limited Scope**: The Slang system is a language, compiler, and module. It is not an engine, not a renderer, and not a "framework." The Slang system explicitly does *not* assume responsibility for interacting with GPU APIs to load code, allocate resources, bind parameters, or kick off work. While a user *may* use the Slang runtime library in their application, they are not *required* to do so.
The ordering here is significant, with earlier goals generally being more important than later ones.

View File

@@ -0,0 +1,108 @@
---
layout: user-guide
permalink: /user-guide/get-started
---
# Getting Started with Slang
Slang enables you to do many powerful things with shader code, including compiling shader code to many different platforms, obtaining reflection information, organizing your shader library in a modern modular fashion, controlling specialization and more. The following sections help you get started with the basics of Slang in a simple example. We will assume Windows as the operating system, but the steps performed here are similar for other platforms.
## Installation
The easiest way to start using Slang is to download a [binary release](https://github.com/shader-slang/slang/releases/) from the GitHub repository. Once you have downloaded and extracted the files from a release package, you can find the `slangc.exe` or `slangc` executable under `/bin`. In this tutorial we will use the `slangc` standalone Slang compiler included in a release package.
> #### Note: Required Dependencies ####
> For Windows, `slang-compiler.dll` and `slang-glslang.dll` must be placed in the same directory as `slangc.exe` as they are required by the standalone executable.
> #### Note: Multiple Slang Installations ####
> If you have multiple versions of Slang installed on your system (such as Slang from the Vulkan SDK), ensure that the correct dynamic libraries are being loaded. On Linux, the `LD_LIBRARY_PATH` environment variable will override the `RUNPATH` embedded in the `slangc` executable, causing it to load `libslang-compiler.so` from the path specified in `LD_LIBRARY_PATH` first. This can lead to version mismatches and unexpected behavior.
If you are interested in building from source, please refer to the [documentation on building Slang](../building.md).
## Your first Slang shader
In this section we demonstrate how to write a simple compute shader in Slang that adds numbers from two buffers and writes the results into a third buffer. To start, create a text file named `hello-world.slang` in any directory, and paste the following content in the newly created file:
```hlsl
// hello-world.slang
StructuredBuffer<float> buffer0;
StructuredBuffer<float> buffer1;
RWStructuredBuffer<float> result;
[shader("compute")]
[numthreads(1,1,1)]
void computeMain(uint3 threadId : SV_DispatchThreadID)
{
uint index = threadId.x;
result[index] = buffer0[index] + buffer1[index];
}
```
> #### Note ####
> Slang has official language extension support for both [Visual Studio](https://marketplace.visualstudio.com/items?itemName=shader-slang.slang-vs-extension) and [Visual Studio Code](https://marketplace.visualstudio.com/items?itemName=shader-slang.slang-language-extension). The extensions are powered by the Slang compiler to support a wide range of
> assisting features including auto-completion, function signature hinting, semantic highlighting and more.
As you can see, `hello-world.slang` is no different from a normal HLSL shader file. In fact, Slang is compatible with most HLSL code you would write. On top of HLSL, Slang has added many new language and compiler features that simplifies various tasks with shader code, which we will cover in future chapters. For now we will demonstrate one key feature of Slang: cross-compiling to different platforms.
Slang supports compiling shaders into many different targets including Direct3D 11, Direct3D 12, Vulkan, CUDA and C++ (for execution on CPU). You can run `slangc` with the following command line to compile `hello-world.slang` into Vulkan SPIRV:
```bat
.\slangc.exe hello-world.slang -profile glsl_450 -target spirv -o hello-world.spv -entry computeMain
```
If you would like to see the equivalent GLSL of the generated SPIRV code, simply change the `-target` argument to `glsl`:
```bat
.\slangc.exe hello-world.slang -profile glsl_450 -target glsl -o hello-world.glsl -entry computeMain
```
The resulting `hello-world.glsl` generated by `slangc` is shown below:
```glsl
// hello-world.glsl (generated by slangc)
#version 450
layout(row_major) uniform;
layout(row_major) buffer;
#line 2 0
layout(std430, binding = 0) readonly buffer _S1 {
float _data[];
} buffer0_0;
#line 3
layout(std430, binding = 1) readonly buffer _S2 {
float _data[];
} buffer1_0;
#line 4
layout(std430, binding = 2) buffer _S3 {
float _data[];
} result_0;
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
void main()
{
#line 10
uint index_0 = gl_GlobalInvocationID.x;
float _S4 = ((buffer0_0)._data[(index_0)]);
#line 11
float _S5 = ((buffer1_0)._data[(index_0)]);
#line 11
float _S6 = _S4 + _S5;
#line 11
((result_0)._data[(index_0)]) = _S6;
#line 8
return;
}
```
As you can see, things are being translated just as expected to GLSL: the HLSL `StructuredBuffer` and `RWStructuredBuffer` types are mapped to shader storage objects and the `[numthreads]` attribute are translated into proper `layout(...) in` qualifier on the `main` entry-point.
Note that in the generated GLSL code, all shader parameters are qualified with explicit binding layouts. This is because Slang provides a guarantee that all parameters will have fixed bindings regardless of shader optimization. Without generating explicit binding layout qualifiers, the downstream compiler in the driver may change the binding of a parameter depending on whether any preceding parameters are eliminated during optimization passes. In practice this causes a pain in application code, where developers will need to rely on run-time reflection to determine the binding location of a compiled shader kernel. The issue gets harder to manage when the application also needs to deal with shader specializations. Since Slang will always generate explicit binding locations in its output on all targets as if no parameters are eliminated, the user is assured that parameters always gets a deterministic binding location without having to write any manual binding qualifiers in the Slang code themselves. In fact, we strongly encourage users not to qualify their Slang code with explicit binding qualifiers and let the Slang compiler do its work to properly lay out parameters. This is best practice to maintain code modularity and avoid potential binding location conflicts between different shader modules.
## The full example
The full Vulkan example that sets up and runs the `hello-world.slang` shader is located in the [/examples/hello-world](https://github.com/shader-slang/slang/tree/master/examples/hello-world) directory of the Slang repository. The example code initializes a Vulkan context and runs the compiled SPIRV code. The example code demonstrates how to use the Slang API to load and compile shaders.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,237 @@
---
layout: user-guide
permalink: /user-guide/modules
---
Modules and Access Control
===========================
While the preprocessor `#include` is still supported, Slang provides a _module_ system for software engineering benefits such as clean expression of subcomponent boundaries and dependencies, hiding implementation details, and providing a path towards true separate compilation.
## Defining a Module
A module in Slang comprises one or more files. A module must have one and only one primary file that is used as the source-of-truth to uniquely identify the module. The primary file must start with a `module` declaration. For example, the following code defines a module named `scene`:
```
// scene.slang
module scene;
// ...
```
A module can contain more than one file. The additional files are pulled into the module with the `__include` syntax:
```
// scene.slang
module scene;
__include "scene-helpers";
```
```
// scene-helpers.slang
implementing scene;
// ...
```
The files being included into a module must start with `implementing <module-name>` declaration.
Note that the `__include` syntax here has a different meaning than the preprocessor `#include`. `__include` has the following semantics:
1. The preprocessor state at which a file inclusion does not apply to the file being included, and the preprocessor state after parsing the included file will not be visible to the outer "includer" file. For example, `#define`s before a `__include` is not visible to the included file, and `#define`s in the included file is not visible to the file that includes it.
2. A file will be included into the current module exactly once, no matter how many times a `__include` of that file is encountered.
3. Circular `__include`s are allowed, given (2).
4. All files that become part of a module via `__include` can access all other entities defined in the same module, regardless the order of `__include`s.
This means that the following code is valid:
```
// a.slang
implementing m;
void f_a() {}
// b.slang
implementing "m"; // alternate syntax.
__include a; // pulls in `a` to module `m`.
void f_b() { f_a(); }
// c.slang
implementing "m.slang"; // alternate syntax.
void f_c()
{
// OK, `c.slang` is part of module `m` because it is `__include`'d by
// `m.slang`.
f_a(); f_b();
}
// m.slang
module m;
__include m; // OK, a file including itself is allowed and has no effect.
__include "b"; // Pulls in file b (alternate syntax), and transitively pulls in file a.
__include "c.slang"; // Pulls in file c, specifying the full file name.
void test() { f_a(); f_b(); f_c(); }
```
Note that both `module`, `implementing` and `__include` support two flavors of syntax to refer to a module or a file: either via
normal identifier tokens or via string literals. For example, the following flavors are equivalent and will resolve to the same file:
```
__include dir.file_name; // `file_name` is translated to "file-name".
__include "dir/file-name.slang";
__include "dir/file-name";
```
Also note that a file is considered a part of a module only if the file can be discovered
via transitive `__include`s from the primary module file. It is possible to have a dangling
file with the `implementing` declaration that is not `__include`'d by any other files in
the module. Such dangling files will not be considered as part of the module and will not
be compiled. The `implementing` declaration is for the purpose of verification and language server code assisting, and does not carry any other semantics that affect compilation.
> #### Note ####
> When using the identifier token syntax, Slang will translate any underscores(`_`) to hyphens("-") to obtain the file name.
## Importing a Module
At the global scope of a Slang file, you can use the `import` keyword to import another module by name:
```hlsl
// MyShader.slang
import YourLibrary;
```
This `import` declaration will cause the compiler to look for a module named `YourLibrary` and make its declarations visible in the current scope. Similar to `__include`, `import` also supports both the identifier-token and the file-name string syntax.
You can only `import` a primary source file of a module. For example, given:
```
// m.slang
module m;
__include helper;
// helper.slang
implementing m;
// ...
```
It is only valid for the user code to `import m`. Attempting to `import helper` will result a compile-time error.
Multiple `import`s of the same module from different input files will only cause the module to be loaded once (there is no need for "include guards" or `#pragma once`).
Note that preprocessor definitions in the current file will not affect the compilation of `import`ed code, and the preprocessor definitions in the imported code is not visible to the current file.
## Access Control
Slang supports access control modifiers: `public`, `internal` and `private`. The module boundary plays an important role in access control.
`public` symbols are accessible everywhere: from within the different types, different files or different modules.
`private` symbols are only visible to other symbols in the same type. The following example shows the scope of `private` visibility:
```csharp
struct MyType
{
private int member;
int f() { member = 5; } // OK.
struct ChildType
{
int g(MyType t)
{
return t.member; // OK.
}
}
}
void outerFunc(MyType t)
{
t.member = 2; // Error, `member` is not visible here.
}
```
`internal` symbols are visible throughout the same module, regardless if it is referenced from the same type or same file. But they are not visible to other modules. The following example shows the scope of `internal` visibility:
```csharp
// a.slang
module a;
__include b;
public struct PS
{
internal int internalMember;
public int publicMember;
}
internal void f() { f_b(); } // OK, f_b defined in the same module.
// b.slang
implementing a;
internal void f_b(); // Defines f_b in module `a`.
public void publicFunc();
// m.slang
module m;
import a;
void main()
{
f(); // Error, f is not visible here.
publicFunc(); // OK.
PS p; // OK.
p.internalMember = 1; // Error, internalMember is not visible.
p.publicMember = 1; // OK.
}
```
`internal` is the default visibility if no other access modifiers are specified, an exception is for `interface` members, where the default visibility is the visibility of the interface.
### Additional Validation Rules
The Slang compiler enforces the following rules regarding access control:
- A more visible entity should not expose less visible entities through its signature. For example, a `public` function cannot have a return type that is `internal`.
- A member of a `struct`, `interface` and other aggregate types cannot have a higher visibility than its parent.
- If a `struct` type has visibility `Vs`, and one of its member has visibility `Vm`, and the member is used to satisfy an interface requirement that has visibility `Vr`, then `Vm` must not be lower (less visible) than `min(Vs, Vr)`.
- Type definitions themselves cannot be `private`, for example, `private struct S {}` is not valid code.
- `interface` requirements cannot be `private`.
## Organizing File Structure of Modules
Slang does not seek to impose any specific organization of modules. However, there are some conventions that have emerged as being useful.
### Module Organization Suggestions
- The top-level directory contains modules that would be `import`ed by user code.
- The implementation details of the modules are placed in files at lower levels of the tree.
This has the benefit that it is easy for a user to distinguish the public API from the implementation details.
### Module Organization Example
<img src="../assets/moduletree.png" width="300em" alt="Module organization tree diagram"/>
### Module Organization Example
The above diagram shows a module organization example.
Top-level module files such as `utils.slang` are those that are directly `import`ed by user code. The implementation details of the module are placed in the lower levels of the tree, organized into similarly named subdirectories for clarity.
Modules like `utils.slang` needn't contain anything more than a module declaration and a list of included files, with optional `import` statement(s) to pull in any external dependencies, e.g.
```
module utils;
import slangpy;
__include "utils/accumlator.slang";
__include "utils/tonemap.slang";
__include "utils/fill.slang";
```
Here, all the public symbols defined in `accumlator.slang`, `tonemap.slang`, and `fill.slang` are visible to the user of the `utils` module, and these constituent helper files do not need to clutter the top-level file hierarchy.
## Legacy Modules
Slang used to not have support for access control, and all symbols were treated as having `public` visibility. To provide compatibility with existing code, the Slang compiler will detect if the module is written in the legacy language, and treat all symbols as `public` if so.
A module is determined to be written in legacy language if all the following conditions are met:
- The module is lacking `module` declaration at the beginning.
- There is no use of `__include`.
- There is no use of any visibility modifiers -- `public`, `private` or `internal`.
The user is advised that this legacy mode is for compatibility only. This mode may be deprecated in the future, and it is strongly recommended that new code should not rely on this compiler behavior.

View File

@@ -0,0 +1,337 @@
---
layout: user-guide
permalink: /user-guide/capabilities
---
# Capabilities
One of the biggest challenges in maintaining cross-platform shader code is to manage the differences in hardware capabilities across different GPUs, graphics APIs, and shader stages.
Each graphics API or shader stage may expose operations that are not available on other platforms. Instead of restricting Slang's features to the lowest common denominator of different platforms,
Slang exposes operations from all target platforms to allow the user to take maximum advantage on a specific target.
A consequence of this approach is that the user is now responsible for maintaining compatibility of their code. For example, if the user writes code that uses a Vulkan extension currently not
available on D3D/HLSL, they will get an error when attempting to compile that code to D3D.
To help the user to maintain compatibility of their shader code on platforms that matter to their applications, Slang's type system can now infer and enforce capability requirements
to provide assurance that the shader code will be compatible with the specific set of platforms before compiling for that platform.
For example, `Texture2D.SampleCmp` is available on D3D and Vulkan, but not available on CUDA. If the user is intended to write cross-platform code that targets CUDA, they will
receive a type-checking error when attempting to use `SampleCmp` before the code generation stage of compilation. When using Slang's intellisense plugin, the programmer should
get a diagnostic message directly in their code editor.
As another example, `discard` is a statement that is only meaningful when used in fragment shaders. If a vertex shader contains a `discard` statement or calling a function that contains
a `discard` statement, it shall be a type-check error.
## Capability Atoms and Capability Requirements
Slang models code generation targets, shader stages, API extensions and hardware features as distinct capability atoms. For example, `GLSL_460` is a capability atom that stands for the GLSL 460 code generation target,
`compute` is an atom that represents the compute shader stage, `_sm_6_7` is an atom representing the shader model 6.7 feature set in D3D, `SPV_KHR_ray_tracing` is an atom representing the `SPV_KHR_ray_tracing` SPIR-V extension, and `spvShaderClockKHR` is an atom for the `ShaderClockKHR` SPIRV capability. For a complete list of capabilities supported by the Slang compiler, check the [capability definition file](https://github.com/shader-slang/slang/blob/master/source/slang/slang-capabilities.capdef).
A capability **requirement** can be a single capability atom, a conjunction of capability atoms, or a disjunction of conjunction of capability atoms. A function can declare its
capability requirement with the following syntax:
```csharp
[require(spvShaderClockKHR)]
[require(glsl, GL_EXT_shader_realtime_clock)]
[require(hlsl_nvapi)]
uint2 getClock() {...}
```
Each `[require]` attribute declares a conjunction of capability atoms, and all `[require]` attributes form the final requirement of the `getClock()` function as a disjunction of capabilities:
```
(spvShaderClockKHR | glsl + GL_EXT_shader_realtime_clock | hlsl_nvapi)
```
A capability can __imply__ other capabilities. Here `spvShaderClockKHR` is a capability that implies `SPV_KHR_shader_clock`, which represents the SPIRV `SPV_KHR_shader_clock` extension, and the `SPV_KHR_shader_clock` capability implies `spirv_1_0`, which stands for the spirv code generation target.
When evaluating capability requirements, Slang will expand all implications. Therefore the final capability requirement for `getClock` is:
```
spirv_1_0 + SPV_KHR_shader_clock + spvShaderClockKHR
| glsl + _GL_EXT_shader_realtime_clock
| hlsl + hlsl_nvapi
```
Which means the function can be called from locations where the `spvShaderClockKHR` capability is available (when targeting SPIRV), or where the `GL_EXT_shader_realtime_clock` extension is available when targeting GLSL,
or where `nvapi` is available when targeting HLSL.
## Conflicting Capabilities
Certain groups of capabilities are mutually exclusive such that only one capability in the group is allowed to exist. For example, all stage capabilities are mutual exclusive: a requirement for both `fragment` and `vertex` is impossible to satisfy. Currently, capabilities that model different code generation targets (e.g. `hlsl`, `glsl`) or different shader stages (`vertex`, `fragment`, etc.) are mutually exclusive within
their corresponding group.
If two capability requirements contain different atoms that are conflicting with each other, these two requirements are considered __incompatible__.
For example, requirement `spvShaderClockKHR + fragment` and requirement `spvShaderClockKHR + vertex` are incompatible, because `fragment` conflicts with `vertex`.
## Capabilities Between Parent and Members
The capability requirement of a member is always merged with the requirements declared in its parent(s). If the member declares requirements for additional compilation targets, they are added to the requirement set as a separate disjunction.
For example, given:
```csharp
[require(glsl)]
[require(hlsl)]
struct MyType
{
[require(hlsl, hlsl_nvapi)]
[require(spirv)]
static void method() { ... }
}
```
`MyType.method` will have requirement `glsl | hlsl + hlsl_nvapi | spirv`.
The `[require]` attribute can also be used on module declarations, so that the requirement will
apply to all members within the module. For example:
```csharp
[require(glsl)]
[require(hlsl)]
[require(spirv)]
module myModule;
// myFunc has requirement glsl|hlsl|spirv
public void myFunc()
{
}
```
## Capabilities Between Subtype and Supertype
For inheritance/implementing-interfaces the story is a bit different.
We require that the subtype (`Foo1`) have a subset of capabilities to the supertype (`IFoo1`).
For example:
```csharp
[require(sm_4_0)]
interface IFoo1
{
}
[require(sm_6_0)]
struct Foo1 : IFoo1
{
}
```
We error here since `Foo1` is not a subset to `IFoo1`. `Foo1` has `sm_6_0`, which includes capabilities `sm_4_0` does not have.
```csharp
[require(sm_6_0)]
interface IFoo2
{
}
[require(sm_4_0)]
interface IFoo1
{
}
[require(sm_4_0)]
struct Foo1 : IFoo1, IFoo2
{
}
```
We do not error here since `IFoo2` and `IFoo1` are supersets to `Foo1`.
Additionally, any supertype to subtype relationship must share the same shader stage and shader target support.
```csharp
// Error, Foo1 is missing `spirv`
[require(hlsl)]
[require(spirv)]
interface IFoo1
{
}
[require(hlsl)]
struct Foo1 : IFoo1
{
}
// Error, IFoo1 is missing `hlsl`
[require(hlsl)]
interface IFoo1
{
}
[require(hlsl)]
[require(spirv)]
struct Foo1 : IFoo1
{
}
```
## Capabilities Between Requirement and Implementation
We require that all requirement capabilities are supersets of their implementation (only required if capabilities are explicitly annotated).
```csharp
public interface IAtomicAddable_Pass
{
public static void atomicAdd(RWByteAddressBuffer buf, uint addr, This value);
}
public extension int64_t : IAtomicAddable_Pass
{
public static void atomicAdd(RWByteAddressBuffer buf, uint addr, int64_t value) { buf.InterlockedAddI64(addr, value); }
}
public interface IAtomicAddable_Error
{
[require(glsl, sm_4_0)]
public static void atomicAdd(RWByteAddressBuffer buf, uint addr, This value);
}
public extension uint : IAtomicAddable_Error
{
// Error: implementation has superset of capabilites, sm_6_0 vs. sm_4_0
// Note: sm_6_0 is inferred from `InterlockedAddI64`
public static void atomicAdd(RWByteAddressBuffer buf, uint addr, int64_t value) { buf.InterlockedAddI64(addr, value); }
}
```
Requirment and implementation must also share the same shader stage and shader target support.
```csharp
public interface IAtomicAddable_Error
{
[require(glsl)]
[require(hlsl)]
public static void atomicAdd(RWByteAddressBuffer buf, uint addr, This value);
}
public extension uint : IAtomicAddable_Error
{
[require(glsl)] // Error, missing `hlsl`
public static void atomicAdd(RWByteAddressBuffer buf, uint addr, int64_t value) { buf.InterlockedAddI64(addr, value); }
}
public interface IAtomicAddable_Error
{
[require(glsl)]
public static void atomicAdd(RWByteAddressBuffer buf, uint addr, This value);
}
public extension uint : IAtomicAddable_Error
{
[require(glsl)]
[require(hlsl)] // Error, has additional capability `hlsl`
public static void atomicAdd(RWByteAddressBuffer buf, uint addr, int64_t value) { buf.InterlockedAddI64(addr, value); }
}
```
## Capabilities of Functions
### Inference of Capability Requirements
By default, Slang will infer the capability requirements of a function given its definition, as long as the function has `internal` or `private` visibility. For example, given:
```csharp
void myFunc()
{
if (getClock().x % 1000 == 0)
discard;
}
```
Slang will automatically deduce that `myFunc` has capability
```
spirv_1_0 + SPV_KHR_shader_clock + spvShaderClockKHR + fragment
| glsl + _GL_EXT_shader_realtime_clock + fragment
| hlsl + hlsl_nvapi + fragment
```
Since `discard` statement requires capability `fragment`.
### Inference on target_switch
A `__target_switch` statement will introduce disjunctions in its inferred capability requirement. For example:
```csharp
void myFunc()
{
__target_switch
{
case spirv: ...;
case hlsl: ...;
}
}
```
The capability requirement of `myFunc` is `(spirv | hlsl)`, meaning that the function can be called from a context where either `spirv` or `hlsl` capability
is available.
### Capability Incompatabilities
The function declaration must be a superset of the capabilities the function body uses **for any shader stage/target the function declaration implicitly/explicitly requires**.
```csharp
[require(sm_5_0)]
public void requires_sm_5_0()
{
}
[require(sm_4_0)]
public void logic_sm_5_0_error() // Error, missing `sm_5_0` support
{
requires_sm_5_0();
}
public void logic_sm_5_0__pass() // Pass, no requirements
{
requires_sm_5_0();
}
[require(hlsl, vertex)]
public void logic_vertex()
{
}
[require(hlsl, fragment)]
public void logic_fragment()
{
}
[require(hlsl, vertex, fragment)]
public void logic_stage_pass_1() // Pass, `vertex` and `fragment` supported
{
__stage_switch
{
case vertex:
logic_vertex();
case fragment:
logic_fragment();
}
}
[require(hlsl, vertex, fragment, mesh, hull, domain)]
public void logic_many_stages()
{
}
[require(hlsl, vertex, fragment)]
public void logic_stage_pass_2() // Pass, function only requires that the body implements the stages `vertex` & `fragment`, the rest are irelevant
{
logic_many_stages();
}
[require(hlsl, any_hit)]
public void logic_stage_fail_1() // Error, function requires `any_hit`, body does not support `any_hit`
{
logic_many_stages();
}
```
## Capability Aliases
To make it easy to specify capabilities on different platforms, Slang also defines many aliases that can be used in `[require]` attributes.
For example, Slang declares in `slang-capabilities.capdef`:
```
alias sm_6_6 = _sm_6_6
| glsl_spirv_1_5 + sm_6_5
+ GL_EXT_shader_atomic_int64 + atomicfloat2
| spirv_1_5 + sm_6_5
+ GL_EXT_shader_atomic_int64 + atomicfloat2
+ SPV_EXT_descriptor_indexing
| cuda
| cpp;
```
So user code can write `[require(sm_6_6)]` to mean that the function requires shader model 6.6 on D3D or equivalent set of GLSL/SPIRV extensions when targeting GLSL or SPIRV.
Note that in the above definition, `GL_EXT_shader_atomic_int64` is also an alias that is defined as:
```
alias GL_EXT_shader_atomic_int64 = _GL_EXT_shader_atomic_int64 | spvInt64Atomics;
```
Where `_GL_EXT_shader_atomic_int64` is the atom that represent the true `GL_EXT_shader_atomic_int64` GLSL extension.
The `GL_EXT_shader_atomic_int64` alias is defined as a disjunction of `_GL_EXT_shader_atomic_int64` and the `Int64Atomics` SPIRV capability so that
it can be used in both the contexts of GLSL and SPIRV target.
When aliases are used in a `[require]` attribute, the compiler will expand the alias to evaluate the capability set, and remove all incompatible conjunctions.
For example, `[require(hlsl, sm_6_6)]` will be evaluated to `(hlsl+_sm_6_6)` because all other conjunctions in `sm_6_6` are incompatible with `hlsl`.
## Validation of Capability Requirements
Slang requires all public methods and interface methods to have explicit capability requirements declarations. Omitting capability declaration on a public method means that the method does not require any
specific capability. Functions with explicit requirement declarations will be verified by the compiler to ensure that it does not use any capability beyond what is declared.
Slang recommends but does not require explicit declaration of capability requirements for entrypoints. If explicit capability requirements are declared on an entrypoint, they will be used to validate the entrypoint the same way as other public methods, providing assurance that the function will work on all intended targets. If an entrypoint does not define explicit capability requirements, Slang will infer the requirements, and only issue a compiler error when the inferred capability is incompatible with the current code generation target.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,9 @@
<html>
<head>
<meta http-equiv="refresh" content="0; url=https://shader-slang.com/slang/user-guide/autodiff" />
</head>
<body>
<p>This page has been relocated. <a href="https://shader-slang.com/slang/user-guide/autodiff">Click here for the new page.</a></p>
</body>
</html>

View File

@@ -0,0 +1,864 @@
---
layout: user-guide
permalink: /user-guide/autodiff
---
# Automatic Differentiation
To support differentiable graphics systems such as Gaussian splatters, neural radiance fields, differentiable path tracers, and more,
Slang provides first class support for differentiable programming.
An overview:
- Slang supports the `fwd_diff` and `bwd_diff` operators that can generate the forward and backward-mode derivative propagation functions for any valid Slang function annotated with the `[Differentiable]` attribute.
- The `DifferentialPair<T>` built-in generic type is used to pass derivatives associated with each function input.
- The `IDifferentiable`, and the experimental `IDifferentiablePtrType`, interfaces denote differentiable value and pointer types respectively, and allow finer control over how types behave under differentiation.
- Further, Slang allows for user-defined derivative functions through the `[ForwardDerivative(custom_fn)]` and `[BackwardDerivative(custom_fn)]`
- All Slang features, such as control-flow, generics, interfaces, extensions, and more are compatible with automatic differentiation, though the bottom of this chapter documents some sharp edges & known issues.
## Auto-diff operations `fwd_diff` and `bwd_diff`
In Slang, `fwd_diff` and `bwd_diff` are higher-order functions used to transform Slang functions into their forward or backward derivative methods. To better understand what these methods do, here is a small refresher on differentiable calculus:
### Mathematical overview: Jacobian and its vector products
Forward and backward derivative methods are two different ways of computing a dot product with the Jacobian of a given function.
Parts of this overview are based on JAX's excellent auto-diff cookbook [here](https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html#how-it-s-made-two-foundational-autodiff-functions). The relevant [wikipedia article](https://en.wikipedia.org/wiki/Automatic_differentiation) is also a great resource for understanding auto-diff.
The [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant) (also called the total derivative) of a function $$\mathbf{f}(\mathbf{x})$$ is represented by $$D\mathbf{f}(\mathbf{x})$$.
For a general function with multiple scalar inputs and multiple scalar outputs, the Jacobian is a _matrix_ where $$D\mathbf{f}_{ij}$$ represents the [partial derivative](https://en.wikipedia.org/wiki/Partial_derivative) of the $$i^{th}$$ output element w.r.t the $$j^{th}$$ input element $$\frac{\partial f_i}{\partial x_j}$$
As an example, consider a polynomial function
$$ f(x, y) = x^3 + x^2 - y $$
Here, $$f$$ has 1 output and 2 inputs. $$Df$$ is therefore the row matrix:
$$ Df(x, y) = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}] = [3x^2 + 2x, -1] $$
Another, more complex example with a function that has multiple outputs (for clarity, denoted by $$f_1$$, $$f_2$$, etc..)
$$ \mathbf{f}(x, y) = \begin{bmatrix} f_0(x, y) & f_1(x, y) & f_2(x, y) \end{bmatrix} = \begin{bmatrix} x^3 & y^2x & y^3 \end{bmatrix} $$
Here, $$D\mathbf{f}$$ is a 3x2 matrix with each element containing a partial derivative:
$$ D\mathbf{f}(x, y) = \begin{bmatrix}
\partial f_0 / \partial x & \partial f_0 / \partial y \\
\partial f_1 / \partial x & \partial f_1 / \partial y \\
\partial f_2 / \partial x & \partial f_2 / \partial y
\end{bmatrix} =
\begin{bmatrix}
3x^2 & 0 \\
y^2 & 2yx \\
0 & 3y^2
\end{bmatrix} $$
Computing full Jacobians is often unnecessary and expensive. Instead, auto-diff offers ways to compute _products_ of the Jacobian with a vector, which is a much faster operation.
There are two basic ways to compute this product:
1. the Jacobian-vector product $$ \langle D\mathbf{f}(\mathbf{x}), \mathbf{v} \rangle $$, also called forward-mode autodiff, and can be computed using `fwd_diff` operator in Slang, and
2. the vector-Jacobian product $$ \langle \mathbf{v}^T, D\mathbf{f}(\mathbf{x}) \rangle $$, also called reverse-mode autodiff, and can be computed using `bwd_diff` operator in Slang. From a linear algebra perspective, this is the transpose of the forward-mode operator.
#### Propagating derivatives with forward-mode auto-diff
The products described above allow the _propagation_ of derivatives forward and backward through the function $$f$$
The forward-mode derivative (Jacobian-vector product) can convert a derivative of the inputs to a derivative of the outputs.
For example, let's say inputs $$\mathbf{x}$$ depend on some scalar $$\theta$$, and $$\frac{\partial \mathbf{x}}{\partial \theta}$$ is a vector of partial derivatives describing that dependency.
Invoking forward-mode auto-diff with $$\mathbf{v} = \frac{\partial \mathbf{x}}{\partial \theta}$$ converts this into a derivative of the outputs w.r.t the same scalar $$\theta$$.
This can be verified by expanding the Jacobian and applying the [chain rule](https://en.wikipedia.org/wiki/Chain_rule) of derivatives:
$$\langle D\mathbf{f}(\mathbf{x}), \frac{\partial \mathbf{x}}{\partial \theta} \rangle = \langle \begin{bmatrix} \frac{\partial f_0}{\partial x_0} & \frac{\partial f_0}{\partial x_1} & \cdots \\ \frac{\partial f_1}{\partial x_0} & \frac{\partial f_1}{\partial x_1} & \cdots \\ \cdots & \cdots & \cdots \end{bmatrix}, \begin{bmatrix} \frac{\partial x_0}{\partial \theta} \\ \frac{\partial x_1}{\partial \theta} \\ \cdots \end{bmatrix} \rangle = \begin{bmatrix} \frac{\partial f_0}{\partial \theta} \\ \frac{\partial f_1}{\partial \theta} \\ \cdots \end{bmatrix} = \frac{\partial \mathbf{f}}{\partial \theta}$$
#### Propagating derivatives with reverse-mode auto-diff
The reverse-mode derivative (vector-Jacobian product) can convert a derivative w.r.t outputs into a derivative w.r.t inputs.
For example, let's say we have some scalar $$\mathcal{L}$$ that depends on the outputs $$\mathbf{f}$$, and $$\frac{\partial \mathcal{L}}{\partial \mathbf{f}}$$ is a vector of partial derivatives describing that dependency.
Invoking forward-mode auto-diff with $$\mathbf{v} = \frac{\partial \mathcal{L}}{\partial \mathbf{f}}$$ converts this into a derivative of the same scalar $$\mathcal{L}$$ w.r.t the inputs $$\mathbf{x}$$.
To provide more intuition for this, we can expand the Jacobian in a same way we did above:
$$\langle \frac{\partial \mathcal{L}}{\partial \mathbf{f}}^T, D\mathbf{f}(\mathbf{x}) \rangle = \langle \begin{bmatrix}\frac{\partial \mathcal{L}}{\partial f_0} & \frac{\partial \mathcal{L}}{\partial f_1} & \cdots \end{bmatrix}, \begin{bmatrix} \frac{\partial f_0}{\partial x_0} & \frac{\partial f_0}{\partial x_1} & \cdots \\ \frac{\partial f_1}{\partial x_0} & \frac{\partial f_1}{\partial x_1} & \cdots \\ \cdots & \cdots & \cdots \end{bmatrix} \rangle = \begin{bmatrix} \frac{\partial \mathcal{L}}{\partial x_0} & \frac{\partial \mathcal{L}}{\partial x_1} & \cdots \end{bmatrix} = \frac{\partial \mathcal{L}}{\partial \mathbf{x}}^T$$
This mode is the most popular, since machine learning systems often construct their differentiable pipeline with multiple inputs (which can number in the millions or billions), and a single scalar output often referred to as the 'loss' denoted by $$\mathcal{L}$$. The desired derivative can be constructed with a single reverse-mode invocation.
### Invoking auto-diff in Slang
With the mathematical foundations established, we can describe concretely how to compute derivatives using Slang.
In Slang derivatives are computed using `fwd_diff`/`bwd_diff` which each correspond to Jacobian-vector and vector-Jacobian products.
For forward-diff, to pass the vector $$\mathbf{v}$$ and receive the outputs, we use the `DifferentialPair<T>` type. We use pairs of inputs because every input element $$x_i$$ has a corresponding element $$v_i$$ in the vector, and each original output element has a corresponding output element in the product.
Example of `fwd_diff`:
```csharp
[Differentiable] // Auto-diff requires that functions are marked differentiable
float2 foo(float a, float b)
{
return float2(a * b * b, a * a);
}
void main()
{
DifferentialPair<float> dp_a = diffPair(
1.0, // input 'a'
1.0 // vector 'v' for vector-Jacobian product input (for 'a')
);
DifferentialPair<float> dp_b = diffPair(2.4, 0.0);
// fwd_diff to compute output and d_output w.r.t 'a'.
// Our output is also a differential pair.
//
DifferentialPair<float2> dp_output = fwd_diff(foo)(dp_a, dp_b);
// Extract output's primal part, which is just the standard output when foo is called normally.
// Can also use `.getPrimal()`
//
float2 output_p = dp_output.p;
// Extract output's derivative part. Can also use `.getDifferential()`
float2 output_d = dp_output.d;
printf("foo(1.0, 2.4) = (%f %f)\n", output_p.x, output_p.y);
printf("d(foo)/d(a) at (1.0, 2.4) = (%f, %f)\n", output_d.x, output_d.y);
}
```
Note that all the inputs and outputs to our function become 'paired'. This only applies to differentiable types, such as `float`, `float2`, etc. See the section on differentiable types for more info.
`diffPair<T>(primal_val, diff_val)` is a built-in utility function that constructs the pair from the primal and differential values.
Additionally, invoking forward-mode also computes the regular (or 'primal') output value (can be obtained from `output.getPrimal()` or `output.p`). The same is _not_ true for reverse-mode.
For reverse-mode, the example proceeds in a similar way, and we still use `DifferentialPair<T>` type. However, note that each input gets a corresponding _output_ and each output gets a corresponding _input_. Thus, all inputs become `inout` differential pairs, to allow the function to write into the derivative part (the primal part is still accepted as an input in the same pair data-structure).
The one extra rule is that the derivative corresponding to the return value of the function is accepted as the last argument (an extra input). This value does not need to be a pair.
Example:
```csharp
[Differentiable] // Auto-diff requires that functions are marked differentiable
float2 foo(float a, float b)
{
return float2(a * b * b, a * a);
}
void main()
{
DifferentialPair<float> dp_a = diffPair(
1.0 // input 'a'
); // Calling diffPair without a derivative part initializes to 0.
DifferentialPair<float> dp_b = diffPair(2.4);
// Derivatives of scalar L w.r.t output.
float2 dL_doutput = float2(1.0, 0.0);
// bwd_diff to compute dL_da and dL_db
// The derivative of the output is provided as an additional _input_ to the call
// Derivatives w.r.t inputs are written into dp_a.d and dp_b.d
//
bwd_diff(foo)(dp_a, dp_b, dL_doutput);
// Extract the derivatives of L w.r.t input
float dL_da = dp_a.d;
float dL_db = dp_b.d;
printf("If dL/dOutput = (1.0, 0.0), then (dL/da, dL/db) at (1.0, 2.4) = (%f, %f)", dL_da, dL_db);
}
```
## Differentiable Type System
Slang will only generate differentiation code for values that has a *differentiable* type.
Differentiable types are defining through conformance to one of two built-in interfaces:
1. `IDifferentiable`: For value types (e.g. `float`, structs of value types, etc..)
2. `IDifferentiablePtrType`: For buffer, pointer & reference types that represent locations rather than values.
### Differentiable Value Types
All basic types (`float`, `int`, `double`, etc..) and all aggregate types (i.e. `struct`) that use any combination of these are considered value types in Slang.
Slang uses the `IDifferentiable` interface to define differentiable types. Basic types that describe a continuous value (`float`, `double` and `half`) and their vector/matrix versions (`float3`, `half2x2`, etc..) are defined as differentiable by the standard library. For all basic types, the type used for the differential (can be obtained with `T.Differential`) is the same as the primal.
#### Builtin Differentiable Value Types
The following built-in types are differentiable:
- Scalars: `float`, `double` and `half`.
- Vector/Matrix: `vector` and `matrix` of `float`, `double` and `half` types.
- Arrays: `T[n]` is differentiable if `T` is differentiable.
- Tuples: `Tuple<each T>` is differentiable if `T` is differentiable.
#### User-defined Differentiable Value Types
However, it is easy to define your own differentiable types.
Typically, all you need is to implement the `IDifferentiable` interface.
```csharp
struct MyType : IDifferentiable
{
float x;
float y;
};
```
The main requirement of a type implementing `IDifferentiable` is the `Differential` associated type that the compiler uses to carry the corresponding derivative.
In most cases the `Differential` of a type can be itself, though it can be different if necessary.
You can access the differential of any differentiable type through `Type.Differential`
Example:
```csharp
MyType obj;
obj.x = 1.f;
MyType.Differential d_obj;
// Differentiable fields will have a corresponding field in the diff type
d_obj.x = 1.f;
```
Slang can automatically derive the `Differential` type in the majority of cases.
For instance, for `MyType`, Slang can infer the differential trivially:
```csharp
struct MyType : IDifferentiable
{
// Automatically inserted by Slang from the fact that
// MyType has 2 floats which are both differentiable
//
typealias Differential = MyType;
// ...
}
```
For more complex types that aren't fully differentiable, a new type is synthesized automatically:
```csharp
struct MyPartialDiffType : IDifferentiable
{
// Automatically inserted by Slang based on which fields are differentiable.
typealias MyPartialDiffType = syn_MyPartialDiffType_Differential;
float x;
uint y;
};
// Synthesized
struct syn_MyPartialDiffType_Differential
{
// Only one field since 'y' does not conform to IDifferentiable
float x;
};
```
You can make existing types differentiable through Slang's extension mechanism.
For instance, `extension MyType : IDifferentiable { }` will make `MyType` differentiable retroactively.
See the `IDifferentiable` [reference documentation](https://shader-slang.org/stdlib-reference/interfaces/idifferentiable-01/index) for more information on how to override the default behavior.
#### DifferentialPair<T>: Pairs of differentiable value types
The `DifferentialPair<T>` type is used to pass derivatives to a derivative call by representing a pair of values of type `T` and `T.Differential`. Note that `T` must conform to `IDifferentiable`.
`DifferentialPair<T>` can either be created via constructor calls or the `diffPair` utility method.
Example:
```csharp
MyType obj = {1.f, 2.f};
MyType.Differential d_obj = {0.4f, 3.f};
// The differential part of a differentiable-pair is of the diff type.
DifferentialPair<MyType> dp_obj = diffPair(obj, d_obj);
// Use .p to extract the primal part
MyType new_p_obj = dp_obj.p;
// Use .d to extract the differential part
MyType.Differential new_d_obj = dp_obj.d;
```
### Differentiable Ptr types
Pointer types are any type that represents a location or reference to a value rather than the value itself.
Examples include resource types (`RWStructuredBuffer`, `Texture2D`), pointer types (`Ptr<float>`) and references.
The `IDifferentiablePtrType` interface can be used to denote types that need to transform into pairs during auto-diff. However, unlike
an `IDifferentiable` type whose derivative portion is an _output_ under `bwd_diff`, the derivative part of `IDifferentiablePtrType` remains an input. This is because only the value is returned as an output, while the location where it needs to be written to, is still effectively an input to the derivative methods.
> #### Note ####
> Support for `IDifferentiablePtrType` is still experimental. There are no built-in types conforming to this interface, though we plan to add stdlib support in the near future.
`IDifferentiablePtrType` only requires a `Differential` associated type to be specified.
#### DifferentialPtrPair<T>: Pairs of differentiable ptr types
For types conforming to `IDifferentiablePtrType`, the corresponding pair to use for passing the derivative counterpart is `DifferentialPtrPair<T>`, which represents a pair of `T` and `T.Differential`. Objects of this type can be created using a constructor.
#### Example of defining and using an `IDifferentiablePtrType` object.
Here is an example of create a differentiable buffer pointer type, and using it within a differentiable function.
You can find an interactive sample on the Slang playground [here](https://shader-slang.org/slang-playground/?target=WGSL&code=eJy1VF1v2kAQfPevWEWKYhfkmFdMkBrRSpHKhyBSpdIIHfgcTjFn9z4gEeK_d-_ONsZp1L6UF8MxOzszuz62K3KhoMjI27PINU9iTyqhNwrGb_c6TamY5YwrKqAPDyNmDihXjKwzOlPi8a2g3tED_NzewuOWQnKGZFAQpGYSCM_VFikYl4rwDYU8bdOHlkQhH8kYkTBq8ty10bFn4fPvC6tVC5q4_wdplhM1hLVOYwvRiMd2qaQq9k5Yhzq_Mf4CBDZaqnwHCRVsTxTbU295TzYvByKSUX3mI1-yWh-S4Mmz3GAO_HY4Rdd1Yjyhr0EZiaCojEMRopplEToV0HGgJ5Rj1UxyRUFtkRkzgnWpoCELJHvmxJg0WUrFsgwThRvGb9pxM8xxn7MEKtV-M0cc2Awhg5b44aX6LjifyVSryknbrmm7KpTAyRRh4pKuzqzb-kfLNHTuLLE1v7zcpypgqXfTdPFLE0HlIMPiCe4e9h2-T73SVxbmEgVFYVqux3JMXh8QJ_0JTs_icgG-s2qQMT4GMMFHpxNYgKM7U-5JtjJQO3SMiQVxjTDt0I6DfHJP9-_Ja84fcc7u-SXr9-efJyO_F6Guj5eY8UIrrL2s_PFlPl38rdRujym121AItDwmjPsfDdTN8ug6diE6OSO20L_6yoRU0IuMR01lH37yqzKIPybaixqRNniukz5cp1iMQXbLTJUwqQblyErgQu_MHSHdEpivaVtCyXOxLL1oaAhrtndra1Ip9_boInJeqxtsRiTeVvZFMk0LV4dH0g3D3VL4Xq3Mgvvt5oFfO_6X986Zr0UF3bq6F0Zlvs1UzreSEYc9dabgEIrQXR3_ZT61unJKp98JDfhi).
```csharp
struct MyBufferPointer : IDifferentiablePtrType
{
// The differential part is another instance of MyBufferPointer.
typealias Differential = MyBufferPointer;
RWStructuredBuffer<float> buf;
uint offset;
};
// Link a custom derivative
[BackwardDerivative(load_bwd)]
float load(MyBufferPointer p, uint index)
{
return p.buf[p.offset + index];
}
// Note that the backward derivative signature is still an 'in' differential pair.
void load_bwd(DifferentialPtrPair<MyBufferPointer> p, uint index, float dOut)
{
MyBufferPointer diff_ptr = p.d;
diff_ptr.buf[diff_ptr.offset + index] += dOut;
}
[Differentiable]
float sumOfSquares<let N : int>(MyBufferPointer p)
{
float sos = 0.f;
[MaxIters(N)]
for (uint i = 0; i < N; i++)
{
float val_i = load(p, i);
sos += val_i * val_i;
}
return sos;
}
RWStructuredBuffer<float> inputs;
RWStructuredBuffer<float> derivs;
void main()
{
MyBufferPointer ptr = {inputs, 0};
print("Sum of squares of first 10 values: ", sumOfSquares<10>(ptr));
MyBufferPointer deriv_ptr = {derivs, 0};
// Pass a pair of pointers as input.
bwd_diff(sumOfSquares<10>)(
DifferentialPtrPair<MyBufferPointer>(ptr, deriv_ptr),
1.0);
print("Derivative of result w.r.t the 10 values: \n");
for (uint i = 0; i < 10; i++)
print("%d: %f\n", i, load(deriv_ptr, i));
}
```
## User-Defined Derivative Functions
As an alternative to compiler-generated derivatives, you can choose to provide an implementation for the derivative, which the compiler will use instead of attempting to generate one.
This can be performed on a per-function basis by using the decorators `[ForwardDerivative(fwd_deriv_func)]` and `[BackwardDerivative(bwd_deriv_func)]` to reference the derivative from the primal function.
For instance, it often makes little sense to differentiate the body of a `sin(x)` implementation, when we know that the derivative is `cos(x) * dx`. In Slang, this can be represented in the following way:
```csharp
DifferentialPair<float> sin_fwd(DifferentialPair<float> dpx)
{
float x = dpx.p;
float dx = dpx.d;
return DifferentialPair<float>(dpx.p, cos(x) * dx);
}
// sin() is now considered differentiable (atleast for forward-mode) since it provides
// a derivative implementation.
//
[ForwardDerivative(sin_fwd)]
float sin(float x)
{
// Calc sin(X) using Taylor series..
}
// Any uses of sin() in a `[Differentiable]` will automaticaly use the sin_fwd implementation when differentiated.
```
A similar example for a backward derivative.
```csharp
void sin_bwd(inout DifferentialPair<float> dpx, float dresult)
{
float x = dpx.p;
// Write-back the derivative to each input (the primal part must be copied over as-is)
dpx = DifferentialPair<float>(x, cos(x) * dresult);
}
[BackwardDerivative(sin_bwd)]
float sin(float x)
{
// Calc sin(X) using Taylor series..
}
```
> Note that the signature of the provided forward or backward derivative function must match the expected signature from invoking `fwd_diff(fn)`/`bwd_diff(fn)`
> For a full list of signature rules, see the reference section for the [auto-diff operators](#fwd_difff--slang_function---slang_function).
### Back-referencing User Derivative Attributes.
Sometimes, the original function's definition might be inaccessible, so it can be tricky to add an attribute to create the association.
For such cases, Slang provides the `[ForwardDerivativeOf(primal_fn)]` and `[BackwardDerivativeOf(primal_fn)]` attributes that can be used
on the derivative function and contain a reference to the function for which they are providing a derivative implementation.
As long as both the derivative function is in scope, the primal function will be considered differentiable.
Example:
```csharp
// Module A
float sin(float x) { /* ... */ }
// Module B
import A;
[BackwardDerivativeOf(sin)] // Add a derivative implementation for sin() in module A.
void sin_bwd(inout DifferentialPair<float> dpx, float dresult) { /* ... */ }
```
User-defined derivatives also work for generic functions, member functions, accessors, and more.
See the reference section for the [`[ForwardDerivative(fn)]`](https://shader-slang.org/stdlib-reference/attributes/forwardderivative-07.html) and [`[BackwardDerivative(fn)]`](https://shader-slang.org/stdlib-reference/attributes/backwardderivative-08) attributes for more.
## Using Auto-diff with Generics
Automatic differentiation works seamlessly with generically-defined types and methods.
For generic methods, differentiability of a type is defined either through an explicit `IDifferentiable` constraint or any other
interface that extends `IDifferentiable`.
Example for generic methods:
```csharp
[Differentiable]
T calcFoo<T : IDifferentiable>(T x) { /* ... */ }
[Differentiable]
T calcBar<T : __BuiltinFloatingPointType>(T x) { /* ... */ }
[Differentiable]
void main()
{
DifferentialPair<float4> dpa = /* ... */;
// Can call with any type that is IDifferentiable. Generic parameters
// are inferred like any other call.
//
bwd_diff(calcFoo)(dpa, float4(1.f));
// But you can also be explicit with < >
bwd_diff(calcFoo<float4>)(dpa, float4(1.f));
// x is differentiable for calcBar because
// __BuiltinFloatingPointType : IDifferentiable
//
DifferentialPair<double> dpb = /* .. */;
bwd_diff(calcBar)(dpb, 1.0);
}
```
You can implement `IDifferentiable` on a generic type. Automatic synthesis still applies and will use
generic constraints to resolve whether a field is differentiable or not.
```csharp
struct Foo<T : IDifferentiable, U> : IDifferentiable
{
T t;
U u;
};
// The synthesized Foo<T, U>.Differential will contain a field for
// 't' but not 'U'
//
```
## Using Auto-diff with Interface Requirements and Interface Types
For interface requirements, using `[Differentiable]` attribute enforces that any implementation of that method must also be
differentiable. You can, of course, provide a manual derivative implementation to satisfy the requirement.
The following is a sample snippet. You can run the full sample on the playground [here](https://shader-slang.org/slang-playground/?target=HLSL&code=eJyVVMtu2zAQvOsrFgEKy4Wq1C7QQ1330AYBcujjnhbBWiRjphQpUJQjI8i_d0mRquLYASLYtLwc7sySs5R1Y6yDRuH-1ppOs1UmteNWYMXh6tKY7CEDeq4vpBDccu0kbhT_E4JCGXRQoary4bWfr7LHLGud7SoHtPqqbhR8miYagJTeGbvKQuj8HDyO15QdnTQadhIBO2dq-lsB-0_tZ8tXCQrxgdo_lrvOaujhLXwoBY1RCQTE43P1y5fk-8gLJdSoO1RqD401O8mkvgXGrdwRYseh5m5rWBvLuTT2Hi27GOdzX8aNuGfzobbrr1j9PQbZjJBXlb-clh-rDz-TjVW_UNrPIdcXSHryU4BTbP78PC7vy2YgLiJ7X7JRw_yJiJ2RDFJ5udSmcyeF9UWsnFnedsodyuhhfWqtl1SkdQebMgoiSd4BcMvdz81d3lGDgNnc3Ug2j66QAvIhAus1LA4FpEaQfljDw6L8KB5Xh9vkZxOlH7lq-fFEyzET6X05EWl_1inRJqZuOseTUwo4UtfxZv1mOToOSEy6dajppjACuHRbbpPEBZjxfRkWhi0U9F2njYxcsfdS9ou9xjp0fdugq7bgDGBDHdRY6Wln3hWzMsLTqh-GptzWqyUK6VquBMgWtNHv2JP6CxLORrYtesz0ilHA0GEBG3LcrJ8F9GwwyCytupdKkTut3U8aui3bqagdWoi-WntRZehLf0Nmk7MaEOHWDJanIrX7jlLn6QxOuZ41SInH3lqW76NhqWFufDiPJzzPCVrAgj6lSPSBJz97I37rs8LnKpm_u_8BU5nW2Q).
```csharp
interface IFoo
{
[Differentiable]
float calc(float x);
}
struct FooImpl : IFoo
{
// Implementation via automatic differentiation.
[Differentiable]
float calc(float x)
{ /* ... */ }
}
struct FooImpl2 : IFoo
{
// Implementation via manually providing derivative methods.
[ForwardDerivative(calc_fwd)]
[BackwardDerivative(calc_bwd)]
float calc(float x)
{ /* ... */ }
DifferentialPair<float> calc_fwd(DifferentialPair<float> x)
{ /* ... */ }
void calc_bwd(inout DifferentialPair<float> x, float dresult)
{ /* ... */ }
}
[Differentiable]
float compute(float x, uint obj_id)
{
// Create an instance of either FooImpl1 or FooImpl2
IFoo foo = createDynamicObject<IFoo>(obj_id);
// Dynamic dispatch to appropriate 'calc'.
//
// Note that foo itself is non-differentiable, and
// has no differential data, but 'x' and 'result'
// will carry derivatives.s
//
var result = foo.calc(x);
return result;
}
```
### Differentiable Interface (and Associated) Types
> Note: This is an advanced use-case and support is currently experimental.
You can have an interface or an interface associated type extend `IDifferentiable` and use that in differentiable interface requirement functions. This is often important in large code-bases with modular components that are all differentiable (one example is the material system in large production renderers)
Here is a snippet of how to make an interface and associated type (and by consequence all its implementations) differentiable.
For a full working sample, check out the Slang playground [here](https://shader-slang.org/slang-playground/?target=WGSL&code=eJylVVFvmzAQfudXnCpVgoXRhK4vpdnLuodIq9pqe9umygGzuXPANaYjqvLfd8bgmIR0a-eHcHfcnT9_38WwlSilAsHJ-ocs6yJLPI8VisqcpBQWHwhPa04UKws4h8Uly3MqaaEYWXLqPXmA6-sw-r0N5rwkCogQfO0buw4SbzPs_kWSospLuTrYm1RVmTKiaKbWgsIOHMfFxgexuFUr8ov6HZJKyTpV8IkVlMibkq-LcsUI3-ncIekOlDjO0jgvIaEJ4AkkVbUsgMAbaGCCbWDjbSyc25pkEndOX4_IOOlznDwPzbfYgs5IhyANZwstpSi3glhBO4haNMIZqQYazPcod2ELNRu68cu0bcNme7321CUpAnjy1U9WRdgc3kJnzoLQmpvENujVSk1oVKpXEzEitjNUhwnZuiuW3Qn1XxSNTdxDy5JN0SvGUdCETeBda82QOm0ZBOEgdxvHpDObjuXDPAxbf5_zB5fzvbM514c-1vXy3q_xcgGWhR03j4TPHDsOOjVYDj7LYD6H6fi4DPXkTHNhmuk2-0A564HqX8oreojiYeeHnc4h-JplHUCaW8hwAqdRPsINe46b7gZA5Q0nev56JoTlRMS91fTUOKSWy3tE11NrOuhaEQfduA0-D3pgsCTqb1gHaxqZi6bRF6_nPZZIvpCI64qwwu-3boFeXV9-_Iyd-hkvJXSqYnCa4OPC5KA5meyqZw7TfmHEDU4clkRxcuB1jK9P3dctJP9oKNFVmVE4zsBv5tPoLDiH4_xbcRQCCw29-LT7bU0kVmf3ROnlSMRvCJMXLZr3kIkGgWT4Vkd9XZb8Q9HCOaQttkhe1CIeaxE7LZa_szud4OsTB_rIzv6uE2unCWEWTd2jd8RmvqRVzVVwkuEoWCaxIsqCPRncbH05O_l277_XxWN1sa3Df88fIn-viQ)
```csharp
interface IFoo : IDifferentiable
{
associatedtype BaseType : IDifferentiable;
[Differentiable]
BaseType foo(BaseType x);
};
[Differentiable]
float calc(float x)
{
// Note that since IFoo is differentiable,
// any data in the IFoo implementation is differentiable
// and will carry derivatives.
//
IFoo obj = makeObj(/* ... */);
return obj.foo(x);
}
```
Under the hood, Slang will automatically construct an anonymous abstract type to represent the differentials.
However, on targets that don't support true dynamic dispatch, these are lowered into tagged unions.
While we are working to improve the implementation, this union can currently include all active differential
types, rather than just the relevant ones. This can lead to increased memory use.
## Primal Substitute Functions
Sometimes it is desirable to replace a function with another when generating derivative code.
Most often, this is because a lot of shader operations may just not have a function body, such hardware intrinsics for
texture sampling. In such cases, Slang provides a `[PrimalSubstitute(fn)]` attribute that can be used to provide
a reference implementation that Slang can differentiate to generate the derivative function.
The following is a small snippet with bilinear texture sampling. For a full example application that uses this concept, see the [texture differentiation sample](https://github.com/shader-slang/slang/tree/master/examples/autodiff-texture) in the Slang repository.
```csharp
[PrimalSubstitute(sampleTextureBiliear_reference)]
float4 sampleTextureBilinear(Texture2D<float4> x, float2 loc)
{
// HW-accelerated sampling intrinsics.
// Slang does not have access to body, so cannot differentiate.
//
x.Sample(/*...*/)
}
// Since the substitute is differentiable, so is `sampleTextureBilinear`.
[Differentiable]
float4 sampleTextureBilinear_reference(Texture2D<float4> x, float2 loc)
{
// Reference SW interpolation, that is differentiable.
}
[Differentiable]
float computePixel(Texture2D<float> x, float a, float b)
{
// Slang will use HW-accelerated sampleTextureBilinear for standard function
// call, but differentiate the SW reference interpolation during backprop.
//
float4 sample1 = sampleTextureBilinear(x, float2(a, 1));
}
```
Similar to `[ForwardDerivativeOf(fn)]` and `[BackwardDerivativeOf(fn)]` attributes, Slang provides a `[PrimalSubstituteOf(fn)]` attribute that can be used on the substitute function to reference the primal one.
## Working with Mixed Differentiable and Non-Differentiable Code
Introducing differentiability to an existing system often involves dealing with code that mixes differentiable and non-differentiable logic.
Slang provides type checking and code analysis features to allow users to clarify the intention and guard against unexpected behaviors involving when to propagate derivatives through operations.
### Excluding Parameters from Differentiation
Sometimes we do not wish a parameter to be considered differentiable despite it has a differentiable type. We can use the `no_diff` modifier on the parameter to inform the compiler to treat the parameter as non-differentiable and skip generating differentiation code for the parameter. The syntax is:
```csharp
// Only differentiate this function with regard to `x`.
float myFunc(no_diff float a, float x);
```
The forward derivative and backward propagation functions of `myFunc` should have the following signature:
```csharp
DifferentialPair<float> fwd_derivative(float a, DifferentialPair<float> x);
void back_prop(float a, inout DifferentialPair<float> x, float dResult);
```
In addition, the `no_diff` modifier can also be used on the return type to indicate the return value should be considered non-differentiable. For example, the function
```csharp
no_diff float myFunc(no_diff float a, float x, out float y);
```
Will have the following forward derivative and backward propagation function signatures:
```csharp
float fwd_derivative(float a, DifferentialPair<float> x);
void back_prop(float a, inout DifferentialPair<float> x, float d_y);
```
By default, the implicit `this` parameter will be treated as differentiable if the enclosing type of the member method is differentiable. If you wish to exclude `this` parameter from differentiation, use `[NoDiffThis]` attribute on the method:
```csharp
struct MyDifferentiableType : IDifferentiable
{
[NoDiffThis] // Make `this` parameter `no_diff`.
float compute(float x) { ... }
}
```
### Excluding Struct Members from Differentiation
When using automatic `IDifferentiable` conformance synthesis for a `struct` type, Slang will by-default treat all struct members that have a differentiable type as differentiable, and thus include a corresponding field in the generated `Differential` type for the struct.
For example, given the following definition
```csharp
struct MyType : IDifferentiable
{
float member1;
float2 member2;
}
```
Slang will generate:
```csharp
struct MyType.Differential : IDifferentiable
{
float member1; // derivative for MyType.member1
float2 member2; // derivative for MyType.member2
}
```
If the user does not want a certain member to be treated as differentiable despite it has a differentiable type, a `no_diff` modifier can be used on the struct member to exclude it from differentiation.
For example, the following code excludes `member1` from differentiation:
```csharp
struct MyType : IDifferentiable
{
no_diff float member1; // excluded from differentiation
float2 member2;
}
```
The generated `Differential` in this case will be:
```csharp
struct MyType.Differential : IDifferentiable
{
float2 member2;
}
```
### Assigning Differentiable Values into a Non-Differentiable Location
When a value with derivatives is being assigned to a location that is not differentiable, such as a struct member that is marked as `no_diff`, the derivative info is discarded and any derivative propagation is stopped at the assignment site.
This may lead to unexpected results. For example:
```csharp
struct MyType : IDifferentiable
{
no_diff float member;
float someOtherMember;
}
[Differentiable]
float f(float x)
{
MyType t;
t.member = x * x; // Error: assigning value with derivative into a non-differentiable location.
return t.member;
}
```
In this case, we are assigning the value `x*x`, which carries a derivative, into a non-differentiable location `MyType.member`, thus throwing away any derivative info. When `f` returns `t.member`, there will be no derivative associated with it, so the function will not propagate the derivative through. This code is most likely not intending to discard the derivative through the assignment. To help avoid this kind of unintentional behavior, Slang will treat any assignments of a value with derivative info into a non-differentiable location as a compile-time error. To eliminate this error, the user should either make `t.member` differentiable, or to force the assignment by clarifying the intention to discard any derivatives using the built-in `detach` method.
The following code will compile, and the derivatives will be discarded:
```csharp
[Differentiable]
float f(float x)
{
MyType t;
// OK: the code has expressed clearly the intention to discard the derivative and perform the assignment.
t.member = detach(x * x);
return t.member;
}
```
### Calling Non-Differentiable Functions from a Differentiable Function
Calling non-differentiable function from a differentiable function is allowed. However, derivatives will not be propagated through the call. The user is required to clarify the intention by prefixing the call with the `no_diff` keyword. An un-clarified call to non-differentiable function will result in a compile-time error.
For example, consider the following code:
```csharp
float g(float x)
{
return 2*x;
}
[Differentiable]
float f(float x)
{
// Error: implicit call to non-differentiable function g.
return g(x) + x * x;
}
```
The derivative will not propagate through the call to `g` in `f`. As a result, `fwd_diff(f)(diffPair(1.0, 1.0))` will return
`{3.0, 2.0}` instead of `{3.0, 4.0}` as the derivative from `2*x` is lost through the non-differentiable call. To prevent unintended error, it is treated as a compile-time error to call `g` from `f`. If such a non-differentiable call is intended, a `no_diff` prefix is required in the call:
```csharp
[Differentiable]
float f(float x)
{
// OK. The intention to call a non-differentiable function is clarified.
return no_diff g(x) + x * x;
}
```
However, the `no_diff` keyword is not required in a call if a non-differentiable function does not take any differentiable parameters, or if the result of the differentiable function is not dependent on the derivative being propagated through the call.
### Treat Non-Differentiable Functions as Differentiable
Slang allows functions to be marked with a `[TreatAsDifferentiable]` attribute for them to be considered as differentiable functions by the type-system. When a function is marked as `[TreatAsDifferentiable]`, the compiler will not generate derivative propagation code from the original function body or perform any additional checking on the function definition. Instead, it will generate trivial forward and backward propagation functions that returns 0.
This feature can be useful if the user marked an `interface` method as forward or backward differentiable, but only wish to provide non-trivial derivative propagation functions for a subset of types that implement the interface. For other types that does not actually need differentiation, the user can simply put `[TreatAsDifferentiable]` on the method implementations for them to satisfy the interface requirement.
See the following code for an example of `[TreatAsDifferentiable]`:
```csharp
interface IFoo
{
[Differentiable]
float f(float v);
}
struct B : IFoo
{
[TreatAsDifferentiable]
float f(float v)
{
return v * v;
}
}
[Differentiable]
float use(IFoo o, float x)
{
return o.f(x);
}
// Test:
B obj;
float result = fwd_diff(use)(obj, diffPair(2.0, 1.0)).d;
// result == 0.0, since `[TreatAsDifferentiable]` causes a trivial derivative implementation
// being generated regardless of the original code.
```
## Higher-Order Differentiation
Slang supports generating higher order forward and backward derivative propagation functions. It is allowed to use `fwd_diff` and `bwd_diff` operators inside a forward or backward differentiable function, or to nest `fwd_diff` and `bwd_diff` operators. For example, `fwd_diff(fwd_diff(sin))` will have the following signature:
```csharp
DifferentialPair<DifferentialPair<float>> sin_diff2(DifferentialPair<DifferentialPair<float>> x);
```
The input parameter `x` contains four fields: `x.p.p`, `x.p.d,`, `x.d.p`, `x.d.d`, where `x.p.p` specifies the original input value, both `x.p.d` and `x.d.p` store the first order derivative if `x`, and `x.d.d` stores the second order derivative of `x`. Calling `fwd_diff(fwd_diff(sin))` with `diffPair(diffPair(pi/2, 1.0), DiffPair(1.0, 0.0))` will result `{ { 1.0, 0.0 }, { 0.0, -1.0 } }`.
User defined higher-order derivative functions can be specified by using `[ForwardDerivative]` or `[BackwardDerivative]` attribute on the derivative function, or by using `[ForwardDerivativeOf]` or `[BackwardDerivativeOf]` attribute on the higher-order derivative function.
## Restrictions and Known Issues
The compiler can generate forward derivative and backward propagation implementations for most uses of array and struct types, including arbitrary read and write access at dynamic array indices, and supports uses of all types of control flows, mutable parameters, generics and interfaces. This covers the set of operations that is sufficient for a lot of functions. However, the user needs to be aware of the following restrictions when using automatic differentiation:
- All operations to global resources, global variables and shader parameters, including texture reads or atomic writes, are treated as a non-differentiable operation. Slang provides support for special data-structures (such as `Tensor`) through libraries such as `SlangPy`, which come with custom derivative implementations
- If a differentiable function contains calls that cause side-effects such as updates to global memory, there is currently no guarantee on how many times side-effects will occur during the resulting derivative function or back-propagation function.
- Loops: Loops must have a bounded number of iterations. If this cannot be inferred statically from the loop structure, the attribute `[MaxIters(<count>)]` can be used specify a maximum number of iterations. This will be used by compiler to allocate space to store intermediate data. If the actual number of iterations exceeds the provided maximum, the behavior is undefined. You can always mark a loop with the `[ForceUnroll]` attribute to instruct the Slang compiler to unroll the loop before generating derivative propagation functions. Unrolled loops will be treated the same way as ordinary code and are not subject to any additional restrictions.
- Double backward derivatives (higher-order differentiation): The compiler does not currently support multiple backward derivative calls such as `bwd_diff(bwd_diff(fn))`. The vast majority of higher-order derivative applications can be acheived more efficiently via multiple forward-derivative calls or a single layer of `bwd_diff` on functions that use one or more `fwd_diff` passes.
The above restrictions do not apply if a user-defined derivative or backward propagation function is provided.
## Reference
This section contains some additional information for operators that are not currently included in the [standard library reference](https://shader-slang.org/stdlib-reference/)
### `fwd_diff(f : slang_function) -> slang_function`
The `fwd_diff` operator can be used on a differentiable function to obtain the forward derivative propagation function.
A forward derivative propagation function computes the derivative of the result value with regard to a specific set of input parameters.
Given an original function, the signature of its forward propagation function is determined using the following rules:
- If the return type `R` implements `IDifferentiable` the forward propagation function will return a corresponding `DifferentialPair<R>` that consists of both the computed original result value and the (partial) derivative of the result value. Otherwise, the return type is kept unmodified as `R`.
- If a parameter has type `T` that implements `IDifferentiable`, it will be translated into a `DifferentialPair<T>` parameter in the derivative function, where the differential component of the `DifferentialPair` holds the initial derivatives of each parameter with regard to their upstream parameters.
- If a parameter has type `T` that implements `IDifferentiablePtrType`, it will be translated into a `DifferentialPtrPair<T>` parameter where the differential component references the differential component.
- All parameter directions are unchanged. For example, an `out` parameter in the original function will remain an `out` parameter in the derivative function.
- Differentiable methods cannot have a type implementing `IDifferentiablePtrType` as an `out` or `inout` parameter, or a return type. Types implementing `IDifferentiablePtrType` can only be used for input parameters to a differentiable method. Marking such a method as `[Differentiable]` will result in a compile-time diagnostic error.
For example, given original function:
```csharp
[Differentiable]
R original(T0 p0, inout T1 p1, T2 p2, T3 p3);
```
Where `R`, `T0`, `T1 : IDifferentiable`, `T2` is non-differentiable, and `T3 : IDifferentiablePtrType`, the forward derivative function will have the following signature:
```csharp
DifferentialPair<R> derivative(DifferentialPair<T0> p0, inout DifferentialPair<T1> p1, T2 p2, DifferentialPtrPair<T3> p3);
```
This forward propagation function takes the initial primal value of `p0` in `p0.p`, and the partial derivative of `p0` with regard to some upstream parameter in `p0.d`. It takes the initial primal and derivative values of `p1` and updates `p1` to hold the newly computed value and propagated derivative. Since `p2` is not differentiable, it remains unchanged.
### `bwd_diff(f : slang_function) -> slang_function`
A backward derivative propagation function propagates the derivative of the function output to all the input parameters simultaneously.
Given an original function `f`, the general rule for determining the signature of its backward propagation function is that a differentiable output `o` becomes an input parameter holding the partial derivative of a downstream output with regard to the differentiable output, i.e. $$\partial y/\partial o$$; an input differentiable parameter `i` in the original function will become an output in the backward propagation function, holding the propagated partial derivative $$\partial y/\partial i$$; and any non-differentiable outputs are dropped from the backward propagation function. This means that the backward propagation function never returns any values computed in the original function.
More specifically, the signature of its backward propagation function is determined using the following rules:
- A backward propagation function always returns `void`.
- A differentiable `in` parameter of type `T : IDifferentiable` will become an `inout DifferentialPair<T>` parameter, where the original value part of the differential pair contains the original value of the parameter to pass into the back-prop function. The original value will not be overwritten by the backward propagation function. The propagated derivative will be written to the derivative part of the differential pair after the backward propagation function returns. The initial derivative value of the pair is ignored as input.
- A differentiable `out` parameter of type `T : IDifferentiable` will become an `in T.Differential` parameter, carrying the partial derivative of some downstream term with regard to the return value.
- A differentiable `inout` parameter of type `T : IDifferentiable` will become an `inout DifferentialPair<T>` parameter, where the original value of the argument, along with the downstream partial derivative with regard to the argument is passed as input to the backward propagation function as the original and derivative part of the pair. The propagated derivative with regard to this input parameter will be written back and replace the derivative part of the pair. The primal value part of the parameter will *not* be updated.
- A differentiable return value of type `R` will become an additional `in R.Differential` parameter at the end of the backward propagation function parameter list, carrying the result derivative of a downstream term with regard to the return value of the original function.
- A non-differentiable return value of type `NDR` will be dropped.
- A non-differentiable `in` parameter of type `ND` will remain unchanged in the backward propagation function.
- A non-differentiable `out` parameter of type `ND` will be removed from the parameter list of the backward propagation function.
- A non-differentiable `inout` parameter of type `ND` will become an `in ND` parameter.
- Types implemented `IDifferentiablePtrType` work the same was as the forward-mode case. They can only be used with `in` parameters, and are converted into `DifferentialPtrPair` types. Their directions are **not** affected.
For example consider the following original function:
```csharp
struct T : IDifferentiable {...}
struct R : IDifferentiable {...}
struct P : IDifferentiablePtrType {...}
struct ND {} // Non differentiable
[Differentiable]
R original(T p0, out T p1, inout T p2, ND p3, out ND p4, inout ND p5, P p6);
```
The signature of its backward propagation function is:
```csharp
void back_prop(
inout DifferentialPair<T> p0,
T.Differential p1,
inout DifferentialPair<T> p2,
ND p3,
ND p5,
DifferentialPtrPair<P> p6,
R.Differential dResult);
```
Note that although `p2` is still `inout` in the backward propagation function, the backward propagation function will only write propagated derivative to `p2.d` and will not modify `p2.p`.
### Built-in Differentiable Functions
The following built-in functions are differentiable and both their forward and backward derivative functions are already defined in the standard library's core module:
- Arithmetic functions: `abs`, `max`, `min`, `sqrt`, `rcp`, `rsqrt`, `fma`, `mad`, `fmod`, `frac`, `radians`, `degrees`
- Interpolation and clamping functions: `lerp`, `smoothstep`, `clamp`, `saturate`
- Trigonometric functions: `sin`, `cos`, `sincos`, `tan`, `asin`, `acos`, `atan`, `atan2`
- Hyperbolic functions: `sinh`, `cosh`, `tanh`
- Exponential and logarithmic functions: `exp`, `exp2`, `pow`, `log`, `log2`, `log10`
- Vector functions: `dot`, `cross`, `length`, `distance`, `normalize`, `reflect`, `refract`
- Matrix transforms: `mul(matrix, vector)`, `mul(vector, matrix)`, `mul(matrix, matrix)`
- Matrix operations: `transpose`, `determinant`
- Legacy blending and lighting intrinsics: `dst`, `lit`

View File

@@ -0,0 +1,996 @@
---
layout: user-guide
permalink: /user-guide/compiling
---
Compiling Code with Slang
=========================
This chapter presents the ways that the Slang system supports compiling and composing shader code.
We will start with a discussion of the mental model that Slang uses for compilation.
Next we will cover the command-line Slang compiler, `slangc`, and how to use it to perform offline compilation.
Finally we will discuss the Slang compilation API, which can be used to integrate Slang compilation into an application at runtime, or to build custom tools that implement application-specific compilation policy.
## Concepts
For simple scenarios it may be enough to think of a shader compiler as a box where source code goes in and compiled kernels come out.
Most real-time graphics applications end up needing more control over shader compilation, and/or more information about the results of compilation.
In order to make use of the services provided by the Slang compilation system, it is useful to start with a clear model of the concepts that are involved in compilation.
### Source Units
At the finest granularity, code is fed to the compiler in _source units_ which are most often stored as files on disk or strings of text in memory.
The compilation model largely does not care whether source units have been authored by human programmers or automatically assembled by other tools.
If multiple source units are specified as part of the same compile, they will be preprocessed and parsed independently.
However, a source unit might contain `#include` directives, so that the preprocessed text of that source unit includes the content of other files.
Note that the `#include`d files do not become additional source units; they are just part of the text of a source unit that was fed to the compiler.
### Translation Units and Modules
Source units (such as files) are grouped into _translation units_, and each translation unit will produce a single _module_ when compiled.
While the source units are all preprocessed and parsed independently, semantic checking is applied to a translation unit as a whole.
One source file in a translation unit may freely refer to declarations in another source file from the same translation unit without any need for forward declarations. For example:
```hlsl
// A.slang
float getFactor() { return 10.0; }
```
```hlsl
// B.slang
float scaleValue(float value)
{
return value * getFactor();
}
```
In this example, the `scaleValue()` function in `B.slang` can freely refer to the `getFactor()` function in `A.slang` because they are part of the same translation unit.
It is allowed, and indeed common, for a translation unit to contain only a single source unit.
For example, when adapting an existing codebase with many `.hlsl` files, it is appropriate to compile each `.hlsl` file as its own translation unit.
A modernized codebase that uses modular `include` feature as documented in [Modules and Access Control](modules) might decide to compile multiple `.slang` files in a single directory as a single translation unit.
The result of compiling a translation unit is a module in Slang's internal intermediate representation (IR). The compiled module can then be serialized to a `.slang-module` binary file. The binary file can then be loaded via the
`ISession::loadModuleFromIRBlob` function or `import`ed in slang code the same way as modules written in `.slang` files.
### Entry Points
A translation unit / module may contain zero or more entry points.
Slang supports two models for identifying entry points when compiling.
#### Entry Point Attributes
By default, the compiler will scan a translation unit for function declarations marked with the `[shader(...)]` attribute; each such function will be identified as an entry point in the module.
Developers are encouraged to use this model because it directly documents intention and makes source code less dependent on external compiler configuration options.
#### Explicit Entry Point Options
For compatibility with existing code, the Slang compiler also supports explicit specification of entry point functions using configuration options external to shader source code.
When these options are used the compiler will *ignore* all `[shader(...)]` attributes and only use the explicitly-specified entry points instead.
### Shader Parameters
A translation unit / module may contain zero or more global shader parameters.
Similarly, each entry point may define zero or more entry-point `uniform` shader parameters.
The shader parameters of a module or entry point are significant because they describe the interface between host application code and GPU code.
It is important that both the application and generated GPU kernel code agree on how parameters are laid out in memory and/or how they are assigned to particular API-defined registers, locations, or other "slots."
### Targets
Within the Slang system a _target_ represents a particular platform and set of capabilities that output code can be generated for.
A target includes information such as:
* The _format_ that code should be generated in: SPIR-V, DXIL, etc.
* A _profile_ that specifies a general feature/capability level for the target: D3D Shader Model 5.1, GLSL version 4.60, etc.
* Optional _capabilities_ that should be assumed available on the target: for example, specific Vulkan GLSL extensions
* Options that impact code generation: floating-point strictness, level of debug information to generate, etc.
Slang supports compiling for multiple targets in the same compilation session.
When using multiple targets at a time, it is important to understand the distinction between the _front-end_ of the compiler, and the _back-end_:
* The compiler front-end comprises preprocessing, parsing, and semantic checking. The front-end runs once for each translation unit and its results are shared across all targets.
* The compiler back-end generates output code, and thus runs once per target.
> #### Note ####
> Because front-end actions, including preprocessing, only run once, across all targets, the Slang compiler does not automatically provide any target-specific preprocessor `#define`s that can be used for preprocessor conditionals.
> Applications that need target-specific `#define`s should always compile for one target at a time, and set up their per-target preprocessor state manually.
### Layout
While the front-end of the compiler determines what the shader parameters of a module or entry point are, the _layout_ for those parameters is dependent on a particular compilation target.
A `Texture2D` might consume a `t` register for Direct3D, a `binding` for Vulkan, or just plain bytes for CUDA.
The details of layout in Slang will come in a later chapter.
For the purposes of the compilation model it is important to note that the layout computed for shader parameters depends on:
* What modules and entry points are being used together; these define which parameters are relevant.
* Some well-defined ordering of those parameters; this defines which parameters should be laid out before which others.
* The rules and constraints that the target imposes on layout.
An important design choice in Slang is give the user of the compiler control over these choices.
### Composition
The user of the Slang compiler communicates the modules and entry points that will be used together, as well as their relative order, using a system for _composition_.
A _component type_ is a unit of shader code composition; both modules and entry points are examples of component types.
A _composite_ component type is formed from a list of other component types (for example, one module and two entry points) and can be used to define a unit of shader code that is meant to be used together.
Once a programmer has formed a composite of all the code they intend to use together, they can query the layout of the shader parameters in that composite, or invoke the linking step to
resolve all cross module references.
### Linking
A user-composed program may have transitive module dependencies and cross references between module boundaries. The linking step in Slang is to resolve all the cross references in the IR and produce a
new self-contained IR module that has everything needed for target code generation. The user will have an opportunity to specialize precompiled modules or provide additional compiler backend options
at the linking step.
### Kernels
Once a program is linked, the user can request generation of the _kernel_ code for an entry point.
The same entry point can be used to generate many different kernels.
First, an entry point can be compiled for different targets, resulting in different kernels in the appropriate format for each target.
Second, different compositions of shader code can result in different layouts, which leads to different kernels being required.
## Command-Line Compilation with `slangc`
The `slangc` tool, included in binary distributions of Slang, is a command-line compiler that can handle most simple compilation tasks.
`slangc` is intended to be usable as a replacement for tools like `fxc` and `dxc`, and covers most of the same use cases.
### All Available Options
See [slangc command line reference](https://github.com/shader-slang/slang/blob/master/docs/command-line-slangc-reference.md) for a complete list of compiler options supported by the `slangc` tool.
### A Simple `slangc` Example
Here we will repeat the example used in the [Getting Started](01-get-started.md) chapter.
Given the following Slang code:
```hlsl
// hello-world.slang
StructuredBuffer<float> buffer0;
StructuredBuffer<float> buffer1;
RWStructuredBuffer<float> result;
[shader("compute")]
[numthreads(1,1,1)]
void computeMain(uint3 threadId : SV_DispatchThreadID)
{
uint index = threadId.x;
result[index] = buffer0[index] + buffer1[index];
}
```
we can compile the `computeMain()` entry point to SPIR-V using the following command line:
```bat
slangc hello-world.slang -target spirv -o hello-world.spv
```
> #### Note ####
> Some targets require additional parameters. See [`slangc` Entry Points](#slangc-entry-points) for details. For example, to target HLSL, the equivalent command is:
>
> ```bat
> slangc hello-world.slang -target hlsl -entry computeMain -o hello-world.hlsl
> ```
### Source Files and Translation Units
The `hello-world.slang` argument here is specifying an input file.
Each input file specified on the command line will be a distinct source unit during compilation.
Slang supports multiple file-name extensions for input files, but the most common ones will be `.hlsl` for existing HLSL code, and `.slang` for files written specifically for Slang.
If multiple source files are passed to `slangc`, they will be grouped into translation units using the following rules:
* If there are any `.slang` files, then all of them will be grouped into a single translation unit
* Each `.hlsl` file will be grouped into a distinct translation unit of its own.
* Each `.slang-module` file forms its own translation unit.
### `slangc` Entry Points
When using `slangc`, you will typically want to identify which entry point(s) you intend to compile.
The `-entry computeMain` option selects an entry point to be compiled to output code in this invocation of `slangc`.
Because the `computeMain()` entry point in this example has a `[shader(...)]` attribute, the compiler is able to deduce that it should be compiled for the `compute` stage.
```bat
slangc hello-world.slang -target spirv -o hello-world.spv
```
In code that does not use `[shader(...)]` attributes, a `-entry` option should be followed by a `-stage` option to specify the stage of the entry point:
```bat
slangc hello-world.slang -entry computeMain -stage compute -target spirv -o hello-world.spv
```
> #### Note ####
> The `slangc` CLI [currently](https://github.com/shader-slang/slang/issues/5541) cannot automatically deduce `-entrypoint` and `-stage`/`-profile` options from `[shader(...)]` attributes when generating code for targets other than SPIRV, Metal, CUDA, or Optix. For targets such as HLSL, please continue to specify `-entry` and `-stage` options, even when compiling a file with the `[shader(...)]` attribute on its entry point.
### `slangc` Targets
Our example uses the option `-target spirv` to introduce a compilation target; in this case, code will be generated as SPIR-V.
The argument of a `-target` option specified the format to use for the target; common values are `dxbc`, `dxil`, and `spirv`.
Additional options for a target can be specified after the `-target` option.
For example, a `-profile` option can be used to specify a profile that should be used.
Slang provides two main kinds of profiles for use with `slangc`:
* Direct3D "Shader Model" profiles have names like `sm_5_1` and `sm_6_3`
* GLSL versions can be used as profile with names like `glsl_430` and `glsl_460`
### `slangc` Kernels
A `-o` option indicates that kernel code should be written to a file on disk.
In our example, the SPIR-V kernel code for the `computeMain()` entry point will be written to the file `hello-world.spv`.
### Working with Multiples
It is possible to use `slangc` with multiple input files, entry points, or targets.
In these cases, the ordering of arguments on the command line becomes significant.
When an option modifies or relates to another command-line argument, it implicitly applies to the most recent relevant argument.
For example:
* If there are multiple input files, then an `-entry` option applies to the preceding input file
* If there are multiple entry points, then a `-stage` option applies to the preceding `-entry` option
* If there are multiple targets, then a `-profile` option applies to the preceding `-target` option
Kernel `-o` options are the most complicated case, because they depend on both a target and entry point.
A `-o` option applies to the preceding entry point, and the compiler will try to apply it to a matching target based on its file extension.
For example, a `.spv` output file will be matched to a `-target spirv`.
The compiler makes a best effort to support complicated cases with multiple files, entry points, and targets.
Users with very complicated compilation requirements will probably be better off using multiple `slangc` invocations or migrating to the compilation API.
### Additional Options
The main other options are:
* `-D<name>` or `-D<name>=<value>` can be used to introduce preprocessor macros.
* `-I<path>` or `-I <path>` can be used to introduce a _search path_ to be used when resolving `#include` directives and `import` declarations.
* `-g` can be used to enable inclusion of debug information in output files (where possible and implemented)
* `-O<level>` can be used to control optimization levels when the Slang compiler invokes downstream code generator
See [slangc command line reference](https://github.com/shader-slang/slang/blob/master/docs/command-line-slangc-reference.md) for a complete list of compiler options supported by the `slangc` tool.
### Downstream Arguments
`slangc` may leverage a 'downstream' tool like 'dxc', 'fxc', 'glslang', or 'gcc' for some target compilations. Rather than replicate every possible downstream option, arguments can be passed directly to the downstream tool using the "-X" option in `slangc`.
The mechanism used here is based on the `-X` mechanism used in GCC, to specify arguments to the linker.
```
-Xlinker option
```
When used, `option` is not interpreted by GCC, but is passed to the linker once compilation is complete. Slang extends this idea in several ways. First there are many more 'downstream' stages available to Slang than just `linker`. These different stages are known as `SlangPassThrough` types in the API and have the following names
* `fxc` - FXC HLSL compiler
* `dxc` - DXC HLSL compiler
* `glslang` - GLSLANG GLSL compiler
* `visualstudio` - Visual Studio C/C++ compiler
* `clang` - Clang C/C++ compiler
* `gcc` - GCC C/C++ compiler
* `genericcpp` - A generic C++ compiler (can be any one of visual studio, clang or gcc depending on system and availability)
* `nvrtc` - NVRTC CUDA compiler
The Slang command line allows you to specify an argument to these downstream compilers, by using their name after the `-X`. So for example to send an option `-Gfa` through to DXC you can use
```
-Xdxc -Gfa
```
Note that if an option is available via normal Slang command line options then these should be used. This will generally work across multiple targets, but also avoids options clashing which is undefined behavior currently. The `-X` mechanism is best used for options that are unavailable through normal Slang mechanisms.
If you want to pass multiple options using this mechanism the `-Xdxc` needs to be in front of every options. For example
```
-Xdxc -Gfa -Xdxc -Vd
```
Would reach `dxc` as
```
-Gfa -Vd
```
This can get a little repetitive especially if there are many parameters, so Slang adds a mechanism to have multiple options passed by using an ellipsis `...`. The syntax is as follows
```
-Xdxc... -Gfa -Vd -X.
```
The `...` at the end indicates all the following parameters should be sent to `dxc` until it reaches the matching terminating `-X.` or the end of the command line.
It is also worth noting that `-X...` options can be nested. This would allow a GCC downstream compilation to control linking, for example with
```
-Xgcc -Xlinker --split -X.
```
In this example gcc would see
```
-Xlinker --split
```
And the linker would see (as passed through by gcc)
```
--split
```
Setting options for tools that aren't used in a Slang compilation has no effect. This allows for setting `-X` options specific for all downstream tools on a command line, and they are only used as part of a compilation that needs them.
NOTE! Not all tools that Slang uses downstream make command line argument parsing available. `FXC` and `GLSLANG` currently do not have any command line argument passing as part of their integration, although this could change in the future.
The `-X` mechanism is also supported by render-test tool. In this usage `slang` becomes a downstream tool. Thus you can use the `dxc` option `-Gfa` in a render-test via
```
-Xslang... -Xdxc -Gfa -X.
```
Means that the dxc compilation in the render test (assuming dxc is invoked) will receive
```
-Gfa
```
Some options are made available via the same mechanism for all downstream compilers.
* Use `-I` to specify include path for downstream compilers
For example to specify an include path "somePath" to DXC you can use...
```
-Xdxc -IsomePath
```
### Convenience Features
The `slangc` compiler provides a few conveniences for command-line compilation:
* Most options can appear out of order when they are unambiguous. For example, if there is only a single translation unit a `-entry` option can appear before or after any file.
* A `-target` option can be left out if it can be inferred from the only `-o` option present. For example, `-o hello-world.spv` already implies `-target spirv`.
* If a `-o` option is left out then kernel code will be written to the standard output. This output can be piped to a file, or can be printed to a console. In the latter case, the compiler will automatically disassemble binary formats for printing.
### Precompiled Modules
You can compile a `.slang` file into a binary IR module. For example, given the following source:
```hlsl
// my_library.slang
float myLibFunc() { return 5.0; }
```
You can compile it into `my_library.slang-module` with the following slangc command line:
```bat
slangc my_library.slang -o my_library.slang-module
```
This allows you to deploy just the `my_library.slang-module` file to users of the module, and it can be consumed in the user code with the same `import` syntax:
```hlsl
import my_library;
```
### Limitations
The `slangc` tool is meant to serve the needs of many developers, including those who are currently using `fxc`, `dxc`, or similar tools.
However, some applications will benefit from deeper integration of the Slang compiler into application-specific code and workflows.
Notable features that Slang supports which cannot be accessed from `slangc` include:
* Slang can provide _reflection_ information about shader parameters and their layouts for particular targets; this information is not currently output by `slangc`.
* Slang allows applications to control the way that shader modules and entry points are composed (which in turn influences their layout); `slangc` currently implements a single default policy for how to generate a composition of shader code.
Applications that need more control over compilation are encouraged to use the C++ compilation API described in the next section.
### Examples of `slangc` usage
#### Multiple targets and multiple entrypoints
In this example, there are two shader entrypoints defined in one source file.
```hlsl
// targets.slang
struct VertexOutput
{
nointerpolation int a : SOME_VALUE;
float3 b : SV_Position;
};
[shader("pixel")]
float4 psMain() : SV_Target
{
return float4(1, 0, 0, 1);
}
[shader("vertex")]
VertexOutput vsMain()
{
VertexOutput out;
out.a = 0;
out.b = float4(0, 1, 0, 1);
return out;
}
```
A single entrypoint from the preceding shader can be compiled to both SPIR-V Assembly and HLSL targets in one command:
```bat
slangc targets.slang -entry psMain -target spirv-asm -o targets.spv-asm -target hlsl -o targets.hlsl
```
The following command compiles both entrypoints to SPIR-V:
```bat
slangc targets.slang -entry vsMain -entry psMain -target spirv -o targets.spv
```
#### Creating a standalone executable example
This example compiles and runs a CPU host-callable style Slang unit.
```hlsl
// cpu.slang
class MyClass
{
int intMember;
__init()
{
intMember = 0;
}
int method()
{
printf("method\n");
return intMember;
}
}
export __extern_cpp int main()
{
MyClass obj = new MyClass();
return obj.method();
}
```
Compile the above code as standalone executable, using -I option to find dependent header files:
```bat
slangc cpu.slang -target executable -o cpu.exe -Xgenericcpp -I./include -Xgenericcpp -I./external/unordered_dense/include/
```
Execute the resulting executable:
```bat
C:\slang> cpu
method
```
#### Compiling and linking slang-modules
This example demonstrates the compilation of a slang-module, and linking to a shader which uses that module.
Two scenarios are provided, one in which the entry-point is compiled in the same `slangc` invocation that links in the dependent slang-module, and another scenario where linking is a separate invocation.
```hlsl
// lib.slang
public int foo(int a)
{
return a + 1;
}
```
```hlsl
// entry.slang
import "lib";
RWStructuredBuffer<int> outputBuffer;
[shader("compute")]
[numthreads(4, 1, 1)]
void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID)
{
int index = (int)dispatchThreadID.x;
outputBuffer[index] = foo(index);
}
```
Compile lib.slang to lib.slang-module:
```bat
slangc lib.slang -o lib.slang-module
```
Scenario 1: Compile entry.slang and link lib and entry together in one step:
```bat
slangc entry.slang -target spirv -o program.spv # Compile and link
```
Scenario 2: Compile entry.slang to entry.slang-module and then link together lib and entry in a second invocation:
```bat
slangc entry.slang -o entry.slang-module # Compile
slangc lib.slang-module entry.slang-module -target spirv -o program.spv # Link
```
#### Compiling with debug symbols
Debug symbols can be added with the "-g<debug-level>" option.
Adding '-g1' (or higher) to a SPIR-V compilation will emit extended 'DebugInfo' instructions.
```bat
slangc vertex.slang -target spirv-asm -o v.spv-asm -g0 # Omit debug symbols
slangc vertex.slang -target spirv-asm -o v.spv-asm -g1 # Add debug symbols
```
#### Compiling with additional preprocessor macros
User-defined macros can be set on the command-line with the "-D<macro>" or "-D<macro>=<value>" option.
```hlsl
// macrodefine.slang
[shader("pixel")]
float4 psMain() : SV_Target
{
#if defined(mymacro)
return float4(1, 0, 0, 1);
#else
return float4(0, 1, 0, 1);
#endif
}
```
* Setting a user-defined macro "mymacro"
```bat
slangc macrodefine.slang -entry psMain -target spirv-asm -o targets.spvasm -Dmymacro
```
## Using the Compilation API
The C++ API provided by Slang is meant to provide more complete control over compilation for applications that need it.
The additional level of control means that some tasks require more individual steps than they would when using a one-size-fits-all tool like `slangc`.
### "COM-lite" Components
Many parts of the Slang C++ API use interfaces that follow the design of COM (the Component Object Model).
Some key Slang interfaces are binary-compatible with existing COM interfaces.
However, the Slang API does not depend on any runtime aspects of the COM system, even on Windows; the Slang system can be seen as a "COM-lite" API.
The `ISlangUnknown` interface is equivalent to (and binary-compatible with) the standard COM `IUnknown`.
Application code is expected to correctly maintain the reference counts of `ISlangUnknown` objects returned from API calls; the `Slang::ComPtr<T>` "smart pointer" type is provided as an optional convenience for applications that want to use it.
Many Slang API calls return `SlangResult` values; this type is equivalent to (and binary-compatible with) the standard COM `HRESULT` type.
As a matter of convention, Slang API calls return a zero value (`SLANG_OK`) on success, and a negative value on errors.
> #### Note ####
> Slang API interfaces may be named with the suffix "_Experimental", indicating that the interface is not complete, may have known bugs, and may change or be removed between Slang API releases.
### Creating a Global Session
A Slang _global session_ uses the interface `slang::IGlobalSession` and it represents a connection from an application to a particular implementation of the Slang API.
A global session is created using the function `slang::createGlobalSession()`:
```c++
using namespace slang;
Slang::ComPtr<IGlobalSession> globalSession;
SlangGlobalSessionDesc desc = {};
createGlobalSession(&desc, globalSession.writeRef());
```
When a global session is created, the Slang system will load its internal representation of the _core module_ that the compiler provides to user code.
The core module can take a significant amount of time to load, so applications are advised to use a single global session if possible, rather than creating and then disposing of one for each compile.
If you want to enable GLSL compatibility mode, you need to set `SlangGlobalSessionDesc::enableGLSL` to `true` when calling `createGlobalSession()`. This will load the necessary GLSL intrinsic module
for compiling GLSL code. Without this setting, compiling GLSL code will result in an error.
> #### Note ####
> Currently, the global session type is *not* thread-safe.
> Applications that wish to compile on multiple threads will need to ensure that each concurrent thread compiles with a distinct global session.
> #### Note ####
> Currently, the global session should be freed after any objects created from it.
> See [issue 6344](https://github.com/shader-slang/slang/issues/6344).
### Creating a Session
A _session_ uses the interface `slang::ISession`, and represents a scope for compilation with a consistent set of compiler options.
In particular, all compilation with a single session will share:
* A list of enabled compilation targets (with their options)
* A list of search paths (for `#include` and `import`)
* A list of pre-defined macros
In addition, a session provides a scope for the loading and re-use of modules.
If two pieces of code compiled in a session both `import` the same module, then that module will only be loaded and compiled once.
To create a session, use the `IGlobalSession::createSession()` method:
```c++
SessionDesc sessionDesc;
/* ... fill in `sessionDesc` ... */
Slang::ComPtr<ISession> session;
globalSession->createSession(sessionDesc, session.writeRef());
```
The definition of `SessionDesc` structure is:
```C++
struct SessionDesc
{
/** The size of this structure, in bytes.
*/
size_t structureSize = sizeof(SessionDesc);
/** Code generation targets to include in the session.
*/
TargetDesc const* targets = nullptr;
SlangInt targetCount = 0;
/** Flags to configure the session.
*/
SessionFlags flags = kSessionFlags_None;
/** Default layout to assume for variables with matrix types.
*/
SlangMatrixLayoutMode defaultMatrixLayoutMode = SLANG_MATRIX_LAYOUT_ROW_MAJOR;
/** Paths to use when searching for `#include`d or `import`ed files.
*/
char const* const* searchPaths = nullptr;
SlangInt searchPathCount = 0;
PreprocessorMacroDesc const* preprocessorMacros = nullptr;
SlangInt preprocessorMacroCount = 0;
ISlangFileSystem* fileSystem = nullptr;
bool enableEffectAnnotations = false;
bool allowGLSLSyntax = false;
/** Pointer to an array of compiler option entries, whose size is compilerOptionEntryCount.
*/
CompilerOptionEntry* compilerOptionEntries = nullptr;
/** Number of additional compiler option entries.
*/
uint32_t compilerOptionEntryCount = 0;
};
```
The user can specify a set of commonly used compiler options directly in the `SessionDesc` struct, such as `searchPath` and `preprocessMacros`.
Additional compiler options can be specified via the `compilerOptionEntries` field, which is an array of `CompilerOptionEntry` that defines a key-value
pair of a compiler option setting, see the [Compiler Options](#compiler-options) section.
#### Targets
The `SessionDesc::targets` array can be used to describe the list of targets that the application wants to support in a session.
Often, this will consist of a single target.
Each target is described with a `TargetDesc` which includes options to control code generation for the target.
The most important fields of the `TargetDesc` are the `format` and `profile`; most others can be left at their default values.
The `format` field should be set to one of the values from the `SlangCompileTarget` enumeration.
For example:
```c++
TargetDesc targetDesc;
targetDesc.format = SLANG_SPIRV;
```
The `profile` field must be set with the ID of one of the profiles supported by the Slang compiler.
The exact numeric value of the different profiles is not currently stable across compiler versions, so applications should look up a chosen profile using `IGlobalSession::findProfile`.
For example:
```c++
targetDesc.profile = globalSession->findProfile("glsl_450");
```
Once the chosen `TargetDesc`s have been initialized, they can be attached to the `SessionDesc`:
```c++
sessionDesc.targets = &targetDesc;
sessionDesc.targetCount = 1;
```
#### Search Paths
The search paths on a session provide the paths where the compiler will look when trying to resolve a `#include` directive or `import` declaration.
The search paths can be set in the `SessionDesc` as an array of `const char*`:
```c++
const char* searchPaths[] = { "myapp/shaders/" };
sessionDesc.searchPaths = searchPaths;
sessionDesc.searchPathCount = 1;
```
#### Pre-Defined Macros
The pre-defined macros in a session will be visible at the start of each source unit that is compiled, including source units loaded via `import`.
Each pre-defined macro is described with a `PreprocessorMacroDesc`, which has `name` and `value` fields:
```c++
PreprocessorMacroDesc fancyFlag = { "ENABLE_FANCY_FEATURE", "1" };
sessionDesc.preprocessorMacros = &fancyFlag;
sessionDesc.preprocessorMacroCount = 1;
```
#### More Options
You can specify other compiler options for the session or for a specific target through the `compilerOptionEntries` and `compilerOptionEntryCount` fields
of the `SessionDesc` or `TargetDesc` structures. See the [Compiler Options](#compiler-options) section for more details on how to encode such an array.
### Loading a Module
The simplest way to load code into a session is with `ISession::loadModule()`:
```c++
IModule* module = session->loadModule("MyShaders");
```
Executing `loadModule("MyShaders")` in host C++ code is similar to using `import MyShaders` in Slang code.
The session will search for a matching module (usually in a file called `MyShaders.slang`) and will load and compile it (if it hasn't been done already).
Note that `loadModule()` does not provide any ways to customize the compiler configuration for that specific module.
The preprocessor environment, search paths, and targets will always be those specified for the session.
### Capturing Diagnostic Output
Compilers produce various kinds of _diagnostic_ output when compiling code.
This includes not only error messages when compilation fails, but also warnings and other helpful messages that may be produced even for successful compiles.
Many operations in Slang, such as `ISession::loadModule()` can optionally produce a _blob_ of diagnostic output.
For example:
```c++
Slang::ComPtr<IBlob> diagnostics;
Slang::ComPtr<IModule> module(session->loadModule("MyShaders", diagnostics.writeRef()));
```
In this example, if any diagnostic messages were produced when loading `MyShaders`, then the `diagnostics` pointer will be set to a blob that contains the textual content of those diagnostics.
The content of a blob can be accessed with `getBufferPointer()`, and the size of the content can be accessed with `getBufferSize()`.
Diagnostic blobs produces by the Slang compiler are always null-terminated, so that they can be used with C-style string APIs:
```c++
if(diagnostics)
{
fprintf(stderr, "%s\n", (const char*) diagnostics->getBufferPointer());
}
```
> #### Note ####
> The `slang::IBlob` interface is binary-compatible with the `ID3D10Blob` and `ID3DBlob` interfaces used by some Direct3D compilation APIs.
### Entry Points
When using `loadModule()` applications should ensure that entry points in their shader code are always marked with appropriate `[shader(...)]` attributes.
For example, if `MyShaders.slang` contained:
```hlsl
[shader("compute")]
void myComputeMain(...) { ... }
```
then the Slang system will automatically detect and validate this entry point as part of a `loadModule("MyShaders")` call.
After a module has been loaded, the application can look up entry points in that module using `IModule::findEntryPointByName()`:
```c++
Slang::ComPtr<IEntryPoint> computeEntryPoint;
module->findEntryPointByName("myComputeMain", computeEntryPoint.writeRef());
```
### Composition
An application might load any number of modules with `loadModule()`, and those modules might contain any number of entry points.
Before GPU kernel code can be generated it is first necessary to decide which pieces of GPU code will be used together.
Both `slang::IModule` and `slang::IEntryPoint` inherit from `slang::IComponentType`, because both can be used as components when composing a shader program.
A composition can be created with `ISession::createCompositeComponentType()`:
```c++
IComponentType* components[] = { module, entryPoint };
Slang::ComPtr<IComponentType> program;
session->createCompositeComponentType(components, 2, program.writeRef());
```
As discussed earlier in this chapter, the composition operation serves two important purposes.
First, it establishes which code is part of a compiled shader program and which is not.
Second, it established an ordering for the code in a program, which can be used for layout.
### Layout and Reflection
Some applications need to perform reflection on shader parameters and their layout, whether at runtime or as part of an offline compilation tool.
The Slang API allows layout to be queried on any `IComponentType` using `getLayout()`:
```c++
slang::ProgramLayout* layout = program->getLayout();
```
> #### Note ####
> In the current Slang API, the `ProgramLayout` type is not reference-counted.
> Currently, the lifetime of a `ProgramLayout` is tied to the `IComponentType` that returned it.
> An application must ensure that it retains the given `IComponentType` for as long as it uses the `ProgramLayout`.
Note that because both `IModule` and `IEntryPoint` inherit from `IComponentType`, they can also be queried for their layouts individually.
The layout for a module comprises just its global-scope parameters.
The layout for an entry point comprises just its entry-point parameters (both `uniform` and varying).
The details of how Slang computes layout, what guarantees it makes, and how to inspect the reflection information will be discussed in a later chapter.
Because the layout computed for shader parameters may depend on the compilation target, the `getLayout()` method actually takes a `targetIndex` parameter that is the zero-based index of the target for which layout information is being queried.
This parameter defaults to zero as a convenience for the common case where applications use only a single compilation target at runtime.
See [Using the Reflection API](reflection) chapter for more details on the reflection API.
### Linking
Before generating code, you must link the program to resolve all cross-module references. This can be done by calling
`IComponentType::link` or `IComponentType::linkWithOptions` if you wish to specify additional compiler options for the program.
For example:
```c++
Slang::ComPtr<IComponentType> linkedProgram;
Slang::ComPtr<ISlangBlob> diagnosticBlob;
program->link(linkedProgram.writeRef(), diagnosticBlob.writeRef());
```
The linking step is also used to perform link-time specialization, which is a recommended approach for shader specialization
compared to preprocessor based specialization. Please see [Link-time Specialization and Precompiled Modules](10-link-time-specialization.md) for more details.
Any diagnostic messages related to linking (for example, if an external symbol cannot be resolved) will be written to `diagnosticBlob`.
### Kernel Code
Given a linked `IComponentType`, an application can extract kernel code for one of its entry points using `IComponentType::getEntryPointCode()`:
```c++
int entryPointIndex = 0; // only one entry point
int targetIndex = 0; // only one target
Slang::ComPtr<IBlob> kernelBlob;
linkedProgram->getEntryPointCode(
entryPointIndex,
targetIndex,
kernelBlob.writeRef(),
diagnostics.writeRef());
```
Any diagnostic messages related to back-end code generation (for example, if the chosen entry point requires features not available on the chosen target) will be written to `diagnostics`.
The `kernelBlob` output is a `slang::IBlob` that can be used to access the generated code (whether binary or textual).
In many cases `kernelBlob->getBufferPointer()` can be passed directly to the appropriate graphics API to load kernel code onto a GPU.
## Multithreading
The only functions which are currently thread safe are
```C++
SlangSession* spCreateSession(const char* deprecated);
SlangResult slang_createGlobalSession(SlangInt apiVersion, slang::IGlobalSession** outGlobalSession);
SlangResult slang_createGlobalSession2(const SlangGlobalSessionDesc* desc, slang::IGlobalSession** outGlobalSession);
SlangResult slang_createGlobalSessionWithoutCoreModule(SlangInt apiVersion, slang::IGlobalSession** outGlobalSession);
ISlangBlob* slang_getEmbeddedCoreModule();
SlangResult slang::createGlobalSession(slang::IGlobalSession** outGlobalSession);
const char* spGetBuildTagString();
```
This assumes Slang has been built with the C++ multithreaded runtime, as is the default.
All other functions and methods are not [reentrant](https://en.wikipedia.org/wiki/Reentrancy_(computing)) and can only execute on a single thread. More precisely, functions and methods can only be called on a *single* thread at *any one time*. This means for example a global session can be used across multiple threads, as long as some synchronization enforces that only one thread can be in a Slang call at any one time.
Much of the Slang API is available through [COM interfaces](https://en.wikipedia.org/wiki/Component_Object_Model). In strict COM, interfaces should be atomically reference counted. Currently *MOST* Slang API COM interfaces are *NOT* atomic reference counted. One exception is the `ISlangSharedLibrary` interface when produced from [host-callable](../cpu-target.md#host-callable). It is atomically reference counted, allowing it to persist and be used beyond the original compilation and be freed on a different thread.
## Compiler Options
Both the `SessionDesc`, `TargetDesc` structures contain fields that encodes a `CompilerOptionEntry` array for additional compiler options to apply on the session or the target. In addition,
the `IComponentType::linkWithOptions()` method allow you to specify additional compiler options when linking a program. All these places accepts the same encoding of compiler options, which is
documented in this section.
The `CompilerOptionEntry` structure is defined as follows:
```c++
struct CompilerOptionEntry
{
CompilerOptionName name;
CompilerOptionValue value;
};
```
Where `CompilerOptionName` is an `enum` specifying the compiler option to set, and `value` encodes the value of the option.
`CompilerOptionValue` is a structure that allows you to endcode up to two integer or string values for a compiler option:
```c++
enum class CompilerOptionValueKind
{
Int,
String
};
struct CompilerOptionValue
{
CompilerOptionValueKind kind = CompilerOptionValueKind::Int;
int32_t intValue0 = 0;
int32_t intValue1 = 0;
const char* stringValue0 = nullptr;
const char* stringValue1 = nullptr;
};
```
The meaning of each integer or string value is dependent on the compiler option. The following table lists all available compiler options that can be set and
meanings of their `CompilerOptionValue` encodings.
|CompilerOptionName | Description |
|:------------------ |:----------- |
| MacroDefine | Specifies a preprocessor macro define entry. `stringValue0` encodes macro name, `stringValue1` encodes the macro value.
| Include | Specifies an additional search path. `stringValue0` encodes the additional path. |
| Language | Specifies the input language. `intValue0` encodes a value defined in `SlangSourceLanguage`. |
| MatrixLayoutColumn | Use column major matrix layout as default. `intValue0` encodes a bool value for the setting. |
| MatrixLayoutRow | Use row major matrix layout as default. `intValue0` encodes a bool value for the setting. |
| Profile | Specifies the target profile. `intValue0` encodes the raw profile representation returned by `IGlobalSession::findProfile()`. |
| Stage | Specifies the target entry point stage. `intValue0` encodes the stage defined in `SlangStage` enum. |
| Target | Specifies the target format. Has same effect as setting TargetDesc::format. |
| WarningsAsErrors | Specifies a list of warnings to be treated as errors. `stringValue0` encodes a comma separated list of warning codes or names, or can be "all" to indicate all warnings. |
| DisableWarnings | Specifies a list of warnings to disable. `stringValue0` encodes comma separated list of warning codes or names. |
| EnableWarning | Specifies a list of warnings to enable. `stringValue0` encodes comma separated list of warning codes or names. |
| DisableWarning | Specify a warning to disable. `stringValue0` encodes the warning code or name. |
| ReportDownstreamTime | Turn on/off downstream compilation time report. `intValue0` encodes a bool value for the setting. |
| ReportPerfBenchmark | Turn on/off reporting of time spend in different parts of the compiler. `intValue0` encodes a bool value for the setting. |
| SkipSPIRVValidation | Specifies whether or not to skip the validation step after emitting SPIRV. `intValue0` encodes a bool value for the setting. |
| Capability | Specify an additional capability available in the compilation target. `intValue0` encodes a capability defined in the `CapabilityName` enum. |
| DefaultImageFormatUnknown | Whether or not to use `unknown` as the image format when emitting SPIRV for a texture/image resource parameter without a format specifier. `intValue0` encodes a bool value for the setting. |
| DisableDynamicDispatch | (Internal use only) Disables generation of dynamic dispatch code. `intValue0` encodes a bool value for the setting. |
| DisableSpecialization | (Internal use only) Disables specialization pass. `intValue0` encodes a bool value for the setting. |
| FloatingPointMode | Specifies the floating point mode. `intValue0` encodes the floating mode point defined in the `SlangFloatingPointMode` enum. |
| DebugInformation | Specifies the level of debug information to include in the generated code. `intValue0` encodes an value defined in the `SlangDebugInfoLevel` enum. |
| LineDirectiveMode | Specifies the line directive mode to use the generated textual code such as HLSL or CUDA. `intValue0` encodes an value defined in the `SlangLineDirectiveMode` enum. |
| Optimization | Specifies the optimization level. `intValue0` encodes the value for the setting defined in the `SlangOptimizationLevel` enum. |
| Obfuscate | Specifies whether or not to turn on obfuscation. When obfuscation is on, Slang will strip variable and function names from the target code and replace them with hash values. `intValue0` encodes a bool value for the setting. |
| VulkanBindShift | Specifies the `-fvk-bind-shift` option. `intValue0` (higher 8 bits): kind, `intValue0` (lower bits): set; `intValue1`: shift. |
| VulkanBindGlobals | Specifies the `-fvk-bind-globals` option. `intValue0`: index, `intValue`: set. |
| VulkanInvertY | Specifies the `-fvk-invert-y` option. `intValue0` specifies a bool value for the setting. |
| VulkanUseDxPositionW | Specifies the `-fvk-use-dx-position-w` option. `intValue0` specifies a bool value for the setting. |
| VulkanUseEntryPointName | When set, will keep the original name of entrypoints as they are defined in the source instead of renaming them to `main`. `intValue0` specifies a bool value for the setting. |
| VulkanUseGLLayout | When set, will use std430 layout instead of D3D buffer layout for raw buffer load/stores. `intValue0` specifies a bool value for the setting. |
| VulkanEmitReflection | Specifies the `-fspv-reflect` option. When set will include additional reflection instructions in the output SPIRV. `intValue0` specifies a bool value for the setting. |
| GLSLForceScalarLayout | Specifies the `-force-glsl-scalar-layout` option. When set will use `scalar` layout for all buffers when generating SPIRV. `intValue0` specifies a bool value for the setting. |
| EnableEffectAnnotations | When set will turn on compatibility mode to parse legacy HLSL effect annotation syntax. `intValue0` specifies a bool value for the setting. |
| EmitSpirvViaGLSL | When set will emit SPIRV by emitting GLSL first and then use glslang to produce the final SPIRV code. `intValue0` specifies a bool value for the setting. |
| EmitSpirvDirectly | When set will use Slang's direct-to-SPIRV backend to generate SPIRV directly from Slang IR. `intValue0` specifies a bool value for the setting. |
| SPIRVCoreGrammarJSON | When set will use the provided SPIRV grammar file to parse SPIRV assembly blocks. `stringValue0` specifies a path to the spirv core grammar json file. |
| IncompleteLibrary | When set will not issue an error when the linked program has unresolved extern function symbols. `intValue0` specifies a bool value for the setting. |
| DownstreamArgs | Provide additional arguments to the downstream compiler. `stringValue0` encodes the downstream compiler name, `stringValue1` encodes the argument list, one argument per line. |
| DumpIntermediates | When set will dump the intermediate source output. `intValue0` specifies a bool value for the setting. |
| DumpIntermediatePrefix | The file name prefix for the intermediate source output. `stringValue0` specifies a string value for the setting. |
| DebugInformationFormat | Specifies the format of debug info. `intValue0` a value defined in the `SlangDebugInfoFormat` enum. |
| VulkanBindShiftAll | Specifies the `-fvk-bind-shift` option for all spaces. `intValue0`: kind, `intValue1`: shift. |
| GenerateWholeProgram | When set will emit target code for the entire program instead of for a specific entrypoint. `intValue0` specifies a bool value for the setting. |
| UseUpToDateBinaryModule | When set will only load precompiled modules if it is up-to-date with its source. `intValue0` specifies a bool value for the setting. |
| ValidateUniformity | When set will perform [uniformity analysis](a1-05-uniformity.md).|
## Debugging
Slang's SPIRV backend supports generating debug information using the [NonSemantic Shader DebugInfo Instructions](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/nonsemantic/NonSemantic.Shader.DebugInfo.100.asciidoc).
To enable debugging information when targeting SPIRV, specify the `-emit-spirv-directly` and the `-g2` argument when using `slangc` tool, or set `EmitSpirvDirectly` to `1` and `DebugInformation` to `SLANG_DEBUG_INFO_LEVEL_STANDARD` when using the API.
Debugging support has been tested with RenderDoc.

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More