summary refs log tree commit diff
path: root/doc/cross-compilation.xml
diff options
context:
space:
mode:
authorJohn Ericson <John.Ericson@Obsidian.Systems>2019-03-20 18:21:00 -0400
committerJohn Ericson <git@JohnEricson.me>2019-03-24 22:12:21 -0400
commit5e5266f83fc2cce2b353601da0f29bd6805d4597 (patch)
tree5a3ab6a32817791808abb6d455e8be831a3f9ef2 /doc/cross-compilation.xml
parent655a29ff9ccf9b27e52893de24f9535bda7e3cd2 (diff)
downloadnixpkgs-5e5266f83fc2cce2b353601da0f29bd6805d4597.tar
nixpkgs-5e5266f83fc2cce2b353601da0f29bd6805d4597.tar.gz
nixpkgs-5e5266f83fc2cce2b353601da0f29bd6805d4597.tar.bz2
nixpkgs-5e5266f83fc2cce2b353601da0f29bd6805d4597.tar.lz
nixpkgs-5e5266f83fc2cce2b353601da0f29bd6805d4597.tar.xz
nixpkgs-5e5266f83fc2cce2b353601da0f29bd6805d4597.tar.zst
nixpkgs-5e5266f83fc2cce2b353601da0f29bd6805d4597.zip
manual: Document `pkgsFooBar` and more
There was a bunch of stuff in the cross section that haddn't had any
attention in a while. I might need to slim it down later, but this is
good for now.
Diffstat (limited to 'doc/cross-compilation.xml')
-rw-r--r--doc/cross-compilation.xml374
1 files changed, 291 insertions, 83 deletions
diff --git a/doc/cross-compilation.xml b/doc/cross-compilation.xml
index dbaf6f104ec..d97f12f2566 100644
--- a/doc/cross-compilation.xml
+++ b/doc/cross-compilation.xml
@@ -12,11 +12,12 @@
    computing power and memory to compile their own programs. One might think
    that cross-compilation is a fairly niche concern. However, there are
    significant advantages to rigorously distinguishing between build-time and
-   run-time environments! This applies even when one is developing and
-   deploying on the same machine. Nixpkgs is increasingly adopting the opinion
-   that packages should be written with cross-compilation in mind, and nixpkgs
-   should evaluate in a similar way (by minimizing cross-compilation-specific
-   special cases) whether or not one is cross-compiling.
+   run-time environments! Significant, because the benefits apply even when one
+   is developing and deploying on the same machine. Nixpkgs is increasingly
+   adopting the opinion that packages should be written with cross-compilation
+   in mind, and nixpkgs should evaluate in a similar way (by minimizing
+   cross-compilation-specific special cases) whether or not one is
+   cross-compiling.
   </para>
 
   <para>
@@ -30,7 +31,7 @@
  <section xml:id="sec-cross-packaging">
   <title>Packaging in a cross-friendly manner</title>
 
-  <section xml:id="sec-cross-platform-parameters">
+  <section xml:id="ssec-cross-platform-parameters">
    <title>Platform parameters</title>
 
    <para>
@@ -218,8 +219,20 @@
    </variablelist>
   </section>
 
-  <section xml:id="sec-cross-specifying-dependencies">
-   <title>Specifying Dependencies</title>
+  <section xml:id="ssec-cross-dependency-categorization">
+   <title>Theory of dependency categorization</title>
+
+   <note>
+    <para>
+     This is a rather philosophical description that isn't very
+     Nixpkgs-specific. For an overview of all the relevant attributes given to
+     <varname>mkDerivation</varname>, see
+     <xref
+     linkend="ssec-stdenv-dependencies"/>. For a description of how
+     everything is implemented, see
+     <xref linkend="ssec-cross-dependency-implementation" />.
+    </para>
+   </note>
 
    <para>
     In this section we explore the relationship between both runtime and
@@ -227,84 +240,98 @@
    </para>
 
    <para>
-    A runtime dependency between 2 packages implies that between them both the
-    host and target platforms match. This is directly implied by the meaning of
-    "host platform" and "runtime dependency": The package dependency exists
-    while both packages are running on a single host platform.
+    A run time dependency between two packages requires that their host
+    platforms match. This is directly implied by the meaning of "host platform"
+    and "runtime dependency": The package dependency exists while both packages
+    are running on a single host platform.
    </para>
 
    <para>
-    A build time dependency, however, implies a shift in platforms between the
-    depending package and the depended-on package. The meaning of a build time
-    dependency is that to build the depending package we need to be able to run
-    the depended-on's package. The depending package's build platform is
-    therefore equal to the depended-on package's host platform. Analogously,
-    the depending package's host platform is equal to the depended-on package's
-    target platform.
+    A build time dependency, however, has a shift in platforms between the
+    depending package and the depended-on package. "build time dependency"
+    means that to build the depending package we need to be able to run the
+    depended-on's package. The depending package's build platform is therefore
+    equal to the depended-on package's host platform.
    </para>
 
    <para>
-    In this manner, given the 3 platforms for one package, we can determine the
-    three platforms for all its transitive dependencies. This is the most
-    important guiding principle behind cross-compilation with Nixpkgs, and will
-    be called the <wordasword>sliding window principle</wordasword>.
+    If both the dependency and depending packages aren't compilers or other
+    machine-code-producing tools, we're done. And indeed
+    <varname>buildInputs</varname> and <varname>nativeBuildInputs</varname>
+    have covered these simpler build-time and run-time (respectively) changes
+    for many years. But if the depedency does produce machine code, we might
+    need to worry about it's target platform too. In principle, that target
+    platform might be any of the depending package's build, host, or target
+    platforms, but we prohibit dependencies from a "later" platform to an
+    earlier platform to limit confusion because we've never seen a legitimate
+    use for them.
    </para>
 
    <para>
-    Some examples will make this clearer. If a package is being built with a
-    <literal>(build, host, target)</literal> platform triple of <literal>(foo,
-    bar, bar)</literal>, then its build-time dependencies would have a triple
-    of <literal>(foo, foo, bar)</literal>, and <emphasis>those
-    packages'</emphasis> build-time dependencies would have a triple of
-    <literal>(foo, foo, foo)</literal>. In other words, it should take two
-    "rounds" of following build-time dependency edges before one reaches a
-    fixed point where, by the sliding window principle, the platform triple no
-    longer changes. Indeed, this happens with cross-compilation, where only
-    rounds of native dependencies starting with the second necessarily coincide
-    with native packages.
+    Finally, if the depending package is a compiler or other
+    machine-code-producing tool, it might need dependencies that run at "emit
+    time". This is for compilers that (regrettably) insist on being in built
+    together with their source langauges' standard libraries. Assuming build !=
+    host != target, a run-time dependency of the standard library cannot be run
+    at the compiler's build time or run time, but only at the run time of code
+    emitted by the compiler.
    </para>
 
-   <note>
-    <para>
-     The depending package's target platform is unconstrained by the sliding
-     window principle, which makes sense in that one can in principle build
-     cross compilers targeting arbitrary platforms.
-    </para>
-   </note>
-
    <para>
-    How does this work in practice? Nixpkgs is now structured so that
-    build-time dependencies are taken from <varname>buildPackages</varname>,
-    whereas run-time dependencies are taken from the top level attribute set.
-    For example, <varname>buildPackages.gcc</varname> should be used at
-    build-time, while <varname>gcc</varname> should be used at run-time. Now,
-    for most of Nixpkgs's history, there was no
-    <varname>buildPackages</varname>, and most packages have not been
-    refactored to use it explicitly. Instead, one can use the six
-    (<emphasis>gasp</emphasis>) attributes used for specifying dependencies as
-    documented in <xref linkend="ssec-stdenv-dependencies"/>. We "splice"
-    together the run-time and build-time package sets with
-    <varname>callPackage</varname>, and then <varname>mkDerivation</varname>
-    for each of four attributes pulls the right derivation out. This splicing
-    can be skipped when not cross-compiling as the package sets are the same,
-    but is a bit slow for cross-compiling. Because of this, a
-    best-of-both-worlds solution is in the works with no splicing or explicit
-    access of <varname>buildPackages</varname> needed. For now, feel free to
-    use either method.
+    Putting this all together, that means we have dependencies in the form
+    "host → target", in at most the following six combinations:
+    <table>
+     <caption>Possible dependency types</caption>
+     <thead>
+      <tr>
+       <th>Dependency's host platform</th>
+       <th>Dependency's target platform</th>
+      </tr>
+     </thead>
+     <tbody>
+      <tr>
+       <td>build</td>
+       <td>build</td>
+      </tr>
+      <tr>
+       <td>build</td>
+       <td>host</td>
+      </tr>
+      <tr>
+       <td>build</td>
+       <td>target</td>
+      </tr>
+      <tr>
+       <td>host</td>
+       <td>host</td>
+      </tr>
+      <tr>
+       <td>host</td>
+       <td>target</td>
+      </tr>
+      <tr>
+       <td>target</td>
+       <td>target</td>
+      </tr>
+     </tbody>
+    </table>
    </para>
 
-   <note>
-    <para>
-     There is also a "backlink" <varname>targetPackages</varname>, yielding a
-     package set whose <varname>buildPackages</varname> is the current package
-     set. This is a hack, though, to accommodate compilers with lousy build
-     systems. Please do not use this unless you are absolutely sure you are
-     packaging such a compiler and there is no other way.
-    </para>
-   </note>
+   <para>
+    Some examples will make this table clearer. Suppose there's some package
+    that is being built with a <literal>(build, host, target)</literal>
+    platform triple of <literal>(foo, bar, baz)</literal>. If it has a
+    build-time library dependency, that would be a "host → build" dependency
+    with a triple of <literal>(foo, foo, *)</literal> (the target platform is
+    irrelevant). If it needs a compiler to be built, that would be a "build →
+    host" dependency with a triple of <literal>(foo, foo, *)</literal> (the
+    target platform is irrelevant). That compiler, would be built with another
+    compiler, also "build → host" dependency, with a triple of <literal>(foo,
+    foo, foo)</literal>.
+   </para>
   </section>
 
-  <section xml:id="sec-cross-cookbook">
+  <section xml:id="ssec-cross-cookbook">
    <title>Cross packaging cookbook</title>
 
    <para>
@@ -450,21 +477,202 @@ nix-build &lt;nixpkgs&gt; --arg crossSystem '{ config = "&lt;arch&gt;-&lt;os&gt;
  <section xml:id="sec-cross-infra">
   <title>Cross-compilation infrastructure</title>
 
-  <para>
-   To be written.
-  </para>
+  <section xml:id="ssec-cross-dependency-implementation">
+   <title>Implementation of dependencies</title>
 
-  <note>
    <para>
-    If one explores Nixpkgs, they will see derivations with names like
-    <literal>gccCross</literal>. Such <literal>*Cross</literal> derivations is
-    a holdover from before we properly distinguished between the host and
-    target platforms—the derivation with "Cross" in the name covered the
-    <literal>build = host != target</literal> case, while the other covered the
-    <literal>host = target</literal>, with build platform the same or not based
-    on whether one was using its <literal>.nativeDrv</literal> or
-    <literal>.crossDrv</literal>. This ugliness will disappear soon.
+    The categorizes of dependencies developed in
+    <xref
+    linkend="ssec-cross-dependency-categorization"/> are specified as
+    lists of derivations given to <varname>mkDerivation</varname>, as
+    documented in <xref linkend="ssec-stdenv-dependencies"/>. In short, the
+    each list of dependencies for "host → target" of "foo → bar" is called
+    <varname>depsFooBar</varname>, with the exceptions for backwards
+    compatibility that <varname>depsBuildHost</varname> is instead called
+    <varname>nativeBuildInputs</varname> and <varname>depsHostTarget</varname>
+    is instead called <varname>buildInputs</varname>. Nixpkgs is now structured
+    so that each <varname>depsFooBar</varname> is automatically taken from
+    <varname>pkgsFooBar</varname>. (These <varname>pkgsFooBar</varname>s are
+    quite new, so there is no special case for
+    <varname>nativeBuildInputs</varname> and <varname>buildInputs</varname>.)
+    For example, <varname>pkgsBuildHost.gcc</varname> should be used at
+    build-time, while <varname>pkgsHostTarget.gcc</varname> should be used at
+    run-time.
    </para>
-  </note>
+
+   <para>
+    Now, for most of Nixpkgs's history, there was no
+    <varname>pkgsFooBar</varname> attributes, and most packages have not been
+    refactored to use it explicitly. Prior to those, there were just
+    <varname>buildPackages</varname>, <varname>pkgs</varname>, and
+    <varname>targetPackages</varname>. Those are now redefined as aliases to
+    <varname>pkgsBuildHost</varname>, <varname>pkgsHostTarget</varname>, and
+    <varname>pkgsTargetTarget</varname>. It is fine, indeed if anything
+    recommended, to use them for libraries to show that the host platform is
+    irrelevant.
+   </para>
+
+   <para>
+    But before that, there was just <varname>pkgs</varname>, even though both
+    <varname>buildInputs</varname> and <varname>nativeBuildInputs</varname>
+    existed. [Cross barely worked, and those were implemented with some hacks
+    on <varname>mkDerivation</varname> to override dependencies.] What this
+    means is the vast majority of packages do not use any explicit package set
+    to populate their dependencies, just using whatever
+    <varname>callPackage</varname> gives them even if they do correctly sort
+    their dependencies into the multiple lists described above. And indeed,
+    asking that users both sort their dependencies, <emphasis>and</emphasis>
+    take them from the right attribute set, is both too onerous and redundant,
+    so the recommend approach (for now) is to continue just categorizing by
+    list and not using an explicit package set.
+   </para>
+
+   <para>
+    No make this work, we "splice" together the six
+    <varname>pkgsFooBar</varname> package sets and have
+    <varname>callPackage</varname> actually take its arguments from that. This
+    is currently implemented in <filename>pkgs/top-level/splice.nix</filename>.
+    <varname>mkDerivation</varname> then, for each dependency attribute, pulls
+    the right derivation out from the splice. This splicing can be skipped when
+    not cross-compiling as the package sets are the same, but still is a bit
+    slow for cross-compiling. We'd like to do something better, but haven't
+    come up with anything yet.
+   </para>
+  </section>
+
+  <section xml:id="ssec-bootstrapping">
+   <title>Bootstrapping</title>
+
+   <para>
+    Each of the package sets described above come from a single bootstrapping
+    stage. While <filename>pkgs/top-level/default.nix</filename>, coordinates
+    the composition of stages at a high level,
+    <filename>pkgs/top-level/stage.nix</filename> "ties the knot" (creates the
+    fixed point) of each stage. The package sets are defined per-stage however,
+    so they can be thought of as edges between stages (the nodes) in a graph.
+    Compositions like <literal>pkgsBuildTarget.TargetPackages</literal> can be
+    thought of as paths to this graph.
+   </para>
+
+   <para>
+    While there are many package sets, and thus many edges, the stages can also
+    be arranged in a linear chain. In other words, many of the edges are
+    redundant as far as connectivity is concerned. This hinges on the type of
+    bootstrapping we do. Currently for cross it is:
+    <orderedlist>
+     <listitem>
+      <para>
+       <literal>(native, native, native)</literal>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <literal>(native, native, foreign)</literal>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <literal>(native, foreign, foreign)</literal>
+      </para>
+     </listitem>
+    </orderedlist>
+    In each stage, <varname>pkgsBuildHost</varname> refers the the previous
+    stage, <varname>pkgsBuildBuild</varname> refers to the one before that, and
+    <varname>pkgsHostTarget</varname> refers to the current one, and
+    <varname>pkgsTargetTarget</varname> refers to the next one. When there is
+    no previous or next stage, they instead refer to the current stage. Note
+    how all the invariants about the mapping between dependency and depending
+    packages' build host and target platforms are preserved.
+    <varname>pkgsBuildTarget</varname> and <varname>pkgsHostHost</varname> are
+    more complex in that the stage fitting the requirements isn't always a
+    fixed chain of "prevs" and "nexts" away (modulo the "saturating"
+    self-references at the ends). We just special case instead. All the primary
+    edges are implemented is in <filename>pkgs/stdenv/booter.nix</filename>,
+    and secondarily aliases in <filename>pkgs/top-level/stage.nix</filename>.
+   </para>
+
+   <note>
+    <para>
+     Note the native stages are bootstrapped in legacy ways that predate the
+     current cross implementation. This is why the the bootstrapping stages
+     leading up to the final stages are ignored inthe previous paragraph.
+    </para>
+   </note>
+
+   <para>
+    If one looks at the 3 platform triples, one can see that they overlap such
+    that one could put them together into a chain like:
+<programlisting>
+(native, native, native, foreign, foreign)
+</programlisting>
+    If one imagines the saturating self references at the end being replaced
+    with infinite stages, and then overlays those platform triples, one ends up
+    with the infinite tuple:
+<programlisting>
+(native..., native, native, native, foreign, foreign, foreign...)
+</programlisting>
+    On can then imagine any sequence of platforms such that there are bootstrap
+    stages with their 3 platforms determined by "sliding a window" that is the
+    3 tuple through the sequence. This was the original model for
+    bootstrapping. Without a target platform (assume a better world where all
+    compilers are multi-target and all standard libraries are built in their
+    own derivation), this is sufficient. Conversely if one wishes to cross
+    compile "faster", with a "Canadian Cross" bootstraping stage where
+    <literal>build != host != target</literal>, more bootstrapping stages are
+    needed since no sliding window providess the pesky
+    <varname>pkgsBuildTarget</varname> package set since it skips the Canadian
+    cross stage's "host".
+   </para>
+
+   <note>
+    <para>
+     It is much better to refer to <varname>buildPackages</varname> than
+     <varname>targetPackages</varname>, or more broadly package sets that do
+     not mention "target". There are three reasons for this.
+    </para>
+    <para>
+     First, it is because bootstrapping stages do not have a unique
+     <varname>targetPackages</varname>. For example a <literal>(x86-linux,
+     x86-linux, arm-linux)</literal> and <literal>(x86-linux, x86-linux,
+     x86-windows)</literal> package set both have a <literal>(x86-linux,
+     x86-linux, x86-linux)</literal> package set. Because there is no canonical
+     <varname>targetPackages</varname> for such a native (<literal>build ==
+     host == target</literal>) package set, we set their
+     <varname>targetPackages</varname>
+    </para>
+    <para>
+     Second, it is because this is a frequent source of hard-to-follow
+     "infinite recursions" / cycles. When only packages sets that don't mention
+     target are used, the package set forms a directly acyclic graph. This
+     means that all cycles that exist are confirmed to one stage. This means
+     they are a lot smaller, so easier to follow in the code or a backtrace. It
+     also means they are present in native and cross builds alike, and so more
+     likely to be caught by CI and other users.
+    </para>
+    <para>
+     Thirdly, it is because everything target-mentioning only exists to
+     accommodate compilers with lousy build systems that insist on the compiler
+     itself and standard library being built together. Of course that is bad
+     because bigger derivation means longer rebuilds. It is also subpar because
+     it tends to make the standard libraries less like other libraries than
+     they could be, complicating code and build systems alike. Because of the
+     other problems, and because of these innate disadvantages, compilers ought
+     to be packaged another way where possible.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If one explores Nixpkgs, they will see derivations with names like
+     <literal>gccCross</literal>. Such <literal>*Cross</literal> derivations is
+     a holdover from before we properly distinguished between the host and
+     target platforms—the derivation with "Cross" in the name covered the
+     <literal>build = host != target</literal> case, while the other covered
+     the <literal>host = target</literal>, with build platform the same or not
+     based on whether one was using its <literal>.nativeDrv</literal> or
+     <literal>.crossDrv</literal>. This ugliness will disappear soon.
+    </para>
+   </note>
+  </section>
  </section>
 </chapter>