William Harrison Introduction to Compiler Construction University of Missouri Spring 2013 These slides were graciously provided by Helmut Seidl. 1 0 Introduction Principle of Interpretation: Program + Input Interpreter Advantage: No precomputation on the program text startup-time Output ==⇒ no/short Disadvantages: Program parts are repeatedly analyzed during execution + less efficient access to program variables ==⇒ slower execution speed 2 Principle of Compilation: Program Compiler Code Input Code Output Two Phases (at two different Times): • Translation of the source program into a machine program (at compile time); • Execution of the machine program on input data (at run time). 3 Preprocessing of the source program provides for • efficient access to the values of program variables at run time • global program transformations to increase execution speed. Disadvantage: Compilation takes time Advantage: Program execution is sped up long running or often run programs 4 ==⇒ compilation pays off in Structure of a compiler: Source program Frontend Internal representation (Syntax tree) Optimizations Internal representation Code generation Program for target machine 5 Subtasks in code generation: Goal is a good exploitation of the hardware resources: 1. Instruction Selection: Selection of efficient, semantically equivalent instruction sequences; 2. Register-allocation: Best use of the available processor registers 3. Instruction Scheduling: Reordering of the instruction stream to exploit intra-processor parallelism For several reasons, e.g. modularization of code generation and portability, code generation may be split into two phases: 6 Intermediate representation Code generation abstract machine code abstract machine code Compiler concrete machine Interpreter Output code alternatively: Input 7 Virtual machine • idealized architecture, • simple code generation, • easily implemented on real hardware. Advantages: • Porting the compiler to a new target architecture is simpler, • Modularization makes the compiler easier to modify, • Translation of program constructs is separated from the exploitation of architectural features. 8 Virtual (or: abstract) machines for some programming languages: Pascal → P-machine Smalltalk → Bytecode Prolog → WAM SML, Haskell → STGM Java → JVM (“Warren Abstract Machine”) 9 We will consider the following languages and virtual machines: C → CMa // imperative PuF → MaMa // functional Proll → WiM // logic based C± → OMa // object oriented multi-threaded C → threaded CMa // concurrent 10 The Translation of C 11 1 The Architecture of the CMa • Each virtual machine provides a set of instructions • Instructions are executed on the virtual hardware • This virtual hardware can be viewed as a set of data structures, which the instructions access • ... and which are managed by the run-time system For the CMa we need: 12 The Data Store: S 0 SP • S is the (data) store, onto which new cells are allocated in a LIFO discipline ==⇒ Stack. • SP (= b Stack Pointer) is a register, which contains the address of the topmost allocated cell, Simplification: All types of data fit into one cell of S. 13 The Code/Instruction Store: C 0 1 PC • C is the Code store, which contains the program. Each cell of field C can store exactly one virtual instruction. • PC (= b Program Counter) is a register, which contains the address of the instruction to be executed next. • Initially, PC contains the address 0. ==⇒ C[ 0] contains the instruction to be executed first. 14 Execution of Programs: • The machine loads the instruction in C[PC] into a Instruction-Register IR and executes it • PC is incremented by 1 before the execution of the instruction while (true) { IR = C[PC]; PC++; execute (IR); } • The execution of the instruction may overwrite the PC (jumps). • The Main Cycle of the machine will be halted by executing the instruction halt , which returns control to the environment, e.g. the operating system • More instructions will be introduced by demand 15 2 Simple expressions and assignments Problem: evaluate the expression (1 + 7) ∗ 3 ! This means: generate an instruction sequence, which • determines the value of the expression and • pushes it on top of the stack... Idea: • first compute the values of the subexpressions, • save these values on top of the stack, • then apply the operator. 16 The general principle: • instructions expect their arguments on top of the stack, • execution of an instruction consumes its operands, • results, if any, are stored on top of the stack. loadc q q SP++; S[SP] = q; Instruction loadc q needs no operand on top of the stack, pushes the constant q onto the stack. Note: the content of register SP is only implicitly represented, namely through the height of the stack. 17 3 8 mul 24 SP--; S[SP] = S[SP] ∗ S[SP+1]; mul expects two operands on top of the stack, consumes both, and pushes their product onto the stack. ... the other binary arithmetic and logical instructions, add, sub, div, mod, and, or and xor, work analogously, as do the comparison instructions eq, neq, le, leq, gr and geq. 18 Example: The operator leq leq 7 3 1 Remark: 0 represents false, all other integers true. Unary operators result. neg and 8 not consume one operand and produce one neg S[SP] = – S[SP]; 19 −8 Example: 1 + 7: Code for loadc 1 loadc 7 add Execution of this code sequence: loadc 1 1 loadc 7 20 7 1 add 8 Variables are associated with cells in S: z: y: x: Code generation will be described by some Translation Functions, code, codeL , and codeR . Arguments: A program construct and a function ρ. ρ delivers for each variable x the relative address of x. ρ is called Address Environment. 21 Variables can be used in two different ways: Example: x = y+1 We are interested in the value of y, but in the address of x. The syntactic position determines, whether the L-value or the R-value of a variable is required. L-value of x = address of x R-value of x = content of x codeR e ρ produces code to compute the R-value of e in the address environment ρ codeL e ρ analogously for the L-value Note: Not every expression has an L-value (Ex.: 22 x + 1). We define: codeR (e1 + e2 ) ρ = codeR e1 ρ codeR e2 ρ add ... analogously for the other binary operators codeR (−e) ρ = codeR e ρ neg ... analogously for the other unary operators codeR q ρ = loadc q codeL x ρ = loadc (ρ x) ... 23 codeR x ρ = codeL x ρ load The instruction the stack. load loads the contents of the cell, whose address is on top of load 13 13 13 S[SP] = S[S[SP]]; 24 codeR ( x = e) ρ = codeR e ρ codeL x ρ store store writes the contents of the second topmost stack cell into the cell, whose address in on top of the stack, and leaves the written value on top of the stack. Note: this differs from the code generated by gcc ?? 13 13 store 13 S[S[SP]] = S[SP-1]; SP--; 25 Example: Code for e ≡ x = y−1 with ρ = { x 7→ 4, y 7→ 7}. codeR e ρ produces: loadc 7 loadc 1 loadc 4 load sub store Improvements: Introduction of special instructions for frequently used instruction sequences, e.g., loada q = loadc q load storea q = loadc q store 26 3 Is Statements and Statement Sequences e an expression, then e; is a statement. Statements do not deliver a value. The contents of the SP before and after the execution of the generated code must therefore be the same. code e; ρ = codeR e ρ pop The instruction pop eliminates the top element of the stack. 1 pop SP--; 27 The code for a statement sequence is the concatenation of the code for the statements of the sequence: code (s ss) ρ = code s ρ code ss ρ code ε ρ = // empty sequence of instructions 28 4 Conditional and Iterative Statements We need jumps to deviate from the serial execution of consecutive statements: jump A A PC PC PC = A; 29 1 jumpz A PC PC 0 jumpz A A PC PC if (S[SP] == 0) PC = A; SP--; 30 For ease of comprehension, we use symbolic jump targets. They will later be replaced by absolute addresses. Instead of absolute code addresses, one could generate relative addresses, i.e., relative to the actual PC. Advantages: • smaller addresses suffice most of the time; • the code becomes relocatable, i.e., can be moved around in memory. 31 4.1 One-sided Conditional Statement Let us first regard s ≡ if (e) s′ . Idea: • Put code for the evaluation of e and s′ consecutively in the code store, • Insert a conditional jump (jump on zero) in between. 32 code s ρ = codeR e ρ codeR for e jumpz A jumpz code s′ ρ A: code for s’ ... 33 4.2 Two-sided Conditional Statement s ≡ if (e) s1 else s2 . The same strategy yields: Let us now regard code s ρ = codeR e ρ codeR for e jumpz A jumpz code s1 ρ code for s 1 jump B A: code s2 ρ jump B: ... code for s 34 2 Example: Be s ρ = { x 7→ 4, y 7→ 7} and ≡ if ( x > y) (i ) x = x − y; (ii ) else y = y − x; (iii ) code s ρ produces: loada 4 loada 4 loada 7 loada 7 loada 4 gr sub sub jumpz A storea 4 storea 7 pop pop jump B (i ) (ii ) A: B: loada 7 ... (iii ) 35 4.3 while-Loops Let us regard the loop code s ρ s ≡ while (e) s′ . We generate: codeR for e = A: codeR e ρ jumpz jumpz B code for s’ code s′ ρ jump jump A B: ... 36 Example: Be ρ = { a 7→ 7, b 7→ 8, c 7→ 9} and s the statement: while ( a > 0) {c = c + 1; a = a − b; } code s ρ A: produces the sequence: loada 7 loada 9 loada 7 loadc 0 loadc 1 loada 8 gr add sub jumpz B storea 9 storea 7 pop pop jump A 37 B: ... 4.4 for-Loops The for-loop s ≡ for (e1 ; e2 ; e3 ) s′ is equivalent to the statement sequence e1 ; while (e2 ) {s′ e3 ; } – provided that s′ contains no continue-statement. We therefore translate: code s ρ = codeR e1 pop A: codeR e2 ρ jumpz B code s′ ρ codeR e3 ρ pop jump A B: 38 ... 4.5 The switch-Statement Idea: • Multi-target branching in constant time! • Use a jump table, which contains at its i-th position the jump to the beginning of the i-th alternative. • Realized by indexed jumps. q jumpi B B+q PC PC PC = B + S[SP]; SP--; 39 Simplification: We only regard switch-statements of the following form: s ≡ switch (e) { case 0: ss0 break; case 1: .. . ss1 break; case k − 1: ssk−1 break; default: ssk } s is then translated into the instruction sequence: 40 code s ρ = codeR e ρ C0 : check 0 k B Ck : code ss0 ρ B: jump C0 jump D ... ... jump Ck code ssk ρ D: ... jump D • The Macro check 0 k B checks, whether the R-value of e is in the interval [ 0, k ], and executes an indexed jump into the table B • The jump table contains direct jumps to the respective alternatives. • At the end of each alternative is an unconditional jump out of the switch-statement. 41 check 0 k B = dup dup jumpi B loadc 0 loadc k geq le loadc k jumpz A jumpz A jumpi B A: pop • The R-value of e is still needed for indexing after the comparison. It is therefore copied before the comparison. • This is done by the instruction dup. • The R-value of e is replaced by k before the indexed jump is executed if it is less than 0 or greater than k. 42 3 dup S[SP+1] = S[SP]; SP++; 43 3 3 Note: • The jump table could be placed directly after the code for the Macro check. This would save a few unconditional jumps. However, it may require to search the switch-statement twice. • If the table starts with u instead of 0, we have to decrease the R-value of e by u before using it as an index. • If all potential values of e are definitely in the interval [ 0, k ], the macro check is not needed. 44 5 Storage Allocation for Variables Goal: Associate statically, i.e. at compile time, with each variable x a fixed (relative) address ρ x Assumptions: • variables of basic types, e.g. int, . . . occupy one storage cell. • variables are allocated in the store in the order, in which they are declared, starting at address 1. Consequently, we obtain for the declaration type) the address environment ρ such that ρ xi = i, d ≡ t1 x1 ; . . . tk xk ; i = 1, . . . , k 45 (ti basic 5.1 Arrays Example: int [11] a; The array a consists of 11 components and therefore needs 11 cells. ρ a is the address of the component a[ 0]. a[10] a[0] 46 We need a function sizeof (notation: | · |), computing the space requirement of a type: 1 |t| = k · |t′ | if t basic if t ≡ t′ [ k ] Accordingly, we obtain for the declaration ρ x1 = 1 ρ xi = ρ x i −1 + | t i −1 | d ≡ t1 x1 ; . . . tk xk ; for i > 1 Since | · | can be computed at compile time, also ρ can be computed at compile time. 47 Task: Extend codeL and codeR to expressions with accesses to array components. Be t[ c] a; a. the declaration of an array To determine the start address of a component ρ a + |t| ∗ (R-value of i). a [i ] , we compute In consequence: codeL a [ e] ρ = loadc (ρ a ) codeR e ρ loadc |t| mul add . . . or more general: 48 codeL e1 [ e2 ] ρ codeR e1 ρ = codeR e2 ρ loadc |t| mul add Remark: • In C, an array is a pointer. A declared array a is a pointer-constant, whose R-value is the start address of the array. codeR e ρ = codeL e ρ • Formally, we define for an array e: • In C, the following are equivalent (as L-values): 2[ a] a[2] a+2 Normalization: Array names and expressions evaluating to arrays occur in front of index brackets, index expressions inside the index brackets. 49 5.2 Structures In Modula and Pascal, structures are called Records. Simplification: Names of structure components are not used elsewhere. Alternatively, one could manage a separate environment structure type st. Be struct { int a; int b; } x; ρst for each part of a declaration list. • x has as relative address the address of the first cell allocated for the structure. • The components have addresses relative to the start address of the structure. In the example, these are a 7→ 0, b 7→ 1. 50 Let t ≡ struct {t1 c1 ; . . . tk ck ; }. We have k |t| = ∑ | ti | i =1 ρ c1 = 0 and ρ ci = ρ ci−1 + |ti−1 | for i > 1 We thus obtain: codeL (e.c) ρ = codeL e ρ loadc (ρ c) add 51 Example: Be struct { int a; int b; } x; ρ = { x 7→ 13, a 7→ 0, b 7→ 1}. such that This yields: codeL ( x.b) ρ = loadc 13 loadc 1 add 52 6 Pointer and Dynamic Storage Management Pointer allow the access to anonymous, dynamically generated objects, whose life time is not subject to the LIFO-principle. ==⇒ We need another potentially unbounded storage area H – the Heap. S H 0 MAX SP NP EP EP NP = b New Pointer; points to the lowest occupied heap cell. = b Extreme Pointer; points to the uppermost cell, to which SP can point (during execution of the actual function). 53 Idea: • Stack and Heap grow toward each other in S, but must not collide. (Stack Overflow). • A collision may be caused by an increment of SP or a decrement of NP. • EP saves us the check for collision at the stack operations. • The checks at heap allocations are still necessary. 54 What can we do with pointers (pointer values)? • set a pointer to a storage cell, • dereference a pointer, access the value in a storage cell pointed to by a pointer. There a two ways to set a pointer: (1) A call malloc (e) reserves a heap area of the size of the value of e and returns a pointer to this area: codeR malloc (e) ρ = codeR e ρ new (2) The application of the address operator & to a variable returns a pointer to this variable, i.e. its address (= b L-value). Therefore: codeR (&e) ρ = codeL e ρ 55 NP NP n n new if (NP - S[SP] ≤ EP) S[SP] = NULL; else { NP = NP - S[SP]; S[SP] = NP; } • NULL is a special pointer constant, identified with the integer constant 0. • In the case of a collision of stack and heap the NULL-pointer is returned. 56 Dereferencing of Pointers: The application of the operator ∗ to the expression e returns the contents of the storage cell, whose address is the R-value of e: codeL (∗e) ρ = codeR e ρ Example: Given the declarations struct t { int a [ 7]; struct t ∗b; }; int i, j; struct t ∗ pt; and the expression (( pt → b) → a )[i + 1] Because of e → a ≡ (∗e).a holds: codeL (e → a ) ρ = codeR e ρ loadc (ρ a ) add 57 b: a: b: pt: a: j: i: 58 Be ρ = {i 7→ 1, j 7→ 2, pt 7→ 3, a 7→ 0, b 7→ 7 }. Then: codeL (( pt → b) → a )[i + 1] ρ = codeR (( pt → b) → a ) ρ = codeR (( pt → b) → a ) ρ codeR (i + 1) ρ loada 1 loadc 1 loadc 1 mul add add loadc 1 mul add 59 For arrays, their R-value equals their L-value. Therefore: codeR (( pt → b) → a ) ρ = codeR ( pt → b) ρ = loada 3 loadc 0 loadc 7 add add load loadc 0 add In total, we obtain the instruction sequence: loada 3 load loada 1 loadc 1 loadc 7 loadc 0 loadc 1 mul add add add add 60 7 Conclusion We tabulate the cases of the translation of expressions: codeL (e1 [ e2 ]) ρ = codeR e1 ρ codeR e2 ρ loadc |t| mul if e1 has type t∗ or t[] add codeL (e.a ) ρ = codeL e ρ loadc (ρ a ) add 61 codeL (∗e) ρ = codeR e ρ codeL x ρ = loadc (ρ x) codeR (&e) ρ = codeL e ρ codeR e ρ = codeL e ρ codeR (e1 2 e2 ) ρ = codeR e1 ρ if e is an array codeR e2 ρ op op instruction for operator ‘2’ 62 codeR q ρ = loadc q codeR (e1 = e2 ) ρ = codeR e2 ρ q constant codeL e1 ρ store codeR e ρ = codeL e ρ load 63 otherwise Example: int For the statement: a [ 10] , (∗b)[ 10] ; ∗ a = 5; with ρ = { a 7→ 7, b 7→ 17}. we obtain: codeL (∗ a ) ρ = codeR a ρ code (∗ a = 5; ) ρ = loadc 5 = codeL a ρ = loadc 7 loadc 7 store pop As an exercise translate: s1 ≡ b = (&a ) + 2; and 64 s2 ≡ ∗(b + 3)[ 0] = 5; code (s1 s2 ) ρ = loadc 7 loadc 5 loadc 2 loadc 17 loadc 10 // size of int[ 10] load mul // scaling loadc 3 add loadc 10 // size of int[ 10] loadc 17 mul // scaling store add // end of s2 pop // end of s1 store pop 65 8 Freeing Occupied Storage Problems: • The freed storage area is still referenced by other pointers (dangling references). • After several deallocations, the storage could look like this (fragmentation): frei 66 Potential Solutions: • Trust the programmer. Manage freed storage in a particular data structure (free list) ==⇒ malloc or free my become expensive. • Do nothing, i.e.: code free (e); ρ = codeR e ρ pop ==⇒ • simple and (in general) efficient. Use an automatic, potentially “conservative” Garbage-Collection, which occasionally collects certainly inaccessible heap space. 67 9 Functions The definition of a function consists of • a name, by which it can be called, • a specification of the formal parameters; • maybe a result type; • a statement part, the body. For C holds: codeR f ρ ==⇒ = loadc _ f starting address of the code for f = Function names must also be managed in the address environment! 68 Example: main () { int n; n = fac(2) + fac(1); printf (“%d”, n); } int fac (int x) { if (x ≤ 0) return 1; else return x ∗ fac( x − 1); } At any time during the execution, several instances of one function may exist, i.e., may have started, but not finished execution. An instance is created by a call to the function. The recursion tree in the example: main fac fac fac fac fac 69 printf We conclude: The formal parameters and local variables of the different instances of the same function must be kept separate. Idea: Allocate a special storage area for each instance of a function. In sequential programming languages these storage areas can be managed on a stack. They are therefore called Stack Frames. 70 9.1 Storage Organization for Functions SP local variables formal parameters FP PCold organisational cells FPold EPold return value FP = b Frame Pointer; points to the last organizational cell and is used to address the formal parameters and the local variables. 71 The caller must be able to continue execution in its frame after the return from a function. Therefore, at a function call the following values have to be saved into organizational cells: • the FP • the continuation address after the call and • the actual EP. Simplification: The return value fits into one storage cell. Translation tasks for functions: • Generate code for the body! • Generate code for calls! 72 9.2 Computing the Address Environment We have to distinguish two different kinds of variables: 1. globals, which are defined externally to the functions; 2. locals/automatic (including formal parameters), which are defined internally to the functions. ==⇒ The address environment ρ associates pairs names. (tag, a) ∈ { G, L} × N0 with their Note: • There exist more refined notions of visibility of (the defining occurrences of) variables, namely nested blocks. • The translation of different program parts in general uses different address environments! 73 Example (1): 2 0 int i; int k; struct list { scanf ("%d", &i); int info; scanlist (&l); struct list ∗ next; printf ("\n\t%d\n", ith (l,i)); } } ∗ l; address 1 void main () { int ith (struct list ∗ x, int i) { ρ0 : if (i ≤ 1) return x →info; else return ith (x →next, i − 1); } environment 0 i 7→ ( G, 1) l 7→ ( G, 2) ith 7→ ( G, _ith) main 7→ ( G, _main) ... 74 at Example (2): 2 0 int i; int k; struct list { scanf ("%d", &i); int info; scanlist (&l); struct list ∗ next; printf ("\n\t%d\n", ith (l,i)); } } ∗ l; 1 1 void main () { int ith (struct list ∗ x, int i) { inside of ith: i 7→ ( L, 2) if (i ≤ 1) return x →info; x 7→ ( L, 1) else return ith (x →next, i − 1); l 7→ ( G, 2) ith 7→ ( G, _ith) ρ1 : } ... 75 Example (3): 2 void main () { int k; 0 int i; scanf ("%d", &i); struct list { scanlist (&l); int info; printf ("\n\t%d\n", ith (l,i)); struct list ∗ next; } } ∗ l; 2 1 ρ2 : int ith (struct list ∗ x, int i) { if (i ≤ 1) return x →info; else return ith (x →next, i − 1); } inside of i 7→ ( G, 1) l 7→ ( G, 2) k 7→ ( L, 1) ith 7→ ( G, _ith) main 7→ ( G, _main) ... 76 main: 9.3 Calling/Entering and Leaving Functions Be f the actual function, the Caller, and let f call the function g, the Callee. The code for a function call has to be distributed among the Caller and the Callee: The distribution depends on who has which information. 77 Actions upon calling/entering g: o 1. Saving FP, EP 2. Computing the actual parameters 3. Determining the start address of g 4. Setting the new FP 5. Saving PC and call o enter o alloc jump to the beginning of g 6. Setting the new EP 7. Allocating the local variables Actions upon leaving mark g: 1. Restoring the registers FP, EP, SP 2. return Returning to the code of f, i.e. restoring the PC 78 available in f available in g Altogether we generate for a call: codeR g(e1 , . . . , en ) ρ = mark codeR e1 ρ ... codeR em ρ codeR g ρ call n where n = space for the actual parameters Note: • Expressions occurring as actual parameters will be evaluated to their R-value ==⇒ Call-by-Value-parameter passing. • Function g can also be an expression, whose R-value is the start address of the function to be called ... 79 • Function names are regarded as constant pointers to functions, similarly to declared arrays. The R-value of such a pointer is the start address of the function. • For a variable int (∗)() g; , the two calls (∗ g)() und g() are equivalent :-) Normalization: Dereferencing of a function pointer is ignored. • Structures are copied when they are passed as parameters. In consequence: codeR f ρ = loadc (ρ f ) f a function name codeR (∗e) ρ = codeR e ρ e a function pointer codeR e ρ = codeL e ρ e a structure of size k move k 80 k move k for (i = k-1; i≥0; i--) S[SP+i] = S[S[SP]+i]; SP = SP+k–1; 81 The instruction mark allocates space for the return value and for the organizational cells and saves the FP and EP. FP EP FP EP e mark S[SP+2] = EP; S[SP+3] = FP; SP = SP + 4; 82 e e The instruction call n PC their new values. saves the continuation address and assigns FP, SP, and q n FP call n p PC PC p q FP = SP - n - 1; S[FP] = PC; PC = S[SP]; SP--; 83 Correspondingly, we translate a function definition: code t f (specs){V_defs ss} ρ _f: = enter q // Setting the EP alloc k // Allocating the local variables // leaving the function code ss ρf return where t = return type of f with |t| ≤ 1 q = maxS + k maxS = maximal depth of the local stack k = space for the local variables ρf = address environment for f // where takes care of specs, V_defs and ρ 84 The instruction enter q sets EP to its new value. Program execution is terminated if not enough space is available. EP q enter q EP = SP + q; if (EP ≥ NP) Error (“Stack Overflow”); 85 The instruction alloc k reserves stack space for the local variables. k alloc k SP = SP + k; 86 The instruction return pops the actual stack frame, i.e., it restores the registers PC, EP, SP, and FP and leaves the return value on top of the stack. PC FP EP p return PC FP EP e v p e v PC = S[FP]; EP = S[FP-2]; if (EP ≥ NP) Error (“Stack Overflow”); SP = FP-3; FP = S[SP+2]; 87 9.4 Access to Variables and Formal Parameters, and Return of Values Local variables and formal parameters are addressed relative to the current FP. We therefore modify codeL for the case of variable names. For ρ x = (tag, j) we define loadc j codeL x ρ = loadrc j 88 tag = G tag = L The instruction loadrc j computes the sum of FP and FP f loadrc j SP++; S[SP] = FP+j; 89 FP j. f f+j As an optimization one introduces the instructions loadr j and This is analogous to loada j and storea j. loadr j = storer j . loadrc j load storer j = loadrc j store The code for return e; corresponds to an assignment to a variable with relative address −3. = code return e; ρ codeR e ρ storer -3 return 90 Example: For the function int fac (int x) { if (x ≤ 0) return 1; else return x ∗ fac ( x − 1); } we generate: _fac: enter q loadc 1 alloc 0 A: loadr 1 mul storer -3 mark storer -3 loadr 1 return loadr 1 return loadc 0 jump B loadc 1 leq sub jumpz A loadc _fac call 1 where ρfac : x 7→ ( L, 1) and q = 1 + 4 + 2 = 7. 91 B: return 10 Translation of Whole Programs The state before program execution starts: SP = −1 FP = EP = 0 Be p ≡ V_defs F_def1 . . . F_defn , fi , of which one is named main. PC = 0 a program, where F_defi defines a function The code for the program p consists of: • Code for the function definitions F_defi ; • Code for allocating the global variables; • Code for the call of NP = MAX main(); • the instruction halt. 92 We thus define: code p ∅ enter (k + 6) = alloc (k + 1) mark loadc _main call 0 pop halt where ∅ = b _f1 : code F_def1 ρ .. . _fn : code F_defn ρ empty address environment; global address environment; k = b _main ∈ {_f1 , . . . , _fn } ρ = b space for global variables 93 The Translation of Functional Programming Languages 94 11 The language PuF We only regard a mini-language PuF (“Pure Functions”). We do not treat, as yet: • Side effects; • Data structures. 95 A program is an expression e of the form: e ::= b | x | ( 21 e ) | ( e 1 22 e 2 ) | (if e0 then e1 else e2 ) | ( e ′ e 0 . . . e k −1 ) | (fun x0 . . . xk−1 → e) | (let x1 = e1 in e0 ) | (let rec x1 = e1 and . . . and xn = en in e0 ) An expression is therefore • a basic value, a variable, the application of an operator, or • a function-application, a function-abstraction, or • a let-expression, i.e. an expression with locally defined variables, or • a let-rec-expression, i.e. an expression with simultaneously defined local variables. For simplicity, we only allow int as basic type. 96 Example: The following well-known function computes the factorial of a natural number: let rec fac = fun x → if x ≤ 1 then 1 else x · fac ( x − 1) in fac 7 As usual, we only use the minimal amount of parentheses. There are two Semantics: CBV: Arguments are evaluated before they are passed to the function (as in SML); CBN: Arguments are passed unevaluated; they are only evaluated when their value is needed (as in Haskell). 97 12 Architecture of the MaMa: We know already the following components: C 0 C 1 = PC Code-store – contains the MaMa-program; each cell contains one instruction; PC = Program Counter – points to the instruction to be executed next; 98 S 0 SP FP S = Runtime-Stack – each cell can hold a basic value or an address; SP = Stack-Pointer – points to the topmost occupied cell; as in the CMa implicitely represented; FP = Frame-Pointer – points to the actual stack frame. 99 We also need a heap H: Tag Code Pointer Value Heap Pointer 100 ... it can be thought of as an abstract data type, being capable of holding data objects of the following form: v B −173 cp Basic Value gp Closure C cp ap gp Function F v[0] ...... v[n−1] V n Vector 101 The instruction new (tag, args) creates a corresponding object (B, C, F, V) in H and returns a reference to it. We distinguish three different kinds of code for an expression e: • codeV e — (generates code that) computes the Value of e, stores it in the heap and returns a reference to it on top of the stack (the normal case); • codeB e — computes the value of e, and returns it on the top of the stack (only for Basic types); • codeC e — does not evaluate e, but stores a Closure of e in the heap and returns a reference to the closure on top of the stack. We start with the code schemata for the first two kinds: 102 13 Simple expressions Expressions consisting only of constants, operator applications, and conditionals are translated like expressions in imperative languages: codeB b ρ sd = loadc b codeB (21 e) ρ sd = codeB e ρ sd op1 codeB (e1 22 e2 ) ρ sd = codeB e1 ρ sd codeB e2 ρ (sd + 1) op2 103 codeB (if e0 then e1 else e2 ) ρ sd = codeB e0 ρ sd jumpz A codeB e1 ρ sd jump B A: codeB e2 ρ sd B: 104 ... Note: • ρ denotes the actual address environment, in which the expression is translated. • The extra argument sd, the stack difference, simulates the movement of the SP when instruction execution modifies the stack. It is needed later to address variables. • The instructions op1 and op2 implement the operators 21 and 22 , in the same way as the the operators neg and add implement negation resp. addition in the CMa. • For all other expressions, we first compute the value in the heap and then dereference the returned pointer: codeB e ρ sd = codeV e ρ sd getbasic 105 B 17 17 getbasic if (H[S[SP]] != (B,_)) Error “not basic!”; else S[SP] = H[S[SP]].v; 106 For codeV and simple expressions, we define analogously: codeV b ρ sd = loadc b; mkbasic codeV (21 e) ρ sd = codeB e ρ sd op1 ; mkbasic codeV (e1 22 e2 ) ρ sd = codeB e1 ρ sd codeB e2 ρ (sd + 1) op2 ; mkbasic codeV (if e0 then e1 else e2 ) ρ sd = codeB e0 ρ sd jumpz A codeV e1 ρ sd jump B A: codeV e2 ρ sd B: 107 ... 17 B 17 mkbasic S[SP] = new (B,S[SP]); 108 14 Accessing Variables We must distinguish between local and global variables. Example: Regard the function f : let in let c=5 f = fun a → let b = a ∗ a in b + c in f c The function f uses the global variable c and the local variables a (as formal parameter) and b (introduced by the inner let). The binding of a global variable is determined, when the function is constructed (static scoping!), and later only looked up. 109 Accessing Global Variables • The bindings of global variables of an expression or a function are kept in a vector in the heap (Global Vector). • They are addressed consecutively starting with 0. • When an F-object or a C-object are constructed, the Global Vector for the function or the expression is determined and a reference to it is stored in the gp-component of the object. • During the evaluation of an expression, the (new) register GP (Global Pointer) points to the actual Global Vector. • In constrast, local variables should be administered on the stack ... ==⇒ General form of the address environment: ρ : Vars → { L, G } × Z 110 Accessing Local Variables Local variables are administered on the stack, in stack frames. Let e ≡ e′ e0 . . . em−1 be the application of a function e′ to arguments e 0 , . . . , e m−1 . Warning: The arity of e′ does not need to be m :-) • f may therefore receive less than n arguments (under supply); • f may also receive more than n arguments, if t is a functional type (over supply). 111 Possible stack organisations: F e′ e m−1 e0 FP + Addressing of the arguments can be done relative to FP − The local variables of e′ cannot be addressed relative to FP. − If e′ is an n-ary function with n < m, i.e., we have an over-supplied function application, the remaining m − n arguments will have to be shifted. 112 − If e′ evaluates to a function, which has already been partially applied to the parameters a0 , . . . , ak−1 , these have to be sneaked in underneath e0 : e m−1 e0 a1 a0 FP 113 Alternative: e′ F e0 e m−1 FP + The further arguments a0 , . . . , ak−1 and the local variables can be allocated above the arguments. 114 a0 e0 a1 e m−1 FP − Addressing of arguments and local variables relative to FP is no more possible. (Remember: m is unknown when the function definition is translated.) 115 Way out: • We address both, arguments and local variables, relative to the stack pointer SP !!! • However, the stack pointer changes during program execution... SP sd e0 sp 0 e m−1 FP 116 • The differerence between the current value of SP and its value sp0 at the entry of the function body is called the stack distance, sd. • Fortunately, this stack distance can be determined at compile time for each program point, by simulating the movement of the SP. • The formal parameters x0 , x1 , x2 , . . . successively receive the non-positive relative addresses 0, −1, −2, . . ., i.e., ρ xi = ( L, −i ). • The absolute address of the i-th formal parameter consequently is sp0 − i = (SP − sd) − i • The local let-variables y1 , y2 , y3 , . . . will be successively pushed onto the stack: 117 SP sd y3 3 2 y2 y1 x0 1 sp 0 : 0 x1 −1 −2 x k −1 • The yi have positive relative addresses 1, 2, 3, . . ., that is: ρ yi = ( L, i ). sp0 + i = (SP − sd) + i • The absolute address of yi is then 118 With CBN, we generate for the access to a variable: codeV x ρ sd = getvar x ρ sd eval The instruction eval checks, whether the value has already been computed or whether its evaluation has to yet to be done (==⇒ will be treated later :-) With CBV, we can just delete The (compile-time) macro eval from the above code schema. getvar getvar x ρ sd is defined by: = let (t, i ) = ρ x in match t with L → pushloc (sd − i ) | G → pushglob i end 119 The access to local variables: pushloc n n S[SP+1] =S[SP - n]; SP++; 120 Correctness argument: Let sp and sd be the values of the stack pointer resp. stack distance before the execution of the instruction. The value of the local variable with address i is loaded from S[ a ] with a = sp − (sd − i ) = (sp − sd) + i = sp0 + i ... exactly as it should be :-) 121 The access to global variables is much simpler: pushglob i GP V GP i SP = SP + 1; S[SP] = GP→v[i]; 122 V Example: Regard e ≡ (b + c) for ρ = {b 7→ ( L, 1), c 7→ ( G, 0)} and sd = 1. With CBN, we obtain: codeV e ρ 1 = getvar b ρ 1 = 1 pushloc 0 eval 2 eval getbasic 2 getbasic getvar c ρ 2 2 pushglob 0 eval 3 eval getbasic 3 getbasic add 3 add mkbasic 2 mkbasic 123 15 let-Expressions As a warm-up let us first consider the treatment of local variables :-) Let e ≡ let y1 = e1 in . . . let en in e0 be a nested let-expression. The translation of e must deliver an instruction sequence that • allocates local variables y1 , . . . , yn ; • in the case of CBV: evaluates e1 , . . . , en and binds the yi to their values; CBN: constructs closures for the e1 , . . . , en and binds the yi to them; • evaluates the expression e0 and returns its value. Here, we consider the non-recursive case only, i.e. where y j only depends on y1 , . . . , y j−1 . We obtain for CBN: 124 codeV e ρ sd = codeC e1 ρ sd codeC e2 ρ1 (sd + 1) ... codeC en ρn−1 (sd + n − 1) codeV e0 ρn (sd + n) slide n where // deallocates local variables ρ j = ρ ⊕ { yi 7→ ( L, sd + i ) | i = 1, . . . , j}. In the case of CBV, we use codeV for the expressions e1 , . . . , en . Warning! All the ei must be associated with the same binding for the global variables! 125 Example: Consider the expression e ≡ let a = 19 in let b = a ∗ a in a + b for ρ = ∅ and sd = 0. We obtain (for CBV): 0 loadc 19 3 getbasic 3 pushloc 1 1 mkbasic 3 mul 4 getbasic 1 pushloc 0 2 mkbasic 4 add 2 getbasic 2 pushloc 1 3 mkbasic 2 pushloc 1 3 getbasic 3 slide 2 126 The instruction slide k deallocates again the space for the locals: slide k k S[SP-k] = S[SP]; SP = SP - k; 127 16 Function Definitions The definition of a function f requires code that allocates a functional value for f in the heap. This happens in the following steps: • Creation of a Global Vector with the binding of the free variables; • Creation of an (initially empty) argument vector; • Creation of an F-Object, containing references to theses vectors and the start address of the code for the body; Separately, code for the body has to be generated. Thus: 128 codeV (fun x0 . . . xk−1 → e) ρ sd = getvar z0 ρ sd getvar z1 ρ (sd + 1) ... getvar z g−1 ρ (sd + g − 1) mkvec g mkfunval A jump B A: targ k codeV e ρ′ 0 return k B: where and ... { z0 , . . . , z g−1 } = free(fun x0 . . . xk−1 → e) ρ′ = { xi 7→ ( L, −i ) | i = 0, . . . , k − 1} ∪ { z j 7→ ( G, j) | j = 0, . . . , g − 1} 129 V g g mkvec g h = new (V, n); SP = SP - g + 1; for (i=0; i<g; i++) h→v[i] = S[SP + i]; S[SP] = h; 130 F A mkfunval A V V 0 V a = new (V,0); S[SP] = new (F, A, a, S[SP]); 131 Example: Regard f ≡ fun b → a + b ρ = { a 7→ ( L, 1)} and for sd = 1. codeV f ρ 1 produces: 1 pushloc 0 0 pushglob 0 2 getbasic 2 mkvec 1 1 eval 2 add 2 mkfunval A 1 getbasic 1 mkbasic 2 jump B 1 pushloc 1 1 return 1 targ 1 2 eval 2 0 A: The secrets around targ k and B: ... return k will be revealed later :-) 132 17 Function Application Function applications correspond to function calls in C. The necessary actions for the evaluation of e′ e0 . . . em−1 are: • Allocation of a stack frame; • Transfer of the actual parameters , i.e. with: CBV: Evaluation of the actual parameters; CBN: Allocation of closures for the actual parameters; • Evaluation of the expression e′ to an F-object; • Application of the function. Thus for CBN: 133 codeV (e′ e0 . . . em−1 ) ρ sd = mark A // Allocation of the frame codeC em−1 ρ (sd + 3) codeC em−2 ρ (sd + 4) ... codeC e0 ρ (sd + m + 2) A: codeV e′ ρ (sd + m + 3) // Evaluation of e′ apply // corresponds to call ... To implement CBV, we use codeV instead of codeC for the arguments ei . Example: For ( f 42) , ρ = { f 7→ ( L, 2)} and sd = 2, we obtain with CBV: 2 mark A 6 mkbasic 7 5 loadc 42 6 pushloc 4 3 134 apply A: ... A Slightly Larger Example: let a = 17 in let f = fun b → a + b in f 42 For CBV and sd = 0 we obtain: 0 loadc 17 2 1 mkbasic 0 1 pushloc 0 0 pushglob 0 1 mkbasic 6 pushloc 4 2 mkvec 1 1 getbasic 1 return 1 7 apply 2 mkfunval A 1 pushloc 1 2 mark C 3 A: jump B 2 getbasic 5 loadc 42 targ 1 2 add 5 mkbasic 135 B: C: slide 2 For the implementation of the new instruction, we must fix the organization of a stack frame: SP local stack Arguments FP PCold FPold GPold 0 -1 -2 136 3 org. cells Different from the CMa, the instruction mark A already saves the return address: A mark A FP FP GP GP V S[SP+1] = GP; S[SP+2] = FP; S[SP+3] = A; FP = SP = SP + 3; 137 V The instruction apply unpacks the F-object, a reference to which (hopefully) resides on top of the stack, and continues execution at the address given there: GP PC ap gp GP V apply F 42 PC 42 V n h = S[SP]; if (H[h] != (F,_,_)) Error “no fun”; else { GP = h→gp; PC = h→cp; for (i=0; i< h→ap→n; i++) S[SP+i] = h→ap→v[i]; SP = SP + h→ap→n – 1; } 138 V Warning: • The last element of the argument vector is the last to be put onto the stack. This must be the first argument reference. • This should be kept in mind, when we treat the packing of arguments of an under-supplied function application into an F-object !!! 139 18 Over– and Undersupply of Arguments The first instruction to be executed when entering a function body, i.e., after an apply is targ k . This instruction checks whether there are enough arguments to evaluate the body. Only if this is the case, the execution of the code for the body is started. Otherwise, i.e. in the case of under-supply, a new F-object is returned. The test for number of arguments uses: 140 SP – FP targ k is a complex instruction. We decompose its execution in the case of under-supply into several steps: targ k = if (SP – FP < k) { mkvec0; // creating the argumentvector wrap; // wrapping into an F − object popenv; // popping the stack frame } The combination of these steps into one instruction is a kind of optimization 141 :-) The instruction mkvec0 takes all references from the stack above FP and stores them into a vector: V g g FP mkvec0 FP g = SP–FP; h = new (V, g); SP = FP+1; for (i=0; i<g; i++) h→v[i] = S[SP + i]; S[SP] = h; 142 The instruction wrap wraps the argument vector together with the global vector and PC-1 into an F-object: ap gp F 41 wrap V GP V V GP PC 42 PC 42 S[SP] = new (F, PC-1, S[SP], GP); 143 V The instruction popenv FP 42 finally releases the stack frame: popenv FP 19 GP = S[FP-2]; S[FP-2] = S[SP]; PC = S[FP]; SP = FP - 2; FP = S[FP-1]; 144 PC GP 42 19 Thus, we obtain for targ k in the case of under supply: GP PC 42 V mkvek0 FP 17 V 145 GP PC 42 V Vm wrap FP 17 V 146 GP PC 42 F 41 V Vm popenv FP 17 V 147 GP F 41 V PC 17 V FP V 148 • The stack frame can be released after the execution of the body if exactly the right number of arguments was available. • If there is an oversupply of arguments, the body must evaluate to a function, which consumes the rest of the arguments ... • The check for this is done by return k: return k = if (SP − FP = k + 1) // Done popenv; else { // There are more arguments slide k; // another application apply; } The execution of return k results in: 149 Case: Done GP GP PC PC 17 popenv k FP FP 17 V V 150 Case: Over-supply F F k FP slide k apply FP 151 19 let-rec-Expressions Consider the expression e ≡ let rec y1 = e1 and . . . and yn = en in e0 . The translation of e must deliver an instruction sequence that • allocates local variables y1 , . . . , yn ; • in the case of CBV: evaluates e1 , . . . , en and binds the yi to their values; CBN: constructs closures for the e1 , . . . , en and binds the yi to them; • evaluates the expression e0 and returns its value. Warning: In a letrec-expression, the definitions can use variables that will be allocated only later! ==⇒ Dummy-values are put onto the stack before processing the definition. 152 For CBN, we obtain: codeV e ρ sd = alloc n // allocates local variables codeC e1 ρ′ (sd + n) rewrite n ... codeC en ρ′ (sd + n) rewrite 1 codeV e0 ρ′ (sd + n) slide n where // deallocates local variables ρ′ = ρ ⊕ { yi 7→ ( L, sd + i ) | i = 1, . . . , n}. In the case of CBV, we also use codeV for the expressions e1 , . . . , en . Warning: Recursive definitions of basic values are undefined with CBV!!! 153 Example: Consider the expression e ≡ let rec f = fun x y → ify ≤ 1 then x else f ( x ∗ y)( y − 1) in f 1 for ρ = ∅ and sd = 0. We obtain (for CBV): 0 alloc 1 0 1 pushloc 0 2 A: targ 2 4 loadc 1 0 ... 5 mkbasic mkvec 1 1 return 2 5 pushloc 4 2 mkfunval A 2 rewrite 1 6 apply 2 jump B 1 mark C 2 B: 154 C: slide 1 The instruction alloc n n dummy nodes: reserves n cells on the stack and initialises them with C C C C alloc n for (i=1; i<=n; i++) S[SP+i] = new (C,-1,-1); SP = SP + n; 155 −1 −1 −1 −1 −1 −1 −1 −1 n The instruction rewrite n overwrites the contents of the heap cell pointed to by the reference at S[SP–n]: x rewrite n n x H[S[SP-n]] = H[S[SP]]; SP = SP - 1; • The reference S[SP – n] remains unchanged! • Only its contents is changed! 156 20 Closures and their Evaluation • Closures are needed for the implementation of CBN and for functional paramaters. • Before the value of a variable is accessed (with CBN), this value must be available. • Otherwise, a stack frame must be created to determine this value. • This task is performed by the instruction 157 eval. eval can be decomposed into small actions: eval = if (H[ S[ SP]] ≡ (C, _, _)) { mark0; // allocation of the stack frame pushloc 3; // copying of the reference apply0; // corresponds to apply } • A closure can be understood as a parameterless function. Thus, there is no need for an ap-component. • Evaluation of the closure thus means evaluation of an application of this function to 0 arguments. • In constrast to mark A , • The difference between is put on the stack. mark0 dumps the current PC. apply and apply0 is that no argument vector 158 17 mark0 FP GP PC 17 FP V S[SP+1] = GP; S[SP+2] = FP; S[SP+3] = PC; FP = SP = SP + 3; 159 GP PC 17 V cp gp C 42 cp gp V C 42 apply0 GP GP PC PC 42 h = S[SP]; SP--; GP = h→gp; PC = h→cp; We thus obtain for the instruction eval: 160 V cp gp C 42 GP FP V mark0 V pushloc 3 3 PC 17 17 3 cp gp C 42 GP FP 3 PC 17 161 17 3 cp gp C 42 GP FP V 3 PC 17 17 3 cp gp C 42 V GP FP PC 42 162 apply0 The construction of a closure for an expression e consists of: • Packing the bindings for the free variables into a vector; • Creation of a C-object, which contains a reference to this vector and to the code for the evaluation of e: codeC e ρ sd = getvar z0 ρ sd getvar z1 ρ (sd + 1) ... getvar z g−1 ρ (sd + g − 1) mkvec g mkclos A jump B A: codeV e ρ′ 0 update B: where ... { z0 , . . . , z g−1 } = free(e) and ρ′ = { zi 7→ ( G, i ) | i = 0, . . . , g − 1}. 163 Example: Consider e ≡ a ∗ a with ρ = { a 7→ ( L, 0)} and sd = 1. We obtain: 1 pushloc 1 0 2 mkvec 1 2 2 A: pushglob 0 2 getbasic 1 eval 2 mul mkclos A 1 getbasic 1 mkbasic jump B 1 pushglob 0 1 update 2 eval 2 164 B: ... • The instruction mkclos A is analogous to the instruction mkfunval A. • It generates a C-object, where the included code pointer is A. C A mkclos A V V S[SP] = new (C, A, S[SP]); 165 In fact, the instruction update is the combination of the two actions: popenv rewrite 1 It overwrites the closure with the computed value. FP FP 42 update 19 C 166 PC 42 GP 19 21 Optimizations I: Global Variables Observation: • Functional programs construct many F- and C-objects. • This requires the inclusion of (the bindings of) all global variables. Recall, e.g., the construction of a closure for an expression e ... 167 codeC e ρ sd = getvar z0 ρ sd getvar z1 ρ (sd + 1) ... getvar z g−1 ρ (sd + g − 1) mkvec g mkclos A jump B A: codeV e ρ′ 0 update B: where ... { z0 , . . . , z g−1 } = free(e) and ρ′ = { zi 7→ ( G, i ) | i = 0, . . . , g − 1}. 168 Idea: • Reuse Global Vectors, i.e. share Global Vectors! • Profitable in the translation of let-expressions or function applications: Build one Global Vector for the union of the free-variable sets of all let-definitions resp. all arguments. • Allocate (references to ) global vectors with multiple uses in the stack frame like local variables! • Support the access to the current GP by an instruction 169 copyglob : GP V GP copyglob SP++; S[SP] = GP; 170 V • The optimization will cause Global Vectors to contain more components than just references to the free the variables that occur in one expression ... Disadvantage: Superfluous components in Global Vectors prevent the deallocation of already useless heap objects ==⇒ Space Leaks :-( Potential Remedy: Deletion of references at the end of their life time. 171 22 Optimizations II: Closures In some cases, the construction of closures can be avoided, namely for • Basic values, • Variables, • Functions. 172 Basic Values: The construction of a closure for the value is at least as expensive as the construction of the B-object itself! Therefore: codeC b ρ sd = codeV b ρ sd = loadc b mkbasic This replaces: mkvec 0 mkclos A A: jump B mkbasic loadc b update 173 B: ... Variables: Variables are either bound to values or to C-objects. Constructing another closure is therefore superfluous. Therefore: codeC x ρ sd = getvar x ρ sd This replaces: getvar x ρ sd mkclos A mkvec 1 jump B Example: A: pushglob 0 eval e ≡ let rec a = b and b = 7 in a. update B: ... codeV e ∅ 0 produces: 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 174 The execution of this instruction sequence should deliver the basic value 7 ... 175 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 alloc 2 176 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 pushloc 0 C −1 −1 C −1 −1 177 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 rewrite 2 C −1 −1 C −1 −1 178 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 loadc 7 C −1 −1 C −1 −1 179 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 mkbasic 7 C −1 −1 C −1 −1 180 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 rewrite 1 B 7 C −1 −1 C −1 −1 181 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 pushloc 1 B 7 C −1 182 −1 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 eval C −1 B 7 C −1 183 −1 −1 0 alloc 2 3 rewrite 2 3 mkbasic 2 pushloc 1 2 pushloc 0 2 loadc 7 3 rewrite 1 3 eval 3 slide 2 Segmentation Fault !! 184 Apparently, this optimization was not quite correct :-( The Problem: Binding of variable y to variable x before x’s dummy node is replaced!! ==⇒ The Solution: cyclic definitions: reject sequences of definitions like let a = b; . . . b = a in . . .. acyclic definitions: order the definitions y = x such that the dummy node for the right side of x is already overwritten. 185 Functions: Functions are values, which are not evaluated further. Instead of generating code that constructs a closure for an F-object, we generate code that constructs the F-object directly. Therefore: codeC (fun x0 . . . xk−1 → e) ρ sd = codeV (fun x0 . . . xk−1 → e) ρ sd 186 23 The Translation of a Program Expression Execution of a program e starts with PC = 0 SP = FP = GP = −1 The expression e must not contain free variables. The value of e should be determined and then a executed. code e halt instruction should be = codeV e ∅ 0 halt 187 Remarks: • The code schemata as defined so far produce Spaghetti code. • Reason: Code for function bodies and closures placed directly behind the instructions mkfunval resp. mkclos with a jump over this code. • Alternative: Place this code somewhere else, e.g. following the halt-instruction: Advantage: Elimination of the direct jumps following mkfunval and mkclos. Disadvantage: The code schemata are more complex as they would have to accumulate the code pieces in a Code-Dump. ==⇒ Solution: Disentangle the Spaghetti code in a subsequent optimization phase 188 :-) Example: let a = 17 in let f = fun b → a + b in f 42 Disentanglement of the jumps produces: 0 loadc 17 2 mark B 3 1 mkbasic 5 loadc 42 1 1 pushloc 0 6 mkbasic 0 2 mkvec 1 6 pushloc 4 2 mkfunval A 7 7 slide 2 1 pushloc 1 halt 2 eval targ 1 2 getbasic 0 pushglob 0 2 add eval 1 eval 1 mkbasic apply 1 getbasic 1 return 1 189 B: A: 24 Structured Data In the following, we extend our functional programming language by some datatypes. 24.1 Tuples Constructors: Destructors: (., . . . , .), k-ary with k ≥ 0; # j for j ∈ N0 (Projections) We extend the syntax of expressions correspondingly: e ::= . . . | ( e 0 , . . . , e k −1 ) | # j e | let ( x0 , . . . , xk−1 ) = e1 in e0 190 • In order to construct a tuple, we collect sequence of references on the stack. Then we construct a vector of these references in the heap using mkvec • For returning components we use an indexed access into the tuple. codeV (e0 , . . . , ek−1 ) ρ sd = codeC e0 ρ sd codeC e1 ρ (sd + 1) ... codeC ek−1 ρ (sd + k − 1) mkvec k codeV (# j e) ρ sd = codeV e ρ sd get j eval In the case of CBV, we directly compute the values of the ei . 191 j V g V g get j if (S[SP] == (V,g,v)) S[SP] = v[j]; else Error “Vector expected!”; 192 Inversion: Accessing all components of a tuple simulataneously: e ≡ let ( y0 , . . . , yk−1 ) = e1 in e0 This is translated as follows: codeV e ρ sd = codeV e1 ρ sd getvec k codeV e0 ρ′ (sd + k ) slide k where ρ′ = ρ ⊕ { yi 7→ ( L, sd + i + 1) | i = 0, . . . , k − 1}. The instruction the stack: getvec k pushes the components of a vector of length k onto 193 V k V k getvec k if (S[SP] == (V,k,v)) { SP--; for(i=0; i<k; i++) { SP++; S[SP] = v[i]; } } else Error “Vector expected!”; 194 24.2 Lists Lists are constructed by the constructors: [℄ “::” “Nil”, the empty list; “Cons”, right-associative, takes an element and a list. Access to list components is possible by match-expressions ... Example: The append function app = app: fun l y → match l with [] → y| h :: t → h :: (app t y) 195 accordingly, we extend the syntax of expressions: e ::= . . . | [] | (e1 :: e2 ) | (match e0 with [] → e1 | h :: t → e2 ) Additionally, we need new heap objects: L empty list Nil s[0] s[1] L Cons non−empty list 196 24.3 Building Lists The new instructions nil and cons are introduced for building list nodes. We translate for CBN: codeV [] ρ sd = nil codeV (e1 :: e2 ) ρ sd = codeC e1 ρ sd codeC e2 ρ (sd + 1) cons Note: • With CBN: Closures are constructed for the arguments of “:”; • With CBV: Arguments of “:” are evaluated :-) 197 L Nil nil SP++; S[SP] = new (L,Nil); 198 L Cons cons S[SP-1] = new (L,Cons, S[SP-1], S[SP]); SP- -; 199 24.4 Pattern Matching Consider the expression e ≡ match e0 with [] → e1 | h :: t → e2 . Evaluation of e requires: • evaluation of e0 ; • check, whether resulting value v is an L-object; • if v is the empty list, evaluation of e1 ... • otherwise storing the two references of v on the stack and evaluation of e2 . This corresponds to binding h and t to the two components of v. 200 In consequence, we obtain (for CBN as for CBV): codeV e ρ sd = codeV e0 ρ sd tlist A codeV e1 ρ sd jump B A: codeV e2 ρ′ (sd + 2) slide 2 B: ... where ρ′ = ρ ⊕ { h 7→ ( L, sd + 1), t 7→ ( L, sd + 2)}. The new instruction tlist A does the necessary checks and (in the case of Cons) allocates two new local variables: 201 L Nil tlist A h = S[SP]; if (H[h] != (L,...) Error “no list!”; if (H[h] == (_,Nil)) SP- -; ... 202 L Nil L Cons L Cons tlist A PC A ... else { S[SP+1] = S[SP]→s[1]; S[SP] = S[SP]→s[0]; SP++; PC = A; } 203 Example: The (disentangled) body of the function app with app 7→ ( G, 0) : 0 targ 2 3 pushglob 0 0 0 pushloc 0 4 pushloc 2 3 pushglob 2 1 eval 5 pushloc 6 4 pushglob 1 1 tlist A 6 mkvec 3 5 pushglob 0 0 pushloc 1 4 mkclos C 6 eval 1 eval 4 cons 6 apply 1 jump B 3 slide 2 1 pushloc 1 1 2 A: B: C: D: mark D update return 2 Note: Datatypes with more than two constructors need a generalization of the tlist instruction, corresponding to a swith-instruction :-) 204 24.5 Closures of Tuples and Lists The general schema for codeC (e0 , . . . , ek−1 ) ρ sd codeC can be optimized for tuples and lists: = codeV (e0 , . . . , ek−1 ) ρ sd = codeC e0 ρ sd codeC e1 ρ (sd + 1) ... codeC ek−1 ρ (sd + k − 1) mkvec k codeC [] ρ sd = codeV [] ρ sd = nil codeC (e1 : e2 ) ρ sd = codeV (e1 : e2 ) ρ sd = codeC e1 ρ sd codeC e2 ρ (sd + 1) cons 205 25 Last Calls A function application is called last call in an expression e if this application could deliver the value for e. A last call usually is the outermost application of a defining expression. A function definition is called tail recursive if all recursive calls are last calls. Examples: r t ( h :: y) is a last call in match x with [] → y | h :: t → r t ( h :: y) f ( x − 1) is not a last call in if x ≤ 1 then 1 else x ∗ f ( x − 1) Observation: Last calls in a function body need no new stack frame! ==⇒ Automatic transformation of tail recursion into loops!!! 206 The code for a last call l ≡ (e′ e0 . . . em1 ) inside a function f with k arguments must 1. allocate the arguments ei and evaluate e′ to a function (note: all this inside f ’s frame!); 2. deallocate the local variables and the k consumed arguments of f ; 3. execute an apply. codeV l ρ sd = codeC em−1 ρ sd codeC em−2 ρ (sd + 1) ... codeC e0 ρ (sd + m − 1) codeV e′ ρ (sd + m ) // Evaluation of the function move r (m + 1) // Deallocation of r cells apply where r = sd + k is the number of stack cells to deallocate. 207 Example: The body of the function r = fun x y → match x with [] → y | h :: t → r t ( h :: y) 0 targ 2 1 jump B 0 pushloc 0 1 eval 2 1 tlist A 3 pushloc 4 apply 0 pushloc 1 4 cons slide 2 1 eval 3 pushloc 1 A: pushloc 1 4 pushglob 0 5 eval 5 move 4 3 1 B: return 2 Since the old stack frame is kept, return 2 will only be reached by the direct jump at the end of the []-alternative. 208 k move r k r SP = SP – k – r; for (i=1; i≤k; i++) S[SP+i] = S[SP+i+r]; SP = SP + k; 209 Threads 210 26 The Language ThreadedC We extend C by a simple thread concept. In particular, we provide functions for: • generating new threads: create(); • terminating a thread: exit(); • waiting for termination of a thread: • mutual exclusion: join(); lock(), unlock(); ... In order to enable a parallel program execution, we extend the abstract machine (what else? :-) 211 27 Storage Organization All threads share the same common code store and heap: C 0 1 0 1 PC H 2 NP 212 ... similar to the CMa, we have: C = Code Store – contains the CMa program; every cell contains one instruction; PC = Program-Counter – points to the next executable instruction; H = Heap – every cell may contain a base value or an address; the globals are stored at the bottom; NP = New-Pointer – points to the first free cell. For a simplification, we assume that the heap is stored in a separate segment. The function malloc() then fails whenever NP exceeds the topmost border. 213 Every thread on the other hand needs its own stack: S SP FP H SSet 214 In constrast to the CMa, we have: SSet = Set of Stacks – contains the stacks of the threads; every cell may contain a base value of an address; S = common address space for heap and the stacks; SP = Stack-Pointer – points to the current topmost ocupied stack cell; FP = Frame-Pointer – points to the current stack frame. Warning: • If all references pointed into the heap, we could use separate address spaces for each stack. Besides SP and FP, we would have to record the number of the current stack :-) • In the case of C, though, we must assume that all storage reagions live within the same address space — only at different locations :-) SP Und FP then uniquely identify storage locations. • For simplicity, we omit the extreme-pointer 215 EP. 28 The Ready-Queue Idea: • Every thread has a unique number tid. • A table TTab allows to determine for every tid the corresponding thread. • At every point in time, there can be several executable threads, but only one running thread (per processor :-) • the tid of the currently running thread is cept in the register CT Thread). • The function: Accordingly: tid self () (Current returns the tid of the current thread. codeR self () ρ 216 = self ... where the instruction the (current) stack: self pushes the content of the register CT CT 11 11 self S[SP++] = CT; 217 11 CT onto • The remaining executable threads (more precisely, their tid’s) are maintained in the queue RQ (Ready-Queue). • For queues, we need the functions: void enqueue (queue q, tid t), tid dequeue (queue q) which insert a tid into a queue and return the first one, respectively ... 218 CT TTab 219 RQ CT RQ TTab 13 enqueue(RQ, 13) 220 CT RQ TTab 13 221 CT RQ TTab CT = dequeue(RQ); 222 CT RQ TTab 223 If a call to dequeue () failed, it returns a value < 0 :-) The thread table must contain for every thread, all information which is needed for its execution. In particular it consists of the registers PC, SP und FP: 2 SP 1 0 PC FP Interrupting the current thread therefore requires to save these registers: void save () { TTab[CT℄[0℄ = FP; TTab[CT℄[1℄ = PC; TTab[CT℄[2℄ = SP; } 224 Analogously, we restore these registers by calling the function: void restore () { FP = TTab[CT℄[0℄; PC = TTab[CT℄[1℄; SP = TTab[CT℄[2℄; } Thus, we can realize an instruction yield which causes a thread-switch: tid ct = dequeue ( RQ ); if (ct ≥ 0) { save (); enqueue ( RQ, CT ); CT = ct; restore (); } Only if the ready-queue is non-empty, the current thread is replaced 225 :-) 29 Switching between Threads Problem: We want to give each executable thread a fair chance to be completed. ==⇒ • Every thread must former or later be scheduled for running. • Every thread must former or later be interrupted. Possible Strategies: • Thread switch only at explicit calls to a function • Thread switch after every instruction yield() :-( ==⇒ too expensive :-( • Thread switch after a fixed number of steps ==⇒ we must install a counter and execute yield at dynamically chosen points :-( 226 We insert thread switches at selected program points ... • at the beginning of function bodies; • before every jump whose target does not exceed the current PC ... ==⇒ rare :-)) The modified scheme for loops code s ρ = A: s ≡ while (e) s codeR e ρ jumpz B code s ρ yield jump A B: ... 227 then yields: Note: • If-then-else-Statements do not necessarily contain thread switches. • do-while-Loops require a thread switch at the end of the condition. • Every loop should contain (at least) one thread switch :-) • Loop-Unroling reduces the number of thread switches. • At the translation of switch-statements, we created a jump table behind the code for the alternatives. Nonetheless, we can avoid thread switches here. • At freely programmed uses of jumpi as well as jumpz we should also insert thread switches before the jump (or at the jump target). • If we want to reduce the number of executed thread switches even further, we could switch threads, e.g., only at every 100th call of yield ... 228 30 Generating New Threads We assume that the expression: s ≡ create (e0 , e1 ) first evaluates the expressions ei to the values f , a and then creates a new thread which computes f ( a) . If thread creation fails, s returns the value −1. Otherwise, s returns the new thread’s tid. Tasks of the Generated Code: • Evaluation of the ei ; • Allocation of a new run-time stack together with a stack frame for the evaluation of f ( a ); • Generation of a new tid; • Allocation of a new entry in the TTab; • Insertion of the new tid into the ready-queue. 229 The translation of s then is quite simple: codeR s ρ = codeR e0 ρ codeR e1 ρ initStack initThread where we assume the argument value occupies 1 cell :-) For the implementation of initStack we need a run-time function newStak() which returns a pointer onto the first element of a new stack: 230 SP SP newStack() If the creation of a new stack fails, the value 0 is returned. 231 SP SP initStack newStak(); if (S[SP]) { S[S[SP]+1] = -1; S[S[SP]+2] = f; S[S[SP]+3] = S[SP-1]; S[SP-1] = S[SP]; SP-} else S[SP = SP - 2] = -1; 232 f −1 Note: • The continuation address of threads. f points to the (fixed) code for the termination • Inside the stack frame, we no longer allocate space for the EP return value has relative address −2. • The bottom stack frame can be identified through ==⇒ the FPold = -1 :-) In order to create new thread ids, we introduce a new register Count). TC (Thread Initially, TC has the value 0 (corresponds to the tid of the initial thread). Before thread creation, TC is incremented by 1. 233 SP SP 37 TC 6 TC 5 6 initThread 37 6 234 if (S[SP] ≥ 0) { tid = ++TCount; TTab[tid][0] = S[SP]-1; TTab[tid][1] = S[SP-1]; TTab[tid][2] = S[SP]; S[--SP] = tid; enqueue( RQ, tid ); } 235 31 Terminating Threads Termination of a thread (usually :-) returns a value. There are two (regular) ways to terminate a thread: 1. The initial function call has terminated. Then the return value is the return value of the call. 2. The thread executes the statement the value of e. exit (e); Then the return value equals Warning: • We want to return the return value in the bottom stack cell. • exit may occur arbitrarily deeply nested inside a recursion. Then we de-allocate all stack frames ... • ... and jump to the terminal treatment of threads at address 236 f . Therefore, we translate: code exit (e); ρ = codeR e ρ exit term next The instruction term is explained later :-) The instruction exit successively pops all stack frames: result = S[SP]; while (FP 6= –1) { SP = FP–2; FP = S[FP–1]; } S[SP] = result; 237 17 FP FP −1 −1 exit 17 238 The instruction next activates the next executable thread: in contrast to yield the current thread is not inserted into RQ . RQ 4 CT 4 SP PC FP 5 7 2 RQ 13 CT 13 SP 39 PC 4 FP 21 next 13 39 4 21 4 239 5 7 2 13 39 4 21 Ist die Schlange RQ leer, wird zusätzlich If the queue RQ is empty, we additionally terminate the whole program: if (0 > ct = dequeue( RQ )) halt; else { save (); CT = ct; restore (); } 240 32 Waiting for Termination Occaionally, a thread may only continue with its execution, if some other thread has terminated. For that, we have the expression join (e) where we assume that e evaluatges to a thread id tid. • If the thread with the given tid is already terminated, we return its return value. • If it is not yet terminated, we interrupt the current thread execution. • We insert the current thread into the queue of treads already waiting for the termination. We save the current registers and switch to the next executable thread. • Thread waiting for termination are maintained in the table • There, we also store the return values of threads 241 :-) JTab. Example: JTab 0 1 2 3 2 3 CT RQ 0 1 4 4 Thread 0 is running, thread 1 could run, threads 2 and 3 wait for the termination of 1, and thread 4 waits for the termination of 3. 242 Thus, we translate: codeR join (e) ρ = codeR e ρ join finalize ... where the instruction join is defined by: tid = S[SP]; if (TTab[tid][1] ≥ 0) { enqueue ( JTab[tid], CT ); next } 243 ... accordingly: SP 5 SP 42 finalize 5 42 5 S[SP] = JTab[tid][1]; 244 42 The instruction sequence: term next is executed before a thread is terminated. Therefore, we store them at the location f. The instruction though, next switches to the next executable thread. Before that, • ... the last stack frame must be popped and the result be stored in the table JTab ; • ... the thread must be marked as terminated, e.g., by additionally setting the PC to −1; • ... all threads must be notified which have waited for the termination. For the instruction term this means: 245 PC = –1; JTab[CT][1] = S[SP]; freeStack(SP); while (0 ≤ tid = dequeue ( JTab[CT][0] )) enqueue ( RQ, tid ); The run-time function the location adr : freeStak (int adr) removes the (one-element) stack at freeStack(adr) adr 246 33 Mutual Exclusion A mutex is an (abstract) datatype (in the heap) which should allow the programmer to dedicate exclusive access to a shared resource (mutual exclusion). The datatype supports the following operations: Mutex ∗ newMutex (); void lok (Mutex ∗me); — creates a new mutex; — tries to acquire the mutex; void unlok (Mutex ∗me); — releases the mutex; Warning: A thread is only allowed to release a mutex if it has owned it beforehand 247 :-) A mutex me consists of: • the tid of the current owner (or −1 if there is no one); • the queue BQ of blocked threads which want to acquire the mutex. 1 BQ 0 owner 248 Then we translate: codeR newMutex () ρ = newMutex where: −1 newMutex 249 Then we translate: code lock (e); ρ = codeR e ρ lock where: CT CT 17 −1 17 17 lock 250 If the mutex is already owned by someone, the current thread is interrupted: CT CT 17 17 5 5 lock if (S[S[SP]] < 0) S[S[SP– –]] = CT; else { enqueue ( S[SP– –]+1, CT ); next; } 251 Accordingly, we translate: code unlock (e); ρ = codeR e ρ unlock where: CT CT 5 5 17 5 17 unlock 252 If the queue CT BQ is empty, we release the mutex: CT 5 5 5 −1 unlock if (S[S[SP]] 6= CT) Error (“Illegal unlock!”); if (0 > tid = dequeue ( S[SP]+1)) S[S[SP– –]] = –1; else { S[S[SP--]] = tid; enqueue ( RQ, tid ); } 253 34 Waiting for Better Wheather It may happen that a thread owns a mutex but must wait until some extra condition is true. Then we want the thread to remain in-active until it is told otherwise. For that, we use condition variables. A condition variable consists of a queue WQ of waiting threads :-) 0 WQ 254 For condition variables, we introduce the functions: CondVar ∗ newCondVar void wait (CondVar ∗ (); v), void signal (CondVar ∗ Mutex ∗ me); v); void broadast (CondVar ∗ v); 255 — creates a new condition variable; — enqueues the current thread; — re-animates one waiting thread; — re-animates all waiting threads. Then we translate: codeR newCondVar () ρ = newCondVar where: newCondVar 256 After enqueuing the current thread, we release the mutex. After re-animation, though, we must acquire the mutex again. Therefore, we translate: code wait (e0 , e1 ); ρ = codeR e1 ρ codeR e0 ρ wait dup unlock next lock where ... 257 CT CT 5 5 5 5 5 wait if (S[S[SP-1]] 6= CT) Error (“Illegal wait!”); enqueue ( S[SP], CT ); SP--; 258 Accordingly, we translate: code signal (e); ρ = codeR e ρ signal RQ RQ 17 signal if (0 ≤ tid = dequeue ( S[SP])) enqueue ( RQ, tid ); SP--; 259 17 Analogously: code broadcast (e); ρ = codeR e ρ broadcast where the instruction into the ready-queue broadcast enqueues all threads from the queue RQ : WQ while (0 ≤ tid = dequeue ( S[SP])) enqueue ( RQ, tid ); SP--; Warning: The re-animated threads are not blocked !!! When they become running, though, they first have to acquire their mutex 260 :-) 35 Example: Semaphores A semaphore is an abstract datatype which controls the access of a bounded number of (identical) resources. Operations: Sema ∗ newSema (int n ) — void Up (Sema void Down ∗ (Sema s) ∗ s) creates a new semaphore; — increases the number of free resources; — decreases the number of available resources. 261 Therefore, a semaphore consists of: • a counter of type int; • a mutex for synchronizing the semaphore operations; • a condition variable. typedef struct { Mutex ∗ me; CondVar ∗ cv; int count; } Sema; 262 Sema ∗ newSema (int n) { Sema ∗ s; s = (Sema ∗) malloc (sizeof (Sema)); s→me = newMutex (); s→cv = newCondVar (); s→count = n; return (s); } 263 The translation of the body amounts to: alloc 1 newMutex newCondVar loadr 1 loadr 2 loadc 3 loadr 2 loadr 2 loadr 2 storer -2 new store loadc 1 loadc 2 return storer 2 pop add add store store pop pop pop 264 The function Down() decrements the counter. If the counter becomes negative, wait is called: void Down (Sema ∗ s) { Mutex ∗me; me = s→me; lock (me); s→count– –; if (s→count < 0) wait (s→cv,me); unlock (me); } 265 The translation of the body amounts to: alloc 1 loadc 2 add loadc 1 loadr 1 add store add load load loadc 0 load storer 2 loadc 1 le wait lock sub jumpz A A: loadr 2 loadr 1 loadr 2 unlock loadc 2 loadr 1 return loadr 1 266 The function Up() increments the counter again. If it is afterwards not yet positive, there still must exist waiting threads. One of these is sent a signal: void Up (Sema ∗ s) { Mutex ∗me; me = s→me; lock (me); s→count++; if (s→count ≤ 0) signal (s→cv); unlock (me); } 267 The translation of the body amounts to: alloc 1 loadc 2 add loadc 1 loadr 1 add store add load load loadc 0 load storer 2 loadc 1 le signal lock add jumpz A A: loadr 2 loadr 1 loadr 1 unlock loadc 2 loadr 1 268 return 36 Stack-Management Problem: • All threads live within the same storage. • Every thread requires its own stack (at least conceptually). 1. Idea: Allocate for each new thread a fixed amount of storage space. ==⇒ Then we implement: void *newStak() { return mallo(M); } void freeStak(void *adr) { free(adr); } 269 Problem: • Some threads consume much, some only little stack space. • The necessary space is statically typically unknown :-( 2. Idea: • Maintain all stacks in one joint Frame-Heap FH :-) • Take care that the space inside the stack frame is sufficient at least for the current function call. • A global stack-pointer GSP points to the overall topmost stack cell ... 270 GSP thread 1 thread 2 Allocation and de-allocation of a stack frame makes use of the run-time functions: int newFrame(int size) { int result = GSP; GSP = GSP+size; return result; } void freeFrame(int sp, int size); 271 Warning: The de-allocated block may reside inside the stack :-( ==⇒ We maintain a list of freed stack blocks 42 30 :-) 19 15 7 6 0 3 1 This list supports a function void insertBlok(int max, int min) which allows to free single blocks. • If the block is on top of the stack, we pop the stack immediately; • ... together with the blocks below – given that these have already been marked as de-allocated. • If the block is inside the stack, we merge it with neighbored free blocks: 272 GSP GSP freeBlock(...) 273 GSP GSP freeBlock(...) 274 GSP GSP freeBlock(...) 275 Approach: We allocate a fresh block for every function call ... Problem: When ordering the block before the call, we do not yet know the space consumption of the called function :-( ==⇒ We order the new block after entering the function body! 276 SP Organisational cells as well as actual parameters must be allocated inside the old block ... 277 SP actual parameters When entering the new function, we now allocate the new block ... and one further line 278 SP actual parameters FP local variables Inparticular, the local variables reside in the new block ... and one further line 279 ==⇒ We address ... • the formal parameters relatively to the frame-pointer; • the local variables relatively to the stack-pointer :-) ==⇒ Alternative: We must re-organize the complete code generation ... :-( Passing of parameters in registers ... :-) 280 argument registers SP The values of the actual parameters are determined before allocation of the new stack frame. 281 argument registers actual parameters SP FP organizational cells The complete frame is allocated inside the new block – plus the space for the current parameters. 282 argument registers actual parameters SP FP Inside the new block, though, we must store the old order to correctly return the result ... :-) 283 SP (possibly +1) in 3. Idea: Hybrid Solution • For the first k threads, we allocate a separate stack area. • For all further threads, we successively use one of the existing ones !!! ==⇒ • For few threads extremely simple and efficient; • For many threads amortized storage usage 284 :-))

© Copyright 2020