gcc optimization list
------------------------------------------------------------
http://www.redhat.com/en_us/USA/home/gnupro/technicaldetails/gccoptimization/


GCC Optimization


Alias analysis:
Used to determine when a storage location may be accessed in more than one way. Alias analysis itself is not an optimization; instead it computes data which other optimization passes need. In particular the information produced by alias analysis is used by Common Subexpression Elimination (CSE), Loop, scheduling and the register allocator.


Constant folding:
This involves eliminating instructions, which compute a value that is a known compile time constant. Many different parts of gcc perform constant folding -- it's not an optimization pass unto itself, but instead several of our optimizers call routines to perform constant folding.


Algebraic Simplifications & Reassociation:
This involves using algebraic properties to rearrange expressions to allow better optimization, rearrange expressions so that loop invariant parts of an expression are exposed, and eliminate those expressions which are unnecessary operations (i.e. a + 0 is the same as "a" for non- floating point code).


Value numbering:
GCC uses value numbering in its CSE pass.


Copy propagation:
This optimization tries to eliminate useless copy assignments in code. GCC has both local and global copy propagation.


Constant propagation:
This is similar to copy propagation, except that it applies to eliminating useless assignment operations involving constants. GCC has both local and global copy propagation.


Common Subexpression Elimination:
GCC has both a local and a global common subexpression elimination pass. The local CSE pass includes support for CSE on extended basic blocks and "around loops".


Partial Redundancy Elimination:
PRE is a more effective form of GCSE. In a nutshell it finds and eliminates more redundant expressions on a function-wide basis. The PRE pass also performs loop invariant code motion (see below) as well as critical edge splitting. Critical edge splitting is a technique to expose more redundant expressions in a program so that they can be eliminated.


Loop Invariant Code Motion:
GCC includes loop invariant code motion as part of its loop optimizer as well as in its partial redundancy elimination pass. This optimization removes instructions from loops, which compute a value which does not change throughout the lifetime of a loop.


Scalar Replacement of Aggregates:
This optimization involves treating fields in an array, structure or union as unique entities, which can be optimized like normal scalar variables. Of particular importance is the ability to hold aggregates in registers.


Procedure Integration:
GCC supports function inlining. It can inline specific functions based on their declaration or it can attempt to inline all function calls based on a command line switch. .


Tail call optimizations GCC does supports one special case of tail call optimizations -- tail recursion elimination.


Dead Code elimination:
GCC supports traditional dead code elimination -- i.e. it will remove code which it can determine at compile time that does not do anything useful (if not used to store into memory, as a function arg, return value, etc).


Induction variable elimination & Strength reduction GCC supports elimination of both basic induction variables and general induction variables. It also supports combination of related general induction variables. These optimizations remove unnecessary expressions from within loops and are vital to good generating highly optimized loops. And finally it supports strength reduction of expressions inside loops, including strength reduction of memory address calculations.


Loop unrolling:
GCC supports command line driven loop unrolling with register splitting (AKA variable expansion).


Live variable analysis:
This isn't an optimization in itself, though it does enable other optimization passes like dead code elimination to be more effective.


Leaf routine optimizations:
GCC supports leaf routine optimizations. Generally this allows gcc to perform better register allocation and generate more efficient prologues/epilogues.


Prologue/epilogue optimizations; GCC generates highly optimized prologues and epilogues; it also supports instruction scheduling and delay slot filling for prologue and epilogue sequences.


Caller-save optimizations:
GCC supports allocation of variables to call-clobbered registers in the presence of calls (with lazy save/restores around call points).


Local register allocation:
GCC uses weighted counts to allocate values into registers, which live in a single basic block. It also performs register tying to avoid unnecessary register-register copies (AKA register coalescing or subsumption).


Global register allocation:
GCC uses weighted counts to allocate values into registers, which are used in more than once basic block.


Spill code optimization GCC attempts to generate optimized spill code whenever possible. Including running a small CSE pass after register allocation and spilling to remove redundant spill code.


Instruction scheduling:
GCC supports both single-issue and multi-issue instruction scheduling. GCC also supports cross-block (critical path) scheduling. Both basic block and cross-block scheduling are performed using forward list scheduling algorithms.


Delay slot scheduling:
GCC supports delay slot scheduling for processors, which have branch/call delay slots.


Conditional move/predication:
GCC supports traditional conditional move instructions as well as simple instruction predication (nullification). The predication code is pretty simple, we will continue to improve it as more systems with predication appear.


Unreachable code elimination:
GCC supports removal of unreachable code.


Code straightening:
GCC supports code straightening by replacing conditional branch sequence with straight-line code, conditional moves, predicated instructions, etc.


If expression simplifications:
GCC supports most forms of "if simplifications"; these optimizations can reduce the number of conditional branches in a program. But more importantly, when an "if" expression can be simplified to a constant, then gcc can eliminate branches & dead code entirely.


Loop inversion:
GCC supports loop inversion, which eliminates some test, & branch code inside loops and, supports inversion of loop counter to count down to zero instead of up to a constant value. Saves cycles inside some loops.


Branch optimizations:
GCC supports elimination of branches to branches, branches around branches, etc etc.


Tail merging/cross jumping:
Supported by GCC.


Machine idioms & instruction combining:
Supported by GCC.


New regmove optimizations:
Improvements to reduce register pressure for 2 address machines such as the ia32, Hitachi SH, m68k, Hitachi H8 series, etc etc.


Reduced register pressure leads to more efficient code, particularly in routines where the number of user variables and compiler generated temporaries is larger than the physical register set for the target processor.


More reload inheritance:
Spill code moves data in/out of memory or in/out of special registers when a particular variable/compiler temporary did not get allocated to one of the target processor's registers.


Reload inheritance attempts to optimize spill code, particularly unnecessary memory loads/stores caused by register spilling.


Local spilling in reload:
GCC's register allocator works by allocating variables and compiler temporaries to the target machine's register set. Multiple variables and compiler temporaries can live in a single register as long as the variables/temporaries do not have overlapping lifetimes. ie, at any given point there can only be one variable or compiler temporary being held in a particular target register.


From time to time the compiler may need to use one of the target machine's registers. This requires making the desired target register free and emitting code to use it in the desired manner. This is known as register reloading.


Post reload flow analysis:
Post reload flow analysis allows the compiler to detect and eliminate dead code that was created or exposed by register allocation and reloading.


Constant equivalence handling for muliply-set pseudos:
The compiler identifies user variables and compiler temporaries that are always known to hold the same value throughout the entire function -- even if the user variable or temporary is set multiple times. This allows the compiler to substitute the equivalent expression in place of the user variable or temporary when it is profitable to do so.


PowerPC scheduling prologue & epilogue:
The compiler performs instruction scheduling on prologue and epilogue sequences for PowerPC targets. This can improve the performance of code for PowerPC targets.


Comparison optimizations:
The compiler more aggressively transforms combinations of comparisons into simpler comparisons. This is particularly useful on newer PowerPC targets.


New Functionality Recently Added
Target-independent register allocation enhancements:


    * New regmove optimizations
    * More reload inheritance
    * Local spilling in reload
    * Post reload flow analysis
    * Constant equivalence handling for multiple-times set pseudos
    * Equivalences of pseudos to stack slot addresses can now be handled
    * Support of PowerPC 750
    * PowerPC scheduling prologue & epilogue
    * Comparison optimizations (transformation of some combination of comparisons into simpler ones)
    * Data layout consistency testsuite

'KB > Win32/x86' 카테고리의 다른 글

sysinernals mark 블로그~  (0) 2007.04.30
컴파일러 설계강의 정리  (0) 2007.04.27
[펌] Windows Registry 데이터베이스 파일 분석  (0) 2007.04.10
Pfair (Proportionate fair) scheduling  (0) 2006.09.23
Operating System Inside  (0) 2006.09.20

+ Recent posts