책을 몇권 읽어보시고 각각의 항목에 해당하는 Tools 을 소스를 보시면 도움이 될겁니다.
ps. 몇몇 compiler tool 들은 문법( BNF, lex & yacc )만 정의하면 제한된 플랫픔(x86, VM, ... )에서 가능한 컴파일러를 만들어 줍니다.
Front end
The front end analyzes the source code to build an internal representation of the program, called the intermediate representation or IR. It also manages the symbol table, a data structure mapping each symbol in the source code to associated information such as location, type and scope. This is done over several phases, which includes some of the following:
1. Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within identifiers require a phase before parsing, which converts the input character sequence to a canonical form ready for the parser. The top-down, recursive-descent, table-driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. Atlas Autocode, and Imp (and some implementations of Algol and Coral66) are examples of stropped languages whose compilers would have a Line Reconstruction phase.
2. Lexical analysis breaks the source code text into small pieces called tokens. Each token is a single atomic unit of the language, for instance a keyword, identifier or symbol name. The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This phase is also called lexing or scanning, and the software doing lexical analysis is called a lexical analyzer or scanner.
3. Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports macro substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as Scheme support macro substitutions based on syntactic forms.
4. Syntax analysis involves parsing the token sequence to identify the syntactic structure of the program. This phase typically builds a parse tree, which replaces the linear sequence of tokens with a tree structure built according to the rules of a formal grammar which define the language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.
5. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree and builds the symbol table. This phase performs semantic checks such as type checking (checking for type errors), or object binding (associating variable and function references with their definitions), or definite assignment (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the parsing phase, and logically precedes the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.
[edit] Back end
The term back end is sometimes confused with code generator because of the overlapped functionality of generating assembly code. Some literature uses middle end to distinguish the generic analysis and optimization phases in the back end from the machine-dependent code generators.
The main phases of the back end include the following:
1. Analysis: This is the gathering of program information from the intermediate representation derived from the input. Typical analyses are data flow analysis to build use-define chains, dependence analysis, alias analysis, pointer analysis, escape analysis etc. Accurate analysis is the basis for any compiler optimization. The call graph and control flow graph are usually also built during the analysis phase.
2. Optimization: the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are inline expansion, dead code elimination, constant propagation, loop transformation, register allocation or even automatic parallelization.
3. Code generation: the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes (see also Sethi-Ullman algorithm).
마법사 책 SICP (컴퓨터 프로그램의 구조와 해석) !!!!
http://insightbook.springnote.com/pages/253156
일단 이 책을 사서 언제나 들고 다니면서 읽으심이 좋을 것 같습니다.
임예진 팬클럽 ♡예진아씨♡ http://cafe.daum.net/imyejin
[예진아씨 피카사 웹앨범] 임예진 팬클럽 ♡예진아씨♡ http://cafe.daum.net/imyejin
감사합니다.
1년동안 열심히보고. 제대로 읽고 제대로 쓸 수 있다면 좋겠습니다.
매일 1억명이 사용하는 프로그램을 함께 만들어보고 싶습니다.
----------------------------------------------------------------------------
젊음'은 모든것을 가능하게 만든다.
매일 1억명이 사용하는 프로그램을 함께 만들어보고 싶습니다.
정규 근로 시간을 지키는. 야근 없는 회사와 거래합니다.
각 분야별. 좋은 책'이나 사이트' 블로그' 링크 소개 받습니다. shintx@naver.com
음...
학문적 접근을 한다면 책을 추천해드립니다. 종류도 몇권안되니 쉽게 찾아볼 수 있습니다.
구현적인 접근이라면 구글에서 "Compiler Tool" 로 검색해보시면 관련내용을 확인하실 수 있습니다.
그림과 같이 크게 Front-end, Back-end 로 나뉩니다.
( http://en.wikipedia.org/wiki/Image:Compiler.svg )
책을 몇권 읽어보시고 각각의 항목에 해당하는 Tools 을 소스를 보시면 도움이 될겁니다.
ps. 몇몇 compiler tool 들은 문법( BNF, lex & yacc )만 정의하면 제한된 플랫픔(x86, VM, ... )에서 가능한 컴파일러를 만들어 줍니다.
Hello World.
감사합니다.
아직은 감잡으려면 노력이 필요한것 같습니다.
매일 1억명이 사용하는 프로그램을 함께 만들어보고 싶습니다.
----------------------------------------------------------------------------
젊음'은 모든것을 가능하게 만든다.
매일 1억명이 사용하는 프로그램을 함께 만들어보고 싶습니다.
정규 근로 시간을 지키는. 야근 없는 회사와 거래합니다.
각 분야별. 좋은 책'이나 사이트' 블로그' 링크 소개 받습니다. shintx@naver.com
댓글 달기