nasmdoc.src
资源名称:nasm-0.98.zip [点击查看]
上传用户:yuppie_zhu
上传日期:2007-01-08
资源大小:535k
文件大小:370k
源码类别:
编译器/解释器
开发平台:
C/C++
- # $Id: nasmdoc.src,v 2.3 1999/06/03 20:23:53 hpa Exp $
- #
- # Source code to NASM documentation
- #
- IR{-D} c{-D} option
- IR{-E} c{-E} option
- IR{-I} c{-I} option
- IR{-P} c{-P} option
- IR{-U} c{-U} option
- IR{-a} c{-a} option
- IR{-d} c{-d} option
- IR{-e} c{-e} option
- IR{-f} c{-f} option
- IR{-i} c{-i} option
- IR{-l} c{-l} option
- IR{-o} c{-o} option
- IR{-p} c{-p} option
- IR{-s} c{-s} option
- IR{-u} c{-u} option
- IR{-w} c{-w} option
- IR{!=} c{!=} operator
- IR{$ here} c{$} Here token
- IR{$$} c{$$} token
- IR{%} c{%} operator
- IR{%%} c{%%} operator
- IR{%+1} c{%+1} and c{%-1} syntax
- IA{%-1}{%+1}
- IR{%0} c{%0} parameter count
- IR{&} c{&} operator
- IR{&&} c{&&} operator
- IR{*} c{*} operator
- IR{..@} c{..@} symbol prefix
- IR{/} c{/} operator
- IR{//} c{//} operator
- IR{<} c{<} operator
- IR{<<} c{<<} operator
- IR{<=} c{<=} operator
- IR{<>} c{<>} operator
- IR{=} c{=} operator
- IR{==} c{==} operator
- IR{>} c{>} operator
- IR{>=} c{>=} operator
- IR{>>} c{>>} operator
- IR{?} c{?} MASM syntax
- IR{^} c{^} operator
- IR{^^} c{^^} operator
- IR{|} c{|} operator
- IR{||} c{||} operator
- IR{~} c{~} operator
- IR{%$} c{%$} and c{%$$} prefixes
- IA{%$$}{%$}
- IR{+ opaddition} c{+} operator, binary
- IR{+ opunary} c{+} operator, unary
- IR{+ modifier} c{+} modifier
- IR{- opsubtraction} c{-} operator, binary
- IR{- opunary} c{-} operator, unary
- IR{alignment, in bin sections} alignment, in c{bin} sections
- IR{alignment, in elf sections} alignment, in c{elf} sections
- IR{alignment, in win32 sections} alignment, in c{win32} sections
- IR{alignment, of elf common variables} alignment, of c{elf} common
- variables
- IR{alignment, in obj sections} alignment, in c{obj} sections
- IR{a.out, bsd version} c{a.out}, BSD version
- IR{a.out, linux version} c{a.out}, Linux version
- IR{autoconf} Autoconf
- IR{bitwise and} bitwise AND
- IR{bitwise or} bitwise OR
- IR{bitwise xor} bitwise XOR
- IR{block ifs} block IFs
- IR{borland pascal} Borland, Pascal
- IR{borland's win32 compilers} Borland, Win32 compilers
- IR{braces, after % sign} braces, after c{%} sign
- IR{bsd} BSD
- IR{c calling convention} C calling convention
- IR{c symbol names} C symbol names
- IA{critical expressions}{critical expression}
- IA{command line}{command-line}
- IA{case sensitivity}{case sensitive}
- IA{case-sensitive}{case sensitive}
- IA{case-insensitive}{case sensitive}
- IA{character constants}{character constant}
- IR{common object file format} Common Object File Format
- IR{common variables, alignment in elf} common variables, alignment
- in c{elf}
- IR{common, elf extensions to} c{COMMON}, c{elf} extensions to
- IR{common, obj extensions to} c{COMMON}, c{obj} extensions to
- IR{declaring structure} declaring structures
- IR{default-wrt mechanism} default-c{WRT} mechanism
- IR{devpac} DevPac
- IR{djgpp} DJGPP
- IR{dll symbols, exporting} DLL symbols, exporting
- IR{dll symbols, importing} DLL symbols, importing
- IR{dos} DOS
- IR{dos archive} DOS archive
- IR{dos source archive} DOS source archive
- IA{effective address}{effective addresses}
- IA{effective-address}{effective addresses}
- IR{elf shared libraries} c{elf} shared libraries
- IR{freebsd} FreeBSD
- IR{freelink} FreeLink
- IR{functions, c calling convention} functions, C calling convention
- IR{functions, pascal calling convention} functions, Pascal calling
- convention
- IR{global, aoutb extensions to} c{GLOBAL}, c{aoutb} extensions to
- IR{global, elf extensions to} c{GLOBAL}, c{elf} extensions to
- IR{got} GOT
- IR{got relocations} c{GOT} relocations
- IR{gotoff relocation} c{GOTOFF} relocations
- IR{gotpc relocation} c{GOTPC} relocations
- IR{linux elf} Linux ELF
- IR{logical and} logical AND
- IR{logical or} logical OR
- IR{logical xor} logical XOR
- IR{masm} MASM
- IA{memory reference}{memory references}
- IA{misc directory}{misc subdirectory}
- IR{misc subdirectory} c{misc} subdirectory
- IR{microsoft omf} Microsoft OMF
- IR{mmx registers} MMX registers
- IA{modr/m}{modr/m byte}
- IR{modr/m byte} ModR/M byte
- IR{ms-dos} MS-DOS
- IR{ms-dos device drivers} MS-DOS device drivers
- IR{multipush} c{multipush} macro
- IR{nasm version} NASM version
- IR{netbsd} NetBSD
- IR{omf} OMF
- IR{openbsd} OpenBSD
- IR{operating-system} operating system
- IR{os/2} OS/2
- IR{pascal calling convention}Pascal calling convention
- IR{passes} passes, assembly
- IR{perl} Perl
- IR{pic} PIC
- IR{pharlap} PharLap
- IR{plt} PLT
- IR{plt} c{PLT} relocations
- IA{pre-defining macros}{pre-define}
- IR{qbasic} QBasic
- IA{rdoff subdirectory}{rdoff}
- IR{rdoff} c{rdoff} subdirectory
- IR{relocatable dynamic object file format} Relocatable Dynamic
- Object File Format
- IR{relocations, pic-specific} relocations, PIC-specific
- IA{repeating}{repeating code}
- IR{section alignment, in elf} section alignment, in c{elf}
- IR{section alignment, in bin} section alignment, in c{bin}
- IR{section alignment, in obj} section alignment, in c{obj}
- IR{section alignment, in win32} section alignment, in c{win32}
- IR{section, elf extensions to} c{SECTION}, c{elf} extensions to
- IR{section, win32 extensions to} c{SECTION}, c{win32} extensions to
- IR{segment alignment, in bin} segment alignment, in c{bin}
- IR{segment alignment, in obj} segment alignment, in c{obj}
- IR{segment, obj extensions to} c{SEGMENT}, c{elf} extensions to
- IR{segment names, borland pascal} segment names, Borland Pascal
- IR{shift commane} c{shift} command
- IA{sib}{sib byte}
- IR{sib byte} SIB byte
- IA{standard section names}{standardised section names}
- IR{symbols, exporting from dlls} symbols, exporting from DLLs
- IR{symbols, importing from dlls} symbols, importing from DLLs
- IR{tasm} TASM
- IR{test subdirectory} c{test} subdirectory
- IR{tlink} TLINK
- IR{underscore, in c symbols} underscore, in C symbols
- IR{unix} Unix
- IR{unix source archive} Unix source archive
- IR{val} VAL
- IR{version number of nasm} version number of NASM
- IR{visual c++} Visual C++
- IR{www page} WWW page
- IR{win32} Win32
- IR{windows} Windows
- IR{windows 95} Windows 95
- IR{windows nt} Windows NT
- # IC{program entry point}{entry point, program}
- # IC{program entry point}{start point, program}
- # IC{MS-DOS device drivers}{device drivers, MS-DOS}
- # IC{16-bit mode, versus 32-bit mode}{32-bit mode, versus 16-bit mode}
- # IC{c symbol names}{symbol names, in C}
- C{intro} Introduction
- H{whatsnasm} What Is NASM?
- The Netwide Assembler, NASM, is an 80x86 assembler designed for
- portability and modularity. It supports a range of object file
- formats, including Linux c{a.out} and ELF, NetBSD/FreeBSD, COFF,
- Microsoft 16-bit OBJ and Win32. It will also output plain binary
- files. Its syntax is designed to be simple and easy to understand,
- similar to Intel's but less complex. It supports Pentium, P6 and MMX
- opcodes, and has macro capability.
- S{yaasm} Why Yet Another Assembler?
- The Netwide Assembler grew out of an idea on ic{comp.lang.asm.x86}
- (or possibly ic{alt.lang.asm} - I forget which), which was
- essentially that there didn't seem to be a good free x86-series
- assembler around, and that maybe someone ought to write one.
- b ic{a86} is good, but not free, and in particular you don't get any
- 32-bit capability until you pay. It's DOS only, too.
- b ic{gas} is free, and ports over DOS and Unix, but it's not very good,
- since it's designed to be a back end to ic{gcc}, which always feeds
- it correct code. So its error checking is minimal. Also, its syntax
- is horrible, from the point of view of anyone trying to actually
- e{write} anything in it. Plus you can't write 16-bit code in it
- (properly).
- b ic{as86} is Linux-specific, and (my version at least) doesn't seem to
- have much (or any) documentation.
- b i{MASM} isn't very good, and it's expensive, and it runs only under
- DOS.
- b i{TASM} is better, but still strives for i{MASM} compatibility, which
- means millions of directives and tons of red tape. And its syntax is
- essentially i{MASM}'s, with the contradictions and quirks that entails
- (although it sorts out some of those by means of Ideal mode). It's
- expensive too. And it's DOS-only.
- So here, for your coding pleasure, is NASM. At present it's
- still in prototype stage - we don't promise that it can outperform
- any of these assemblers. But please, e{please} send us bug reports,
- fixes, helpful information, and anything else you can get your hands
- on (and thanks to the many people who've done this already! You all
- know who you are), and we'll improve it out of all recognition.
- Again.
- S{legal} Licence Conditions
- Please see the file c{Licence}, supplied as part of any NASM
- distribution archive, for the i{licence} conditions under which you
- may use NASM.
- H{contact} Contact Information
- The current version of NASM (since 0.98) are maintained by H. Peter
- Anvin, W{mailto:hpa@zytor.com}c{hpa@zytor.com}. If you want to report
- a bug, please read k{bugs} first.
- NASM has a i{WWW page} at
- W{http://www.cryogen.com/Nasm}c{http://www.cryogen.com/Nasm}.
- The original authors are i{e-mail}able as
- W{mailto:jules@earthcorp.com}c{jules@earthcorp.com} and
- W{mailto:anakin@pobox.com}c{anakin@pobox.com}.
- i{New releases} of NASM are uploaded to
- W{ftp://ftp.kernel.org/pub/software/devel/nasm/}ic{ftp.kernel.org},
- W{ftp://sunsite.unc.edu/pub/Linux/devel/lang/assemblers/}ic{sunsite.unc.edu},
- W{ftp://ftp.simtel.net/pub/simtelnet/msdos/asmutl/}ic{ftp.simtel.net}
- and
- W{ftp://ftp.coast.net/coast/msdos/asmutil/}ic{ftp.coast.net}.
- Announcements are posted to
- W{news:comp.lang.asm.x86}ic{comp.lang.asm.x86},
- W{news:alt.lang.asm}ic{alt.lang.asm},
- W{news:comp.os.linux.announce}ic{comp.os.linux.announce} and
- W{news:comp.archives.msdos.announce}ic{comp.archives.msdos.announce}
- (the last one is done automagically by uploading to
- W{ftp://ftp.simtel.net/pub/simtelnet/msdos/asmutl/}c{ftp.simtel.net}).
- If you don't have Usenet access, or would rather be informed by
- i{e-mail} when new releases come out, you can subscribe to the
- c{nasm-announce} email list by sending an email containing the line
- c{subscribe nasm-announce} to
- W{mailto:majordomo@linux.kernel.org}c{majordomo@linux.kernel.org}.
- If you want information about NASM beta releases, please subscribe to
- the c{nasm-beta} email list by sending an email containing the line
- c{subscribe nasm-beta} to
- W{mailto:majordomo@linux.kernel.org}c{majordomo@linux.kernel.org}.
- H{install} Installation
- S{instdos} i{Installing} NASM under MS-i{DOS} or Windows
- Once you've obtained the i{DOS archive} for NASM, ic{nasmXXX.zip}
- (where c{XXX} denotes the version number of NASM contained in the
- archive), unpack it into its own directory (for example
- c{c:\nasm}).
- The archive will contain four executable files: the NASM executable
- files ic{nasm.exe} and ic{nasmw.exe}, and the NDISASM executable
- files ic{ndisasm.exe} and ic{ndisasmw.exe}. In each case, the
- file whose name ends in c{w} is a i{Win32} executable, designed to
- run under i{Windows 95} or i{Windows NT} Intel, and the other one
- is a 16-bit i{DOS} executable.
- The only file NASM needs to run is its own executable, so copy
- (at least) one of c{nasm.exe} and c{nasmw.exe} to a directory on
- your PATH, or alternatively edit ic{autoexec.bat} to add the
- c{nasm} directory to your ic{PATH}. (If you're only installing the
- Win32 version, you may wish to rename it to c{nasm.exe}.)
- That's it - NASM is installed. You don't need the c{nasm} directory
- to be present to run NASM (unless you've added it to your c{PATH}),
- so you can delete it if you need to save space; however, you may
- want to keep the documentation or test programs.
- If you've downloaded the i{DOS source archive}, ic{nasmXXXs.zip},
- the c{nasm} directory will also contain the full NASM i{source
- code}, and a selection of i{Makefiles} you can (hopefully) use to
- rebuild your copy of NASM from scratch. The file c{Readme} lists the
- various Makefiles and which compilers they work with.
- Note that the source files c{insnsa.c}, c{insnsd.c}, c{insnsi.h}
- and c{insnsn.c} are automatically generated from the master
- instruction table c{insns.dat} by a Perl script; the file
- c{macros.c} is generated from c{standard.mac} by another Perl
- script. Although the NASM 0.98 distribution includes these generated
- files, you will need to rebuild them (and hence, will need a Perl
- interpreter) if you change c{insns.dat}, c{standard.mac} or the
- documentation. It is possible future source distributions may not
- include these files at all. Ports of i{Perl} for a variety of
- platforms, including DOS and Windows, are available from
- W{http://www.cpan.org/ports/}i{www.cpan.org}.
- S{instdos} Installing NASM under i{Unix}
- Once you've obtained the i{Unix source archive} for NASM,
- ic{nasm-X.XX.tar.gz} (where c{X.XX} denotes the version number of
- NASM contained in the archive), unpack it into a directory such
- as c{/usr/local/src}. The archive, when unpacked, will create its
- own subdirectory c{nasm-X.XX}.
- NASM is an I{Autoconf}Ic{configure}auto-configuring package: once
- you've unpacked it, c{cd} to the directory it's been unpacked into
- and type c{./configure}. This shell script will find the best C
- compiler to use for building NASM and set up i{Makefiles}
- accordingly.
- Once NASM has auto-configured, you can type ic{make} to build the
- c{nasm} and c{ndisasm} binaries, and then c{make install} to
- install them in c{/usr/local/bin} and install the i{man pages}
- ic{nasm.1} and ic{ndisasm.1} in c{/usr/local/man/man1}.
- Alternatively, you can give options such as c{--prefix} to the
- c{configure} script (see the file ic{INSTALL} for more details), or
- install the programs yourself.
- NASM also comes with a set of utilities for handling the RDOFF
- custom object-file format, which are in the ic{rdoff} subdirectory
- of the NASM archive. You can build these with c{make rdf} and
- install them with c{make rdf_install}, if you want them.
- If NASM fails to auto-configure, you may still be able to make it
- compile by using the fall-back Unix makefile ic{Makefile.unx}.
- Copy or rename that file to c{Makefile} and try typing c{make}.
- There is also a c{Makefile.unx} file in the c{rdoff} subdirectory.
- C{running} Running NASM
- H{syntax} NASM i{Command-Line} Syntax
- To assemble a file, you issue a command of the form
- c nasm -f <format> <filename> [-o <output>]
- For example,
- c nasm -f elf myfile.asm
- will assemble c{myfile.asm} into an ELF object file c{myfile.o}. And
- c nasm -f bin myfile.asm -o myfile.com
- will assemble c{myfile.asm} into a raw binary file c{myfile.com}.
- To produce a listing file, with the hex codes output from NASM
- displayed on the left of the original sources, use the c{-l} option
- to give a listing file name, for example:
- c nasm -f coff myfile.asm -l myfile.lst
- To get further usage instructions from NASM, try typing
- c nasm -h
- This will also list the available output file formats, and what they
- are.
- If you use Linux but aren't sure whether your system is c{a.out} or
- ELF, type
- c file nasm
- (in the directory in which you put the NASM binary when you
- installed it). If it says something like
- c nasm: ELF 32-bit LSB executable i386 (386 and up) Version 1
- then your system is ELF, and you should use the option c{-f elf}
- when you want NASM to produce Linux object files. If it says
- c nasm: Linux/i386 demand-paged executable (QMAGIC)
- or something similar, your system is c{a.out}, and you should use
- c{-f aout} instead (Linux c{a.out} systems are considered obsolete,
- and are rare these days.)
- Like Unix compilers and assemblers, NASM is silent unless it
- goes wrong: you won't see any output at all, unless it gives error
- messages.
- S{opt-o} The ic{-o} Option: Specifying the Output File Name
- NASM will normally choose the name of your output file for you;
- precisely how it does this is dependent on the object file format.
- For Microsoft object file formats (ic{obj} and ic{win32}), it
- will remove the c{.asm} i{extension} (or whatever extension you
- like to use - NASM doesn't care) from your source file name and
- substitute c{.obj}. For Unix object file formats (ic{aout},
- ic{coff}, ic{elf} and ic{as86}) it will substitute c{.o}. For
- ic{rdf}, it will use c{.rdf}, and for the ic{bin} format it
- will simply remove the extension, so that c{myfile.asm} produces
- the output file c{myfile}.
- If the output file already exists, NASM will overwrite it, unless it
- has the same name as the input file, in which case it will give a
- warning and use ic{nasm.out} as the output file name instead.
- For situations in which this behaviour is unacceptable, NASM
- provides the c{-o} command-line option, which allows you to specify
- your desired output file name. You invoke c{-o} by following it
- with the name you wish for the output file, either with or without
- an intervening space. For example:
- c nasm -f bin program.asm -o program.com
- c nasm -f bin driver.asm -odriver.sys
- S{opt-f} The ic{-f} Option: Specifying the i{Output File Format}
- If you do not supply the c{-f} option to NASM, it will choose an
- output file format for you itself. In the distribution versions of
- NASM, the default is always ic{bin}; if you've compiled your own
- copy of NASM, you can redefine ic{OF_DEFAULT} at compile time and
- choose what you want the default to be.
- Like c{-o}, the intervening space between c{-f} and the output
- file format is optional; so c{-f elf} and c{-felf} are both valid.
- A complete list of the available output file formats can be given by
- issuing the command ic{nasm -h}.
- S{opt-l} The ic{-l} Option: Generating a i{Listing File}
- If you supply the c{-l} option to NASM, followed (with the usual
- optional space) by a file name, NASM will generate a
- i{source-listing file} for you, in which addresses and generated
- code are listed on the left, and the actual source code, with
- expansions of multi-line macros (except those which specifically
- request no expansion in source listings: see k{nolist}) on the
- right. For example:
- c nasm -f elf myfile.asm -l myfile.lst
- S{opt-E} The ic{-E} Option: Send Errors to a File
- Under MS-i{DOS} it can be difficult (though there are ways) to
- redirect the standard-error output of a program to a file. Since
- NASM usually produces its warning and i{error messages} on
- ic{stderr}, this can make it hard to capture the errors if (for
- example) you want to load them into an editor.
- NASM therefore provides the c{-E} option, taking a filename argument
- which causes errors to be sent to the specified files rather than
- standard error. Therefore you can I{redirecting errors}redirect
- the errors into a file by typing
- c nasm -E myfile.err -f obj myfile.asm
- S{opt-s} The ic{-s} Option: Send Errors to ic{stdout}
- The c{-s} option redirects i{error messages} to c{stdout} rather
- than c{stderr}, so it can be redirected under MS-i{DOS}. To
- assemble the file c{myfile.asm} and pipe its output to the c{more}
- program, you can type:
- c nasm -s -f obj myfile.asm | more
- See also the c{-E} option, k{opt-E}.
- S{opt-i} The ic{-i}Ic{-I} Option: Include File Search Directories
- When NASM sees the ic{%include} directive in a source file (see
- k{include}), it will search for the given file not only in the
- current directory, but also in any directories specified on the
- command line by the use of the c{-i} option. Therefore you can
- include files from a i{macro library}, for example, by typing
- c nasm -ic:\macrolib\ -f obj myfile.asm
- (As usual, a space between c{-i} and the path name is allowed, and
- optional).
- NASM, in the interests of complete source-code portability, does not
- understand the file naming conventions of the OS it is running on;
- the string you provide as an argument to the c{-i} option will be
- prepended exactly as written to the name of the include file.
- Therefore the trailing backslash in the above example is necessary.
- Under Unix, a trailing forward slash is similarly necessary.
- (You can use this to your advantage, if you're really i{perverse},
- by noting that the option c{-ifoo} will cause c{%include "bar.i"}
- to search for the file c{foobar.i}...)
- If you want to define a e{standard} i{include search path},
- similar to c{/usr/include} on Unix systems, you should place one or
- more c{-i} directives in the c{NASM} environment variable (see
- k{nasmenv}).
- For Makefile compatibility with many C compilers, this option can also
- be specified as c{-I}.
- S{opt-p} The ic{-p}Ic{-P} Option: I{pre-including files}Pre-Include a File
- Ic{%include}NASM allows you to specify files to be
- e{pre-included} into your source file, by the use of the c{-p}
- option. So running
- c nasm myfile.asm -p myinc.inc
- is equivalent to running c{nasm myfile.asm} and placing the
- directive c{%include "myinc.inc"} at the start of the file.
- For consistency with the c{-I}, c{-D} and c{-U} options, this
- option can also be specified as c{-P}.
- S{opt-d} The ic{-d}Ic{-D} Option: I{pre-defining macros} Pre-Define a Macro
- Ic{%define}Just as the c{-p} option gives an alternative to placing
- c{%include} directives at the start of a source file, the c{-d}
- option gives an alternative to placing a c{%define} directive. You
- could code
- c nasm myfile.asm -dFOO=100
- as an alternative to placing the directive
- c %define FOO 100
- at the start of the file. You can miss off the macro value, as well:
- the option c{-dFOO} is equivalent to coding c{%define FOO}. This
- form of the directive may be useful for selecting i{assembly-time
- options} which are then tested using c{%ifdef}, for example
- c{-dDEBUG}.
- For Makefile compatibility with many C compilers, this option can also
- be specified as c{-D}.
- S{opt-u} The ic{-u}Ic{-U} Option: I{Undefining macros} Undefine a Macro
- Ic{%undef}The c{-u} option undefines a macro that would otherwise
- have been pre-defined, either automatically or by a c{-p} or c{-d}
- option specified earlier on the command lines.
- For example, the following command line:
- c nasm myfile.asm -dFOO=100 -uFOO
- would result in c{FOO} e{not} being a predefined macro in the
- program. This is useful to override options specified at a different
- point in a Makefile.
- For Makefile compatibility with many C compilers, this option can also
- be specified as c{-U}.
- S{opt-e} The ic{-e} Option: Preprocess Only
- NASM allows the i{preprocessor} to be run on its own, up to a
- point. Using the c{-e} option (which requires no arguments) will
- cause NASM to preprocess its input file, expand all the macro
- references, remove all the comments and preprocessor directives, and
- print the resulting file on standard output (or save it to a file,
- if the c{-o} option is also used).
- This option cannot be applied to programs which require the
- preprocessor to evaluate I{preprocessor expressions}i{expressions}
- which depend on the values of symbols: so code such as
- c %assign tablesize ($-tablestart)
- will cause an error in i{preprocess-only mode}.
- S{opt-a} The ic{-a} Option: Don't Preprocess At All
- If NASM is being used as the back end to a compiler, it might be
- desirable to I{suppressing preprocessing}suppress preprocessing
- completely and assume the compiler has already done it, to save time
- and increase compilation speeds. The c{-a} option, requiring no
- argument, instructs NASM to replace its powerful i{preprocessor}
- with a i{stub preprocessor} which does nothing.
- S{opt-w} The ic{-w} Option: Enable or Disable Assembly i{Warnings}
- NASM can observe many conditions during the course of assembly which
- are worth mentioning to the user, but not a sufficiently severe
- error to justify NASM refusing to generate an output file. These
- conditions are reported like errors, but come up with the word
- `warning' before the message. Warnings do not prevent NASM from
- generating an output file and returning a success status to the
- operating system.
- Some conditions are even less severe than that: they are only
- sometimes worth mentioning to the user. Therefore NASM supports the
- c{-w} command-line option, which enables or disables certain
- classes of assembly warning. Such warning classes are described by a
- name, for example c{orphan-labels}; you can enable warnings of
- this class by the command-line option c{-w+orphan-labels} and
- disable it by c{-w-orphan-labels}.
- The i{suppressible warning} classes are:
- b ic{macro-params} covers warnings about i{multi-line macros}
- being invoked with the wrong number of parameters. This warning
- class is enabled by default; see k{mlmacover} for an example of why
- you might want to disable it.
- b ic{orphan-labels} covers warnings about source lines which
- contain no instruction but define a label without a trailing colon.
- NASM does not warn about this somewhat obscure condition by default;
- see k{syntax} for an example of why you might want it to.
- b ic{number-overflow} covers warnings about numeric constants which
- don't fit in 32 bits (for example, it's easy to type one too many Fs
- and produce c{0x7ffffffff} by mistake). This warning class is
- enabled by default.
- S{nasmenv} The c{NASM} i{Environment} Variable
- If you define an environment variable called c{NASM}, the program
- will interpret it as a list of extra command-line options, which are
- processed before the real command line. You can use this to define
- standard search directories for include files, by putting c{-i}
- options in the c{NASM} variable.
- The value of the variable is split up at white space, so that the
- value c{-s -ic:\nasmlib} will be treated as two separate options.
- However, that means that the value c{-dNAME="my name"} won't do
- what you might want, because it will be split at the space and the
- NASM command-line processing will get confused by the two
- nonsensical words c{-dNAME="my} and c{name"}.
- To get round this, NASM provides a feature whereby, if you begin the
- c{NASM} environment variable with some character that isn't a minus
- sign, then NASM will treat this character as the i{separator
- character} for options. So setting the c{NASM} variable to the
- value c{!-s!-ic:\nasmlib} is equivalent to setting it to c{-s
- -ic:\nasmlib}, but c{!-dNAME="my name"} will work.
- H{qstart} i{Quick Start} for i{MASM} Users
- If you're used to writing programs with MASM, or with i{TASM} in
- MASM-compatible (non-Ideal) mode, or with ic{a86}, this section
- attempts to outline the major differences between MASM's syntax and
- NASM's. If you're not already used to MASM, it's probably worth
- skipping this section.
- S{qscs} NASM Is I{case sensitivity}Case-Sensitive
- One simple difference is that NASM is case-sensitive. It makes a
- difference whether you call your label c{foo}, c{Foo} or c{FOO}.
- If you're assembling to DOS or OS/2 c{.OBJ} files, you can invoke
- the ic{UPPERCASE} directive (documented in k{objfmt}) to ensure
- that all symbols exported to other code modules are forced to be
- upper case; but even then, e{within} a single module, NASM will
- distinguish between labels differing only in case.
- S{qsbrackets} NASM Requires i{Square Brackets} For i{Memory References}
- NASM was designed with simplicity of syntax in mind. One of the
- i{design goals} of NASM is that it should be possible, as far as is
- practical, for the user to look at a single line of NASM code
- and tell what opcode is generated by it. You can't do this in MASM:
- if you declare, for example,
- c foo equ 1
- c bar dw 2
- then the two lines of code
- c mov ax,foo
- c mov ax,bar
- generate completely different opcodes, despite having
- identical-looking syntaxes.
- NASM avoids this undesirable situation by having a much simpler
- syntax for memory references. The rule is simply that any access to
- the e{contents} of a memory location requires square brackets
- around the address, and any access to the e{address} of a variable
- doesn't. So an instruction of the form c{mov ax,foo} will
- e{always} refer to a compile-time constant, whether it's an c{EQU}
- or the address of a variable; and to access the e{contents} of the
- variable c{bar}, you must code c{mov ax,[bar]}.
- This also means that NASM has no need for MASM's ic{OFFSET}
- keyword, since the MASM code c{mov ax,offset bar} means exactly the
- same thing as NASM's c{mov ax,bar}. If you're trying to get
- large amounts of MASM code to assemble sensibly under NASM, you
- can always code c{%idefine offset} to make the preprocessor treat
- the c{OFFSET} keyword as a no-op.
- This issue is even more confusing in ic{a86}, where declaring a
- label with a trailing colon defines it to be a `label' as opposed to
- a `variable' and causes c{a86} to adopt NASM-style semantics; so in
- c{a86}, c{mov ax,var} has different behaviour depending on whether
- c{var} was declared as c{var: dw 0} (a label) or c{var dw 0} (a
- word-size variable). NASM is very simple by comparison:
- e{everything} is a label.
- NASM, in the interests of simplicity, also does not support the
- i{hybrid syntaxes} supported by MASM and its clones, such as
- c{mov ax,table[bx]}, where a memory reference is denoted by one
- portion outside square brackets and another portion inside. The
- correct syntax for the above is c{mov ax,[table+bx]}. Likewise,
- c{mov ax,es:[di]} is wrong and c{mov ax,[es:di]} is right.
- S{qstypes} NASM Doesn't Store i{Variable Types}
- NASM, by design, chooses not to remember the types of variables you
- declare. Whereas MASM will remember, on seeing c{var dw 0}, that
- you declared c{var} as a word-size variable, and will then be able
- to fill in the i{ambiguity} in the size of the instruction c{mov
- var,2}, NASM will deliberately remember nothing about the symbol
- c{var} except where it begins, and so you must explicitly code
- c{mov word [var],2}.
- For this reason, NASM doesn't support the c{LODS}, c{MOVS},
- c{STOS}, c{SCAS}, c{CMPS}, c{INS}, or c{OUTS} instructions,
- but only supports the forms such as c{LODSB}, c{MOVSW}, and
- c{SCASD}, which explicitly specify the size of the components of
- the strings being manipulated.
- S{qsassume} NASM Doesn't ic{ASSUME}
- As part of NASM's drive for simplicity, it also does not support the
- c{ASSUME} directive. NASM will not keep track of what values you
- choose to put in your segment registers, and will never
- e{automatically} generate a i{segment override} prefix.
- S{qsmodel} NASM Doesn't Support i{Memory Models}
- NASM also does not have any directives to support different 16-bit
- memory models. The programmer has to keep track of which functions
- are supposed to be called with a i{far call} and which with a
- i{near call}, and is responsible for putting the correct form of
- c{RET} instruction (c{RETN} or c{RETF}; NASM accepts c{RET}
- itself as an alternate form for c{RETN}); in addition, the
- programmer is responsible for coding CALL FAR instructions where
- necessary when calling e{external} functions, and must also keep
- track of which external variable definitions are far and which are
- near.
- S{qsfpu} i{Floating-Point} Differences
- NASM uses different names to refer to floating-point registers from
- MASM: where MASM would call them c{ST(0)}, c{ST(1)} and so on, and
- ic{a86} would call them simply c{0}, c{1} and so on, NASM
- chooses to call them c{st0}, c{st1} etc.
- As of version 0.96, NASM now treats the instructions with
- i{`nowait'} forms in the same way as MASM-compatible assemblers.
- The idiosyncratic treatment employed by 0.95 and earlier was based
- on a misunderstanding by the authors.
- S{qsother} Other Differences
- For historical reasons, NASM uses the keyword ic{TWORD} where MASM
- and compatible assemblers use ic{TBYTE}.
- NASM does not declare i{uninitialised storage} in the same way as
- MASM: where a MASM programmer might use c{stack db 64 dup (?)},
- NASM requires c{stack resb 64}, intended to be read as `reserve 64
- bytes'. For a limited amount of compatibility, since NASM treats
- c{?} as a valid character in symbol names, you can code c{? equ 0}
- and then writing c{dw ?} will at least do something vaguely useful.
- Ic{RESB}ic{DUP} is still not a supported syntax, however.
- In addition to all of this, macros and directives work completely
- differently to MASM. See k{preproc} and k{directive} for further
- details.
- C{lang} The NASM Language
- H{syntax} Layout of a NASM Source Line
- Like most assemblers, each NASM source line contains (unless it
- is a macro, a preprocessor directive or an assembler directive: see
- k{preproc} and k{directive}) some combination of the four fields
- c label: instruction operands ; comment
- As usual, most of these fields are optional; the presence or absence
- of any combination of a label, an instruction and a comment is allowed.
- Of course, the operand field is either required or forbidden by the
- presence and nature of the instruction field.
- NASM places no restrictions on white space within a line: labels may
- have white space before them, or instructions may have no space
- before them, or anything. The i{colon} after a label is also
- optional. (Note that this means that if you intend to code c{lodsb}
- alone on a line, and type c{lodab} by accident, then that's still a
- valid source line which does nothing but define a label. Running
- NASM with the command-line option
- I{orphan-labels}c{-w+orphan-labels} will cause it to warn you if
- you define a label alone on a line without a i{trailing colon}.)
- i{Valid characters} in labels are letters, numbers, c{_}, c{$},
- c{#}, c{@}, c{~}, c{.}, and c{?}. The only characters which may
- be used as the e{first} character of an identifier are letters,
- c{.} (with special meaning: see k{locallab}), c{_} and c{?}.
- An identifier may also be prefixed with a I{$prefix}c{$} to
- indicate that it is intended to be read as an identifier and not a
- reserved word; thus, if some other module you are linking with
- defines a symbol called c{eax}, you can refer to c{$eax} in NASM
- code to distinguish the symbol from the register.
- The instruction field may contain any machine instruction: Pentium
- and P6 instructions, FPU instructions, MMX instructions and even
- undocumented instructions are all supported. The instruction may be
- prefixed by c{LOCK}, c{REP}, c{REPE}/c{REPZ} or
- c{REPNE}/c{REPNZ}, in the usual way. Explicit I{address-size
- prefixes}address-size and i{operand-size prefixes} c{A16},
- c{A32}, c{O16} and c{O32} are provided - one example of their use
- is given in k{mixsize}. You can also use the name of a I{segment
- override}segment register as an instruction prefix: coding
- c{es mov [bx],ax} is equivalent to coding c{mov [es:bx],ax}. We
- recommend the latter syntax, since it is consistent with other
- syntactic features of the language, but for instructions such as
- c{LODSB}, which has no operands and yet can require a segment
- override, there is no clean syntactic way to proceed apart from
- c{es lodsb}.
- An instruction is not required to use a prefix: prefixes such as
- c{CS}, c{A32}, c{LOCK} or c{REPE} can appear on a line by
- themselves, and NASM will just generate the prefix bytes.
- In addition to actual machine instructions, NASM also supports a
- number of pseudo-instructions, described in k{pseudop}.
- Instruction i{operands} may take a number of forms: they can be
- registers, described simply by the register name (e.g. c{ax},
- c{bp}, c{ebx}, c{cr0}: NASM does not use the c{gas}-style
- syntax in which register names must be prefixed by a c{%} sign), or
- they can be i{effective addresses} (see k{effaddr}), constants
- (k{const}) or expressions (k{expr}).
- For i{floating-point} instructions, NASM accepts a wide range of
- syntaxes: you can use two-operand forms like MASM supports, or you
- can use NASM's native single-operand forms in most cases. Details of
- all forms of each supported instruction are given in
- k{iref}. For example, you can code:
- c fadd st1 ; this sets st0 := st0 + st1
- c fadd st0,st1 ; so does this
- c
- c fadd st1,st0 ; this sets st1 := st1 + st0
- c fadd to st1 ; so does this
- Almost any floating-point instruction that references memory must
- use one of the prefixes ic{DWORD}, ic{QWORD} or ic{TWORD} to
- indicate what size of i{memory operand} it refers to.
- H{pseudop} i{Pseudo-Instructions}
- Pseudo-instructions are things which, though not real x86 machine
- instructions, are used in the instruction field anyway because
- that's the most convenient place to put them. The current
- pseudo-instructions are ic{DB}, ic{DW}, ic{DD}, ic{DQ} and
- ic{DT}, their i{uninitialised} counterparts ic{RESB},
- ic{RESW}, ic{RESD}, ic{RESQ} and ic{REST}, the ic{INCBIN}
- command, the ic{EQU} command, and the ic{TIMES} prefix.
- S{db} c{DB} and friends: Declaring Initialised Data
- ic{DB}, ic{DW}, ic{DD}, ic{DQ} and ic{DT} are used, much
- as in MASM, to declare initialised data in the output file. They can
- be invoked in a wide range of ways:
- I{floating-point}I{character constant}I{string constant}
- c db 0x55 ; just the byte 0x55
- c db 0x55,0x56,0x57 ; three bytes in succession
- c db 'a',0x55 ; character constants are OK
- c db 'hello',13,10,'$' ; so are string constants
- c dw 0x1234 ; 0x34 0x12
- c dw 'a' ; 0x41 0x00 (it's just a number)
- c dw 'ab' ; 0x41 0x42 (character constant)
- c dw 'abc' ; 0x41 0x42 0x43 0x00 (string)
- c dd 0x12345678 ; 0x78 0x56 0x34 0x12
- c dd 1.234567e20 ; floating-point constant
- c dq 1.234567e20 ; double-precision float
- c dt 1.234567e20 ; extended-precision float
- c{DQ} and c{DT} do not accept i{numeric constants} or string
- constants as operands.
- S{resb} c{RESB} and friends: Declaring i{Uninitialised} Data
- ic{RESB}, ic{RESW}, ic{RESD}, ic{RESQ} and ic{REST} are
- designed to be used in the BSS section of a module: they declare
- e{uninitialised} storage space. Each takes a single operand, which
- is the number of bytes, words, doublewords or whatever to reserve.
- As stated in k{qsother}, NASM does not support the MASM/TASM syntax
- of reserving uninitialised space by writing Ic{?}c{DW ?} or
- similar things: this is what it does instead. The operand to a
- c{RESB}-type pseudo-instruction is a ie{critical expression}: see
- k{crit}.
- For example:
- c buffer: resb 64 ; reserve 64 bytes
- c wordvar: resw 1 ; reserve a word
- c realarray resq 10 ; array of ten reals
- S{incbin} ic{INCBIN}: Including External i{Binary Files}
- c{INCBIN} is borrowed from the old Amiga assembler i{DevPac}: it
- includes a binary file verbatim into the output file. This can be
- handy for (for example) including i{graphics} and i{sound} data
- directly into a game executable file. It can be called in one of
- these three ways:
- c incbin "file.dat" ; include the whole file
- c incbin "file.dat",1024 ; skip the first 1024 bytes
- c incbin "file.dat",1024,512 ; skip the first 1024, and
- c ; actually include at most 512
- S{equ} ic{EQU}: Defining Constants
- c{EQU} defines a symbol to a given constant value: when c{EQU} is
- used, the source line must contain a label. The action of c{EQU} is
- to define the given label name to the value of its (only) operand.
- This definition is absolute, and cannot change later. So, for
- example,
- c message db 'hello, world'
- c msglen equ $-message
- defines c{msglen} to be the constant 12. c{msglen} may not then be
- redefined later. This is not a i{preprocessor} definition either:
- the value of c{msglen} is evaluated e{once}, using the value of
- c{$} (see k{expr} for an explanation of c{$}) at the point of
- definition, rather than being evaluated wherever it is referenced
- and using the value of c{$} at the point of reference. Note that
- the operand to an c{EQU} is also a i{critical expression}
- (k{crit}).
- S{times} ic{TIMES}: i{Repeating} Instructions or Data
- The c{TIMES} prefix causes the instruction to be assembled multiple
- times. This is partly present as NASM's equivalent of the ic{DUP}
- syntax supported by i{MASM}-compatible assemblers, in that you can
- code
- c zerobuf: times 64 db 0
- or similar things; but c{TIMES} is more versatile than that. The
- argument to c{TIMES} is not just a numeric constant, but a numeric
- e{expression}, so you can do things like
- c buffer: db 'hello, world'
- c times 64-$+buffer db ' '
- which will store exactly enough spaces to make the total length of
- c{buffer} up to 64. Finally, c{TIMES} can be applied to ordinary
- instructions, so you can code trivial i{unrolled loops} in it:
- c times 100 movsb
- Note that there is no effective difference between c{times 100 resb
- 1} and c{resb 100}, except that the latter will be assembled about
- 100 times faster due to the internal structure of the assembler.
- The operand to c{TIMES}, like that of c{EQU} and those of c{RESB}
- and friends, is a critical expression (k{crit}).
- Note also that c{TIMES} can't be applied to i{macros}: the reason
- for this is that c{TIMES} is processed after the macro phase, which
- allows the argument to c{TIMES} to contain expressions such as
- c{64-$+buffer} as above. To repeat more than one line of code, or a
- complex macro, use the preprocessor ic{%rep} directive.
- H{effaddr} Effective Addresses
- An i{effective address} is any operand to an instruction which
- I{memory reference}references memory. Effective addresses, in NASM,
- have a very simple syntax: they consist of an expression evaluating
- to the desired address, enclosed in i{square brackets}. For
- example:
- c wordvar dw 123
- c mov ax,[wordvar]
- c mov ax,[wordvar+1]
- c mov ax,[es:wordvar+bx]
- Anything not conforming to this simple system is not a valid memory
- reference in NASM, for example c{es:wordvar[bx]}.
- More complicated effective addresses, such as those involving more
- than one register, work in exactly the same way:
- c mov eax,[ebx*2+ecx+offset]
- c mov ax,[bp+di+8]
- NASM is capable of doing i{algebra} on these effective addresses,
- so that things which don't necessarily e{look} legal are perfectly
- all right:
- c mov eax,[ebx*5] ; assembles as [ebx*4+ebx]
- c mov eax,[label1*2-label2] ; ie [label1+(label1-label2)]
- Some forms of effective address have more than one assembled form;
- in most such cases NASM will generate the smallest form it can. For
- example, there are distinct assembled forms for the 32-bit effective
- addresses c{[eax*2+0]} and c{[eax+eax]}, and NASM will generally
- generate the latter on the grounds that the former requires four
- bytes to store a zero offset.
- NASM has a hinting mechanism which will cause c{[eax+ebx]} and
- c{[ebx+eax]} to generate different opcodes; this is occasionally
- useful because c{[esi+ebp]} and c{[ebp+esi]} have different
- default segment registers.
- However, you can force NASM to generate an effective address in a
- particular form by the use of the keywords c{BYTE}, c{WORD},
- c{DWORD} and c{NOSPLIT}. If you need c{[eax+3]} to be assembled
- using a double-word offset field instead of the one byte NASM will
- normally generate, you can code c{[dword eax+3]}. Similarly, you
- can force NASM to use a byte offset for a small value which it
- hasn't seen on the first pass (see k{crit} for an example of such a
- code fragment) by using c{[byte eax+offset]}. As special cases,
- c{[byte eax]} will code c{[eax+0]} with a byte offset of zero, and
- c{[dword eax]} will code it with a double-word offset of zero. The
- normal form, c{[eax]}, will be coded with no offset field.
- Similarly, NASM will split c{[eax*2]} into c{[eax+eax]} because
- that allows the offset field to be absent and space to be saved; in
- fact, it will also split c{[eax*2+offset]} into
- c{[eax+eax+offset]}. You can combat this behaviour by the use of
- the c{NOSPLIT} keyword: c{[nosplit eax*2]} will force
- c{[eax*2+0]} to be generated literally.
- H{const} i{Constants}
- NASM understands four different types of constant: numeric,
- character, string and floating-point.
- S{numconst} i{Numeric Constants}
- A numeric constant is simply a number. NASM allows you to specify
- numbers in a variety of number bases, in a variety of ways: you can
- suffix c{H}, c{Q} and c{B} for i{hex}, i{octal} and i{binary},
- or you can prefix c{0x} for hex in the style of C, or you can
- prefix c{$} for hex in the style of Borland Pascal. Note, though,
- that the I{$prefix}c{$} prefix does double duty as a prefix on
- identifiers (see k{syntax}), so a hex number prefixed with a c{$}
- sign must have a digit after the c{$} rather than a letter.
- Some examples:
- c mov ax,100 ; decimal
- c mov ax,0a2h ; hex
- c mov ax,$0a2 ; hex again: the 0 is required
- c mov ax,0xa2 ; hex yet again
- c mov ax,777q ; octal
- c mov ax,10010011b ; binary
- S{chrconst} i{Character Constants}
- A character constant consists of up to four characters enclosed in
- either single or double quotes. The type of quote makes no
- difference to NASM, except of course that surrounding the constant
- with single quotes allows double quotes to appear within it and vice
- versa.
- A character constant with more than one character will be arranged
- with i{little-endian} order in mind: if you code
- c mov eax,'abcd'
- then the constant generated is not c{0x61626364}, but
- c{0x64636261}, so that if you were then to store the value into
- memory, it would read c{abcd} rather than c{dcba}. This is also
- the sense of character constants understood by the Pentium's
- ic{CPUID} instruction (see k{insCPUID}).
- S{strconst} String Constants
- String constants are only acceptable to some pseudo-instructions,
- namely the Ic{DW}Ic{DD}Ic{DQ}Ic{DT}ic{DB} family and
- ic{INCBIN}.
- A string constant looks like a character constant, only longer. It
- is treated as a concatenation of maximum-size character constants
- for the conditions. So the following are equivalent:
- c db 'hello' ; string constant
- c db 'h','e','l','l','o' ; equivalent character constants
- And the following are also equivalent:
- c dd 'ninechars' ; doubleword string constant
- c dd 'nine','char','s' ; becomes three doublewords
- c db 'ninechars',0,0,0 ; and really looks like this
- Note that when used as an operand to c{db}, a constant like
- c{'ab'} is treated as a string constant despite being short enough
- to be a character constant, because otherwise c{db 'ab'} would have
- the same effect as c{db 'a'}, which would be silly. Similarly,
- three-character or four-character constants are treated as strings
- when they are operands to c{dw}.
- S{fltconst} I{floating-point, constants}Floating-Point Constants
- i{Floating-point} constants are acceptable only as arguments to
- ic{DD}, ic{DQ} and ic{DT}. They are expressed in the
- traditional form: digits, then a period, then optionally more
- digits, then optionally an c{E} followed by an exponent. The period
- is mandatory, so that NASM can distinguish between c{dd 1}, which
- declares an integer constant, and c{dd 1.0} which declares a
- floating-point constant.
- Some examples:
- c dd 1.2 ; an easy one
- c dq 1.e10 ; 10,000,000,000
- c dq 1.e+10 ; synonymous with 1.e10
- c dq 1.e-10 ; 0.000 000 000 1
- c dt 3.141592653589793238462 ; pi
- NASM cannot do compile-time arithmetic on floating-point constants.
- This is because NASM is designed to be portable - although it always
- generates code to run on x86 processors, the assembler itself can
- run on any system with an ANSI C compiler. Therefore, the assembler
- cannot guarantee the presence of a floating-point unit capable of
- handling the i{Intel number formats}, and so for NASM to be able to
- do floating arithmetic it would have to include its own complete set
- of floating-point routines, which would significantly increase the
- size of the assembler for very little benefit.
- H{expr} i{Expressions}
- Expressions in NASM are similar in syntax to those in C.
- NASM does not guarantee the size of the integers used to evaluate
- expressions at compile time: since NASM can compile and run on
- 64-bit systems quite happily, don't assume that expressions are
- evaluated in 32-bit registers and so try to make deliberate use of
- i{integer overflow}. It might not always work. The only thing NASM
- will guarantee is what's guaranteed by ANSI C: you always have e{at
- least} 32 bits to work in.
- NASM supports two special tokens in expressions, allowing
- calculations to involve the current assembly position: the
- I{$ here}c{$} and ic{$$} tokens. c{$} evaluates to the assembly
- position at the beginning of the line containing the expression; so
- you can code an i{infinite loop} using c{JMP $}. c{$$} evaluates
- to the beginning of the current section; so you can tell how far
- into the section you are by using c{($-$$)}.
- The arithmetic i{operators} provided by NASM are listed here, in
- increasing order of i{precedence}.
- S{expor} ic{|}: i{Bitwise OR} Operator
- The c{|} operator gives a bitwise OR, exactly as performed by the
- c{OR} machine instruction. Bitwise OR is the lowest-priority
- arithmetic operator supported by NASM.
- S{expxor} ic{^}: i{Bitwise XOR} Operator
- c{^} provides the bitwise XOR operation.
- S{expand} ic{&}: i{Bitwise AND} Operator
- c{&} provides the bitwise AND operation.
- S{expshift} ic{<<} and ic{>>}: i{Bit Shift} Operators
- c{<<} gives a bit-shift to the left, just as it does in C. So c{5<<3}
- evaluates to 5 times 8, or 40. c{>>} gives a bit-shift to the
- right; in NASM, such a shift is e{always} unsigned, so that
- the bits shifted in from the left-hand end are filled with zero
- rather than a sign-extension of the previous highest bit.
- S{expplmi} I{+ opaddition}c{+} and I{- opsubtraction}c{-}:
- i{Addition} and i{Subtraction} Operators
- The c{+} and c{-} operators do perfectly ordinary addition and
- subtraction.
- S{expmul} ic{*}, ic{/}, ic{//}, ic{%} and ic{%%}:
- i{Multiplication} and i{Division}
- c{*} is the multiplication operator. c{/} and c{//} are both
- division operators: c{/} is i{unsigned division} and c{//} is
- i{signed division}. Similarly, c{%} and c{%%} provide I{unsigned
- modulo}I{modulo operators}unsigned and
- i{signed modulo} operators respectively.
- NASM, like ANSI C, provides no guarantees about the sensible
- operation of the signed modulo operator.
- Since the c{%} character is used extensively by the macro
- i{preprocessor}, you should ensure that both the signed and unsigned
- modulo operators are followed by white space wherever they appear.
- S{expmul} i{Unary Operators}: I{+ opunary}c{+}, I{- opunary}c{-},
- ic{~} and ic{SEG}
- The highest-priority operators in NASM's expression grammar are
- those which only apply to one argument. c{-} negates its operand,
- c{+} does nothing (it's provided for symmetry with c{-}), c{~}
- computes the i{one's complement} of its operand, and c{SEG}
- provides the i{segment address} of its operand (explained in more
- detail in k{segwrt}).
- H{segwrt} ic{SEG} and ic{WRT}
- When writing large 16-bit programs, which must be split into
- multiple i{segments}, it is often necessary to be able to refer to
- the I{segment address}segment part of the address of a symbol. NASM
- supports the c{SEG} operator to perform this function.
- The c{SEG} operator returns the ie{preferred} segment base of a
- symbol, defined as the segment base relative to which the offset of
- the symbol makes sense. So the code
- c mov ax,seg symbol
- c mov es,ax
- c mov bx,symbol
- will load c{ES:BX} with a valid pointer to the symbol c{symbol}.
- Things can be more complex than this: since 16-bit segments and
- i{groups} may I{overlapping segments}overlap, you might occasionally
- want to refer to some symbol using a different segment base from the
- preferred one. NASM lets you do this, by the use of the c{WRT}
- (With Reference To) keyword. So you can do things like
- c mov ax,weird_seg ; weird_seg is a segment base
- c mov es,ax
- c mov bx,symbol wrt weird_seg
- to load c{ES:BX} with a different, but functionally equivalent,
- pointer to the symbol c{symbol}.
- NASM supports far (inter-segment) calls and jumps by means of the
- syntax c{call segment:offset}, where c{segment} and c{offset}
- both represent immediate values. So to call a far procedure, you
- could code either of
- c call (seg procedure):procedure
- c call weird_seg:(procedure wrt weird_seg)
- (The parentheses are included for clarity, to show the intended
- parsing of the above instructions. They are not necessary in
- practice.)
- NASM supports the syntax Ic{CALL FAR}c{call far procedure} as a
- synonym for the first of the above usages. c{JMP} works identically
- to c{CALL} in these examples.
- To declare a i{far pointer} to a data item in a data segment, you
- must code
- c dw symbol, seg symbol
- NASM supports no convenient synonym for this, though you can always
- invent one using the macro processor.
- H{crit} i{Critical Expressions}
- A limitation of NASM is that it is a i{two-pass assembler}; unlike
- TASM and others, it will always do exactly two I{passes}i{assembly
- passes}. Therefore it is unable to cope with source files that are
- complex enough to require three or more passes.
- The first pass is used to determine the size of all the assembled
- code and data, so that the second pass, when generating all the
- code, knows all the symbol addresses the code refers to. So one
- thing NASM can't handle is code whose size depends on the value of a
- symbol declared after the code in question. For example,
- c times (label-$) db 0
- c label: db 'Where am I?'
- The argument to ic{TIMES} in this case could equally legally
- evaluate to anything at all; NASM will reject this example because
- it cannot tell the size of the c{TIMES} line when it first sees it.
- It will just as firmly reject the slightly I{paradox}paradoxical
- code
- c times (label-$+1) db 0
- c label: db 'NOW where am I?'
- in which e{any} value for the c{TIMES} argument is by definition
- wrong!
- NASM rejects these examples by means of a concept called a
- e{critical expression}, which is defined to be an expression whose
- value is required to be computable in the first pass, and which must
- therefore depend only on symbols defined before it. The argument to
- the c{TIMES} prefix is a critical expression; for the same reason,
- the arguments to the ic{RESB} family of pseudo-instructions are
- also critical expressions.
- Critical expressions can crop up in other contexts as well: consider
- the following code.
- c mov ax,symbol1
- c symbol1 equ symbol2
- c symbol2:
- On the first pass, NASM cannot determine the value of c{symbol1},
- because c{symbol1} is defined to be equal to c{symbol2} which NASM
- hasn't seen yet. On the second pass, therefore, when it encounters
- the line c{mov ax,symbol1}, it is unable to generate the code for
- it because it still doesn't know the value of c{symbol1}. On the
- next line, it would see the ic{EQU} again and be able to determine
- the value of c{symbol1}, but by then it would be too late.
- NASM avoids this problem by defining the right-hand side of an
- c{EQU} statement to be a critical expression, so the definition of
- c{symbol1} would be rejected in the first pass.
- There is a related issue involving i{forward references}: consider
- this code fragment.
- c mov eax,[ebx+offset]
- c offset equ 10
- NASM, on pass one, must calculate the size of the instruction c{mov
- eax,[ebx+offset]} without knowing the value of c{offset}. It has no
- way of knowing that c{offset} is small enough to fit into a
- one-byte offset field and that it could therefore get away with
- generating a shorter form of the i{effective-address} encoding; for
- all it knows, in pass one, c{offset} could be a symbol in the code
- segment, and it might need the full four-byte form. So it is forced
- to compute the size of the instruction to accommodate a four-byte
- address part. In pass two, having made this decision, it is now
- forced to honour it and keep the instruction large, so the code
- generated in this case is not as small as it could have been. This
- problem can be solved by defining c{offset} before using it, or by
- forcing byte size in the effective address by coding c{[byte
- ebx+offset]}.
- H{locallab} i{Local Labels}
- NASM gives special treatment to symbols beginning with a i{period}.
- A label beginning with a single period is treated as a e{local}
- label, which means that it is associated with the previous non-local
- label. So, for example:
- c label1 ; some code
- c .loop ; some more code
- c jne .loop
- c ret
- c label2 ; some code
- c .loop ; some more code
- c jne .loop
- c ret
- In the above code fragment, each c{JNE} instruction jumps to the
- line immediately before it, because the two definitions of c{.loop}
- are kept separate by virtue of each being associated with the
- previous non-local label.
- This form of local label handling is borrowed from the old Amiga
- assembler i{DevPac}; however, NASM goes one step further, in
- allowing access to local labels from other parts of the code. This
- is achieved by means of e{defining} a local label in terms of the
- previous non-local label: the first definition of c{.loop} above is
- really defining a symbol called c{label1.loop}, and the second
- defines a symbol called c{label2.loop}. So, if you really needed
- to, you could write
- c label3 ; some more code
- c ; and some more
- c jmp label1.loop
- Sometimes it is useful - in a macro, for instance - to be able to
- define a label which can be referenced from anywhere but which
- doesn't interfere with the normal local-label mechanism. Such a
- label can't be non-local because it would interfere with subsequent
- definitions of, and references to, local labels; and it can't be
- local because the macro that defined it wouldn't know the label's
- full name. NASM therefore introduces a third type of label, which is
- probably only useful in macro definitions: if a label begins with
- the I{label prefix}special prefix ic{..@}, then it does nothing
- to the local label mechanism. So you could code
- c label1: ; a non-local label
- c .local: ; this is really label1.local
- c ..@foo: ; this is a special symbol
- c label2: ; another non-local label
- c .local: ; this is really label2.local
- c jmp ..@foo ; this will jump three lines up
- NASM has the capacity to define other special symbols beginning with
- a double period: for example, c{..start} is used to specify the
- entry point in the c{obj} output format (see k{dotdotstart}).
- C{preproc} The NASM i{Preprocessor}
- NASM contains a powerful i{macro processor}, which supports
- conditional assembly, multi-level file inclusion, two forms of macro
- (single-line and multi-line), and a `context stack' mechanism for
- extra macro power. Preprocessor directives all begin with a c{%}
- sign.
- H{slmacro} i{Single-Line Macros}
- S{define} The Normal Way: Ic{%idefine}ic{%define}
- Single-line macros are defined using the c{%define} preprocessor
- directive. The definitions work in a similar way to C; so you can do
- things like
- c %define ctrl 0x1F &
- c %define param(a,b) ((a)+(a)*(b))
- c mov byte [param(2,ebx)], ctrl 'D'
- which will expand to
- c mov byte [(2)+(2)*(ebx)], 0x1F & 'D'
- When the expansion of a single-line macro contains tokens which
- invoke another macro, the expansion is performed at invocation time,
- not at definition time. Thus the code
- c %define a(x) 1+b(x)
- c %define b(x) 2*x
- c mov ax,a(8)
- will evaluate in the expected way to c{mov ax,1+2*8}, even though
- the macro c{b} wasn't defined at the time of definition of c{a}.
- Macros defined with c{%define} are i{case sensitive}: after
- c{%define foo bar}, only c{foo} will expand to c{bar}: c{Foo} or
- c{FOO} will not. By using c{%idefine} instead of c{%define} (the
- `i' stands for `insensitive') you can define all the case variants
- of a macro at once, so that c{%idefine foo bar} would cause
- c{foo}, c{Foo}, c{FOO}, c{fOO} and so on all to expand to
- c{bar}.
- There is a mechanism which detects when a macro call has occurred as
- a result of a previous expansion of the same macro, to guard against
- i{circular references} and infinite loops. If this happens, the
- preprocessor will only expand the first occurrence of the macro.
- Hence, if you code
- c %define a(x) 1+a(x)
- c mov ax,a(3)
- the macro c{a(3)} will expand once, becoming c{1+a(3)}, and will
- then expand no further. This behaviour can be useful: see k{32c}
- for an example of its use.
- You can I{overloading, single-line macros}overload single-line
- macros: if you write
- c %define foo(x) 1+x
- c %define foo(x,y) 1+x*y
- the preprocessor will be able to handle both types of macro call,
- by counting the parameters you pass; so c{foo(3)} will become
- c{1+3} whereas c{foo(ebx,2)} will become c{1+ebx*2}. However, if
- you define
- c %define foo bar
- then no other definition of c{foo} will be accepted: a macro with
- no parameters prohibits the definition of the same name as a macro
- e{with} parameters, and vice versa.
- This doesn't prevent single-line macros being e{redefined}: you can
- perfectly well define a macro with
- c %define foo bar
- and then re-define it later in the same source file with
- c %define foo baz
- Then everywhere the macro c{foo} is invoked, it will be expanded
- according to the most recent definition. This is particularly useful
- when defining single-line macros with c{%assign} (see k{assign}).
- You can i{pre-define} single-line macros using the `-d' option on
- the NASM command line: see k{opt-d}.
- S{undef} Undefining macros: ic{%undef}
- Single-line macros can be removed with the c{%undef} command. For
- example, the following sequence:
- c %define foo bar
- c %undef foo
- c mov eax, foo
- will expand to the instruction c{mov eax, foo}, since after
- c{%undef} the macro c{foo} is no longer defined.
- Macros that would otherwise be pre-defined can be undefined on the
- command-line using the `-u' option on the NASM command line: see
- k{opt-u}.
- S{assign} i{Preprocessor Variables}: ic{%assign}
- An alternative way to define single-line macros is by means of the
- c{%assign} command (and its i{case sensitive}case-insensitive
- counterpart ic{%iassign}, which differs from c{%assign} in
- exactly the same way that c{%idefine} differs from c{%define}).
- c{%assign} is used to define single-line macros which take no
- parameters and have a numeric value. This value can be specified in
- the form of an expression, and it will be evaluated once, when the
- c{%assign} directive is processed.
- Like c{%define}, macros defined using c{%assign} can be re-defined
- later, so you can do things like
- c %assign i i+1
- to increment the numeric value of a macro.
- c{%assign} is useful for controlling the termination of c{%rep}
- preprocessor loops: see k{rep} for an example of this. Another
- use for c{%assign} is given in k{16c} and k{32c}.
- The expression passed to c{%assign} is a i{critical expression}
- (see k{crit}), and must also evaluate to a pure number (rather than
- a relocatable reference such as a code or data address, or anything
- involving a register).
- H{mlmacro} i{Multi-Line Macros}: Ic{%imacro}ic{%macro}
- Multi-line macros are much more like the type of macro seen in MASM
- and TASM: a multi-line macro definition in NASM looks something like
- this.
- c %macro prologue 1
- c push ebp
- c mov ebp,esp
- c sub esp,%1
- c %endmacro
- This defines a C-like function prologue as a macro: so you would
- invoke the macro with a call such as
- c myfunc: prologue 12
- which would expand to the three lines of code
- c myfunc: push ebp
- c mov ebp,esp
- c sub esp,12
- The number c{1} after the macro name in the c{%macro} line defines
- the number of parameters the macro c{prologue} expects to receive.
- The use of c{%1} inside the macro definition refers to the first
- parameter to the macro call. With a macro taking more than one
- parameter, subsequent parameters would be referred to as c{%2},
- c{%3} and so on.
- Multi-line macros, like single-line macros, are i{case-sensitive},
- unless you define them using the alternative directive c{%imacro}.
- If you need to pass a comma as e{part} of a parameter to a
- multi-line macro, you can do that by enclosing the entire parameter
- in I{braces, around macro parameters}braces. So you could code
- things like
- c %macro silly 2
- c %2: db %1
- c %endmacro
- c silly 'a', letter_a ; letter_a: db 'a'
- c silly 'ab', string_ab ; string_ab: db 'ab'
- c silly {13,10}, crlf ; crlf: db 13,10
- S{mlmacover} i{Overloading Multi-Line Macros}
- As with single-line macros, multi-line macros can be overloaded by
- defining the same macro name several times with different numbers of
- parameters. This time, no exception is made for macros with no
- parameters at all. So you could define
- c %macro prologue 0
- c push ebp
- c mov ebp,esp
- c %endmacro
- to define an alternative form of the function prologue which
- allocates no local stack space.
- Sometimes, however, you might want to `overload' a machine
- instruction; for example, you might want to define
- c %macro push 2
- c push %1
- c push %2
- c %endmacro
- so that you could code
- c push ebx ; this line is not a macro call
- c push eax,ecx ; but this one is
- Ordinarily, NASM will give a warning for the first of the above two
- lines, since c{push} is now defined to be a macro, and is being
- invoked with a number of parameters for which no definition has been
- given. The correct code will still be generated, but the assembler
- will give a warning. This warning can be disabled by the use of the
- c{-w-macro-params} command-line option (see k{opt-w}).
- S{maclocal} i{Macro-Local Labels}
- NASM allows you to define labels within a multi-line macro
- definition in such a way as to make them local to the macro call: so
- calling the same macro multiple times will use a different label
- each time. You do this by prefixing ic{%%} to the label name. So
- you can invent an instruction which executes a c{RET} if the c{Z}
- flag is set by doing this:
- c %macro retz 0
- c jnz %%skip
- c ret
- c %%skip:
- c %endmacro
- You can call this macro as many times as you want, and every time
- you call it NASM will make up a different `real' name to substitute
- for the label c{%%skip}. The names NASM invents are of the form
- c{..@2345.skip}, where the number 2345 changes with every macro
- call. The ic{..@} prefix prevents macro-local labels from
- interfering with the local label mechanism, as described in
- k{locallab}. You should avoid defining your own labels in this form
- (the c{..@} prefix, then a number, then another period) in case
- they interfere with macro-local labels.
- S{mlmacgre} i{Greedy Macro Parameters}
- Occasionally it is useful to define a macro which lumps its entire
- command line into one parameter definition, possibly after
- extracting one or two smaller parameters from the front. An example
- might be a macro to write a text string to a file in MS-DOS, where
- you might want to be able to write
- c writefile [filehandle],"hello, world",13,10
- NASM allows you to define the last parameter of a macro to be
- e{greedy}, meaning that if you invoke the macro with more
- parameters than it expects, all the spare parameters get lumped into
- the last defined one along with the separating commas. So if you
- code:
- c %macro writefile 2+
- c jmp %%endstr
- c %%str: db %2
- c %%endstr: mov dx,%%str
- c mov cx,%%endstr-%%str
- c mov bx,%1
- c mov ah,0x40
- c int 0x21
- c %endmacro
- then the example call to c{writefile} above will work as expected:
- the text before the first comma, c{[filehandle]}, is used as the
- first macro parameter and expanded when c{%1} is referred to, and
- all the subsequent text is lumped into c{%2} and placed after the
- c{db}.
- The greedy nature of the macro is indicated to NASM by the use of
- the I{+ modifier}c{+} sign after the parameter count on the
- c{%macro} line.
- If you define a greedy macro, you are effectively telling NASM how
- it should expand the macro given e{any} number of parameters from
- the actual number specified up to infinity; in this case, for
- example, NASM now knows what to do when it sees a call to
- c{writefile} with 2, 3, 4 or more parameters. NASM will take this
- into account when overloading macros, and will not allow you to
- define another form of c{writefile} taking 4 parameters (for
- example).
- Of course, the above macro could have been implemented as a
- non-greedy macro, in which case the call to it would have had to
- look like
- c writefile [filehandle], {"hello, world",13,10}
- NASM provides both mechanisms for putting i{commas in macro
- parameters}, and you choose which one you prefer for each macro
- definition.
- See k{sectmac} for a better way to write the above macro.
- S{mlmacdef} i{Default Macro Parameters}
- NASM also allows you to define a multi-line macro with a e{range}
- of allowable parameter counts. If you do this, you can specify
- defaults for i{omitted parameters}. So, for example:
- c %macro die 0-1 "Painful program death has occurred."
- c writefile 2,%1
- c mov ax,0x4c01
- c int 0x21
- c %endmacro
- This macro (which makes use of the c{writefile} macro defined in
- k{mlmacgre}) can be called with an explicit error message, which it
- will display on the error output stream before exiting, or it can be
- called with no parameters, in which case it will use the default
- error message supplied in the macro definition.
- In general, you supply a minimum and maximum number of parameters
- for a macro of this type; the minimum number of parameters are then
- required in the macro call, and then you provide defaults for the
- optional ones. So if a macro definition began with the line
- c %macro foobar 1-3 eax,[ebx+2]
- then it could be called with between one and three parameters, and
- c{%1} would always be taken from the macro call. c{%2}, if not
- specified by the macro call, would default to c{eax}, and c{%3} if
- not specified would default to c{[ebx+2]}.
- You may omit parameter defaults from the macro definition, in which
- case the parameter default is taken to be blank. This can be useful
- for macros which can take a variable number of parameters, since the
- ic{%0} token (see k{percent0}) allows you to determine how many
- parameters were really passed to the macro call.
- This defaulting mechanism can be combined with the greedy-parameter
- mechanism; so the c{die} macro above could be made more powerful,
- and more useful, by changing the first line of the definition to
- c %macro die 0-1+ "Painful program death has occurred.",13,10
- The maximum parameter count can be infinite, denoted by c{*}. In
- this case, of course, it is impossible to provide a e{full} set of
- default parameters. Examples of this usage are shown in k{rotate}.
- S{percent0} ic{%0}: I{counting macro parameters}Macro Parameter Counter
- For a macro which can take a variable number of parameters, the
- parameter reference c{%0} will return a numeric constant giving the
- number of parameters passed to the macro. This can be used as an
- argument to c{%rep} (see k{rep}) in order to iterate through all
- the parameters of a macro. Examples are given in k{rotate}.
- S{rotate} ic{%rotate}: i{Rotating Macro Parameters}
- Unix shell programmers will be familiar with the I{shift
- command}c{shift} shell command, which allows the arguments passed
- to a shell script (referenced as c{$1}, c{$2} and so on) to be
- moved left by one place, so that the argument previously referenced
- as c{$2} becomes available as c{$1}, and the argument previously
- referenced as c{$1} is no longer available at all.
- NASM provides a similar mechanism, in the form of c{%rotate}. As
- its name suggests, it differs from the Unix c{shift} in that no
- parameters are lost: parameters rotated off the left end of the
- argument list reappear on the right, and vice versa.
- c{%rotate} is invoked with a single numeric argument (which may be
- an expression). The macro parameters are rotated to the left by that
- many places. If the argument to c{%rotate} is negative, the macro
- parameters are rotated to the right.
- I{iterating over macro parameters}So a pair of macros to save and
- restore a set of registers might work as follows:
- c %macro multipush 1-*
- c %rep %0
- c push %1
- c %rotate 1
- c %endrep
- c %endmacro
- This macro invokes the c{PUSH} instruction on each of its arguments
- in turn, from left to right. It begins by pushing its first
- argument, c{%1}, then invokes c{%rotate} to move all the arguments
- one place to the left, so that the original second argument is now
- available as c{%1}. Repeating this procedure as many times as there
- were arguments (achieved by supplying c{%0} as the argument to
- c{%rep}) causes each argument in turn to be pushed.
- Note also the use of c{*} as the maximum parameter count,
- indicating that there is no upper limit on the number of parameters
- you may supply to the ic{multipush} macro.
- It would be convenient, when using this macro, to have a c{POP}
- equivalent, which e{didn't} require the arguments to be given in
- reverse order. Ideally, you would write the c{multipush} macro
- call, then cut-and-paste the line to where the pop needed to be
- done, and change the name of the called macro to c{multipop}, and
- the macro would take care of popping the registers in the opposite
- order from the one in which they were pushed.
- This can be done by the following definition:
- c %macro multipop 1-*
- c %rep %0
- c %rotate -1
- c pop %1
- c %endrep
- c %endmacro
- This macro begins by rotating its arguments one place to the
- e{right}, so that the original e{last} argument appears as c{%1}.
- This is then popped, and the arguments are rotated right again, so
- the second-to-last argument becomes c{%1}. Thus the arguments are
- iterated through in reverse order.
- S{concat} i{Concatenating Macro Parameters}
- NASM can concatenate macro parameters on to other text surrounding
- them. This allows you to declare a family of symbols, for example,
- in a macro definition. If, for example, you wanted to generate a
- table of key codes along with offsets into the table, you could code
- something like
- c %macro keytab_entry 2
- c keypos%1 equ $-keytab
- c db %2
- c %endmacro
- c keytab:
- c keytab_entry F1,128+1
- c keytab_entry F2,128+2
- c keytab_entry Return,13
- which would expand to
- c keytab:
- c keyposF1 equ $-keytab
- c db 128+1
- c keyposF2 equ $-keytab
- c db 128+2
- c keyposReturn equ $-keytab
- c db 13
- You can just as easily concatenate text on to the other end of a
- macro parameter, by writing c{%1foo}.
- If you need to append a e{digit} to a macro parameter, for example
- defining labels c{foo1} and c{foo2} when passed the parameter
- c{foo}, you can't code c{%11} because that would be taken as the
- eleventh macro parameter. Instead, you must code
- I{braces, after % sign}c{%{1}1}, which will separate the first
- c{1} (giving the number of the macro parameter) from the second
- (literal text to be concatenated to the parameter).
- This concatenation can also be applied to other preprocessor in-line
- objects, such as macro-local labels (k{maclocal}) and context-local
- labels (k{ctxlocal}). In all cases, ambiguities in syntax can be
- resolved by enclosing everything after the c{%} sign and before the
- literal text in braces: so c{%{%foo}bar} concatenates the text
- c{bar} to the end of the real name of the macro-local label
- c{%%foo}. (This is unnecessary, since the form NASM uses for the
- real names of macro-local labels means that the two usages
- c{%{%foo}bar} and c{%%foobar} would both expand to the same
- thing anyway; nevertheless, the capability is there.)
- S{mlmaccc} i{Condition Codes as Macro Parameters}
- NASM can give special treatment to a macro parameter which contains
- a condition code. For a start, you can refer to the macro parameter
- c{%1} by means of the alternative syntax ic{%+1}, which informs
- NASM that this macro parameter is supposed to contain a condition
- code, and will cause the preprocessor to report an error message if
- the macro is called with a parameter which is e{not} a valid
- condition code.
- Far more usefully, though, you can refer to the macro parameter by
- means of ic{%-1}, which NASM will expand as the e{inverse}
- condition code. So the c{retz} macro defined in k{maclocal} can be
- replaced by a general i{conditional-return macro} like this:
- c %macro retc 1
- c j%-1 %%skip
- c ret
- c %%skip:
- c %endmacro
- This macro can now be invoked using calls like c{retc ne}, which
- will cause the conditional-jump instruction in the macro expansion
- to come out as c{JE}, or c{retc po} which will make the jump a
- c{JPE}.
- The c{%+1} macro-parameter reference is quite happy to interpret
- the arguments c{CXZ} and c{ECXZ} as valid condition codes;
- however, c{%-1} will report an error if passed either of these,
- because no inverse condition code exists.
- S{nolist} i{Disabling Listing Expansion}Ic{.nolist}
- When NASM is generating a listing file from your program, it will
- generally expand multi-line macros by means of writing the macro
- call and then listing each line of the expansion. This allows you to
- see which instructions in the macro expansion are generating what
- code; however, for some macros this clutters the listing up
- unnecessarily.
- NASM therefore provides the c{.nolist} qualifier, which you can
- include in a macro definition to inhibit the expansion of the macro
- in the listing file. The c{.nolist} qualifier comes directly after
- the number of parameters, like this:
- c %macro foo 1.nolist
- Or like this:
- c %macro bar 1-5+.nolist a,b,c,d,e,f,g,h
- H{condasm} i{Conditional Assembly}Ic{%if}
- Similarly to the C preprocessor, NASM allows sections of a source
- file to be assembled only if certain conditions are met. The general
- syntax of this feature looks like this:
- c %if<condition>
- c ; some code which only appears if <condition> is met
- c %elif<condition2>
- c ; only appears if <condition> is not met but <condition2> is
- c %else
- c ; this appears if neither <condition> nor <condition2> was met
- c %endif
- The ic{%else} clause is optional, as is the ic{%elif} clause.
- You can have more than one c{%elif} clause as well.
- S{ifdef} ic{%ifdef}: i{Testing Single-Line Macro Existence}
- Beginning a conditional-assembly block with the line c{%ifdef
- MACRO} will assemble the subsequent code if, and only if, a
- single-line macro called c{MACRO} is defined. If not, then the
- c{%elif} and c{%else} blocks (if any) will be processed instead.
- For example, when debugging a program, you might want to write code
- such as
- c ; perform some function
- c %ifdef DEBUG
- c writefile 2,"Function performed successfully",13,10
- c %endif
- c ; go and do something else
- Then you could use the command-line option c{-dDEBUG} to create a
- version of the program which produced debugging messages, and remove
- the option to generate the final release version of the program.
- You can test for a macro e{not} being defined by using
- ic{%ifndef} instead of c{%ifdef}. You can also test for macro
- definitions in c{%elif} blocks by using ic{%elifdef} and
- ic{%elifndef}.
- S{ifctx} ic{%ifctx}: i{Testing the Context Stack}
- The conditional-assembly construct c{%ifctx ctxname} will cause the
- subsequent code to be assembled if and only if the top context on
- the preprocessor's context stack has the name c{ctxname}. As with
- c{%ifdef}, the inverse and c{%elif} forms ic{%ifnctx},
- ic{%elifctx} and ic{%elifnctx} are also supported.
- For more details of the context stack, see k{ctxstack}. For a
- sample use of c{%ifctx}, see k{blockif}.
- S{if} ic{%if}: i{Testing Arbitrary Numeric Expressions}
- The conditional-assembly construct c{%if expr} will cause the
- subsequent code to be assembled if and only if the value of the
- numeric expression c{expr} is non-zero. An example of the use of
- this feature is in deciding when to break out of a c{%rep}
- preprocessor loop: see k{rep} for a detailed example.
- The expression given to c{%if}, and its counterpart ic{%elif}, is
- a critical expression (see k{crit}).
- c{%if} extends the normal NASM expression syntax, by providing a
- set of i{relational operators} which are not normally available in
- expressions. The operators ic{=}, ic{<}, ic{>}, ic{<=},
- ic{>=} and ic{<>} test equality, less-than, greater-than,
- less-or-equal, greater-or-equal and not-equal respectively. The
- C-like forms ic{==} and ic{!=} are supported as alternative
- forms of c{=} and c{<>}. In addition, low-priority logical
- operators ic{&&}, ic{^^} and ic{||} are provided, supplying
- i{logical AND}, i{logical XOR} and i{logical OR}. These work like
- the C logical operators (although C has no logical XOR), in that
- they always return either 0 or 1, and treat any non-zero input as 1
- (so that c{^^}, for example, returns 1 if exactly one of its inputs
- is zero, and 0 otherwise). The relational operators also return 1
- for true and 0 for false.
- S{ifidn} ic{%ifidn} and ic{%ifidni}: i{Testing Exact Text
- Identity}
- The construct c{%ifidn text1,text2} will cause the subsequent code
- to be assembled if and only if c{text1} and c{text2}, after
- expanding single-line macros, are identical pieces of text.
- Differences in white space are not counted.
- c{%ifidni} is similar to c{%ifidn}, but is i{case-insensitive}.
- For example, the following macro pushes a register or number on the
- stack, and allows you to treat c{IP} as a real register:
- c %macro pushparam 1
- c %ifidni %1,ip
- c call %%label
- c %%label:
- c %else
- c push %1
- c %endif
- c %endmacro
- Like most other c{%if} constructs, c{%ifidn} has a counterpart
- ic{%elifidn}, and negative forms ic{%ifnidn} and ic{%elifnidn}.
- Similarly, c{%ifidni} has counterparts ic{%elifidni},
- ic{%ifnidni} and ic{%elifnidni}.
- S{iftyp} ic{%ifid}, ic{%ifnum}, ic{%ifstr}: i{Testing Token
- Types}
- Some macros will want to perform different tasks depending on
- whether they are passed a number, a string, or an identifier. For
- example, a string output macro might want to be able to cope with
- being passed either a string constant or a pointer to an existing
- string.
- The conditional assembly construct c{%ifid}, taking one parameter
- (which may be blank), assembles the subsequent code if and only if
- the first token in the parameter exists and is an identifier.
- c{%ifnum} works similarly, but tests for the token being a numeric
- constant; c{%ifstr} tests for it being a string.
- For example, the c{writefile} macro defined in k{mlmacgre} can be
- extended to take advantage of c{%ifstr} in the following fashion:
- c %macro writefile 2-3+
- c %ifstr %2
- c jmp %%endstr
- c %if %0 = 3
- c %%str: db %2,%3
- c %else
- c %%str: db %2
- c %endif
- c %%endstr: mov dx,%%str
- c mov cx,%%endstr-%%str
- c %else
- c mov dx,%2
- c mov cx,%3
- c %endif
- c mov bx,%1
- c mov ah,0x40
- c int 0x21
- c %endmacro
- Then the c{writefile} macro can cope with being called in either of
- the following two ways:
- c writefile [file], strpointer, length
- c writefile [file], "hello", 13, 10
- In the first, c{strpointer} is used as the address of an
- already-declared string, and c{length} is used as its length; in
- the second, a string is given to the macro, which therefore declares
- it itself and works out the address and length for itself.
- Note the use of c{%if} inside the c{%ifstr}: this is to detect
- whether the macro was passed two arguments (so the string would be a
- single string constant, and c{db %2} would be adequate) or more (in
- which case, all but the first two would be lumped together into
- c{%3}, and c{db %2,%3} would be required).
- Ic{%ifnid}Ic{%elifid}Ic{%elifnid}Ic{%ifnnum}Ic{%elifnum}Ic{%elifnnum}Ic{%ifnstr}Ic{%elifstr}Ic{%elifnstr}
- The usual c{%elifXXX}, c{%ifnXXX} and c{%elifnXXX} versions exist
- for each of c{%ifid}, c{%ifnum} and c{%ifstr}.
- S{pperror} ic{%error}: Reporting i{User-Defined Errors}
- The preprocessor directive c{%error} will cause NASM to report an
- error if it occurs in assembled code. So if other users are going to
- try to assemble your source files, you can ensure that they define
- the right macros by means of code like this:
- c %ifdef SOME_MACRO
- c ; do some setup
- c %elifdef SOME_OTHER_MACRO
- c ; do some different setup
- c %else
- c %error Neither SOME_MACRO nor SOME_OTHER_MACRO was defined.
- c %endif
- Then any user who fails to understand the way your code is supposed
- to be assembled will be quickly warned of their mistake, rather than
- having to wait until the program crashes on being run and then not
- knowing what went wrong.
- H{rep} i{Preprocessor Loops}I{repeating code}: ic{%rep}
- NASM's c{TIMES} prefix, though useful, cannot be used to invoke a
- multi-line macro multiple times, because it is processed by NASM
- after macros have already been expanded. Therefore NASM provides
- another form of loop, this time at the preprocessor level: c{%rep}.
- The directives c{%rep} and ic{%endrep} (c{%rep} takes a numeric
- argument, which can be an expression; c{%endrep} takes no
- arguments) can be used to enclose a chunk of code, which is then
- replicated as many times as specified by the preprocessor:
- c %assign i 0
- c %rep 64
- c inc word [table+2*i]
- c %assign i i+1
- c %endrep
- This will generate a sequence of 64 c{INC} instructions,
- incrementing every word of memory from c{[table]} to
- c{[table+126]}.
- For more complex termination conditions, or to break out of a repeat
- loop part way along, you can use the ic{%exitrep} directive to
- terminate the loop, like this:
- c fibonacci:
- c %assign i 0
- c %assign j 1
- c %rep 100
- c %if j > 65535
- c %exitrep
- c %endif
- c dw j
- c %assign k j+i
- c %assign i j
- c %assign j k
- c %endrep
- c fib_number equ ($-fibonacci)/2
- This produces a list of all the Fibonacci numbers that will fit in
- 16 bits. Note that a maximum repeat count must still be given to
- c{%rep}. This is to prevent the possibility of NASM getting into an
- infinite loop in the preprocessor, which (on multitasking or
- multi-user systems) would typically cause all the system memory to
- be gradually used up and other applications to start crashing.
- H{include} i{Including Other Files}
- Using, once again, a very similar syntax to the C preprocessor,
- NASM's preprocessor lets you include other source files into your
- code. This is done by the use of the ic{%include} directive:
- c %include "macros.mac"
- will include the contents of the file c{macros.mac} into the source
- file containing the c{%include} directive.
- Include files are I{searching for include files}searched for in the
- current directory (the directory you're in when you run NASM, as
- opposed to the location of the NASM executable or the location of
- the source file), plus any directories specified on the NASM command
- line using the c{-i} option.
- The standard C idiom for preventing a file being included more than
- once is just as applicable in NASM: if the file c{macros.mac} has
- the form
- c %ifndef MACROS_MAC
- c %define MACROS_MAC
- c ; now define some macros
- c %endif
- then including the file more than once will not cause errors,
- because the second time the file is included nothing will happen
- because the macro c{MACROS_MAC} will already be defined.
- You can force a file to be included even if there is no c{%include}
- directive that explicitly includes it, by using the ic{-p} option
- on the NASM command line (see k{opt-p}).
- H{ctxstack} The i{Context Stack}
- Having labels that are local to a macro definition is sometimes not
- quite powerful enough: sometimes you want to be able to share labels
- between several macro calls. An example might be a c{REPEAT} ...
- c{UNTIL} loop, in which the expansion of the c{REPEAT} macro
- would need to be able to refer to a label which the c{UNTIL} macro
- had defined. However, for such a macro you would also want to be
- able to nest these loops.
- NASM provides this level of power by means of a e{context stack}.
- The preprocessor maintains a stack of e{contexts}, each of which is
- characterised by a name. You add a new context to the stack using
- the ic{%push} directive, and remove one using ic{%pop}. You can
- define labels that are local to a particular context on the stack.
- S{pushpop} ic{%push} and ic{%pop}: I{creating
- contexts}I{removing contexts}Creating and Removing Contexts
- The c{%push} directive is used to create a new context and place it
- on the top of the context stack. c{%push} requires one argument,
- which is the name of the context. For example:
- c %push foobar
- This pushes a new context called c{foobar} on the stack. You can
- have several contexts on the stack with the same name: they can
- still be distinguished.
- The directive c{%pop}, requiring no arguments, removes the top
- context from the context stack and destroys it, along with any
- labels associated with it.
- S{ctxlocal} i{Context-Local Labels}
- Just as the usage c{%%foo} defines a label which is local to the
- particular macro call in which it is used, the usage I{%$}c{%$foo}
- is used to define a label which is local to the context on the top
- of the context stack. So the c{REPEAT} and c{UNTIL} example given
- above could be implemented by means of:
- c %macro repeat 0
- c %push repeat
- c %$begin:
- c %endmacro
- c %macro until 1
- c j%-1 %$begin
- c %pop
- c %endmacro
- and invoked by means of, for example,
- c mov cx,string
- c repeat
- c add cx,3
- c scasb
- c until e
- which would scan every fourth byte of a string in search of the byte
- in c{AL}.
- If you need to define, or access, labels local to the context
- e{below} the top one on the stack, you can use I{%$$}c{%$$foo}, or
- c{%$$$foo} for the context below that, and so on.
- S{ctxdefine} i{Context-Local Single-Line Macros}
- NASM also allows you to define single-line macros which are local to
- a particular context, in just the same way:
- c %define %$localmac 3
- will define the single-line macro c{%$localmac} to be local to the
- top context on the stack. Of course, after a subsequent c{%push},
- it can then still be accessed by the name c{%$$localmac}.
- S{ctxrepl} ic{%repl}: I{renaming contexts}Renaming a Context
- If you need to change the name of the top context on the stack (in
- order, for example, to have it respond differently to c{%ifctx}),
- you can execute a c{%pop} followed by a c{%push}; but this will
- have the side effect of destroying all context-local labels and
- macros associated with the context that was just popped.
- NASM provides the directive c{%repl}, which e{replaces} a context
- with a different name, without touching the associated macros and
- labels. So you could replace the destructive code
- c %pop
- c %push newname
- with the non-destructive version c{%repl newname}.
- S{blockif} Example Use of the i{Context Stack}: i{Block IFs}
- This example makes use of almost all the context-stack features,
- including the conditional-assembly construct ic{%ifctx}, to
- implement a block IF statement as a set of macros.
- c %macro if 1
- c %push if
- c j%-1 %$ifnot
- c %endmacro
- c %macro else 0
- c %ifctx if
- c %repl else
- c jmp %$ifend
- c %$ifnot:
- c %else
- c %error "expected `if' before `else'"
- c %endif
- c %endmacro
- c %macro endif 0
- c %ifctx if
- c %$ifnot:
- c %pop
- c %elifctx else
- c %$ifend:
- c %pop
- c %else
- c %error "expected `if' or `else' before `endif'"
- c %endif
- c %endmacro
- This code is more robust than the c{REPEAT} and c{UNTIL} macros
- given in k{ctxlocal}, because it uses conditional assembly to check
- that the macros are issued in the right order (for example, not
- calling c{endif} before c{if}) and issues a c{%error} if they're
- not.
- In addition, the c{endif} macro has to be able to cope with the two
- distinct cases of either directly following an c{if}, or following
- an c{else}. It achieves this, again, by using conditional assembly
- to do different things depending on whether the context on top of
- the stack is c{if} or c{else}.
- The c{else} macro has to preserve the context on the stack, in
- order to have the c{%$ifnot} referred to by the c{if} macro be the
- same as the one defined by the c{endif} macro, but has to change
- the context's name so that c{endif} will know there was an
- intervening c{else}. It does this by the use of c{%repl}.
- A sample usage of these macros might look like:
- c cmp ax,bx
- c if ae
- c cmp bx,cx
- c if ae
- c mov ax,cx
- c else
- c mov ax,bx
- c endif
- c else
- c cmp ax,cx
- c if ae
- c mov ax,cx
- c endif
- c endif
- The block-c{IF} macros handle nesting quite happily, by means of
- pushing another context, describing the inner c{if}, on top of the
- one describing the outer c{if}; thus c{else} and c{endif} always
- refer to the last unmatched c{if} or c{else}.
- H{stdmac} i{Standard Macros}
- NASM defines a set of standard macros, which are already defined
- when it starts to process any source file. If you really need a
- program to be assembled with no pre-defined macros, you can use the
- ic{%clear} directive to empty the preprocessor of everything.
- Most i{user-level assembler directives} (see k{directive}) are
- implemented as macros which invoke primitive directives; these are
- described in k{directive}. The rest of the standard macro set is
- described here.
- S{stdmacver} ic{__NASM_MAJOR__} and ic{__NASM_MINOR__}: i{NASM
- Version}
- The single-line macros c{__NASM_MAJOR__} and c{__NASM_MINOR__}
- expand to the major and minor parts of the i{version number of
- NASM} being used. So, under NASM 0.96 for example,
- c{__NASM_MAJOR__} would be defined to be 0 and c{__NASM_MINOR__}
- would be defined as 96.
- S{fileline} ic{__FILE__} and ic{__LINE__}: File Name and Line Number
- Like the C preprocessor, NASM allows the user to find out the file
- name and line number containing the current instruction. The macro
- c{__FILE__} expands to a string constant giving the name of the
- current input file (which may change through the course of assembly
- if c{%include} directives are used), and c{__LINE__} expands to a
- numeric constant giving the current line number in the input file.
- These macros could be used, for example, to communicate debugging
- information to a macro, since invoking c{__LINE__} inside a macro
- definition (either single-line or multi-line) will return the line
- number of the macro e{call}, rather than e{definition}. So to
- determine where in a piece of code a crash is occurring, for
- example, one could write a routine c{stillhere}, which is passed a
- line number in c{EAX} and outputs something like `line 155: still
- here'. You could then write a macro
- c %macro notdeadyet 0
- c push eax
- c mov eax,__LINE__
- c call stillhere
- c pop eax
- c %endmacro
- and then pepper your code with calls to c{notdeadyet} until you
- find the crash point.
- S{struc} ic{STRUC} and ic{ENDSTRUC}: i{Declaring Structure} Data Types
- The core of NASM contains no intrinsic means of defining data
- structures; instead, the preprocessor is sufficiently powerful that
- data structures can be implemented as a set of macros. The macros
- c{STRUC} and c{ENDSTRUC} are used to define a structure data type.
- c{STRUC} takes one parameter, which is the name of the data type.
- This name is defined as a symbol with the value zero, and also has
- the suffix c{_size} appended to it and is then defined as an
- c{EQU} giving the size of the structure. Once c{STRUC} has been
- issued, you are defining the structure, and should define fields
- using the c{RESB} family of pseudo-instructions, and then invoke
- c{ENDSTRUC} to finish the definition.
- For example, to define a structure called c{mytype} containing a
- longword, a word, a byte and a string of bytes, you might code
- c struc mytype
- c mt_long: resd 1
- c mt_word: resw 1
- c mt_byte: resb 1
- c mt_str: resb 32
- c endstruc
- The above code defines six symbols: c{mt_long} as 0 (the offset
- from the beginning of a c{mytype} structure to the longword field),
- c{mt_word} as 4, c{mt_byte} as 6, c{mt_str} as 7, c{mytype_size}
- as 39, and c{mytype} itself as zero.
- The reason why the structure type name is defined at zero is a side
- effect of allowing structures to work with the local label
- mechanism: if your structure members tend to have the same names in
- more than one structure, you can define the above structure like this:
- c struc mytype
- c .long: resd 1
- c .word: resw 1
- c .byte: resb 1
- c .str: resb 32
- c endstruc
- This defines the offsets to the structure fields as c{mytype.long},
- c{mytype.word}, c{mytype.byte} and c{mytype.str}.
- NASM, since it has no e{intrinsic} structure support, does not
- support any form of period notation to refer to the elements of a
- structure once you have one (except the above local-label notation),
- so code such as c{mov ax,[mystruc.mt_word]} is not valid.
- c{mt_word} is a constant just like any other constant, so the
- correct syntax is c{mov ax,[mystruc+mt_word]} or c{mov
- ax,[mystruc+mytype.word]}.
- S{istruc} ic{ISTRUC}, ic{AT} and ic{IEND}: Declaring
- i{Instances of Structures}
- Having defined a structure type, the next thing you typically want
- to do is to declare instances of that structure in your data
- segment. NASM provides an easy way to do this in the c{ISTRUC}
- mechanism. To declare a structure of type c{mytype} in a program,
- you code something like this:
- c mystruc: istruc mytype
- c at mt_long, dd 123456
- c at mt_word, dw 1024
- c at mt_byte, db 'x'
- c at mt_str, db 'hello, world', 13, 10, 0
- c iend
- The function of the c{AT} macro is to make use of the c{TIMES}
- prefix to advance the assembly position to the correct point for the
- specified structure field, and then to declare the specified data.
- Therefore the structure fields must be declared in the same order as
- they were specified in the structure definition.
- If the data to go in a structure field requires more than one source
- line to specify, the remaining source lines can easily come after
- the c{AT} line. For example:
- c at mt_str, db 123,134,145,156,167,178,189
- c db 190,100,0
- Depending on personal taste, you can also omit the code part of the
- c{AT} line completely, and start the structure field on the next
- line:
- c at mt_str
- c db 'hello, world'
- c db 13,10,0
- S{align} ic{ALIGN} and ic{ALIGNB}: Data Alignment
- The c{ALIGN} and c{ALIGNB} macros provides a convenient way to
- align code or data on a word, longword, paragraph or other boundary.
- (Some assemblers call this directive ic{EVEN}.) The syntax of the
- c{ALIGN} and c{ALIGNB} macros is
- c align 4 ; align on 4-byte boundary
- c align 16 ; align on 16-byte boundary
- c align 8,db 0 ; pad with 0s rather than NOPs
- c align 4,resb 1 ; align to 4 in the BSS
- c alignb 4 ; equivalent to previous line
- Both macros require their first argument to be a power of two; they
- both compute the number of additional bytes required to bring the
- length of the current section up to a multiple of that power of two,
- and then apply the c{TIMES} prefix to their second argument to
- perform the alignment.
- If the second argument is not specified, the default for c{ALIGN}
- is c{NOP}, and the default for c{ALIGNB} is c{RESB 1}. So if the
- second argument is specified, the two macros are equivalent.
- Normally, you can just use c{ALIGN} in code and data sections and
- c{ALIGNB} in BSS sections, and never need the second argument
- except for special purposes.
- c{ALIGN} and c{ALIGNB}, being simple macros, perform no error
- checking: they cannot warn you if their first argument fails to be a
- power of two, or if their second argument generates more than one
- byte of code. In each of these cases they will silently do the wrong
- thing.
- c{ALIGNB} (or c{ALIGN} with a second argument of c{RESB 1}) can
- be used within structure definitions:
- c struc mytype2
- c mt_byte: resb 1
- c alignb 2
- c mt_word: resw 1
- c alignb 4
- c mt_long: resd 1
- c mt_str: resb 32
- c endstruc
- This will ensure that the structure members are sensibly aligned
- relative to the base of the structure.
- A final caveat: c{ALIGN} and c{ALIGNB} work relative to the
- beginning of the e{section}, not the beginning of the address space
- in the final executable. Aligning to a 16-byte boundary when the
- section you're in is only guaranteed to be aligned to a 4-byte
- boundary, for example, is a waste of effort. Again, NASM does not
- check that the section's alignment characteristics are sensible for
- the use of c{ALIGN} or c{ALIGNB}.
- C{directive} i{Assembler Directives}
- NASM, though it attempts to avoid the bureaucracy of assemblers like
- MASM and TASM, is nevertheless forced to support a e{few}
- directives. These are described in this chapter.
- NASM's directives come in two types: i{user-level
- directives}e{user-level} directives and i{primitive
- directives}e{primitive} directives. Typically, each directive has a
- user-level form and a primitive form. In almost all cases, we
- recommend that users use the user-level forms of the directives,
- which are implemented as macros which call the primitive forms.
- Primitive directives are enclosed in square brackets; user-level
- directives are not.
- In addition to the universal directives described in this chapter,
- each object file format can optionally supply extra directives in
- order to control particular features of that file format. These
- i{format-specific directives}e{format-specific} directives are
- documented along with the formats that implement them, in k{outfmt}.
- H{bits} ic{BITS}: Specifying Target i{Processor Mode}
- The c{BITS} directive specifies whether NASM should generate code
- I{16-bit mode, versus 32-bit mode}designed to run on a processor
- operating in 16-bit mode, or code designed to run on a processor
- operating in 32-bit mode. The syntax is c{BITS 16} or c{BITS 32}.
- In most cases, you should not need to use c{BITS} explicitly. The
- c{aout}, c{coff}, c{elf} and c{win32} object formats, which are
- designed for use in 32-bit operating systems, all cause NASM to
- select 32-bit mode by default. The c{obj} object format allows you
- to specify each segment you define as either c{USE16} or c{USE32},
- and NASM will set its operating mode accordingly, so the use of the
- c{BITS} directive is once again unnecessary.
- The most likely reason for using the c{BITS} directive is to write
- 32-bit code in a flat binary file; this is because the c{bin}
- output format defaults to 16-bit mode in anticipation of it being
- used most frequently to write DOS c{.COM} programs, DOS c{.SYS}
- device drivers and boot loader software.
- You do e{not} need to specify c{BITS 32} merely in order to use
- 32-bit instructions in a 16-bit DOS program; if you do, the
- assembler will generate incorrect code because it will be writing
- code targeted at a 32-bit platform, to be run on a 16-bit one.
- When NASM is in c{BITS 16} state, instructions which use 32-bit
- data are prefixed with an 0x66 byte, and those referring to 32-bit
- addresses have an 0x67 prefix. In c{BITS 32} state, the reverse is
- true: 32-bit instructions require no prefixes, whereas instructions
- using 16-bit data need an 0x66 and those working in 16-bit addresses
- need an 0x67.
- The c{BITS} directive has an exactly equivalent primitive form,
- c{[BITS 16]} and c{[BITS 32]}. The user-level form is a macro
- which has no function other than to call the primitive form.
- H{section} ic{SECTION} or ic{SEGMENT}: Changing and i{Defining
- Sections}
- I{changing sections}I{switching between sections}The c{SECTION}
- directive (c{SEGMENT} is an exactly equivalent synonym) changes
- which section of the output file the code you write will be
- assembled into. In some object file formats, the number and names of
- sections are fixed; in others, the user may make up as many as they
- wish. Hence c{SECTION} may sometimes give an error message, or may
- define a new section, if you try to switch to a section that does
- not (yet) exist.
- The Unix object formats, and the c{bin} object format, all support
- the i{standardised section names} c{.text}, c{.data} and c{.bss}
- for the code, data and uninitialised-data sections. The c{obj}
- format, by contrast, does not recognise these section names as being
- special, and indeed will strip off the leading period of any section
- name that has one.
- S{sectmac} The ic{__SECT__} Macro
- The c{SECTION} directive is unusual in that its user-level form
- functions differently from its primitive form. The primitive form,
- c{[SECTION xyz]}, simply switches the current target section to the
- one given. The user-level form, c{SECTION xyz}, however, first
- defines the single-line macro c{__SECT__} to be the primitive
- c{[SECTION]} directive which it is about to issue, and then issues
- it. So the user-level directive
- c SECTION .text
- expands to the two lines
- c %define __SECT__ [SECTION .text]
- c [SECTION .text]
- Users may find it useful to make use of this in their own macros.
- For example, the c{writefile} macro defined in k{mlmacgre} can be
- usefully rewritten in the following more sophisticated form:
- c %macro writefile 2+
- c [section .data]
- c %%str: db %2
- c %%endstr:
- c __SECT__
- c mov dx,%%str
- c mov cx,%%endstr-%%str
- c mov bx,%1
- c mov ah,0x40
- c int 0x21
- c %endmacro
- This form of the macro, once passed a string to output, first
- switches temporarily to the data section of the file, using the
- primitive form of the c{SECTION} directive so as not to modify
- c{__SECT__}. It then declares its string in the data section, and
- then invokes c{__SECT__} to switch back to e{whichever} section
- the user was previously working in. It thus avoids the need, in the
- previous version of the macro, to include a c{JMP} instruction to
- jump over the data, and also does not fail if, in a complicated
- c{OBJ} format module, the user could potentially be assembling the
- code in any of several separate code sections.
- H{absolute} ic{ABSOLUTE}: Defining Absolute Labels
- The c{ABSOLUTE} directive can be thought of as an alternative form
- of c{SECTION}: it causes the subsequent code to be directed at no
- physical section, but at the hypothetical section starting at the
- given absolute address. The only instructions you can use in this
- mode are the c{RESB} family.
- c{ABSOLUTE} is used as follows:
- c absolute 0x1A
- c kbuf_chr resw 1
- c kbuf_free resw 1
- c kbuf resw 16
- This example describes a section of the PC BIOS data area, at
- segment address 0x40: the above code defines c{kbuf_chr} to be
- 0x1A, c{kbuf_free} to be 0x1C, and c{kbuf} to be 0x1E.
- The user-level form of c{ABSOLUTE}, like that of c{SECTION},
- redefines the ic{__SECT__} macro when it is invoked.
- ic{STRUC} and ic{ENDSTRUC} are defined as macros which use
- c{ABSOLUTE} (and also c{__SECT__}).
- c{ABSOLUTE} doesn't have to take an absolute constant as an
- argument: it can take an expression (actually, a i{critical
- expression}: see k{crit}) and it can be a value in a segment. For
- example, a TSR can re-use its setup code as run-time BSS like this:
- c org 100h ; it's a .COM program
- c jmp setup ; setup code comes last
- c ; the resident part of the TSR goes here
- c setup: ; now write the code that installs the TSR here
- c absolute setup
- c runtimevar1 resw 1
- c runtimevar2 resd 20
- c tsr_end:
- This defines some variables `on top of' the setup code, so that
- after the setup has finished running, the space it took up can be
- re-used as data storage for the running TSR. The symbol `tsr_end'
- can be used to calculate the total size of the part of the TSR that
- needs to be made resident.
- H{extern} ic{EXTERN}: i{Importing Symbols} from Other Modules
- c{EXTERN} is similar to the MASM directive c{EXTRN} and the C
- keyword c{extern}: it is used to declare a symbol which is not
- defined anywhere in the module being assembled, but is assumed to be
- defined in some other module and needs to be referred to by this
- one. Not every object-file format can support external variables:
- the c{bin} format cannot.
- The c{EXTERN} directive takes as many arguments as you like. Each
- argument is the name of a symbol:
- c extern _printf
- c extern _sscanf,_fscanf
- Some object-file formats provide extra features to the c{EXTERN}
- directive. In all cases, the extra features are used by suffixing a
- colon to the symbol name followed by object-format specific text.
- For example, the c{obj} format allows you to declare that the
- default segment base of an external should be the group c{dgroup}
- by means of the directive
- c extern _variable:wrt dgroup
- The primitive form of c{EXTERN} differs from the user-level form
- only in that it can take only one argument at a time: the support
- for multiple arguments is implemented at the preprocessor level.
- You can declare the same variable as c{EXTERN} more than once: NASM
- will quietly ignore the second and later redeclarations. You can't
- declare a variable as c{EXTERN} as well as something else, though.
- H{global} ic{GLOBAL}: i{Exporting Symbols} to Other Modules
- c{GLOBAL} is the other end of c{EXTERN}: if one module declares a
- symbol as c{EXTERN} and refers to it, then in order to prevent
- linker errors, some other module must actually e{define} the
- symbol and declare it as c{GLOBAL}. Some assemblers use the name
- ic{PUBLIC} for this purpose.
- The c{GLOBAL} directive applying to a symbol must appear e{before}
- the definition of the symbol.
- c{GLOBAL} uses the same syntax as c{EXTERN}, except that it must
- refer to symbols which e{are} defined in the same module as the
- c{GLOBAL} directive. For example:
- c global _main
- c _main: ; some code
- c{GLOBAL}, like c{EXTERN}, allows object formats to define private
- extensions by means of a colon. The c{elf} object format, for
- example, lets you specify whether global data items are functions or
- data:
- c global hashlookup:function, hashtable:data
- Like c{EXTERN}, the primitive form of c{GLOBAL} differs from the
- user-level form only in that it can take only one argument at a
- time.
- H{common} ic{COMMON}: Defining Common Data Areas
- The c{COMMON} directive is used to declare ie{common variables}.
- A common variable is much like a global variable declared in the
- uninitialised data section, so that
- c common intvar 4
- is similar in function to
- c global intvar
- c section .bss
- c intvar resd 1
- The difference is that if more than one module defines the same
- common variable, then at link time those variables will be
- e{merged}, and references to c{intvar} in all modules will point
- at the same piece of memory.
- Like c{GLOBAL} and c{EXTERN}, c{COMMON} supports object-format
- specific extensions. For example, the c{obj} format allows common
- variables to be NEAR or FAR, and the c{elf} format allows you to
- specify the alignment requirements of a common variable:
- c common commvar 4:near ; works in OBJ
- c common intarray 100:4 ; works in ELF: 4 byte aligned
- Once again, like c{EXTERN} and c{GLOBAL}, the primitive form of
- c{COMMON} differs from the user-level form only in that it can take
- only one argument at a time.
- C{outfmt} i{Output Formats}
- NASM is a portable assembler, designed to be able to compile on any
- ANSI C-supporting platform and produce output to run on a variety of
- Intel x86 operating systems. For this reason, it has a large number
- of available output formats, selected using the ic{-f} option on
- the NASM i{command line}. Each of these formats, along with its
- extensions to the base NASM syntax, is detailed in this chapter.
- As stated in k{opt-o}, NASM chooses a i{default name} for your
- output file based on the input file name and the chosen output
- format. This will be generated by removing the i{extension}
- (c{.asm}, c{.s}, or whatever you like to use) from the input file
- name, and substituting an extension defined by the output format.
- The extensions are given with each format below.
- H{binfmt} ic{bin}: i{Flat-Form Binary}I{pure binary} Output
- The c{bin} format does not produce object files: it generates
- nothing in the output file except the code you wrote. Such `pure
- binary' files are used by i{MS-DOS}: ic{.COM} executables and
- ic{.SYS} device drivers are pure binary files. Pure binary output
- is also useful for i{operating-system} and i{boot loader}
- development.
- c{bin} supports the three i{standardised section names} ic{.text},
- ic{.data} and ic{.bss} only. The file NASM outputs will contain the
- contents of the c{.text} section first, followed by the contents of
- the c{.data} section, aligned on a four-byte boundary. The c{.bss}
- section is not stored in the output file at all, but is assumed to
- appear directly after the end of the c{.data} section, again
- aligned on a four-byte boundary.
- If you specify no explicit c{SECTION} directive, the code you write
- will be directed by default into the c{.text} section.
- Using the c{bin} format puts NASM by default into 16-bit mode (see
- k{bits}). In order to use c{bin} to write 32-bit code such as an
- OS kernel, you need to explicitly issue the Ic{BITS}c{BITS 32}
- directive.
- c{bin} has no default output file name extension: instead, it
- leaves your file name as it is once the original extension has been
- removed. Thus, the default is for NASM to assemble c{binprog.asm}
- into a binary file called c{binprog}.
- S{org} ic{ORG}: Binary File i{Program Origin}
- The c{bin} format provides an additional directive to the list
- given in k{directive}: c{ORG}. The function of the c{ORG}
- directive is to specify the origin address which NASM will assume
- the program begins at when it is loaded into memory.
- For example, the following code will generate the longword
- c{0x00000104}:
- c org 0x100
- c dd label
- c label:
- Unlike the c{ORG} directive provided by MASM-compatible assemblers,
- which allows you to jump around in the object file and overwrite
- code you have already generated, NASM's c{ORG} does exactly what
- the directive says: e{origin}. Its sole function is to specify one
- offset which is added to all internal address references within the
- file; it does not permit any of the trickery that MASM's version
- does. See k{proborg} for further comments.
- S{binseg} c{bin} Extensions to the c{SECTION}
- DirectiveI{SECTION, bin extensions to}
- The c{bin} output format extends the c{SECTION} (or c{SEGMENT})
- directive to allow you to specify the alignment requirements of
- segments. This is done by appending the ic{ALIGN} qualifier to the
- end of the section-definition line. For example,