编译器/解释器

开发平台：

C/C++

nasmdoc.src：源码内容

# $Id: nasmdoc.src,v 2.3 1999/06/03 20:23:53 hpa Exp $
#
# Source code to NASM documentation
#
IR{-D} c{-D} option
IR{-E} c{-E} option
IR{-I} c{-I} option
IR{-P} c{-P} option
IR{-U} c{-U} option
IR{-a} c{-a} option
IR{-d} c{-d} option
IR{-e} c{-e} option
IR{-f} c{-f} option
IR{-i} c{-i} option
IR{-l} c{-l} option
IR{-o} c{-o} option
IR{-p} c{-p} option
IR{-s} c{-s} option
IR{-u} c{-u} option
IR{-w} c{-w} option
IR{!=} c{!=} operator
IR{$ here} c{$} Here token
IR{$$} c{$$} token
IR{%} c{%} operator
IR{%%} c{%%} operator
IR{%+1} c{%+1} and c{%-1} syntax
IA{%-1}{%+1}
IR{%0} c{%0} parameter count
IR{&} c{&} operator
IR{&&} c{&&} operator
IR{*} c{*} operator
IR{..@} c{..@} symbol prefix
IR{/} c{/} operator
IR{//} c{//} operator
IR{<} c{<} operator
IR{<<} c{<<} operator
IR{<=} c{<=} operator
IR{<>} c{<>} operator
IR{=} c{=} operator
IR{==} c{==} operator
IR{>} c{>} operator
IR{>=} c{>=} operator
IR{>>} c{>>} operator
IR{?} c{?} MASM syntax
IR{^} c{^} operator
IR{^^} c{^^} operator
IR{|} c{|} operator
IR{||} c{||} operator
IR{~} c{~} operator
IR{%$} c{%$} and c{%$$} prefixes
IA{%$$}{%$}
IR{+ opaddition} c{+} operator, binary
IR{+ opunary} c{+} operator, unary
IR{+ modifier} c{+} modifier
IR{- opsubtraction} c{-} operator, binary
IR{- opunary} c{-} operator, unary
IR{alignment, in bin sections} alignment, in c{bin} sections
IR{alignment, in elf sections} alignment, in c{elf} sections
IR{alignment, in win32 sections} alignment, in c{win32} sections
IR{alignment, of elf common variables} alignment, of c{elf} common
variables
IR{alignment, in obj sections} alignment, in c{obj} sections
IR{a.out, bsd version} c{a.out}, BSD version
IR{a.out, linux version} c{a.out}, Linux version
IR{autoconf} Autoconf
IR{bitwise and} bitwise AND
IR{bitwise or} bitwise OR
IR{bitwise xor} bitwise XOR
IR{block ifs} block IFs
IR{borland pascal} Borland, Pascal
IR{borland's win32 compilers} Borland, Win32 compilers
IR{braces, after % sign} braces, after c{%} sign
IR{bsd} BSD
IR{c calling convention} C calling convention
IR{c symbol names} C symbol names
IA{critical expressions}{critical expression}
IA{command line}{command-line}
IA{case sensitivity}{case sensitive}
IA{case-sensitive}{case sensitive}
IA{case-insensitive}{case sensitive}
IA{character constants}{character constant}
IR{common object file format} Common Object File Format
IR{common variables, alignment in elf} common variables, alignment
in c{elf}
IR{common, elf extensions to} c{COMMON}, c{elf} extensions to
IR{common, obj extensions to} c{COMMON}, c{obj} extensions to
IR{declaring structure} declaring structures
IR{default-wrt mechanism} default-c{WRT} mechanism
IR{devpac} DevPac
IR{djgpp} DJGPP
IR{dll symbols, exporting} DLL symbols, exporting
IR{dll symbols, importing} DLL symbols, importing
IR{dos} DOS
IR{dos archive} DOS archive
IR{dos source archive} DOS source archive
IA{effective address}{effective addresses}
IA{effective-address}{effective addresses}
IR{elf shared libraries} c{elf} shared libraries
IR{freebsd} FreeBSD
IR{freelink} FreeLink
IR{functions, c calling convention} functions, C calling convention
IR{functions, pascal calling convention} functions, Pascal calling
convention
IR{global, aoutb extensions to} c{GLOBAL}, c{aoutb} extensions to
IR{global, elf extensions to} c{GLOBAL}, c{elf} extensions to
IR{got} GOT
IR{got relocations} c{GOT} relocations
IR{gotoff relocation} c{GOTOFF} relocations
IR{gotpc relocation} c{GOTPC} relocations
IR{linux elf} Linux ELF
IR{logical and} logical AND
IR{logical or} logical OR
IR{logical xor} logical XOR
IR{masm} MASM
IA{memory reference}{memory references}
IA{misc directory}{misc subdirectory}
IR{misc subdirectory} c{misc} subdirectory
IR{microsoft omf} Microsoft OMF
IR{mmx registers} MMX registers
IA{modr/m}{modr/m byte}
IR{modr/m byte} ModR/M byte
IR{ms-dos} MS-DOS
IR{ms-dos device drivers} MS-DOS device drivers
IR{multipush} c{multipush} macro
IR{nasm version} NASM version
IR{netbsd} NetBSD
IR{omf} OMF
IR{openbsd} OpenBSD
IR{operating-system} operating system
IR{os/2} OS/2
IR{pascal calling convention}Pascal calling convention
IR{passes} passes, assembly
IR{perl} Perl
IR{pic} PIC
IR{pharlap} PharLap
IR{plt} PLT
IR{plt} c{PLT} relocations
IA{pre-defining macros}{pre-define}
IR{qbasic} QBasic
IA{rdoff subdirectory}{rdoff}
IR{rdoff} c{rdoff} subdirectory
IR{relocatable dynamic object file format} Relocatable Dynamic
Object File Format
IR{relocations, pic-specific} relocations, PIC-specific
IA{repeating}{repeating code}
IR{section alignment, in elf} section alignment, in c{elf}
IR{section alignment, in bin} section alignment, in c{bin}
IR{section alignment, in obj} section alignment, in c{obj}
IR{section alignment, in win32} section alignment, in c{win32}
IR{section, elf extensions to} c{SECTION}, c{elf} extensions to
IR{section, win32 extensions to} c{SECTION}, c{win32} extensions to
IR{segment alignment, in bin} segment alignment, in c{bin}
IR{segment alignment, in obj} segment alignment, in c{obj}
IR{segment, obj extensions to} c{SEGMENT}, c{elf} extensions to
IR{segment names, borland pascal} segment names, Borland Pascal
IR{shift commane} c{shift} command
IA{sib}{sib byte}
IR{sib byte} SIB byte
IA{standard section names}{standardised section names}
IR{symbols, exporting from dlls} symbols, exporting from DLLs
IR{symbols, importing from dlls} symbols, importing from DLLs
IR{tasm} TASM
IR{test subdirectory} c{test} subdirectory
IR{tlink} TLINK
IR{underscore, in c symbols} underscore, in C symbols
IR{unix} Unix
IR{unix source archive} Unix source archive
IR{val} VAL
IR{version number of nasm} version number of NASM
IR{visual c++} Visual C++
IR{www page} WWW page
IR{win32} Win32
IR{windows} Windows
IR{windows 95} Windows 95
IR{windows nt} Windows NT
# IC{program entry point}{entry point, program}
# IC{program entry point}{start point, program}
# IC{MS-DOS device drivers}{device drivers, MS-DOS}
# IC{16-bit mode, versus 32-bit mode}{32-bit mode, versus 16-bit mode}
# IC{c symbol names}{symbol names, in C}
C{intro} Introduction
H{whatsnasm} What Is NASM?
The Netwide Assembler, NASM, is an 80x86 assembler designed for
portability and modularity. It supports a range of object file
formats, including Linux c{a.out} and ELF, NetBSD/FreeBSD, COFF,
Microsoft 16-bit OBJ and Win32. It will also output plain binary
files. Its syntax is designed to be simple and easy to understand,
similar to Intel's but less complex. It supports Pentium, P6 and MMX
opcodes, and has macro capability.
S{yaasm} Why Yet Another Assembler?
The Netwide Assembler grew out of an idea on ic{comp.lang.asm.x86}
(or possibly ic{alt.lang.asm} - I forget which), which was
essentially that there didn't seem to be a good free x86-series
assembler around, and that maybe someone ought to write one.
b ic{a86} is good, but not free, and in particular you don't get any
32-bit capability until you pay. It's DOS only, too.
b ic{gas} is free, and ports over DOS and Unix, but it's not very good,
since it's designed to be a back end to ic{gcc}, which always feeds
it correct code. So its error checking is minimal. Also, its syntax
is horrible, from the point of view of anyone trying to actually
e{write} anything in it. Plus you can't write 16-bit code in it
(properly).
b ic{as86} is Linux-specific, and (my version at least) doesn't seem to
have much (or any) documentation.
b i{MASM} isn't very good, and it's expensive, and it runs only under
DOS.
b i{TASM} is better, but still strives for i{MASM} compatibility, which
means millions of directives and tons of red tape. And its syntax is
essentially i{MASM}'s, with the contradictions and quirks that entails
(although it sorts out some of those by means of Ideal mode). It's
expensive too. And it's DOS-only.
So here, for your coding pleasure, is NASM. At present it's
still in prototype stage - we don't promise that it can outperform
any of these assemblers. But please, e{please} send us bug reports,
fixes, helpful information, and anything else you can get your hands
on (and thanks to the many people who've done this already! You all
know who you are), and we'll improve it out of all recognition.
Again.
S{legal} Licence Conditions
Please see the file c{Licence}, supplied as part of any NASM
distribution archive, for the i{licence} conditions under which you
may use NASM.
H{contact} Contact Information
The current version of NASM (since 0.98) are maintained by H. Peter
Anvin, W{mailto:hpa@zytor.com}c{hpa@zytor.com}. If you want to report
a bug, please read k{bugs} first.
NASM has a i{WWW page} at
W{http://www.cryogen.com/Nasm}c{http://www.cryogen.com/Nasm}.
The original authors are i{e-mail}able as
W{mailto:jules@earthcorp.com}c{jules@earthcorp.com} and
W{mailto:anakin@pobox.com}c{anakin@pobox.com}.
i{New releases} of NASM are uploaded to
W{ftp://ftp.kernel.org/pub/software/devel/nasm/}ic{ftp.kernel.org},
W{ftp://sunsite.unc.edu/pub/Linux/devel/lang/assemblers/}ic{sunsite.unc.edu},
W{ftp://ftp.simtel.net/pub/simtelnet/msdos/asmutl/}ic{ftp.simtel.net}
and
W{ftp://ftp.coast.net/coast/msdos/asmutil/}ic{ftp.coast.net}.
Announcements are posted to
W{news:comp.lang.asm.x86}ic{comp.lang.asm.x86},
W{news:alt.lang.asm}ic{alt.lang.asm},
W{news:comp.os.linux.announce}ic{comp.os.linux.announce} and
W{news:comp.archives.msdos.announce}ic{comp.archives.msdos.announce}
(the last one is done automagically by uploading to
W{ftp://ftp.simtel.net/pub/simtelnet/msdos/asmutl/}c{ftp.simtel.net}).
If you don't have Usenet access, or would rather be informed by
i{e-mail} when new releases come out, you can subscribe to the
c{nasm-announce} email list by sending an email containing the line
c{subscribe nasm-announce} to
W{mailto:majordomo@linux.kernel.org}c{majordomo@linux.kernel.org}.
If you want information about NASM beta releases, please subscribe to
the c{nasm-beta} email list by sending an email containing the line
c{subscribe nasm-beta} to
W{mailto:majordomo@linux.kernel.org}c{majordomo@linux.kernel.org}.
H{install} Installation
S{instdos} i{Installing} NASM under MS-i{DOS} or Windows
Once you've obtained the i{DOS archive} for NASM, ic{nasmXXX.zip}
(where c{XXX} denotes the version number of NASM contained in the
archive), unpack it into its own directory (for example
c{c:\nasm}).
The archive will contain four executable files: the NASM executable
files ic{nasm.exe} and ic{nasmw.exe}, and the NDISASM executable
files ic{ndisasm.exe} and ic{ndisasmw.exe}. In each case, the
file whose name ends in c{w} is a i{Win32} executable, designed to
run under i{Windows 95} or i{Windows NT} Intel, and the other one
is a 16-bit i{DOS} executable.
The only file NASM needs to run is its own executable, so copy
(at least) one of c{nasm.exe} and c{nasmw.exe} to a directory on
your PATH, or alternatively edit ic{autoexec.bat} to add the
c{nasm} directory to your ic{PATH}. (If you're only installing the
Win32 version, you may wish to rename it to c{nasm.exe}.)
That's it - NASM is installed. You don't need the c{nasm} directory
to be present to run NASM (unless you've added it to your c{PATH}),
so you can delete it if you need to save space; however, you may
want to keep the documentation or test programs.
If you've downloaded the i{DOS source archive}, ic{nasmXXXs.zip},
the c{nasm} directory will also contain the full NASM i{source
code}, and a selection of i{Makefiles} you can (hopefully) use to
rebuild your copy of NASM from scratch. The file c{Readme} lists the
various Makefiles and which compilers they work with.
Note that the source files c{insnsa.c}, c{insnsd.c}, c{insnsi.h}
and c{insnsn.c} are automatically generated from the master
instruction table c{insns.dat} by a Perl script; the file
c{macros.c} is generated from c{standard.mac} by another Perl
script. Although the NASM 0.98 distribution includes these generated
files, you will need to rebuild them (and hence, will need a Perl
interpreter) if you change c{insns.dat}, c{standard.mac} or the
documentation. It is possible future source distributions may not
include these files at all. Ports of i{Perl} for a variety of
platforms, including DOS and Windows, are available from
W{http://www.cpan.org/ports/}i{www.cpan.org}.
S{instdos} Installing NASM under i{Unix}
Once you've obtained the i{Unix source archive} for NASM,
ic{nasm-X.XX.tar.gz} (where c{X.XX} denotes the version number of
NASM contained in the archive), unpack it into a directory such
as c{/usr/local/src}. The archive, when unpacked, will create its
own subdirectory c{nasm-X.XX}.
NASM is an I{Autoconf}Ic{configure}auto-configuring package: once
you've unpacked it, c{cd} to the directory it's been unpacked into
and type c{./configure}. This shell script will find the best C
compiler to use for building NASM and set up i{Makefiles}
accordingly.
Once NASM has auto-configured, you can type ic{make} to build the
c{nasm} and c{ndisasm} binaries, and then c{make install} to
install them in c{/usr/local/bin} and install the i{man pages}
ic{nasm.1} and ic{ndisasm.1} in c{/usr/local/man/man1}.
Alternatively, you can give options such as c{--prefix} to the
c{configure} script (see the file ic{INSTALL} for more details), or
install the programs yourself.
NASM also comes with a set of utilities for handling the RDOFF
custom object-file format, which are in the ic{rdoff} subdirectory
of the NASM archive. You can build these with c{make rdf} and
install them with c{make rdf_install}, if you want them.
If NASM fails to auto-configure, you may still be able to make it
compile by using the fall-back Unix makefile ic{Makefile.unx}.
Copy or rename that file to c{Makefile} and try typing c{make}.
There is also a c{Makefile.unx} file in the c{rdoff} subdirectory.
C{running} Running NASM
H{syntax} NASM i{Command-Line} Syntax
To assemble a file, you issue a command of the form
c nasm -f <format> <filename> [-o <output>]
For example,
c nasm -f elf myfile.asm
will assemble c{myfile.asm} into an ELF object file c{myfile.o}. And
c nasm -f bin myfile.asm -o myfile.com
will assemble c{myfile.asm} into a raw binary file c{myfile.com}.
To produce a listing file, with the hex codes output from NASM
displayed on the left of the original sources, use the c{-l} option
to give a listing file name, for example:
c nasm -f coff myfile.asm -l myfile.lst
To get further usage instructions from NASM, try typing
c nasm -h
This will also list the available output file formats, and what they
are.
If you use Linux but aren't sure whether your system is c{a.out} or
ELF, type
c file nasm
(in the directory in which you put the NASM binary when you
installed it). If it says something like
c nasm: ELF 32-bit LSB executable i386 (386 and up) Version 1
then your system is ELF, and you should use the option c{-f elf}
when you want NASM to produce Linux object files. If it says
c nasm: Linux/i386 demand-paged executable (QMAGIC)
or something similar, your system is c{a.out}, and you should use
c{-f aout} instead (Linux c{a.out} systems are considered obsolete,
and are rare these days.)
Like Unix compilers and assemblers, NASM is silent unless it
goes wrong: you won't see any output at all, unless it gives error
messages.
S{opt-o} The ic{-o} Option: Specifying the Output File Name
NASM will normally choose the name of your output file for you;
precisely how it does this is dependent on the object file format.
For Microsoft object file formats (ic{obj} and ic{win32}), it
will remove the c{.asm} i{extension} (or whatever extension you
like to use - NASM doesn't care) from your source file name and
substitute c{.obj}. For Unix object file formats (ic{aout},
ic{coff}, ic{elf} and ic{as86}) it will substitute c{.o}. For
ic{rdf}, it will use c{.rdf}, and for the ic{bin} format it
will simply remove the extension, so that c{myfile.asm} produces
the output file c{myfile}.
If the output file already exists, NASM will overwrite it, unless it
has the same name as the input file, in which case it will give a
warning and use ic{nasm.out} as the output file name instead.
For situations in which this behaviour is unacceptable, NASM
provides the c{-o} command-line option, which allows you to specify
your desired output file name. You invoke c{-o} by following it
with the name you wish for the output file, either with or without
an intervening space. For example:
c nasm -f bin program.asm -o program.com
c nasm -f bin driver.asm -odriver.sys
S{opt-f} The ic{-f} Option: Specifying the i{Output File Format}
If you do not supply the c{-f} option to NASM, it will choose an
output file format for you itself. In the distribution versions of
NASM, the default is always ic{bin}; if you've compiled your own
copy of NASM, you can redefine ic{OF_DEFAULT} at compile time and
choose what you want the default to be.
Like c{-o}, the intervening space between c{-f} and the output
file format is optional; so c{-f elf} and c{-felf} are both valid.
A complete list of the available output file formats can be given by
issuing the command ic{nasm -h}.
S{opt-l} The ic{-l} Option: Generating a i{Listing File}
If you supply the c{-l} option to NASM, followed (with the usual
optional space) by a file name, NASM will generate a
i{source-listing file} for you, in which addresses and generated
code are listed on the left, and the actual source code, with
expansions of multi-line macros (except those which specifically
request no expansion in source listings: see k{nolist}) on the
right. For example:
c nasm -f elf myfile.asm -l myfile.lst
S{opt-E} The ic{-E} Option: Send Errors to a File
Under MS-i{DOS} it can be difficult (though there are ways) to
redirect the standard-error output of a program to a file. Since
NASM usually produces its warning and i{error messages} on
ic{stderr}, this can make it hard to capture the errors if (for
example) you want to load them into an editor.
NASM therefore provides the c{-E} option, taking a filename argument
which causes errors to be sent to the specified files rather than
standard error. Therefore you can I{redirecting errors}redirect
the errors into a file by typing
c nasm -E myfile.err -f obj myfile.asm
S{opt-s} The ic{-s} Option: Send Errors to ic{stdout}
The c{-s} option redirects i{error messages} to c{stdout} rather
than c{stderr}, so it can be redirected under MS-i{DOS}. To
assemble the file c{myfile.asm} and pipe its output to the c{more}
program, you can type:
c nasm -s -f obj myfile.asm | more
See also the c{-E} option, k{opt-E}.
S{opt-i} The ic{-i}Ic{-I} Option: Include File Search Directories
When NASM sees the ic{%include} directive in a source file (see
k{include}), it will search for the given file not only in the
current directory, but also in any directories specified on the
command line by the use of the c{-i} option. Therefore you can
include files from a i{macro library}, for example, by typing
c nasm -ic:\macrolib\ -f obj myfile.asm
(As usual, a space between c{-i} and the path name is allowed, and
optional).
NASM, in the interests of complete source-code portability, does not
understand the file naming conventions of the OS it is running on;
the string you provide as an argument to the c{-i} option will be
prepended exactly as written to the name of the include file.
Therefore the trailing backslash in the above example is necessary.
Under Unix, a trailing forward slash is similarly necessary.
(You can use this to your advantage, if you're really i{perverse},
by noting that the option c{-ifoo} will cause c{%include "bar.i"}
to search for the file c{foobar.i}...)
If you want to define a e{standard} i{include search path},
similar to c{/usr/include} on Unix systems, you should place one or
more c{-i} directives in the c{NASM} environment variable (see
k{nasmenv}).
For Makefile compatibility with many C compilers, this option can also
be specified as c{-I}.
S{opt-p} The ic{-p}Ic{-P} Option: I{pre-including files}Pre-Include a File
Ic{%include}NASM allows you to specify files to be
e{pre-included} into your source file, by the use of the c{-p}
option. So running
c nasm myfile.asm -p myinc.inc
is equivalent to running c{nasm myfile.asm} and placing the
directive c{%include "myinc.inc"} at the start of the file.
For consistency with the c{-I}, c{-D} and c{-U} options, this
option can also be specified as c{-P}.
S{opt-d} The ic{-d}Ic{-D} Option: I{pre-defining macros} Pre-Define a Macro
Ic{%define}Just as the c{-p} option gives an alternative to placing
c{%include} directives at the start of a source file, the c{-d}
option gives an alternative to placing a c{%define} directive. You
could code
c nasm myfile.asm -dFOO=100
as an alternative to placing the directive
c %define FOO 100
at the start of the file. You can miss off the macro value, as well:
the option c{-dFOO} is equivalent to coding c{%define FOO}. This
form of the directive may be useful for selecting i{assembly-time
options} which are then tested using c{%ifdef}, for example
c{-dDEBUG}.
For Makefile compatibility with many C compilers, this option can also
be specified as c{-D}.
S{opt-u} The ic{-u}Ic{-U} Option: I{Undefining macros} Undefine a Macro
Ic{%undef}The c{-u} option undefines a macro that would otherwise
have been pre-defined, either automatically or by a c{-p} or c{-d}
option specified earlier on the command lines.
For example, the following command line:
c nasm myfile.asm -dFOO=100 -uFOO
would result in c{FOO} e{not} being a predefined macro in the
program. This is useful to override options specified at a different
point in a Makefile.
For Makefile compatibility with many C compilers, this option can also
be specified as c{-U}.
S{opt-e} The ic{-e} Option: Preprocess Only
NASM allows the i{preprocessor} to be run on its own, up to a
point. Using the c{-e} option (which requires no arguments) will
cause NASM to preprocess its input file, expand all the macro
references, remove all the comments and preprocessor directives, and
print the resulting file on standard output (or save it to a file,
if the c{-o} option is also used).
This option cannot be applied to programs which require the
preprocessor to evaluate I{preprocessor expressions}i{expressions}
which depend on the values of symbols: so code such as
c %assign tablesize ($-tablestart)
will cause an error in i{preprocess-only mode}.
S{opt-a} The ic{-a} Option: Don't Preprocess At All
If NASM is being used as the back end to a compiler, it might be
desirable to I{suppressing preprocessing}suppress preprocessing
completely and assume the compiler has already done it, to save time
and increase compilation speeds. The c{-a} option, requiring no
argument, instructs NASM to replace its powerful i{preprocessor}
with a i{stub preprocessor} which does nothing.
S{opt-w} The ic{-w} Option: Enable or Disable Assembly i{Warnings}
NASM can observe many conditions during the course of assembly which
are worth mentioning to the user, but not a sufficiently severe
error to justify NASM refusing to generate an output file. These
conditions are reported like errors, but come up with the word
`warning' before the message. Warnings do not prevent NASM from
generating an output file and returning a success status to the
operating system.
Some conditions are even less severe than that: they are only
sometimes worth mentioning to the user. Therefore NASM supports the
c{-w} command-line option, which enables or disables certain
classes of assembly warning. Such warning classes are described by a
name, for example c{orphan-labels}; you can enable warnings of
this class by the command-line option c{-w+orphan-labels} and
disable it by c{-w-orphan-labels}.
The i{suppressible warning} classes are:
b ic{macro-params} covers warnings about i{multi-line macros}
being invoked with the wrong number of parameters. This warning
class is enabled by default; see k{mlmacover} for an example of why
you might want to disable it.
b ic{orphan-labels} covers warnings about source lines which
contain no instruction but define a label without a trailing colon.
NASM does not warn about this somewhat obscure condition by default;
see k{syntax} for an example of why you might want it to.
b ic{number-overflow} covers warnings about numeric constants which
don't fit in 32 bits (for example, it's easy to type one too many Fs
and produce c{0x7ffffffff} by mistake). This warning class is
enabled by default.
S{nasmenv} The c{NASM} i{Environment} Variable
If you define an environment variable called c{NASM}, the program
will interpret it as a list of extra command-line options, which are
processed before the real command line. You can use this to define
standard search directories for include files, by putting c{-i}
options in the c{NASM} variable.
The value of the variable is split up at white space, so that the
value c{-s -ic:\nasmlib} will be treated as two separate options.
However, that means that the value c{-dNAME="my name"} won't do
what you might want, because it will be split at the space and the
NASM command-line processing will get confused by the two
nonsensical words c{-dNAME="my} and c{name"}.
To get round this, NASM provides a feature whereby, if you begin the
c{NASM} environment variable with some character that isn't a minus
sign, then NASM will treat this character as the i{separator
character} for options. So setting the c{NASM} variable to the
value c{!-s!-ic:\nasmlib} is equivalent to setting it to c{-s
-ic:\nasmlib}, but c{!-dNAME="my name"} will work.
H{qstart} i{Quick Start} for i{MASM} Users
If you're used to writing programs with MASM, or with i{TASM} in
MASM-compatible (non-Ideal) mode, or with ic{a86}, this section
attempts to outline the major differences between MASM's syntax and
NASM's. If you're not already used to MASM, it's probably worth
skipping this section.
S{qscs} NASM Is I{case sensitivity}Case-Sensitive
One simple difference is that NASM is case-sensitive. It makes a
difference whether you call your label c{foo}, c{Foo} or c{FOO}.
If you're assembling to DOS or OS/2 c{.OBJ} files, you can invoke
the ic{UPPERCASE} directive (documented in k{objfmt}) to ensure
that all symbols exported to other code modules are forced to be
upper case; but even then, e{within} a single module, NASM will
distinguish between labels differing only in case.
S{qsbrackets} NASM Requires i{Square Brackets} For i{Memory References}
NASM was designed with simplicity of syntax in mind. One of the
i{design goals} of NASM is that it should be possible, as far as is
practical, for the user to look at a single line of NASM code
and tell what opcode is generated by it. You can't do this in MASM:
if you declare, for example,
c foo equ 1
c bar dw 2
then the two lines of code
c mov ax,foo
c mov ax,bar
generate completely different opcodes, despite having
identical-looking syntaxes.
NASM avoids this undesirable situation by having a much simpler
syntax for memory references. The rule is simply that any access to
the e{contents} of a memory location requires square brackets
around the address, and any access to the e{address} of a variable
doesn't. So an instruction of the form c{mov ax,foo} will
e{always} refer to a compile-time constant, whether it's an c{EQU}
or the address of a variable; and to access the e{contents} of the
variable c{bar}, you must code c{mov ax,[bar]}.
This also means that NASM has no need for MASM's ic{OFFSET}
keyword, since the MASM code c{mov ax,offset bar} means exactly the
same thing as NASM's c{mov ax,bar}. If you're trying to get
large amounts of MASM code to assemble sensibly under NASM, you
can always code c{%idefine offset} to make the preprocessor treat
the c{OFFSET} keyword as a no-op.
This issue is even more confusing in ic{a86}, where declaring a
label with a trailing colon defines it to be a `label' as opposed to
a `variable' and causes c{a86} to adopt NASM-style semantics; so in
c{a86}, c{mov ax,var} has different behaviour depending on whether
c{var} was declared as c{var: dw 0} (a label) or c{var dw 0} (a
word-size variable). NASM is very simple by comparison:
e{everything} is a label.
NASM, in the interests of simplicity, also does not support the
i{hybrid syntaxes} supported by MASM and its clones, such as
c{mov ax,table[bx]}, where a memory reference is denoted by one
portion outside square brackets and another portion inside. The
correct syntax for the above is c{mov ax,[table+bx]}. Likewise,
c{mov ax,es:[di]} is wrong and c{mov ax,[es:di]} is right.
S{qstypes} NASM Doesn't Store i{Variable Types}
NASM, by design, chooses not to remember the types of variables you
declare. Whereas MASM will remember, on seeing c{var dw 0}, that
you declared c{var} as a word-size variable, and will then be able
to fill in the i{ambiguity} in the size of the instruction c{mov
var,2}, NASM will deliberately remember nothing about the symbol
c{var} except where it begins, and so you must explicitly code
c{mov word [var],2}.
For this reason, NASM doesn't support the c{LODS}, c{MOVS},
c{STOS}, c{SCAS}, c{CMPS}, c{INS}, or c{OUTS} instructions,
but only supports the forms such as c{LODSB}, c{MOVSW}, and
c{SCASD}, which explicitly specify the size of the components of
the strings being manipulated.
S{qsassume} NASM Doesn't ic{ASSUME}
As part of NASM's drive for simplicity, it also does not support the
c{ASSUME} directive. NASM will not keep track of what values you
choose to put in your segment registers, and will never
e{automatically} generate a i{segment override} prefix.
S{qsmodel} NASM Doesn't Support i{Memory Models}
NASM also does not have any directives to support different 16-bit
memory models. The programmer has to keep track of which functions
are supposed to be called with a i{far call} and which with a
i{near call}, and is responsible for putting the correct form of
c{RET} instruction (c{RETN} or c{RETF}; NASM accepts c{RET}
itself as an alternate form for c{RETN}); in addition, the
programmer is responsible for coding CALL FAR instructions where
necessary when calling e{external} functions, and must also keep
track of which external variable definitions are far and which are
near.
S{qsfpu} i{Floating-Point} Differences
NASM uses different names to refer to floating-point registers from
MASM: where MASM would call them c{ST(0)}, c{ST(1)} and so on, and
ic{a86} would call them simply c{0}, c{1} and so on, NASM
chooses to call them c{st0}, c{st1} etc.
As of version 0.96, NASM now treats the instructions with
i{`nowait'} forms in the same way as MASM-compatible assemblers.
The idiosyncratic treatment employed by 0.95 and earlier was based
on a misunderstanding by the authors.
S{qsother} Other Differences
For historical reasons, NASM uses the keyword ic{TWORD} where MASM
and compatible assemblers use ic{TBYTE}.
NASM does not declare i{uninitialised storage} in the same way as
MASM: where a MASM programmer might use c{stack db 64 dup (?)},
NASM requires c{stack resb 64}, intended to be read as `reserve 64
bytes'. For a limited amount of compatibility, since NASM treats
c{?} as a valid character in symbol names, you can code c{? equ 0}
and then writing c{dw ?} will at least do something vaguely useful.
Ic{RESB}ic{DUP} is still not a supported syntax, however.
In addition to all of this, macros and directives work completely
differently to MASM. See k{preproc} and k{directive} for further
details.
C{lang} The NASM Language
H{syntax} Layout of a NASM Source Line
Like most assemblers, each NASM source line contains (unless it
is a macro, a preprocessor directive or an assembler directive: see
k{preproc} and k{directive}) some combination of the four fields
c label: instruction operands ; comment
As usual, most of these fields are optional; the presence or absence
of any combination of a label, an instruction and a comment is allowed.
Of course, the operand field is either required or forbidden by the
presence and nature of the instruction field.
NASM places no restrictions on white space within a line: labels may
have white space before them, or instructions may have no space
before them, or anything. The i{colon} after a label is also
optional. (Note that this means that if you intend to code c{lodsb}
alone on a line, and type c{lodab} by accident, then that's still a
valid source line which does nothing but define a label. Running
NASM with the command-line option
I{orphan-labels}c{-w+orphan-labels} will cause it to warn you if
you define a label alone on a line without a i{trailing colon}.)
i{Valid characters} in labels are letters, numbers, c{_}, c{$},
c{#}, c{@}, c{~}, c{.}, and c{?}. The only characters which may
be used as the e{first} character of an identifier are letters,
c{.} (with special meaning: see k{locallab}), c{_} and c{?}.
An identifier may also be prefixed with a I{$prefix}c{$} to
indicate that it is intended to be read as an identifier and not a
reserved word; thus, if some other module you are linking with
defines a symbol called c{eax}, you can refer to c{$eax} in NASM
code to distinguish the symbol from the register.
The instruction field may contain any machine instruction: Pentium
and P6 instructions, FPU instructions, MMX instructions and even
undocumented instructions are all supported. The instruction may be
prefixed by c{LOCK}, c{REP}, c{REPE}/c{REPZ} or
c{REPNE}/c{REPNZ}, in the usual way. Explicit I{address-size
prefixes}address-size and i{operand-size prefixes} c{A16},
c{A32}, c{O16} and c{O32} are provided - one example of their use
is given in k{mixsize}. You can also use the name of a I{segment
override}segment register as an instruction prefix: coding
c{es mov [bx],ax} is equivalent to coding c{mov [es:bx],ax}. We
recommend the latter syntax, since it is consistent with other
syntactic features of the language, but for instructions such as
c{LODSB}, which has no operands and yet can require a segment
override, there is no clean syntactic way to proceed apart from
c{es lodsb}.
An instruction is not required to use a prefix: prefixes such as
c{CS}, c{A32}, c{LOCK} or c{REPE} can appear on a line by
themselves, and NASM will just generate the prefix bytes.
In addition to actual machine instructions, NASM also supports a
number of pseudo-instructions, described in k{pseudop}.
Instruction i{operands} may take a number of forms: they can be
registers, described simply by the register name (e.g. c{ax},
c{bp}, c{ebx}, c{cr0}: NASM does not use the c{gas}-style
syntax in which register names must be prefixed by a c{%} sign), or
they can be i{effective addresses} (see k{effaddr}), constants
(k{const}) or expressions (k{expr}).
For i{floating-point} instructions, NASM accepts a wide range of
syntaxes: you can use two-operand forms like MASM supports, or you
can use NASM's native single-operand forms in most cases. Details of
all forms of each supported instruction are given in
k{iref}. For example, you can code:
c fadd st1 ; this sets st0 := st0 + st1
c fadd st0,st1 ; so does this
c
c fadd st1,st0 ; this sets st1 := st1 + st0
c fadd to st1 ; so does this
Almost any floating-point instruction that references memory must
use one of the prefixes ic{DWORD}, ic{QWORD} or ic{TWORD} to
indicate what size of i{memory operand} it refers to.
H{pseudop} i{Pseudo-Instructions}
Pseudo-instructions are things which, though not real x86 machine
instructions, are used in the instruction field anyway because
that's the most convenient place to put them. The current
pseudo-instructions are ic{DB}, ic{DW}, ic{DD}, ic{DQ} and
ic{DT}, their i{uninitialised} counterparts ic{RESB},
ic{RESW}, ic{RESD}, ic{RESQ} and ic{REST}, the ic{INCBIN}
command, the ic{EQU} command, and the ic{TIMES} prefix.
S{db} c{DB} and friends: Declaring Initialised Data
ic{DB}, ic{DW}, ic{DD}, ic{DQ} and ic{DT} are used, much
as in MASM, to declare initialised data in the output file. They can
be invoked in a wide range of ways:
I{floating-point}I{character constant}I{string constant}
c db 0x55 ; just the byte 0x55
c db 0x55,0x56,0x57 ; three bytes in succession
c db 'a',0x55 ; character constants are OK
c db 'hello',13,10,'$' ; so are string constants
c dw 0x1234 ; 0x34 0x12
c dw 'a' ; 0x41 0x00 (it's just a number)
c dw 'ab' ; 0x41 0x42 (character constant)
c dw 'abc' ; 0x41 0x42 0x43 0x00 (string)
c dd 0x12345678 ; 0x78 0x56 0x34 0x12
c dd 1.234567e20 ; floating-point constant
c dq 1.234567e20 ; double-precision float
c dt 1.234567e20 ; extended-precision float
c{DQ} and c{DT} do not accept i{numeric constants} or string
constants as operands.
S{resb} c{RESB} and friends: Declaring i{Uninitialised} Data
ic{RESB}, ic{RESW}, ic{RESD}, ic{RESQ} and ic{REST} are
designed to be used in the BSS section of a module: they declare
e{uninitialised} storage space. Each takes a single operand, which
is the number of bytes, words, doublewords or whatever to reserve.
As stated in k{qsother}, NASM does not support the MASM/TASM syntax
of reserving uninitialised space by writing Ic{?}c{DW ?} or
similar things: this is what it does instead. The operand to a
c{RESB}-type pseudo-instruction is a ie{critical expression}: see
k{crit}.
For example:
c buffer: resb 64 ; reserve 64 bytes
c wordvar: resw 1 ; reserve a word
c realarray resq 10 ; array of ten reals
S{incbin} ic{INCBIN}: Including External i{Binary Files}
c{INCBIN} is borrowed from the old Amiga assembler i{DevPac}: it
includes a binary file verbatim into the output file. This can be
handy for (for example) including i{graphics} and i{sound} data
directly into a game executable file. It can be called in one of
these three ways:
c incbin "file.dat" ; include the whole file
c incbin "file.dat",1024 ; skip the first 1024 bytes
c incbin "file.dat",1024,512 ; skip the first 1024, and
c ; actually include at most 512
S{equ} ic{EQU}: Defining Constants
c{EQU} defines a symbol to a given constant value: when c{EQU} is
used, the source line must contain a label. The action of c{EQU} is
to define the given label name to the value of its (only) operand.
This definition is absolute, and cannot change later. So, for
example,
c message db 'hello, world'
c msglen equ $-message
defines c{msglen} to be the constant 12. c{msglen} may not then be
redefined later. This is not a i{preprocessor} definition either:
the value of c{msglen} is evaluated e{once}, using the value of
c{$} (see k{expr} for an explanation of c{$}) at the point of
definition, rather than being evaluated wherever it is referenced
and using the value of c{$} at the point of reference. Note that
the operand to an c{EQU} is also a i{critical expression}
(k{crit}).
S{times} ic{TIMES}: i{Repeating} Instructions or Data
The c{TIMES} prefix causes the instruction to be assembled multiple
times. This is partly present as NASM's equivalent of the ic{DUP}
syntax supported by i{MASM}-compatible assemblers, in that you can
code
c zerobuf: times 64 db 0
or similar things; but c{TIMES} is more versatile than that. The
argument to c{TIMES} is not just a numeric constant, but a numeric
e{expression}, so you can do things like
c buffer: db 'hello, world'
c times 64-$+buffer db ' '
which will store exactly enough spaces to make the total length of
c{buffer} up to 64. Finally, c{TIMES} can be applied to ordinary
instructions, so you can code trivial i{unrolled loops} in it:
c times 100 movsb
Note that there is no effective difference between c{times 100 resb
1} and c{resb 100}, except that the latter will be assembled about
100 times faster due to the internal structure of the assembler.
The operand to c{TIMES}, like that of c{EQU} and those of c{RESB}
and friends, is a critical expression (k{crit}).
Note also that c{TIMES} can't be applied to i{macros}: the reason
for this is that c{TIMES} is processed after the macro phase, which
allows the argument to c{TIMES} to contain expressions such as
c{64-$+buffer} as above. To repeat more than one line of code, or a
complex macro, use the preprocessor ic{%rep} directive.
H{effaddr} Effective Addresses
An i{effective address} is any operand to an instruction which
I{memory reference}references memory. Effective addresses, in NASM,
have a very simple syntax: they consist of an expression evaluating
to the desired address, enclosed in i{square brackets}. For
example:
c wordvar dw 123
c mov ax,[wordvar]
c mov ax,[wordvar+1]
c mov ax,[es:wordvar+bx]
Anything not conforming to this simple system is not a valid memory
reference in NASM, for example c{es:wordvar[bx]}.
More complicated effective addresses, such as those involving more
than one register, work in exactly the same way:
c mov eax,[ebx*2+ecx+offset]
c mov ax,[bp+di+8]
NASM is capable of doing i{algebra} on these effective addresses,
so that things which don't necessarily e{look} legal are perfectly
all right:
c mov eax,[ebx*5] ; assembles as [ebx*4+ebx]
c mov eax,[label1*2-label2] ; ie [label1+(label1-label2)]
Some forms of effective address have more than one assembled form;
in most such cases NASM will generate the smallest form it can. For
example, there are distinct assembled forms for the 32-bit effective
addresses c{[eax*2+0]} and c{[eax+eax]}, and NASM will generally
generate the latter on the grounds that the former requires four
bytes to store a zero offset.
NASM has a hinting mechanism which will cause c{[eax+ebx]} and
c{[ebx+eax]} to generate different opcodes; this is occasionally
useful because c{[esi+ebp]} and c{[ebp+esi]} have different
default segment registers.
However, you can force NASM to generate an effective address in a
particular form by the use of the keywords c{BYTE}, c{WORD},
c{DWORD} and c{NOSPLIT}. If you need c{[eax+3]} to be assembled
using a double-word offset field instead of the one byte NASM will
normally generate, you can code c{[dword eax+3]}. Similarly, you
can force NASM to use a byte offset for a small value which it
hasn't seen on the first pass (see k{crit} for an example of such a
code fragment) by using c{[byte eax+offset]}. As special cases,
c{[byte eax]} will code c{[eax+0]} with a byte offset of zero, and
c{[dword eax]} will code it with a double-word offset of zero. The
normal form, c{[eax]}, will be coded with no offset field.
Similarly, NASM will split c{[eax*2]} into c{[eax+eax]} because
that allows the offset field to be absent and space to be saved; in
fact, it will also split c{[eax*2+offset]} into
c{[eax+eax+offset]}. You can combat this behaviour by the use of
the c{NOSPLIT} keyword: c{[nosplit eax*2]} will force
c{[eax*2+0]} to be generated literally.
H{const} i{Constants}
NASM understands four different types of constant: numeric,
character, string and floating-point.
S{numconst} i{Numeric Constants}
A numeric constant is simply a number. NASM allows you to specify
numbers in a variety of number bases, in a variety of ways: you can
suffix c{H}, c{Q} and c{B} for i{hex}, i{octal} and i{binary},
or you can prefix c{0x} for hex in the style of C, or you can
prefix c{$} for hex in the style of Borland Pascal. Note, though,
that the I{$prefix}c{$} prefix does double duty as a prefix on
identifiers (see k{syntax}), so a hex number prefixed with a c{$}
sign must have a digit after the c{$} rather than a letter.
Some examples:
c mov ax,100 ; decimal
c mov ax,0a2h ; hex
c mov ax,$0a2 ; hex again: the 0 is required
c mov ax,0xa2 ; hex yet again
c mov ax,777q ; octal
c mov ax,10010011b ; binary
S{chrconst} i{Character Constants}
A character constant consists of up to four characters enclosed in
either single or double quotes. The type of quote makes no
difference to NASM, except of course that surrounding the constant
with single quotes allows double quotes to appear within it and vice
versa.
A character constant with more than one character will be arranged
with i{little-endian} order in mind: if you code
c mov eax,'abcd'
then the constant generated is not c{0x61626364}, but
c{0x64636261}, so that if you were then to store the value into
memory, it would read c{abcd} rather than c{dcba}. This is also
the sense of character constants understood by the Pentium's
ic{CPUID} instruction (see k{insCPUID}).
S{strconst} String Constants
String constants are only acceptable to some pseudo-instructions,
namely the Ic{DW}Ic{DD}Ic{DQ}Ic{DT}ic{DB} family and
ic{INCBIN}.
A string constant looks like a character constant, only longer. It
is treated as a concatenation of maximum-size character constants
for the conditions. So the following are equivalent:
c db 'hello' ; string constant
c db 'h','e','l','l','o' ; equivalent character constants
And the following are also equivalent:
c dd 'ninechars' ; doubleword string constant
c dd 'nine','char','s' ; becomes three doublewords
c db 'ninechars',0,0,0 ; and really looks like this
Note that when used as an operand to c{db}, a constant like
c{'ab'} is treated as a string constant despite being short enough
to be a character constant, because otherwise c{db 'ab'} would have
the same effect as c{db 'a'}, which would be silly. Similarly,
three-character or four-character constants are treated as strings
when they are operands to c{dw}.
S{fltconst} I{floating-point, constants}Floating-Point Constants
i{Floating-point} constants are acceptable only as arguments to
ic{DD}, ic{DQ} and ic{DT}. They are expressed in the
traditional form: digits, then a period, then optionally more
digits, then optionally an c{E} followed by an exponent. The period
is mandatory, so that NASM can distinguish between c{dd 1}, which
declares an integer constant, and c{dd 1.0} which declares a
floating-point constant.
Some examples:
c dd 1.2 ; an easy one
c dq 1.e10 ; 10,000,000,000
c dq 1.e+10 ; synonymous with 1.e10
c dq 1.e-10 ; 0.000 000 000 1
c dt 3.141592653589793238462 ; pi
NASM cannot do compile-time arithmetic on floating-point constants.
This is because NASM is designed to be portable - although it always
generates code to run on x86 processors, the assembler itself can
run on any system with an ANSI C compiler. Therefore, the assembler
cannot guarantee the presence of a floating-point unit capable of
handling the i{Intel number formats}, and so for NASM to be able to
do floating arithmetic it would have to include its own complete set
of floating-point routines, which would significantly increase the
size of the assembler for very little benefit.
H{expr} i{Expressions}
Expressions in NASM are similar in syntax to those in C.
NASM does not guarantee the size of the integers used to evaluate
expressions at compile time: since NASM can compile and run on
64-bit systems quite happily, don't assume that expressions are
evaluated in 32-bit registers and so try to make deliberate use of
i{integer overflow}. It might not always work. The only thing NASM
will guarantee is what's guaranteed by ANSI C: you always have e{at
least} 32 bits to work in.
NASM supports two special tokens in expressions, allowing
calculations to involve the current assembly position: the
I{$ here}c{$} and ic{$$} tokens. c{$} evaluates to the assembly
position at the beginning of the line containing the expression; so
you can code an i{infinite loop} using c{JMP $}. c{$$} evaluates
to the beginning of the current section; so you can tell how far
into the section you are by using c{($-$$)}.
The arithmetic i{operators} provided by NASM are listed here, in
increasing order of i{precedence}.
S{expor} ic{|}: i{Bitwise OR} Operator
The c{|} operator gives a bitwise OR, exactly as performed by the
c{OR} machine instruction. Bitwise OR is the lowest-priority
arithmetic operator supported by NASM.
S{expxor} ic{^}: i{Bitwise XOR} Operator
c{^} provides the bitwise XOR operation.
S{expand} ic{&}: i{Bitwise AND} Operator
c{&} provides the bitwise AND operation.
S{expshift} ic{<<} and ic{>>}: i{Bit Shift} Operators
c{<<} gives a bit-shift to the left, just as it does in C. So c{5<<3}
evaluates to 5 times 8, or 40. c{>>} gives a bit-shift to the
right; in NASM, such a shift is e{always} unsigned, so that
the bits shifted in from the left-hand end are filled with zero
rather than a sign-extension of the previous highest bit.
S{expplmi} I{+ opaddition}c{+} and I{- opsubtraction}c{-}:
i{Addition} and i{Subtraction} Operators
The c{+} and c{-} operators do perfectly ordinary addition and
subtraction.
S{expmul} ic{*}, ic{/}, ic{//}, ic{%} and ic{%%}:
i{Multiplication} and i{Division}
c{*} is the multiplication operator. c{/} and c{//} are both
division operators: c{/} is i{unsigned division} and c{//} is
i{signed division}. Similarly, c{%} and c{%%} provide I{unsigned
modulo}I{modulo operators}unsigned and
i{signed modulo} operators respectively.
NASM, like ANSI C, provides no guarantees about the sensible
operation of the signed modulo operator.
Since the c{%} character is used extensively by the macro
i{preprocessor}, you should ensure that both the signed and unsigned
modulo operators are followed by white space wherever they appear.
S{expmul} i{Unary Operators}: I{+ opunary}c{+}, I{- opunary}c{-},
ic{~} and ic{SEG}
The highest-priority operators in NASM's expression grammar are
those which only apply to one argument. c{-} negates its operand,
c{+} does nothing (it's provided for symmetry with c{-}), c{~}
computes the i{one's complement} of its operand, and c{SEG}
provides the i{segment address} of its operand (explained in more
detail in k{segwrt}).
H{segwrt} ic{SEG} and ic{WRT}
When writing large 16-bit programs, which must be split into
multiple i{segments}, it is often necessary to be able to refer to
the I{segment address}segment part of the address of a symbol. NASM
supports the c{SEG} operator to perform this function.
The c{SEG} operator returns the ie{preferred} segment base of a
symbol, defined as the segment base relative to which the offset of
the symbol makes sense. So the code
c mov ax,seg symbol
c mov es,ax
c mov bx,symbol
will load c{ES:BX} with a valid pointer to the symbol c{symbol}.
Things can be more complex than this: since 16-bit segments and
i{groups} may I{overlapping segments}overlap, you might occasionally
want to refer to some symbol using a different segment base from the
preferred one. NASM lets you do this, by the use of the c{WRT}
(With Reference To) keyword. So you can do things like
c mov ax,weird_seg ; weird_seg is a segment base
c mov es,ax
c mov bx,symbol wrt weird_seg
to load c{ES:BX} with a different, but functionally equivalent,
pointer to the symbol c{symbol}.
NASM supports far (inter-segment) calls and jumps by means of the
syntax c{call segment:offset}, where c{segment} and c{offset}
both represent immediate values. So to call a far procedure, you
could code either of
c call (seg procedure):procedure
c call weird_seg:(procedure wrt weird_seg)
(The parentheses are included for clarity, to show the intended
parsing of the above instructions. They are not necessary in
practice.)
NASM supports the syntax Ic{CALL FAR}c{call far procedure} as a
synonym for the first of the above usages. c{JMP} works identically
to c{CALL} in these examples.
To declare a i{far pointer} to a data item in a data segment, you
must code
c dw symbol, seg symbol
NASM supports no convenient synonym for this, though you can always
invent one using the macro processor.
H{crit} i{Critical Expressions}
A limitation of NASM is that it is a i{two-pass assembler}; unlike
TASM and others, it will always do exactly two I{passes}i{assembly
passes}. Therefore it is unable to cope with source files that are
complex enough to require three or more passes.
The first pass is used to determine the size of all the assembled
code and data, so that the second pass, when generating all the
code, knows all the symbol addresses the code refers to. So one
thing NASM can't handle is code whose size depends on the value of a
symbol declared after the code in question. For example,
c times (label-$) db 0
c label: db 'Where am I?'
The argument to ic{TIMES} in this case could equally legally
evaluate to anything at all; NASM will reject this example because
it cannot tell the size of the c{TIMES} line when it first sees it.
It will just as firmly reject the slightly I{paradox}paradoxical
code
c times (label-$+1) db 0
c label: db 'NOW where am I?'
in which e{any} value for the c{TIMES} argument is by definition
wrong!
NASM rejects these examples by means of a concept called a
e{critical expression}, which is defined to be an expression whose
value is required to be computable in the first pass, and which must
therefore depend only on symbols defined before it. The argument to
the c{TIMES} prefix is a critical expression; for the same reason,
the arguments to the ic{RESB} family of pseudo-instructions are
also critical expressions.
Critical expressions can crop up in other contexts as well: consider
the following code.
c mov ax,symbol1
c symbol1 equ symbol2
c symbol2:
On the first pass, NASM cannot determine the value of c{symbol1},
because c{symbol1} is defined to be equal to c{symbol2} which NASM
hasn't seen yet. On the second pass, therefore, when it encounters
the line c{mov ax,symbol1}, it is unable to generate the code for
it because it still doesn't know the value of c{symbol1}. On the
next line, it would see the ic{EQU} again and be able to determine
the value of c{symbol1}, but by then it would be too late.
NASM avoids this problem by defining the right-hand side of an
c{EQU} statement to be a critical expression, so the definition of
c{symbol1} would be rejected in the first pass.
There is a related issue involving i{forward references}: consider
this code fragment.
c mov eax,[ebx+offset]
c offset equ 10
NASM, on pass one, must calculate the size of the instruction c{mov
eax,[ebx+offset]} without knowing the value of c{offset}. It has no
way of knowing that c{offset} is small enough to fit into a
one-byte offset field and that it could therefore get away with
generating a shorter form of the i{effective-address} encoding; for
all it knows, in pass one, c{offset} could be a symbol in the code
segment, and it might need the full four-byte form. So it is forced
to compute the size of the instruction to accommodate a four-byte
address part. In pass two, having made this decision, it is now
forced to honour it and keep the instruction large, so the code
generated in this case is not as small as it could have been. This
problem can be solved by defining c{offset} before using it, or by
forcing byte size in the effective address by coding c{[byte
ebx+offset]}.
H{locallab} i{Local Labels}
NASM gives special treatment to symbols beginning with a i{period}.
A label beginning with a single period is treated as a e{local}
label, which means that it is associated with the previous non-local
label. So, for example:
c label1 ; some code
c .loop ; some more code
c jne .loop
c ret
c label2 ; some code
c .loop ; some more code
c jne .loop
c ret
In the above code fragment, each c{JNE} instruction jumps to the
line immediately before it, because the two definitions of c{.loop}
are kept separate by virtue of each being associated with the
previous non-local label.
This form of local label handling is borrowed from the old Amiga
assembler i{DevPac}; however, NASM goes one step further, in
allowing access to local labels from other parts of the code. This
is achieved by means of e{defining} a local label in terms of the
previous non-local label: the first definition of c{.loop} above is
really defining a symbol called c{label1.loop}, and the second
defines a symbol called c{label2.loop}. So, if you really needed
to, you could write
c label3 ; some more code
c ; and some more
c jmp label1.loop
Sometimes it is useful - in a macro, for instance - to be able to
define a label which can be referenced from anywhere but which
doesn't interfere with the normal local-label mechanism. Such a
label can't be non-local because it would interfere with subsequent
definitions of, and references to, local labels; and it can't be
local because the macro that defined it wouldn't know the label's
full name. NASM therefore introduces a third type of label, which is
probably only useful in macro definitions: if a label begins with
the I{label prefix}special prefix ic{..@}, then it does nothing
to the local label mechanism. So you could code
c label1: ; a non-local label
c .local: ; this is really label1.local
c ..@foo: ; this is a special symbol
c label2: ; another non-local label
c .local: ; this is really label2.local
c jmp ..@foo ; this will jump three lines up
NASM has the capacity to define other special symbols beginning with
a double period: for example, c{..start} is used to specify the
entry point in the c{obj} output format (see k{dotdotstart}).
C{preproc} The NASM i{Preprocessor}
NASM contains a powerful i{macro processor}, which supports
conditional assembly, multi-level file inclusion, two forms of macro
(single-line and multi-line), and a `context stack' mechanism for
extra macro power. Preprocessor directives all begin with a c{%}
sign.
H{slmacro} i{Single-Line Macros}
S{define} The Normal Way: Ic{%idefine}ic{%define}
Single-line macros are defined using the c{%define} preprocessor
directive. The definitions work in a similar way to C; so you can do
things like
c %define ctrl 0x1F &
c %define param(a,b) ((a)+(a)*(b))
c mov byte [param(2,ebx)], ctrl 'D'
which will expand to
c mov byte [(2)+(2)*(ebx)], 0x1F & 'D'
When the expansion of a single-line macro contains tokens which
invoke another macro, the expansion is performed at invocation time,
not at definition time. Thus the code
c %define a(x) 1+b(x)
c %define b(x) 2*x
c mov ax,a(8)
will evaluate in the expected way to c{mov ax,1+2*8}, even though
the macro c{b} wasn't defined at the time of definition of c{a}.
Macros defined with c{%define} are i{case sensitive}: after
c{%define foo bar}, only c{foo} will expand to c{bar}: c{Foo} or
c{FOO} will not. By using c{%idefine} instead of c{%define} (the
`i' stands for `insensitive') you can define all the case variants
of a macro at once, so that c{%idefine foo bar} would cause
c{foo}, c{Foo}, c{FOO}, c{fOO} and so on all to expand to
c{bar}.
There is a mechanism which detects when a macro call has occurred as
a result of a previous expansion of the same macro, to guard against
i{circular references} and infinite loops. If this happens, the
preprocessor will only expand the first occurrence of the macro.
Hence, if you code
c %define a(x) 1+a(x)
c mov ax,a(3)
the macro c{a(3)} will expand once, becoming c{1+a(3)}, and will
then expand no further. This behaviour can be useful: see k{32c}
for an example of its use.
You can I{overloading, single-line macros}overload single-line
macros: if you write
c %define foo(x) 1+x
c %define foo(x,y) 1+x*y
the preprocessor will be able to handle both types of macro call,
by counting the parameters you pass; so c{foo(3)} will become
c{1+3} whereas c{foo(ebx,2)} will become c{1+ebx*2}. However, if
you define
c %define foo bar
then no other definition of c{foo} will be accepted: a macro with
no parameters prohibits the definition of the same name as a macro
e{with} parameters, and vice versa.
This doesn't prevent single-line macros being e{redefined}: you can
perfectly well define a macro with
c %define foo bar
and then re-define it later in the same source file with
c %define foo baz
Then everywhere the macro c{foo} is invoked, it will be expanded
according to the most recent definition. This is particularly useful
when defining single-line macros with c{%assign} (see k{assign}).
You can i{pre-define} single-line macros using the `-d' option on
the NASM command line: see k{opt-d}.
S{undef} Undefining macros: ic{%undef}
Single-line macros can be removed with the c{%undef} command. For
example, the following sequence:
c %define foo bar
c %undef foo
c mov eax, foo
will expand to the instruction c{mov eax, foo}, since after
c{%undef} the macro c{foo} is no longer defined.
Macros that would otherwise be pre-defined can be undefined on the
command-line using the `-u' option on the NASM command line: see
k{opt-u}.
S{assign} i{Preprocessor Variables}: ic{%assign}
An alternative way to define single-line macros is by means of the
c{%assign} command (and its i{case sensitive}case-insensitive
counterpart ic{%iassign}, which differs from c{%assign} in
exactly the same way that c{%idefine} differs from c{%define}).
c{%assign} is used to define single-line macros which take no
parameters and have a numeric value. This value can be specified in
the form of an expression, and it will be evaluated once, when the
c{%assign} directive is processed.
Like c{%define}, macros defined using c{%assign} can be re-defined
later, so you can do things like
c %assign i i+1
to increment the numeric value of a macro.
c{%assign} is useful for controlling the termination of c{%rep}
preprocessor loops: see k{rep} for an example of this. Another
use for c{%assign} is given in k{16c} and k{32c}.
The expression passed to c{%assign} is a i{critical expression}
(see k{crit}), and must also evaluate to a pure number (rather than
a relocatable reference such as a code or data address, or anything
involving a register).
H{mlmacro} i{Multi-Line Macros}: Ic{%imacro}ic{%macro}
Multi-line macros are much more like the type of macro seen in MASM
and TASM: a multi-line macro definition in NASM looks something like
this.
c %macro prologue 1
c push ebp
c mov ebp,esp
c sub esp,%1
c %endmacro
This defines a C-like function prologue as a macro: so you would
invoke the macro with a call such as
c myfunc: prologue 12
which would expand to the three lines of code
c myfunc: push ebp
c mov ebp,esp
c sub esp,12
The number c{1} after the macro name in the c{%macro} line defines
the number of parameters the macro c{prologue} expects to receive.
The use of c{%1} inside the macro definition refers to the first
parameter to the macro call. With a macro taking more than one
parameter, subsequent parameters would be referred to as c{%2},
c{%3} and so on.
Multi-line macros, like single-line macros, are i{case-sensitive},
unless you define them using the alternative directive c{%imacro}.
If you need to pass a comma as e{part} of a parameter to a
multi-line macro, you can do that by enclosing the entire parameter
in I{braces, around macro parameters}braces. So you could code
things like
c %macro silly 2
c %2: db %1
c %endmacro
c silly 'a', letter_a ; letter_a: db 'a'
c silly 'ab', string_ab ; string_ab: db 'ab'
c silly {13,10}, crlf ; crlf: db 13,10
S{mlmacover} i{Overloading Multi-Line Macros}
As with single-line macros, multi-line macros can be overloaded by
defining the same macro name several times with different numbers of
parameters. This time, no exception is made for macros with no
parameters at all. So you could define
c %macro prologue 0
c push ebp
c mov ebp,esp
c %endmacro
to define an alternative form of the function prologue which
allocates no local stack space.
Sometimes, however, you might want to `overload' a machine
instruction; for example, you might want to define
c %macro push 2
c push %1
c push %2
c %endmacro
so that you could code
c push ebx ; this line is not a macro call
c push eax,ecx ; but this one is
Ordinarily, NASM will give a warning for the first of the above two
lines, since c{push} is now defined to be a macro, and is being
invoked with a number of parameters for which no definition has been
given. The correct code will still be generated, but the assembler
will give a warning. This warning can be disabled by the use of the
c{-w-macro-params} command-line option (see k{opt-w}).
S{maclocal} i{Macro-Local Labels}
NASM allows you to define labels within a multi-line macro
definition in such a way as to make them local to the macro call: so
calling the same macro multiple times will use a different label
each time. You do this by prefixing ic{%%} to the label name. So
you can invent an instruction which executes a c{RET} if the c{Z}
flag is set by doing this:
c %macro retz 0
c jnz %%skip
c ret
c %%skip:
c %endmacro
You can call this macro as many times as you want, and every time
you call it NASM will make up a different `real' name to substitute
for the label c{%%skip}. The names NASM invents are of the form
c{..@2345.skip}, where the number 2345 changes with every macro
call. The ic{..@} prefix prevents macro-local labels from
interfering with the local label mechanism, as described in
k{locallab}. You should avoid defining your own labels in this form
(the c{..@} prefix, then a number, then another period) in case
they interfere with macro-local labels.
S{mlmacgre} i{Greedy Macro Parameters}
Occasionally it is useful to define a macro which lumps its entire
command line into one parameter definition, possibly after
extracting one or two smaller parameters from the front. An example
might be a macro to write a text string to a file in MS-DOS, where
you might want to be able to write
c writefile [filehandle],"hello, world",13,10
NASM allows you to define the last parameter of a macro to be
e{greedy}, meaning that if you invoke the macro with more
parameters than it expects, all the spare parameters get lumped into
the last defined one along with the separating commas. So if you
code:
c %macro writefile 2+
c jmp %%endstr
c %%str: db %2
c %%endstr: mov dx,%%str
c mov cx,%%endstr-%%str
c mov bx,%1
c mov ah,0x40
c int 0x21
c %endmacro
then the example call to c{writefile} above will work as expected:
the text before the first comma, c{[filehandle]}, is used as the
first macro parameter and expanded when c{%1} is referred to, and
all the subsequent text is lumped into c{%2} and placed after the
c{db}.
The greedy nature of the macro is indicated to NASM by the use of
the I{+ modifier}c{+} sign after the parameter count on the
c{%macro} line.
If you define a greedy macro, you are effectively telling NASM how
it should expand the macro given e{any} number of parameters from
the actual number specified up to infinity; in this case, for
example, NASM now knows what to do when it sees a call to
c{writefile} with 2, 3, 4 or more parameters. NASM will take this
into account when overloading macros, and will not allow you to
define another form of c{writefile} taking 4 parameters (for
example).
Of course, the above macro could have been implemented as a
non-greedy macro, in which case the call to it would have had to
look like
c writefile [filehandle], {"hello, world",13,10}
NASM provides both mechanisms for putting i{commas in macro
parameters}, and you choose which one you prefer for each macro
definition.
See k{sectmac} for a better way to write the above macro.
S{mlmacdef} i{Default Macro Parameters}
NASM also allows you to define a multi-line macro with a e{range}
of allowable parameter counts. If you do this, you can specify
defaults for i{omitted parameters}. So, for example:
c %macro die 0-1 "Painful program death has occurred."
c writefile 2,%1
c mov ax,0x4c01
c int 0x21
c %endmacro
This macro (which makes use of the c{writefile} macro defined in
k{mlmacgre}) can be called with an explicit error message, which it
will display on the error output stream before exiting, or it can be
called with no parameters, in which case it will use the default
error message supplied in the macro definition.
In general, you supply a minimum and maximum number of parameters
for a macro of this type; the minimum number of parameters are then
required in the macro call, and then you provide defaults for the
optional ones. So if a macro definition began with the line
c %macro foobar 1-3 eax,[ebx+2]
then it could be called with between one and three parameters, and
c{%1} would always be taken from the macro call. c{%2}, if not
specified by the macro call, would default to c{eax}, and c{%3} if
not specified would default to c{[ebx+2]}.
You may omit parameter defaults from the macro definition, in which
case the parameter default is taken to be blank. This can be useful
for macros which can take a variable number of parameters, since the
ic{%0} token (see k{percent0}) allows you to determine how many
parameters were really passed to the macro call.
This defaulting mechanism can be combined with the greedy-parameter
mechanism; so the c{die} macro above could be made more powerful,
and more useful, by changing the first line of the definition to
c %macro die 0-1+ "Painful program death has occurred.",13,10
The maximum parameter count can be infinite, denoted by c{*}. In
this case, of course, it is impossible to provide a e{full} set of
default parameters. Examples of this usage are shown in k{rotate}.
S{percent0} ic{%0}: I{counting macro parameters}Macro Parameter Counter
For a macro which can take a variable number of parameters, the
parameter reference c{%0} will return a numeric constant giving the
number of parameters passed to the macro. This can be used as an
argument to c{%rep} (see k{rep}) in order to iterate through all
the parameters of a macro. Examples are given in k{rotate}.
S{rotate} ic{%rotate}: i{Rotating Macro Parameters}
Unix shell programmers will be familiar with the I{shift
command}c{shift} shell command, which allows the arguments passed
to a shell script (referenced as c{$1}, c{$2} and so on) to be
moved left by one place, so that the argument previously referenced
as c{$2} becomes available as c{$1}, and the argument previously
referenced as c{$1} is no longer available at all.
NASM provides a similar mechanism, in the form of c{%rotate}. As
its name suggests, it differs from the Unix c{shift} in that no
parameters are lost: parameters rotated off the left end of the
argument list reappear on the right, and vice versa.
c{%rotate} is invoked with a single numeric argument (which may be
an expression). The macro parameters are rotated to the left by that
many places. If the argument to c{%rotate} is negative, the macro
parameters are rotated to the right.
I{iterating over macro parameters}So a pair of macros to save and
restore a set of registers might work as follows:
c %macro multipush 1-*
c %rep %0
c push %1
c %rotate 1
c %endrep
c %endmacro
This macro invokes the c{PUSH} instruction on each of its arguments
in turn, from left to right. It begins by pushing its first
argument, c{%1}, then invokes c{%rotate} to move all the arguments
one place to the left, so that the original second argument is now
available as c{%1}. Repeating this procedure as many times as there
were arguments (achieved by supplying c{%0} as the argument to
c{%rep}) causes each argument in turn to be pushed.
Note also the use of c{*} as the maximum parameter count,
indicating that there is no upper limit on the number of parameters
you may supply to the ic{multipush} macro.
It would be convenient, when using this macro, to have a c{POP}
equivalent, which e{didn't} require the arguments to be given in
reverse order. Ideally, you would write the c{multipush} macro
call, then cut-and-paste the line to where the pop needed to be
done, and change the name of the called macro to c{multipop}, and
the macro would take care of popping the registers in the opposite
order from the one in which they were pushed.
This can be done by the following definition:
c %macro multipop 1-*
c %rep %0
c %rotate -1
c pop %1
c %endrep
c %endmacro
This macro begins by rotating its arguments one place to the
e{right}, so that the original e{last} argument appears as c{%1}.
This is then popped, and the arguments are rotated right again, so
the second-to-last argument becomes c{%1}. Thus the arguments are
iterated through in reverse order.
S{concat} i{Concatenating Macro Parameters}
NASM can concatenate macro parameters on to other text surrounding
them. This allows you to declare a family of symbols, for example,
in a macro definition. If, for example, you wanted to generate a
table of key codes along with offsets into the table, you could code
something like
c %macro keytab_entry 2
c keypos%1 equ $-keytab
c db %2
c %endmacro
c keytab:
c keytab_entry F1,128+1
c keytab_entry F2,128+2
c keytab_entry Return,13
which would expand to
c keytab:
c keyposF1 equ $-keytab
c db 128+1
c keyposF2 equ $-keytab
c db 128+2
c keyposReturn equ $-keytab
c db 13
You can just as easily concatenate text on to the other end of a
macro parameter, by writing c{%1foo}.
If you need to append a e{digit} to a macro parameter, for example
defining labels c{foo1} and c{foo2} when passed the parameter
c{foo}, you can't code c{%11} because that would be taken as the
eleventh macro parameter. Instead, you must code
I{braces, after % sign}c{%{1}1}, which will separate the first
c{1} (giving the number of the macro parameter) from the second
(literal text to be concatenated to the parameter).
This concatenation can also be applied to other preprocessor in-line
objects, such as macro-local labels (k{maclocal}) and context-local
labels (k{ctxlocal}). In all cases, ambiguities in syntax can be
resolved by enclosing everything after the c{%} sign and before the
literal text in braces: so c{%{%foo}bar} concatenates the text
c{bar} to the end of the real name of the macro-local label
c{%%foo}. (This is unnecessary, since the form NASM uses for the
real names of macro-local labels means that the two usages
c{%{%foo}bar} and c{%%foobar} would both expand to the same
thing anyway; nevertheless, the capability is there.)
S{mlmaccc} i{Condition Codes as Macro Parameters}
NASM can give special treatment to a macro parameter which contains
a condition code. For a start, you can refer to the macro parameter
c{%1} by means of the alternative syntax ic{%+1}, which informs
NASM that this macro parameter is supposed to contain a condition
code, and will cause the preprocessor to report an error message if
the macro is called with a parameter which is e{not} a valid
condition code.
Far more usefully, though, you can refer to the macro parameter by
means of ic{%-1}, which NASM will expand as the e{inverse}
condition code. So the c{retz} macro defined in k{maclocal} can be
replaced by a general i{conditional-return macro} like this:
c %macro retc 1
c j%-1 %%skip
c ret
c %%skip:
c %endmacro
This macro can now be invoked using calls like c{retc ne}, which
will cause the conditional-jump instruction in the macro expansion
to come out as c{JE}, or c{retc po} which will make the jump a
c{JPE}.
The c{%+1} macro-parameter reference is quite happy to interpret
the arguments c{CXZ} and c{ECXZ} as valid condition codes;
however, c{%-1} will report an error if passed either of these,
because no inverse condition code exists.
S{nolist} i{Disabling Listing Expansion}Ic{.nolist}
When NASM is generating a listing file from your program, it will
generally expand multi-line macros by means of writing the macro
call and then listing each line of the expansion. This allows you to
see which instructions in the macro expansion are generating what
code; however, for some macros this clutters the listing up
unnecessarily.
NASM therefore provides the c{.nolist} qualifier, which you can
include in a macro definition to inhibit the expansion of the macro
in the listing file. The c{.nolist} qualifier comes directly after
the number of parameters, like this:
c %macro foo 1.nolist
Or like this:
c %macro bar 1-5+.nolist a,b,c,d,e,f,g,h
H{condasm} i{Conditional Assembly}Ic{%if}
Similarly to the C preprocessor, NASM allows sections of a source
file to be assembled only if certain conditions are met. The general
syntax of this feature looks like this:
c %if<condition>
c ; some code which only appears if <condition> is met
c %elif<condition2>
c ; only appears if <condition> is not met but <condition2> is
c %else
c ; this appears if neither <condition> nor <condition2> was met
c %endif
The ic{%else} clause is optional, as is the ic{%elif} clause.
You can have more than one c{%elif} clause as well.
S{ifdef} ic{%ifdef}: i{Testing Single-Line Macro Existence}
Beginning a conditional-assembly block with the line c{%ifdef
MACRO} will assemble the subsequent code if, and only if, a
single-line macro called c{MACRO} is defined. If not, then the
c{%elif} and c{%else} blocks (if any) will be processed instead.
For example, when debugging a program, you might want to write code
such as
c ; perform some function
c %ifdef DEBUG
c writefile 2,"Function performed successfully",13,10
c %endif
c ; go and do something else
Then you could use the command-line option c{-dDEBUG} to create a
version of the program which produced debugging messages, and remove
the option to generate the final release version of the program.
You can test for a macro e{not} being defined by using
ic{%ifndef} instead of c{%ifdef}. You can also test for macro
definitions in c{%elif} blocks by using ic{%elifdef} and
ic{%elifndef}.
S{ifctx} ic{%ifctx}: i{Testing the Context Stack}
The conditional-assembly construct c{%ifctx ctxname} will cause the
subsequent code to be assembled if and only if the top context on
the preprocessor's context stack has the name c{ctxname}. As with
c{%ifdef}, the inverse and c{%elif} forms ic{%ifnctx},
ic{%elifctx} and ic{%elifnctx} are also supported.
For more details of the context stack, see k{ctxstack}. For a
sample use of c{%ifctx}, see k{blockif}.
S{if} ic{%if}: i{Testing Arbitrary Numeric Expressions}
The conditional-assembly construct c{%if expr} will cause the
subsequent code to be assembled if and only if the value of the
numeric expression c{expr} is non-zero. An example of the use of
this feature is in deciding when to break out of a c{%rep}
preprocessor loop: see k{rep} for a detailed example.
The expression given to c{%if}, and its counterpart ic{%elif}, is
a critical expression (see k{crit}).
c{%if} extends the normal NASM expression syntax, by providing a
set of i{relational operators} which are not normally available in
expressions. The operators ic{=}, ic{<}, ic{>}, ic{<=},
ic{>=} and ic{<>} test equality, less-than, greater-than,
less-or-equal, greater-or-equal and not-equal respectively. The
C-like forms ic{==} and ic{!=} are supported as alternative
forms of c{=} and c{<>}. In addition, low-priority logical
operators ic{&&}, ic{^^} and ic{||} are provided, supplying
i{logical AND}, i{logical XOR} and i{logical OR}. These work like
the C logical operators (although C has no logical XOR), in that
they always return either 0 or 1, and treat any non-zero input as 1
(so that c{^^}, for example, returns 1 if exactly one of its inputs
is zero, and 0 otherwise). The relational operators also return 1
for true and 0 for false.
S{ifidn} ic{%ifidn} and ic{%ifidni}: i{Testing Exact Text
Identity}
The construct c{%ifidn text1,text2} will cause the subsequent code
to be assembled if and only if c{text1} and c{text2}, after
expanding single-line macros, are identical pieces of text.
Differences in white space are not counted.
c{%ifidni} is similar to c{%ifidn}, but is i{case-insensitive}.
For example, the following macro pushes a register or number on the
stack, and allows you to treat c{IP} as a real register:
c %macro pushparam 1
c %ifidni %1,ip
c call %%label
c %%label:
c %else
c push %1
c %endif
c %endmacro
Like most other c{%if} constructs, c{%ifidn} has a counterpart
ic{%elifidn}, and negative forms ic{%ifnidn} and ic{%elifnidn}.
Similarly, c{%ifidni} has counterparts ic{%elifidni},
ic{%ifnidni} and ic{%elifnidni}.
S{iftyp} ic{%ifid}, ic{%ifnum}, ic{%ifstr}: i{Testing Token
Types}
Some macros will want to perform different tasks depending on
whether they are passed a number, a string, or an identifier. For
example, a string output macro might want to be able to cope with
being passed either a string constant or a pointer to an existing
string.
The conditional assembly construct c{%ifid}, taking one parameter
(which may be blank), assembles the subsequent code if and only if
the first token in the parameter exists and is an identifier.
c{%ifnum} works similarly, but tests for the token being a numeric
constant; c{%ifstr} tests for it being a string.
For example, the c{writefile} macro defined in k{mlmacgre} can be
extended to take advantage of c{%ifstr} in the following fashion:
c %macro writefile 2-3+
c %ifstr %2
c jmp %%endstr
c %if %0 = 3
c %%str: db %2,%3
c %else
c %%str: db %2
c %endif
c %%endstr: mov dx,%%str
c mov cx,%%endstr-%%str
c %else
c mov dx,%2
c mov cx,%3
c %endif
c mov bx,%1
c mov ah,0x40
c int 0x21
c %endmacro
Then the c{writefile} macro can cope with being called in either of
the following two ways:
c writefile [file], strpointer, length
c writefile [file], "hello", 13, 10
In the first, c{strpointer} is used as the address of an
already-declared string, and c{length} is used as its length; in
the second, a string is given to the macro, which therefore declares
it itself and works out the address and length for itself.
Note the use of c{%if} inside the c{%ifstr}: this is to detect
whether the macro was passed two arguments (so the string would be a
single string constant, and c{db %2} would be adequate) or more (in
which case, all but the first two would be lumped together into
c{%3}, and c{db %2,%3} would be required).
Ic{%ifnid}Ic{%elifid}Ic{%elifnid}Ic{%ifnnum}Ic{%elifnum}Ic{%elifnnum}Ic{%ifnstr}Ic{%elifstr}Ic{%elifnstr}
The usual c{%elifXXX}, c{%ifnXXX} and c{%elifnXXX} versions exist
for each of c{%ifid}, c{%ifnum} and c{%ifstr}.
S{pperror} ic{%error}: Reporting i{User-Defined Errors}
The preprocessor directive c{%error} will cause NASM to report an
error if it occurs in assembled code. So if other users are going to
try to assemble your source files, you can ensure that they define
the right macros by means of code like this:
c %ifdef SOME_MACRO
c ; do some setup
c %elifdef SOME_OTHER_MACRO
c ; do some different setup
c %else
c %error Neither SOME_MACRO nor SOME_OTHER_MACRO was defined.
c %endif
Then any user who fails to understand the way your code is supposed
to be assembled will be quickly warned of their mistake, rather than
having to wait until the program crashes on being run and then not
knowing what went wrong.
H{rep} i{Preprocessor Loops}I{repeating code}: ic{%rep}
NASM's c{TIMES} prefix, though useful, cannot be used to invoke a
multi-line macro multiple times, because it is processed by NASM
after macros have already been expanded. Therefore NASM provides
another form of loop, this time at the preprocessor level: c{%rep}.
The directives c{%rep} and ic{%endrep} (c{%rep} takes a numeric
argument, which can be an expression; c{%endrep} takes no
arguments) can be used to enclose a chunk of code, which is then
replicated as many times as specified by the preprocessor:
c %assign i 0
c %rep 64
c inc word [table+2*i]
c %assign i i+1
c %endrep
This will generate a sequence of 64 c{INC} instructions,
incrementing every word of memory from c{[table]} to
c{[table+126]}.
For more complex termination conditions, or to break out of a repeat
loop part way along, you can use the ic{%exitrep} directive to
terminate the loop, like this:
c fibonacci:
c %assign i 0
c %assign j 1
c %rep 100
c %if j > 65535
c %exitrep
c %endif
c dw j
c %assign k j+i
c %assign i j
c %assign j k
c %endrep
c fib_number equ ($-fibonacci)/2
This produces a list of all the Fibonacci numbers that will fit in
16 bits. Note that a maximum repeat count must still be given to
c{%rep}. This is to prevent the possibility of NASM getting into an
infinite loop in the preprocessor, which (on multitasking or
multi-user systems) would typically cause all the system memory to
be gradually used up and other applications to start crashing.
H{include} i{Including Other Files}
Using, once again, a very similar syntax to the C preprocessor,
NASM's preprocessor lets you include other source files into your
code. This is done by the use of the ic{%include} directive:
c %include "macros.mac"
will include the contents of the file c{macros.mac} into the source
file containing the c{%include} directive.
Include files are I{searching for include files}searched for in the
current directory (the directory you're in when you run NASM, as
opposed to the location of the NASM executable or the location of
the source file), plus any directories specified on the NASM command
line using the c{-i} option.
The standard C idiom for preventing a file being included more than
once is just as applicable in NASM: if the file c{macros.mac} has
the form
c %ifndef MACROS_MAC
c %define MACROS_MAC
c ; now define some macros
c %endif
then including the file more than once will not cause errors,
because the second time the file is included nothing will happen
because the macro c{MACROS_MAC} will already be defined.
You can force a file to be included even if there is no c{%include}
directive that explicitly includes it, by using the ic{-p} option
on the NASM command line (see k{opt-p}).
H{ctxstack} The i{Context Stack}
Having labels that are local to a macro definition is sometimes not
quite powerful enough: sometimes you want to be able to share labels
between several macro calls. An example might be a c{REPEAT} ...
c{UNTIL} loop, in which the expansion of the c{REPEAT} macro
would need to be able to refer to a label which the c{UNTIL} macro
had defined. However, for such a macro you would also want to be
able to nest these loops.
NASM provides this level of power by means of a e{context stack}.
The preprocessor maintains a stack of e{contexts}, each of which is
characterised by a name. You add a new context to the stack using
the ic{%push} directive, and remove one using ic{%pop}. You can
define labels that are local to a particular context on the stack.
S{pushpop} ic{%push} and ic{%pop}: I{creating
contexts}I{removing contexts}Creating and Removing Contexts
The c{%push} directive is used to create a new context and place it
on the top of the context stack. c{%push} requires one argument,
which is the name of the context. For example:
c %push foobar
This pushes a new context called c{foobar} on the stack. You can
have several contexts on the stack with the same name: they can
still be distinguished.
The directive c{%pop}, requiring no arguments, removes the top
context from the context stack and destroys it, along with any
labels associated with it.
S{ctxlocal} i{Context-Local Labels}
Just as the usage c{%%foo} defines a label which is local to the
particular macro call in which it is used, the usage I{%$}c{%$foo}
is used to define a label which is local to the context on the top
of the context stack. So the c{REPEAT} and c{UNTIL} example given
above could be implemented by means of:
c %macro repeat 0
c %push repeat
c %$begin:
c %endmacro
c %macro until 1
c j%-1 %$begin
c %pop
c %endmacro
and invoked by means of, for example,
c mov cx,string
c repeat
c add cx,3
c scasb
c until e
which would scan every fourth byte of a string in search of the byte
in c{AL}.
If you need to define, or access, labels local to the context
e{below} the top one on the stack, you can use I{%$$}c{%$$foo}, or
c{%$$$foo} for the context below that, and so on.
S{ctxdefine} i{Context-Local Single-Line Macros}
NASM also allows you to define single-line macros which are local to
a particular context, in just the same way:
c %define %$localmac 3
will define the single-line macro c{%$localmac} to be local to the
top context on the stack. Of course, after a subsequent c{%push},
it can then still be accessed by the name c{%$$localmac}.
S{ctxrepl} ic{%repl}: I{renaming contexts}Renaming a Context
If you need to change the name of the top context on the stack (in
order, for example, to have it respond differently to c{%ifctx}),
you can execute a c{%pop} followed by a c{%push}; but this will
have the side effect of destroying all context-local labels and
macros associated with the context that was just popped.
NASM provides the directive c{%repl}, which e{replaces} a context
with a different name, without touching the associated macros and
labels. So you could replace the destructive code
c %pop
c %push newname
with the non-destructive version c{%repl newname}.
S{blockif} Example Use of the i{Context Stack}: i{Block IFs}
This example makes use of almost all the context-stack features,
including the conditional-assembly construct ic{%ifctx}, to
implement a block IF statement as a set of macros.
c %macro if 1
c %push if
c j%-1 %$ifnot
c %endmacro
c %macro else 0
c %ifctx if
c %repl else
c jmp %$ifend
c %$ifnot:
c %else
c %error "expected `if' before `else'"
c %endif
c %endmacro
c %macro endif 0
c %ifctx if
c %$ifnot:
c %pop
c %elifctx else
c %$ifend:
c %pop
c %else
c %error "expected `if' or `else' before `endif'"
c %endif
c %endmacro
This code is more robust than the c{REPEAT} and c{UNTIL} macros
given in k{ctxlocal}, because it uses conditional assembly to check
that the macros are issued in the right order (for example, not
calling c{endif} before c{if}) and issues a c{%error} if they're
not.
In addition, the c{endif} macro has to be able to cope with the two
distinct cases of either directly following an c{if}, or following
an c{else}. It achieves this, again, by using conditional assembly
to do different things depending on whether the context on top of
the stack is c{if} or c{else}.
The c{else} macro has to preserve the context on the stack, in
order to have the c{%$ifnot} referred to by the c{if} macro be the
same as the one defined by the c{endif} macro, but has to change
the context's name so that c{endif} will know there was an
intervening c{else}. It does this by the use of c{%repl}.
A sample usage of these macros might look like:
c cmp ax,bx
c if ae
c cmp bx,cx
c if ae
c mov ax,cx
c else
c mov ax,bx
c endif
c else
c cmp ax,cx
c if ae
c mov ax,cx
c endif
c endif
The block-c{IF} macros handle nesting quite happily, by means of
pushing another context, describing the inner c{if}, on top of the
one describing the outer c{if}; thus c{else} and c{endif} always
refer to the last unmatched c{if} or c{else}.
H{stdmac} i{Standard Macros}
NASM defines a set of standard macros, which are already defined
when it starts to process any source file. If you really need a
program to be assembled with no pre-defined macros, you can use the
ic{%clear} directive to empty the preprocessor of everything.
Most i{user-level assembler directives} (see k{directive}) are
implemented as macros which invoke primitive directives; these are
described in k{directive}. The rest of the standard macro set is
described here.
S{stdmacver} ic{__NASM_MAJOR__} and ic{__NASM_MINOR__}: i{NASM
Version}
The single-line macros c{__NASM_MAJOR__} and c{__NASM_MINOR__}
expand to the major and minor parts of the i{version number of
NASM} being used. So, under NASM 0.96 for example,
c{__NASM_MAJOR__} would be defined to be 0 and c{__NASM_MINOR__}
would be defined as 96.
S{fileline} ic{__FILE__} and ic{__LINE__}: File Name and Line Number
Like the C preprocessor, NASM allows the user to find out the file
name and line number containing the current instruction. The macro
c{__FILE__} expands to a string constant giving the name of the
current input file (which may change through the course of assembly
if c{%include} directives are used), and c{__LINE__} expands to a
numeric constant giving the current line number in the input file.
These macros could be used, for example, to communicate debugging
information to a macro, since invoking c{__LINE__} inside a macro
definition (either single-line or multi-line) will return the line
number of the macro e{call}, rather than e{definition}. So to
determine where in a piece of code a crash is occurring, for
example, one could write a routine c{stillhere}, which is passed a
line number in c{EAX} and outputs something like `line 155: still
here'. You could then write a macro
c %macro notdeadyet 0
c push eax
c mov eax,__LINE__
c call stillhere
c pop eax
c %endmacro
and then pepper your code with calls to c{notdeadyet} until you
find the crash point.
S{struc} ic{STRUC} and ic{ENDSTRUC}: i{Declaring Structure} Data Types
The core of NASM contains no intrinsic means of defining data
structures; instead, the preprocessor is sufficiently powerful that
data structures can be implemented as a set of macros. The macros
c{STRUC} and c{ENDSTRUC} are used to define a structure data type.
c{STRUC} takes one parameter, which is the name of the data type.
This name is defined as a symbol with the value zero, and also has
the suffix c{_size} appended to it and is then defined as an
c{EQU} giving the size of the structure. Once c{STRUC} has been
issued, you are defining the structure, and should define fields
using the c{RESB} family of pseudo-instructions, and then invoke
c{ENDSTRUC} to finish the definition.
For example, to define a structure called c{mytype} containing a
longword, a word, a byte and a string of bytes, you might code
c struc mytype
c mt_long: resd 1
c mt_word: resw 1
c mt_byte: resb 1
c mt_str: resb 32
c endstruc
The above code defines six symbols: c{mt_long} as 0 (the offset
from the beginning of a c{mytype} structure to the longword field),
c{mt_word} as 4, c{mt_byte} as 6, c{mt_str} as 7, c{mytype_size}
as 39, and c{mytype} itself as zero.
The reason why the structure type name is defined at zero is a side
effect of allowing structures to work with the local label
mechanism: if your structure members tend to have the same names in
more than one structure, you can define the above structure like this:
c struc mytype
c .long: resd 1
c .word: resw 1
c .byte: resb 1
c .str: resb 32
c endstruc
This defines the offsets to the structure fields as c{mytype.long},
c{mytype.word}, c{mytype.byte} and c{mytype.str}.
NASM, since it has no e{intrinsic} structure support, does not
support any form of period notation to refer to the elements of a
structure once you have one (except the above local-label notation),
so code such as c{mov ax,[mystruc.mt_word]} is not valid.
c{mt_word} is a constant just like any other constant, so the
correct syntax is c{mov ax,[mystruc+mt_word]} or c{mov
ax,[mystruc+mytype.word]}.
S{istruc} ic{ISTRUC}, ic{AT} and ic{IEND}: Declaring
i{Instances of Structures}
Having defined a structure type, the next thing you typically want
to do is to declare instances of that structure in your data
segment. NASM provides an easy way to do this in the c{ISTRUC}
mechanism. To declare a structure of type c{mytype} in a program,
you code something like this:
c mystruc: istruc mytype
c at mt_long, dd 123456
c at mt_word, dw 1024
c at mt_byte, db 'x'
c at mt_str, db 'hello, world', 13, 10, 0
c iend
The function of the c{AT} macro is to make use of the c{TIMES}
prefix to advance the assembly position to the correct point for the
specified structure field, and then to declare the specified data.
Therefore the structure fields must be declared in the same order as
they were specified in the structure definition.
If the data to go in a structure field requires more than one source
line to specify, the remaining source lines can easily come after
the c{AT} line. For example:
c at mt_str, db 123,134,145,156,167,178,189
c db 190,100,0
Depending on personal taste, you can also omit the code part of the
c{AT} line completely, and start the structure field on the next
line:
c at mt_str
c db 'hello, world'
c db 13,10,0
S{align} ic{ALIGN} and ic{ALIGNB}: Data Alignment
The c{ALIGN} and c{ALIGNB} macros provides a convenient way to
align code or data on a word, longword, paragraph or other boundary.
(Some assemblers call this directive ic{EVEN}.) The syntax of the
c{ALIGN} and c{ALIGNB} macros is
c align 4 ; align on 4-byte boundary
c align 16 ; align on 16-byte boundary
c align 8,db 0 ; pad with 0s rather than NOPs
c align 4,resb 1 ; align to 4 in the BSS
c alignb 4 ; equivalent to previous line
Both macros require their first argument to be a power of two; they
both compute the number of additional bytes required to bring the
length of the current section up to a multiple of that power of two,
and then apply the c{TIMES} prefix to their second argument to
perform the alignment.
If the second argument is not specified, the default for c{ALIGN}
is c{NOP}, and the default for c{ALIGNB} is c{RESB 1}. So if the
second argument is specified, the two macros are equivalent.
Normally, you can just use c{ALIGN} in code and data sections and
c{ALIGNB} in BSS sections, and never need the second argument
except for special purposes.
c{ALIGN} and c{ALIGNB}, being simple macros, perform no error
checking: they cannot warn you if their first argument fails to be a
power of two, or if their second argument generates more than one
byte of code. In each of these cases they will silently do the wrong
thing.
c{ALIGNB} (or c{ALIGN} with a second argument of c{RESB 1}) can
be used within structure definitions:
c struc mytype2
c mt_byte: resb 1
c alignb 2
c mt_word: resw 1
c alignb 4
c mt_long: resd 1
c mt_str: resb 32
c endstruc
This will ensure that the structure members are sensibly aligned
relative to the base of the structure.
A final caveat: c{ALIGN} and c{ALIGNB} work relative to the
beginning of the e{section}, not the beginning of the address space
in the final executable. Aligning to a 16-byte boundary when the
section you're in is only guaranteed to be aligned to a 4-byte
boundary, for example, is a waste of effort. Again, NASM does not
check that the section's alignment characteristics are sensible for
the use of c{ALIGN} or c{ALIGNB}.
C{directive} i{Assembler Directives}
NASM, though it attempts to avoid the bureaucracy of assemblers like
MASM and TASM, is nevertheless forced to support a e{few}
directives. These are described in this chapter.
NASM's directives come in two types: i{user-level
directives}e{user-level} directives and i{primitive
directives}e{primitive} directives. Typically, each directive has a
user-level form and a primitive form. In almost all cases, we
recommend that users use the user-level forms of the directives,
which are implemented as macros which call the primitive forms.
Primitive directives are enclosed in square brackets; user-level
directives are not.
In addition to the universal directives described in this chapter,
each object file format can optionally supply extra directives in
order to control particular features of that file format. These
i{format-specific directives}e{format-specific} directives are
documented along with the formats that implement them, in k{outfmt}.
H{bits} ic{BITS}: Specifying Target i{Processor Mode}
The c{BITS} directive specifies whether NASM should generate code
I{16-bit mode, versus 32-bit mode}designed to run on a processor
operating in 16-bit mode, or code designed to run on a processor
operating in 32-bit mode. The syntax is c{BITS 16} or c{BITS 32}.
In most cases, you should not need to use c{BITS} explicitly. The
c{aout}, c{coff}, c{elf} and c{win32} object formats, which are
designed for use in 32-bit operating systems, all cause NASM to
select 32-bit mode by default. The c{obj} object format allows you
to specify each segment you define as either c{USE16} or c{USE32},
and NASM will set its operating mode accordingly, so the use of the
c{BITS} directive is once again unnecessary.
The most likely reason for using the c{BITS} directive is to write
32-bit code in a flat binary file; this is because the c{bin}
output format defaults to 16-bit mode in anticipation of it being
used most frequently to write DOS c{.COM} programs, DOS c{.SYS}
device drivers and boot loader software.
You do e{not} need to specify c{BITS 32} merely in order to use
32-bit instructions in a 16-bit DOS program; if you do, the
assembler will generate incorrect code because it will be writing
code targeted at a 32-bit platform, to be run on a 16-bit one.
When NASM is in c{BITS 16} state, instructions which use 32-bit
data are prefixed with an 0x66 byte, and those referring to 32-bit
addresses have an 0x67 prefix. In c{BITS 32} state, the reverse is
true: 32-bit instructions require no prefixes, whereas instructions
using 16-bit data need an 0x66 and those working in 16-bit addresses
need an 0x67.
The c{BITS} directive has an exactly equivalent primitive form,
c{[BITS 16]} and c{[BITS 32]}. The user-level form is a macro
which has no function other than to call the primitive form.
H{section} ic{SECTION} or ic{SEGMENT}: Changing and i{Defining
Sections}
I{changing sections}I{switching between sections}The c{SECTION}
directive (c{SEGMENT} is an exactly equivalent synonym) changes
which section of the output file the code you write will be
assembled into. In some object file formats, the number and names of
sections are fixed; in others, the user may make up as many as they
wish. Hence c{SECTION} may sometimes give an error message, or may
define a new section, if you try to switch to a section that does
not (yet) exist.
The Unix object formats, and the c{bin} object format, all support
the i{standardised section names} c{.text}, c{.data} and c{.bss}
for the code, data and uninitialised-data sections. The c{obj}
format, by contrast, does not recognise these section names as being
special, and indeed will strip off the leading period of any section
name that has one.
S{sectmac} The ic{__SECT__} Macro
The c{SECTION} directive is unusual in that its user-level form
functions differently from its primitive form. The primitive form,
c{[SECTION xyz]}, simply switches the current target section to the
one given. The user-level form, c{SECTION xyz}, however, first
defines the single-line macro c{__SECT__} to be the primitive
c{[SECTION]} directive which it is about to issue, and then issues
it. So the user-level directive
c SECTION .text
expands to the two lines
c %define __SECT__ [SECTION .text]
c [SECTION .text]
Users may find it useful to make use of this in their own macros.
For example, the c{writefile} macro defined in k{mlmacgre} can be
usefully rewritten in the following more sophisticated form:
c %macro writefile 2+
c [section .data]
c %%str: db %2
c %%endstr:
c __SECT__
c mov dx,%%str
c mov cx,%%endstr-%%str
c mov bx,%1
c mov ah,0x40
c int 0x21
c %endmacro
This form of the macro, once passed a string to output, first
switches temporarily to the data section of the file, using the
primitive form of the c{SECTION} directive so as not to modify
c{__SECT__}. It then declares its string in the data section, and
then invokes c{__SECT__} to switch back to e{whichever} section
the user was previously working in. It thus avoids the need, in the
previous version of the macro, to include a c{JMP} instruction to
jump over the data, and also does not fail if, in a complicated
c{OBJ} format module, the user could potentially be assembling the
code in any of several separate code sections.
H{absolute} ic{ABSOLUTE}: Defining Absolute Labels
The c{ABSOLUTE} directive can be thought of as an alternative form
of c{SECTION}: it causes the subsequent code to be directed at no
physical section, but at the hypothetical section starting at the
given absolute address. The only instructions you can use in this
mode are the c{RESB} family.
c{ABSOLUTE} is used as follows:
c absolute 0x1A
c kbuf_chr resw 1
c kbuf_free resw 1
c kbuf resw 16
This example describes a section of the PC BIOS data area, at
segment address 0x40: the above code defines c{kbuf_chr} to be
0x1A, c{kbuf_free} to be 0x1C, and c{kbuf} to be 0x1E.
The user-level form of c{ABSOLUTE}, like that of c{SECTION},
redefines the ic{__SECT__} macro when it is invoked.
ic{STRUC} and ic{ENDSTRUC} are defined as macros which use
c{ABSOLUTE} (and also c{__SECT__}).
c{ABSOLUTE} doesn't have to take an absolute constant as an
argument: it can take an expression (actually, a i{critical
expression}: see k{crit}) and it can be a value in a segment. For
example, a TSR can re-use its setup code as run-time BSS like this:
c org 100h ; it's a .COM program
c jmp setup ; setup code comes last
c ; the resident part of the TSR goes here
c setup: ; now write the code that installs the TSR here
c absolute setup
c runtimevar1 resw 1
c runtimevar2 resd 20
c tsr_end:
This defines some variables `on top of' the setup code, so that
after the setup has finished running, the space it took up can be
re-used as data storage for the running TSR. The symbol `tsr_end'
can be used to calculate the total size of the part of the TSR that
needs to be made resident.
H{extern} ic{EXTERN}: i{Importing Symbols} from Other Modules
c{EXTERN} is similar to the MASM directive c{EXTRN} and the C
keyword c{extern}: it is used to declare a symbol which is not
defined anywhere in the module being assembled, but is assumed to be
defined in some other module and needs to be referred to by this
one. Not every object-file format can support external variables:
the c{bin} format cannot.
The c{EXTERN} directive takes as many arguments as you like. Each
argument is the name of a symbol:
c extern _printf
c extern _sscanf,_fscanf
Some object-file formats provide extra features to the c{EXTERN}
directive. In all cases, the extra features are used by suffixing a
colon to the symbol name followed by object-format specific text.
For example, the c{obj} format allows you to declare that the
default segment base of an external should be the group c{dgroup}
by means of the directive
c extern _variable:wrt dgroup
The primitive form of c{EXTERN} differs from the user-level form
only in that it can take only one argument at a time: the support
for multiple arguments is implemented at the preprocessor level.
You can declare the same variable as c{EXTERN} more than once: NASM
will quietly ignore the second and later redeclarations. You can't
declare a variable as c{EXTERN} as well as something else, though.
H{global} ic{GLOBAL}: i{Exporting Symbols} to Other Modules
c{GLOBAL} is the other end of c{EXTERN}: if one module declares a
symbol as c{EXTERN} and refers to it, then in order to prevent
linker errors, some other module must actually e{define} the
symbol and declare it as c{GLOBAL}. Some assemblers use the name
ic{PUBLIC} for this purpose.
The c{GLOBAL} directive applying to a symbol must appear e{before}
the definition of the symbol.
c{GLOBAL} uses the same syntax as c{EXTERN}, except that it must
refer to symbols which e{are} defined in the same module as the
c{GLOBAL} directive. For example:
c global _main
c _main: ; some code
c{GLOBAL}, like c{EXTERN}, allows object formats to define private
extensions by means of a colon. The c{elf} object format, for
example, lets you specify whether global data items are functions or
data:
c global hashlookup:function, hashtable:data
Like c{EXTERN}, the primitive form of c{GLOBAL} differs from the
user-level form only in that it can take only one argument at a
time.
H{common} ic{COMMON}: Defining Common Data Areas
The c{COMMON} directive is used to declare ie{common variables}.
A common variable is much like a global variable declared in the
uninitialised data section, so that
c common intvar 4
is similar in function to
c global intvar
c section .bss
c intvar resd 1
The difference is that if more than one module defines the same
common variable, then at link time those variables will be
e{merged}, and references to c{intvar} in all modules will point
at the same piece of memory.
Like c{GLOBAL} and c{EXTERN}, c{COMMON} supports object-format
specific extensions. For example, the c{obj} format allows common
variables to be NEAR or FAR, and the c{elf} format allows you to
specify the alignment requirements of a common variable:
c common commvar 4:near ; works in OBJ
c common intarray 100:4 ; works in ELF: 4 byte aligned
Once again, like c{EXTERN} and c{GLOBAL}, the primitive form of
c{COMMON} differs from the user-level form only in that it can take
only one argument at a time.
C{outfmt} i{Output Formats}
NASM is a portable assembler, designed to be able to compile on any
ANSI C-supporting platform and produce output to run on a variety of
Intel x86 operating systems. For this reason, it has a large number
of available output formats, selected using the ic{-f} option on
the NASM i{command line}. Each of these formats, along with its
extensions to the base NASM syntax, is detailed in this chapter.
As stated in k{opt-o}, NASM chooses a i{default name} for your
output file based on the input file name and the chosen output
format. This will be generated by removing the i{extension}
(c{.asm}, c{.s}, or whatever you like to use) from the input file
name, and substituting an extension defined by the output format.
The extensions are given with each format below.
H{binfmt} ic{bin}: i{Flat-Form Binary}I{pure binary} Output
The c{bin} format does not produce object files: it generates
nothing in the output file except the code you wrote. Such `pure
binary' files are used by i{MS-DOS}: ic{.COM} executables and
ic{.SYS} device drivers are pure binary files. Pure binary output
is also useful for i{operating-system} and i{boot loader}
development.
c{bin} supports the three i{standardised section names} ic{.text},
ic{.data} and ic{.bss} only. The file NASM outputs will contain the
contents of the c{.text} section first, followed by the contents of
the c{.data} section, aligned on a four-byte boundary. The c{.bss}
section is not stored in the output file at all, but is assumed to
appear directly after the end of the c{.data} section, again
aligned on a four-byte boundary.
If you specify no explicit c{SECTION} directive, the code you write
will be directed by default into the c{.text} section.
Using the c{bin} format puts NASM by default into 16-bit mode (see
k{bits}). In order to use c{bin} to write 32-bit code such as an
OS kernel, you need to explicitly issue the Ic{BITS}c{BITS 32}
directive.
c{bin} has no default output file name extension: instead, it
leaves your file name as it is once the original extension has been
removed. Thus, the default is for NASM to assemble c{binprog.asm}
into a binary file called c{binprog}.
S{org} ic{ORG}: Binary File i{Program Origin}
The c{bin} format provides an additional directive to the list
given in k{directive}: c{ORG}. The function of the c{ORG}
directive is to specify the origin address which NASM will assume
the program begins at when it is loaded into memory.
For example, the following code will generate the longword
c{0x00000104}:
c org 0x100
c dd label
c label:
Unlike the c{ORG} directive provided by MASM-compatible assemblers,
which allows you to jump around in the object file and overwrite
code you have already generated, NASM's c{ORG} does exactly what
the directive says: e{origin}. Its sole function is to specify one
offset which is added to all internal address references within the
file; it does not permit any of the trickery that MASM's version
does. See k{proborg} for further comments.
S{binseg} c{bin} Extensions to the c{SECTION}
DirectiveI{SECTION, bin extensions to}
The c{bin} output format extends the c{SECTION} (or c{SEGMENT})
directive to allow you to specify the alignment requirements of
segments. This is done by appending the ic{ALIGN} qualifier to the
end of the section-definition line. For example,