资源说明:
This is a rough draft and a work in progress, if/when I get through the first pass then I will go back and prune and tweak and re-write. This is my implementation of the Ridiculously Simple Computer. The instruction set is defined here: http://www.eng.umd.edu/~blj/RiSC/ I dont know how many times I have said this but I think there is a lot that can be taught and learned using this architecture. Assembly language, machine language, instruction set simulator, watching a program run inside a processor, whatever else I can think of... A prerequisite for understanding and using this material is a working knowledge of the C programming language. If you have no programming experience at all, that is definitely required, I am not teaching you programming concepts, just another language. Perhaps start with Python the Hard Way by Zed A. Shaw, it isnt really hard or the hard way the key to it is you are actually learning to program by typing the programs in first, then work out the typos and mistakes you made doing it which we all do, then when have actually typed it in right, compiled and run it to match the output in the book, then you are told what the program does and way. You are spoon fed through a path of learning the basic concepts of the language into more complicated things. Other books will lead you along the same path, but they expect you to invent the programs yourself. Learn Python is his well known book but it appears he has a Learn C book as well in some state of completion. http://learncodethehardway.org/ Now that you know how to program in C we can move forward. First, some terminology. Some of these terms have different definitions depending on who you are talking to, which can make it difficult at times. Instruction Set Architecture, sometimes abbreviated ISA, there are other things called ISA so dont get this confused. Also sometimes just called instruction set. It is the set of instructions that a processor knows how to execute. These instructions are also called machine language or machine code. The processor reads bits/bytes from memory, each instruction has a specific bit pattern also called an opcode that tells the processor what you want it to do. When you place specific instructions in a specific order you can get the processor to do complicated things using sequences of simple instructions. Machine language or machine code. This is your program, sequences of processor instructions, in binary form. A bunch of bits that are hard for humans to read but easy for the processor. Assembly language. Assembly language is the programming language that you use to create the machine code. Like other programming languages it is text, written using a text editor. Unlike other programming languages though, each instruction set can an often does have a different assembly language than some other programming language. ARM assembly language although similar in some respects is different from x86 assembly languat or mips assembly language, etc. All three have an add instruction for example and the word "add" is in the instruction but the syntax can and will vary from one processor family to another. Since the machine code is the goal, and assembly language is just a way for us to program at this low level in a human readable/writeable form it does happen from time to time that an assembly language may change or there may be more than one assembly language for an instruction set. Usually the company or individual that creates the processor and at the same time creates the instruction set which means they define the machine code, each instruction, for that processor. They tend to also create the first assembly language definition for that processor. The document used to describe the instruction set will often contain both the assembly language, words like and and xor and things like that along with register names, etc. Also it will often contain the bit definitions for the machine code. Some companies are better than others. Also in order to sell these chips they often create or have someone create an assembler (see below) for this instruction set. But, so long as you create the right machine code and follow the rules for the processor there is no reason why you cant make your own assembly language for a particular processor. x86 is a very well known example, intel created the processor. Microsoft was at first well known for being the assembler that most people used, no doubt intel had their own, but I bet it was pricy. Borland also had an assembler, turbo assembler. Microsofts masm and Borlands turbo assembler used very similar syntax but not exact. Today the gnu tools dominate, and the gnu folks completely messed up the intel x86 assembly language. This is known as the at&t syntax where the classic x86 assembly language is known as intel syntax. nasm is a popular assembler that honors the real intel assembly language where gnu as defaults to using at&t. Assembler. This word is a bit tricky, I like to try to use this word when I am talking about the program that takes assembly language and converts it to machine code. Like a compiler, but the term compiler is for programming languages that are higher level, not a one to one relationship between the line of code or operation and the machine code instruction. Where the confusion comes is the word assembler is also often used when referring to the assembly language itself. So when you hear folks say, you might have to look at the assembler for that program, or some of that was written in assembler. They mean the programming language, assembly language, not the program that reads and converts assembly language to machine code. Now just like compilers for example C compilers, may have compiler specific directives that you can put in the code, assembly language has language items that are specific to the assembler to make the programming job easier or to be more exact about what you want the assembler to do. Macro. You may from time to time come across the words macro assembler. All that means is the assembler and the assembly language it accepts provides a mechanism to make macros. Very similar to the macros you find in the C programming language (assembly language came first naturally, but if you are reading this you learned C first). Macros, as with C have a function like feel but are inline, so using macros in your assembly language you can create instruction sequences that you may wish to repeat in your code and 1) not have to type them every time and 2) if you want to change the sequence you dont have to change it all the places you used it, change it in one place. Instruction set simulator. This is generally a program, software, that takes the bits and bytes of machine code for a particular processor, and like that processor exctracts the machine code instructions and executes them. Using variables or arrays or whatever it pretends to have the same registers as the real processor, execute the instrutions like the real processor but the instruction set simulator software may not actually be written for or compile to run on the same processor that it is simulating. This will make more sense when we get to it. Two main reaons for instruction set simulators are one to allow you to run programs that have been compiled for that processor or more specifically a platform. For example a Gameboy Advance simulator or a Play Station simulator, etc. Think about or go look at the mame project, that project contains many different instruction set simulators and the goal when writing those simulators was to run fast, before our desktop and laptop computers were as fast as they are today you wanted to play these games and have them run the same speed as the arcade. An instruction set simulator can also be used for virtual machines, on Linux you can run Windows on qemu for example, you can run a Linux compiled for the ARM instruction set on an x86 computer using a qemu for the ARM instruction set for example. Now on a tangent you may find that many virtual machines dont simulate every instruction but instead use features of the instruction set to let a fair percentage of the instructions run on the real processor for that machine, then when the virtual machine say wants to send a network packet and talks to what it thinks is the network card, then the emulator comes in and pretends to be a network card such that the operating system on the virtual machine doesnt know the difference. An instruction set simulator strives to resemble the hardware to the point that the program doesnt know that it is not the real hardware. How does a program "know" something like this? It doesnt, what I mean by that is the program will crash or otherwise not run correctly if the instruction set simulator is not close enough to the real thing. Enough terminology for now. In the processor world there are times more in the past than in the present where the core of the processor is designed (the part that actually parses and runs the machine code) one time and that is the processor everyone uses for that instruction set, no variations. Take the 6502 for example, at least originally, my understanding is you created those processors by creating all of the silicon and metal layers in the chip by hand, all of the polygons on a drafting table. Very much like the way a coin is created by an artist at a much larger scale than the actual coin, and from that model of the coin a machine is used to create the master dies that are used to stamp the coins at the proper size. The old processors you made by hand what was essentially the master print or negative used in the photographic like process used to make a processor. With that much work involved if you then wanted to take the generic 6502 processor and then create the processor in the Vic-20 or in the Commodore 64 you would probably just re-use the same drawings and add more stuff around it and not actually re-invent a clone of that original processor core. We do nothing of the sort today. Hardware, logic, is design using programming languages very similar to and inspired by the software programming languages we used today. To the point that there are the equivalent of compilers and linkers that compile that high level language down into assembly like fundamental logic gates or blocks and then different modules are glued together very much the way a linker glues together objects created by a compiler. Just like the C programming language can be compiled down into any number of different assembly languages, the same source into many different implementations. The hardware design languages can be compiled down into different mixtures of logic blocks depending on the target. First off you have programmable logic like cplds and fpgas, these are chips that are filled with various fundamental logic blocks and arranged with a large network of interconnects, what makes them programmable is the interconnects can be changed temporarily or permanantly. If I want to xor two bits together and have the result feed into some other logic block then the tools for that target knows how to take the high level hardware description langauge and describe the connections between logic blocks, then it is a matter of having a utility program all of those little connections in the programmable device. The chips we most often see today are not programmable logic, they are built from the ground up to serve a specific purpose, to implement a single design. If you were to build a car from scratch you are either going to have to invent every little thing yourself, an engine, the pistons in the engine the header, cam, lifters, etc. Or you might just buy an existing engine built by someone, and design your frame so that engine fits in it. Likewise the transmission the drive shaft the gearboxes, brakes, etc. So if you are a company like intel that builds the machinery and factories housing that machinery from the ground up to make chips, well you are basically designing your own engines and transmissions to go in your car. But most companies hire an intel or some other chip maker to make the chips for them. Each chip maker has developed a cell library, a collection of different logic blocks very much like machine code. The hardware description language is then converted into basically lists of connections between the inputs and outputs of these logic blocks. And just like we have assembly language to represent machine code, there are human readable ways to represent the cell library items for a particular foundry (chip factory). Just like we dont have to create newspapers by taking, by hand, little metal letters and arranging them in rows of words to stamp the page, we use computers to create the light and dark spots on a master that becomes the letters on a printed page, we use computers to arrange the massive rats nest of wires and logic blocks on a chip, just like a professionally made magazine or other printed material, sometimes some hand tweaking is required to make the thing perfect. So what was all of that about? What that was all about is that many of the processors you write programs for today are 1) created using a programming language 2) that language is compiled down to create a chip using the generation of technology available at the foundrys at the time this chip was made (and remain available so long as that chip is still profitable enough to stay in production). 3) (here it comes) if that processor is successful enough to warrant new processors then either using the same hardware source code or the same code with features added (new instructions, bugs fixed, etc) may be used to create the new generation of that processor or 4) enough changes are made or completely new source code is created such that it is similar enough to the prior processor to execute most of the same instructions but perhaps does it in a more efficient or different way for some reason (speed, power, etc). 5) those new processors made a year or a few years later may be implemented using newer generations of chip technology making them perform faster for example. Put all of that together and just like the evolution of a popular program like Microsoft Word it may open documents saved by prior versions, but it also has new stuff that is not reverse compatible. You will find most popular instruction sets evolve, new instructions, new features to old instructions, old undesireable instructions or features removed possibly. And the assembly language and assemblers have to choose to either evolve to handle everything frome the old to the new, or chose to draw a line in the sand and one side is one tool the other side is another. When you learn an assembly language that you didnt know before, as often as not, you are going to have to be aware that some of the instructions may not work on the processor you are trying to write a program for. There is no governing body to define the rules for assembly languages much less the documentation. Each processor inventor and/or chip company has its own documentation, sometimes good, sometimes bad and often somewhere in the middle. You may wonder how a chip with bad docs actually survives in the market, but they can and do. When reading a instruction set reference manual, you want to be looking for a few things. You want to obviously be paying attention to the syntax of the assembly language, that is your primary learning exercise when picking up this manual. But also if there are bit definiitions for the machine code you want to be paying attention to that, you should not have to go to stackoverflow.com and ask the question why is it that on x86 I can load a 32 bit register with a 32 bit number like 0x12345678 in one instruction, but on mips I can only load 16 bits in a single instruction and on arm only 8 bits in a single instruction? Why can this jump instruction only jump 128 bytes but this other jump can jump anywhere in memory? The answers to all of those questions and more are and always have been, right there in the processor vendors documentation. If you are using some web page and not the processor vendors documentation, go get the processor vendors documentation. If the processor vendors documentation only has the syntax for the assembly language and not the machine code definition, you might want to find another processor to use, this is going to be more painful than normal. Now let us stretch this a bit more. What about this RiSC16 instruction set, do a little googling and you will find a number of places that use it in their computer science or engineering colleges. mips or dlx also are seen far and wide. And some of those classes your job is to either create an instruction set simulator or create the hardware design using a hardware design language. Now what does it mean to you if several thousand students every year take the same definition of something and are then sent off to create a program that matches or meats that definition? Sure many are going to fail to get it right, that is a given, but even of the ones that did make a program that perfectly matches the instruction set definition, most of those implementations are going to be different, sometimes wildly different. I have created my own RiSC16 simulator and hardware description language based on the instruction set as defined on the link at the top of this document. If you read down into that page and follow the code and links provided you will see that for such a simple instruction set considerable work has been put into making programs and a logic with caches, branch prediciton and all kinds of stuff that seems extremen for something that on the surface is so simple. Most instruction sets that have that level of complexity have hundreds of instructions. My implementation is meant to be easy to use, easy to follow, something that you might learn something from. There is no interest in speed as I dont expect you to actually write complicated programs that would warrant such speed, I expect you to learn some fundamentals here then take that knowledge on to the next instruction set and the next and the next. I firmly believe you should learn a few instruction sets if you are going to bother to learn any. The secon and third and Nth are significantly easier than the first as the concepts and even syntax can be very similar from one to the next. just like learning new programming langauges the basics of programming doesnt change with a new laguage, more often than not it is learning a new syntax. If and when you see my implementation of this processor, understand that first as a profession I do not do the hardware design langauge for chips. As a profession I do look at a lot of that code from other people, debug that code, sometimes propose fixes to that code, but I write software to test that code or write code to boot that processor or drivers to initialize peripherals within or around that processor. These github projects allow me to explore, with you, my own designs based on what I have learned from others. And the more important issue is that when you see this implementation, you see these waveforms representing all of the interconnect signals, and the bits in registers, etc, every implementation is going to look different inside, you might get used to this one and then look at another running the same program and not have a clue as to what you are looking at, until you take the time to understand what you are looking at. Same goes for my instruction set simulator. Many of the instruction set simulators I come across are trying to be fast or clever or both, and as a result can be very unreadable. I hope that this one and others of mine that you may find are, in fact, readable. More definitions: Register. When used in the context of assembly language a register is very much like a variable in C or other programming languages. A difference though is that there are often a fixed number of registers some may have specific rules or limitations, and very often the names for them have been chosen. So you have to reuse these variable like registers with names like r0, r1, r2...In a more general sense the term register is used a lot in programming. For example a video card might have a register than holds the value of the brightness level being output. By writing a new value to that register you can change the brighness. Another register might be the contrast. Or perhaps it may take a number of registers to describe the brightness and/or contrast output by a video card. These latter types of registers are like the ones in a processor core, a chunk of bits somewhere that hold something, but this latter type is usually accessed by reading or writing a particular memory location. The former type the type we are going to focus on when programming in assembly langauge, these have an intimate relationship with the processor instructions themselves in fact an instruction set relies heavily on the registers in that processor. My understanding and implementation of the RiSC16 instruction set. This processor has 8 so called general purpose registers. One is actually special so 7 of them are general purpose. All of the instructions in this instruction set rely on and operate on these registers. My shorthand reference for this instruction set is as follows: 000aaabbb0000ccc add ra,rb,rc ra = rb + rc 001aaabbbsssssss addi ra,rb,simm ra = rb + simm 010aaabbb0000ccc nand ra,rb,rc ra = ~(rb&rc) 011aaaiiiiiiiiii lui ra,imm ra=imm<<6 100aaabbbsssssss sw ra,[rb+simm] 101aaabbbsssssss lw ra,[rb+simm] 110aaabbbsssssss beq ra,rb,simm 111aaabbb0000000 jalr ra,rb ra = pc; j [rb] 1111111111111111 halt If it doesnt make sense right off, dont worry we are going to go through each of these in detail. Understand that this processor, with 8 instructions is not the norm. This processor is simple in the sense that you can easily wrap your head around all of its instructions, all that it does, to make useful programs though you as a programmer have to work harder than you would on other processors. A goal here is to worry less about what you cant easily do with this processor and instead focus on what you can do and how you go about understanding what and why there are rules and limitations to each instruction. When the instruction set is laid out as I have shown it above you should very quickly notice that with one exception you can tell what instruction it is by looking at the top three bits. This certainly makes it easier for everyone to figure out what instruction they are looking at when presented with a bunch of bits. The one exception is something I added to the RiSC16 defined by Professor Jacob. Some processors will have a halt instruction, but in general a processors job is to run forever so long as the power is on. The processors with halt instructions are often microcontrollers and the halt is a temporary state, basically go to sleep and consume very little power until I wake you up, then wake up fast do a few things and go back to sleep. For example your television remote control, in order to prevent the batteries from having to be replaced daily or weekly, the electronics in a device like that use very very little power when in sleep mode, then they wake up still using little power do a quick task then go back to sleep. Battery life being a primary design requirement across the board. I have a halt instruction because this processor is for educational purposes using a simulator, write a small amount of code, end with a halt, look at the output of the simulator. The simulator certainly can be left running a program forever or for a long time if you have a program that you want to run that way, not a problem. But many of the examples will rely on the halt to end the example and allow the output to be examined easily. The second thing you might assume upon first inspection, and with experience right away is that this instruction set appears to be "fixed word length". Fixed word length means the length of the instructions as measured in number of bits, is the same for all the instructions in the instruction set. All 9 instructions are exactly 16 bits no more no less. Because it allows for simpler logic and a more deterministic nature the relatively modern risc processors tend to use fixed word length instruction sets. Not all, and not all all the time, but compared to the older processors like the 6502 and 8086 which are considered cisc, risc leans towards fixed. Variable word length instruction sets are found in processors like the 6502 and 8086 and many others. Variable word length means that some instructions use more bits than others, some might be 8 bits and some might be as many as 81 bits (9 bytes) or more in modern 64 bit cisc processors for a single instruction. A fixed word length processor knows what it needs to do when it reads and decodes that single instruction, it does not have to go back out and fetch or wait for the rest of the instruction to arrive, it is all there. A drawback would be that many simple instructions you might want to have dont need all of those bits and you are wasting space and bandwidth for the simpler instructions. Variable word length instructions you ideally want to make the commonly used instructions shorter and the less commonly used instructions longer. Many times the additional length is for other reasons as we will see shortly with the addi instruction. With experience, another thing you notice at first glance is that this RiSC16 instruction set uses some registers with the conditional branch. The thought is, this is MIPS like. (MIPS is another, well known, instruction set). Upon further study you see that indeed the comparison and the conditional branch are done by the same instruction. The more popular way to do this is to have alu functions set flags, including a compare instruction, which is a subtract that does not save the result only updates the flags. Then the conditional branch instruction(s) uses the flags set by some prior instruction. For the RiSC16 the beq instruciton compares the contents of the two registers. If they match then the branch happens, if they do not match then the branch does not happen. So now you ask yourself, if the branch is mips like, what else is mips like? Is there a delay slot after the branch and what about r0? In this case, RiSC16, there is not a branch delay slot. For pipelined processors (pipelining allows for higher execution performance) when a branch happens the pipeline needs to be flushed and re-filled, this costs clock cycles, a branch delay slot or slots means that one or some of the instructions after the branch will be executed, recovering the cost of some of those lost clock cycles. Typically you would arrange instructions so that one of the last things you would have done before performing the branch (that does not affect the branch) is placed after the branch. If you dont have an instruction you can move there then a nop is used, and you basically lose the clock cycle anyway. So the RiSC16 does not have a delay slot, but r0 is mips like. What that means is r0 is special, the contents of r0 is always the value zero (0x0000). You can read/use r0 wherever you want but if it is the destination register in an instruction, its contents are not changed, it is always zero. As a programmer you could have done this with any register on your own, no need for hardware to make one zero. Processors like this one are designed such that you need a register to be zero to do useful things, so might as well force one. Again it wasnt required to force one as a programmer you could have set one and left it or set a register to zero when you needed it. In the spirit of the RiSC16 instruction set I have also forced r0 to zero in my implementations. Having a register contain zero makes a few of the instructions more powerful. For example add, if one of the two operand regisers is a zero then the add becomes a move instruction, move the contents from one to the other. If both operands are zero then it becomes move zero to the destination register. If all three are zero then it becomes a nop. I am going to run through a reference for the instruction set, then some observations, and then we will start talking about machine and assembly language programming. Between now and then the text will have the feel that you understand assembly language, this will serve as a reference once you do know assembly language. ADD 000aaabbb0000ccc add ra,rb,rc ra = rb + rc ra, rb, and rc are any one of the 8 general purpose registers. The contents of rb and rc are added together and stored in register ra (if ra is not r0). examples add r1,r2,r3 r1 = r2 + r3 add r1,r2,r0 r1 = r2 + r0, since r0 = 0 : r1 = r2. add r3,r0,r0 r3 = r0 + r0, since r0 = 0 : r3 = 0x0000 ADDI 001aaabbbsssssss addi ra,rb,simm ra = rb + simm ra and rb are any one of the 8 general purpose registers, the contents of rb and the immediate value are added together and the result is stored in ra (if ra is not r0). The immediate for addi, is a signed immediate, from the instruction encoding we see there are 7 bits available to the immediate value, being signed means sign extended, so whatever is in bit 6 is used in bits 7 to 15. for example: 001xxxxxx1xxxxxx instruction encoding 1111111111xxxxxx immediate value 001xxxxxx0xxxxxx instruction encoding 0000000000xxxxxx immediate value 001xxxxxx1000101 instruction encoding 1111111111000101 immediate value 001xxxxxx1000101 instruction encoding 0000000000100101 immediate value So the valid immediate values are 0x0000 to 0x003F and 0xFFC0 to 0xFFFF Seeing this encoding and a description for the immediate you should never need to ask "Why cant I use an immediate value of 0x1234 with addi". You now know the reason is because there is no way to encode that value in the instruction. examples addi r1,r2,0x0010 r1 = r2 + 0x0010 addi r1,r0,0x0034 r1 = r0 + 0x0034, since r0 = 0 : r1 = 0x0034 NAND 010aaabbb0000ccc nand ra,rb,rc ra = ~(rb&rc) This is a NOT AND instruction. ra, rb, and rc are any one of the 8 general purpose registers. rb and rc are ANDed together (bit 0 ANDed with bit 0, bit 1 ANDed with bit 1, etc). The result of that AND operation is then inverted (bit 0 = NOT bit 0, bit 1 = NOT bit1, etc). examples nand r1,r2,r3 r1 = ~(r2&r3) nand r5,r0,r7 r5 = ~(r0&r7) -> r5 = ~(0) = 0xFFFF LUI 011aaaiiiiiiiiii lui ra,imm ra=imm<<6 The lui instruction is a load upper immediate. Since we need 3 bits for the opcode, 3 bits for the destination register the largest immediate we can encode is 10 bits. This instruciton zeros the lower 6 bits and places the immediate value encode in the instruction in the upper 10 bits of the specified register. The valid immediate values that can be encoded in the instruction bits are 0x000 to 0x3FF. This also means that valid immediate values are 0x0000, 0x0040, 0x0080, 0x0100, 0x0140, etc. Any multiple of 0x40 between 0x0000 and 0xFFFF. This docment and examples are going to use an assembly language where you specify the immediate value you want. For example if you want 0x0040 you put 0x0040 in the assembly language. The instruction will be encoded with a 0x001, you dont put 0x001 to get 0x0040 you put 0x0040 to get 0x0040. example lui r1,0x1200 r1 = 0x1200 lui r1,0xFFC0 r1 = 0xFFC0 SW 100aaabbbsssssss sw ra,[rb+simm] This instruction writes or stores a word to a location in memory. The myrisc16 implementation of memory addressing is in whole 16 bit words, you cannot address a byte within a word. Just like the addi instruction the immediate value is a signed extended immediate value, the same rule applies the hardware only allows immediates 0x0000 to 0x003F or 0xFFC0 to 0xFFFF. examples sw r1,[r2+0x0034] write r1 to address r2+0x0034 LW 101aaabbbsssssss lw ra,[rb+simm] Exactly like the store word instruction, but this is a load word it reads from memory, not writes. Everything else is the same, immediates are limited to 0x0000 to 0x003F or 0xFFC0 to 0xFFFF. Only 16 bit words are addresssable, the 8 bit bytes within a word are not. BEQ 110aaabbbsssssss beq ra,rb,simm This instruction compares the values in registers ra and rb, if equal then the next instruction executed is at pc + 1 + simm. Where pc is the address of the beq instruciton in question. If the contents of ra and rb are not equal then the next instruction afer beq is executed, no branch happens. beq uses a sign extended immediate value like addi, sw, and lw. So the immediate values are limited to 0x0000 to 0x003F and 0xFFC0 to 0xFFFF. Assemblers will generally allow you to use labels instead of having to add up the number of instructions. JALR 111aaabbb0000000 jalr ra,rb ra = pc; j [rb] Jump and link register. First the instruction after the jalr instruction in question, pc+1, is stored in the ra register (if ra is not r0). Next the program control branches to the address in the rb register. HALT 1111111111111111 halt This instruciton is not part of RiSC16, it is specific to myrisc16. Note the top three bits are the opcode for JALR. Note that bits 0 to 6 in the jalr instruction are zero. This implies that if any of the lower 7 bits are non-zero it is not really a jalr but an undefined instruction. myrisc16 uses one of those undefined instructions to implement a halt instruction. The halt instruciton causes the processor to halt, to stop executing instructions. This is atypical for processors but it makes it much easier to demonstrate simple programs for teaching purposes. As mentioned before risc processors tend to use fixed length instructions. And the fixed length instructions tend to be the same width as the registers and/or memory busses, etc. Which means that you dont have enough bits to both encode a load immediate instruciton and have a full registers width of bits to put in that register. The lui instruciton needs 3 bits for the opcode and 3 bits for the destination register. That leaves us with only 10 maximum bits we can load. Now what they could have done is have had a load high and a load low, for example two other opcode bits could have been used: 011aaa00iiiiiiii lh ra,imm ra = (imm<<8) | (ra&0x00FF); 011aaa01iiiiiiii lhz ra,imm ra = (imm<<8) | 0x0000 011aaa10iiiiiiii ll ra,imm ra = imm | (ra&0xFF00); 011aaa11iiiiiiii llz ra,imm ra = imm | 0x0000 So to load the value 0x4567 into a register using this fantasy instruciton encoding lhz ra,0x4500 ll ra,0x0067 The RiSC16 processor does not do that in that form, but it does it in another form. The lui instruciton allows you to set all of the bits in a register, with the lower 6 always zero and the upper 10 whatever you want. And if you think about it if you have used lui to set the upper 10 bits of a register, and the lower are zeros then you can use addi with r0 as the register operand and addi can be used to modify the lower 6 bits. So using RiSC16 to load the value 0x4567 into a register lui ra,0x4540 addi ra,r0,0x0027 Variable word instruction sets often allow any bit pattern because they often encode a full sized immediate as extra words in the instruction. A 32 bit x86 move of some immediate value into a register might be one to a few 8 bit bytes for the opcode indicating this is a load immediate of size 32 bits into a specific register. Then the instruction would be followed by that 32 bit immediate, 4 bytes. Why on earth would you have a nand instruction in a processor? Dont most have and instructions and not instructions separately? Let's look at some truth tables: not c = not a a c 0 1 1 0 or c = a or b a b c 0 0 0 0 1 1 1 0 1 1 1 1 and c = a and b a b c 0 0 0 0 1 0 1 0 0 1 1 1 nand c = not ( a and b ) a b c 0 0 1 0 1 1 1 0 1 1 1 0 nor c = not ( a or b ) a b c 0 0 0 0 1 1 1 0 1 1 1 1 xor c = a xor b a b c 0 0 0 0 1 1 1 0 1 1 1 0 Using only nand truth tables we can easily implement not, first notice in the and table that anything anded with itself is itself and c = a and b a b c 0 0 0 1 1 1 so to get c = not a we need c = a nand a a a c 0 0 1 1 1 0 Using only nand truth tables you can implement xor logic (from wikipedia) d = a nand b e = a nand d f = b nand d c = e nand f a b d e f c 0 0 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 1 0 1 1 0 c = a xor b De Morgan says that a or b = not ((not a) and (not b)) = not ((not a) and (not b)) = (not a) nand (not b) = (a nand a) nand (b nand b) d = a nand a e = b nand b c = d nand e a b d e c 0 0 1 1 0 0 1 1 0 1 1 0 0 1 1 1 1 0 0 1 c = a or b So a nor operation then would be c = not ( a or b ). Since we have done c = a or b and c = not a we know d = a nand a e = b nand b f = d nand e c = f nand f a b d e f c 0 0 1 1 0 1 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 c = a nor b I think you get the idea, you can make any other logic operation using nand operations. (same is true for nor). If you wander about wikipedia you will see that for the various boolean operations they show you how to implement that operation using nand gates. http://en.wikipedia.org/wiki/OR_gate http://en.wikipedia.org/wiki/NOR_gate http://en.wikipedia.org/wiki/AND_gate http://en.wikipedia.org/wiki/NAND_gate http://en.wikipedia.org/wiki/XOR_gate The beq instruction uses a sign extended immediate, this allows you to branch forward and backward but only by a limited amount. Just like the addi, sw, and lw instructions the sign extended immediate is limited to 0x0000 to 0x003F or 0xFFC0 to 0xFFFF, no other values are allowed. So what if we encoded a 0x7F in to the simm bits? That would give 0xFFFF, which if you are fast with your twos complement math you know that is a minus one (-1). pc = pc + 1 + 0xFFFF means pc = pc + 1 - 1 = pc. An infinite loop. So what are our actual branch limits? Our immediate is limited to two ranges 0x0000 to 0x003F and 0xFFC0 to 0xFFFF. So we should try those four limits and see what happens. It is assumed that you understand twos complement. The value pc here is the address of the beq instruction itself. pc = pc + 1 + 0xFFC0 = pc + 0xFFC1 = pc - 0x003F pc = pc + 1 + 0xFFFF = pc + 0x0000 pc = pc + 1 + 0x0000 = pc + 0x0001 pc = pc + 1 + 0x003F = pc + 0x0040 So we can go anywhere from 63 (0x3F) instructions backward and 64 (0x40) forward relative to the beq instruction itself. Now typically when programming in assembly language you dont normally have to count instructions and set the immediate value, some assemblers might not even let you set the immediate. The use of labels is typical. for example one: ;some code ;more code beq r1,r0,one ; if r1 == r0 then branch to the first instruction after ; the label one beq r1,r2,two ;some code ;more code two: The assembler is keeping track of instructions and labels, the assembler will then figure out the address for the beq and the address for the destination. The assembler will do the math and encode the right value in the instruction for you. If the computed value does not conform to the sign extended immediate limitations then the assembler will give you some flavor of error message. Hopefully not to cryptic. At this point I normally would say that I have created an instruction set simulator and an assembler so that you can learn assembly language for this instruction set without painful to compile tools. This tutorial started off telling you that you need to know C. In part because C and C syntax will be used to explain what the asm is doing, second because you should be able to compile the the tool or tools, written in C, using your C compiler of choice (within reason). What I have not done, at least not so far, is create an assembler. At least not a traditional assembler that takes ascii files and makes machine code. I have borrowed a cool way to use a C compiler as an assembler. So what would normally look like this: addi r1,r0,0x0020 nand r2,r0,r0 one: add r1,r1,r2 beq r1,r0,two beq r0,r0,one two: Looks like this: declare(one); declare(two); ... addi(r1,r0,0x0020); nand(r2,r0,r0); label(one); add(r1,r2,r3); beq(r1,r0,two); beq(r0,r0,one); You can see that I dont need to write a parser, the C language compiler does the parsing, we turn an asm instruction into a C function and just pass it the same parameters. The key to this assembler is the file tinyasm.c. It takes advantage of C macros that let you take whatever ascii is passed and shove that into the next level C program that is created by the front end parser. The registers are just variables that have been declared and initialized for you. As the code I borrowed from I start everything with a macro, but I have the macro call a full function so that I can easily do more in the function than dealing with the code as macros. I can check for sign extended immediate values for example. So the add instruction add ra,rb,rc starts like this #define add(ra,rb,rc) do_add(ra,rb,rc) And the full function is this void do_add ( unsigned int ra, unsigned int rb, unsigned int rc ) { if((ra>7)||(rb>7)||(rc>7)) { printf("do_add limit fail pc = %u \n",__pc__); exit(1); } emit((0<<13)|(ra<<10)|(rb<<7)|(rc)); } You can see this allows for some limit checking, unlike a real assembler with a full parser, I am not able to do the complete job of syntax checking, you can make mistakes and get away with stuff you normally wouldnt with a real assembler. The tradeoff here is that you can very easily see the connection to machine code, and later will, if you wish, create your own pseudo instructions. From our reference material the add instruction looks like this 000aaabbb0000ccc add ra,rb,rc ra = rb + rc 16 bits, upper three bits are zero. Then nine other bits, three sets of three, define the general purpose registers used in the instruciton. The tinyasm magic will put the numbers 0-7 in the variables ra, rb, rc passed to the do_add() function. add r1,r2,r3 becomes this int r1=1, r2=2, r3=3; do_add(r1,r2,r3); which is basically do_add(1,2,3); Using the instruction definition 000aaabbb0000ccc add ra,rb,rc ra = rb + rc we need to make sure that the register numbers are limited to three bits. if ra was an 8 for example and we didnt do a check for it that fourth bit would/could wander into our opcode field and change the instruction from an add to an addi, that would be a problem. A quick and dirty method that works fine with the macro would be to simply and the incoming number with a 7, insuring it is only 3 significant bits. But I wanted at least some warnings and errors so I limit check the incoming values, if above 7 then error, which means if it it gets encoded the value is between 0 and 7; The emit macro takes a word and emits it out into the instruciton stream in the binary. Most places used that word is a machine instruction. So do_add when it calls emit() is converting from asm a list of register numbers to machine code, bits in the right place for the opcode and bits that indicate what registers are used as operands and the destination. emit((0<<13)|(ra<<10)|(rb<<7)|(rc)); Now lets dive in and learn some assembly. You will do the most for yourself by typing these lessons in manually. As you know from other programming experience an important part of programming is correctly typing in the language and debugging the code that you have written. By cutting and pasting you lose that experience for code that you know the expected result, and then experience it for code you are not sure about. --------------------------------------------------------------------- Lesson 0: Building the tools. The first tool of interest is the instruciton set simulator mr16sim.c. I have tried to make the code portable, hopefully it compiles for you, if not let me know. If using the gnu C compiler gcc -o mr16sim.exe -o mr26sim.c ./mr16sim.exe mr16sim filename.csv And now an example lesson. Dont enter the lines starting with ---- ---- lesson0.c #include "tinyasm.c" START("lesson0.csv"); lui(r1,0x0000); addi(r1,r1,1); addi(r1,r1,1); halt(); END ---- lesson0.c gcc -o lesson0.exe lesson0.c ./lesson0.exe Pass 1 completed, starting output pass. Assembly Succeeded. ./mr16sim.exe lesson0.csv [0x0000] 0x6400 lui r1,0x000 (0x0000) write_reg(r1,0x0000) [0x0001] 0x2481 addi r1,r1,0x0001 (1) write_reg(r1,0x0001) [0x0002] 0x2481 addi r1,r1,0x0001 (1) write_reg(r1,0x0002) [0x0003] 0xFFFF halt fetch_count 4 write_count 0 read_count 0 This lesson is not about what did these instructions do but about making sure the tools work, if the tools dont work then stop here. --------------------------------------------------------------------- Lesson 1: LUI ---- lesson1.c #include "tinyasm.c" START("lesson1.csv"); lui(r1,0x0000); lui(r2,0x0000); lui(r1,0xABC0); lui(r5,0x1200); halt(); END ---- lesson1.c When simulated the output is: [0x0000] 0x6400 lui r1,0x000 (0x0000) write_reg(r1,0x0000) [0x0001] 0x6800 lui r2,0x000 (0x0000) write_reg(r2,0x0000) [0x0002] 0x66AF lui r1,0x2AF (0xABC0) write_reg(r1,0xABC0) [0x0003] 0x7448 lui r5,0x048 (0x1200) write_reg(r5,0x1200) [0x0004] 0xFFFF halt fetch_count 5 write_count 0 read_count 0 What does it do? Lui stands for load upper immediate. An immediate is a constant that is encoded in the machine code instruction itself. If you look at the reference material above, the instruction is 16 bits. The number of bits used to figure out what instruction this is, the opcode, is 3 bits and the number of bits in the instruction to indicate what register is going to get the immediate is 3 bits. So 16-3-3 = 10 bits left over. Our registers are 16 bits wide so the biggest immediate we can have is 10 bits. The upper means load those 10 bits into the upper 10 bits of the register. The lui instruction sets the lower 6 bits to zero so all 16 bits in the register are modified by this instruction. No, unfortunately there is no lli, load lower immediate. We will learn in the next lesson how to load all 16 bits of a register with a desired constant. Why do we need to load immediate values into registers? Registers are just like variables in a high level language, at some point before you use those variables you have to put something in them be it using constants or by reading from some input like reading from memory or a file or user. If the variable is passed into the function then somewhere above that function there is ultimately a variable or variables that are loaded from somewhere before being used. A number of the instructions require that there are registers used as operands so before we can use one of those instructions we must load the operand registers with some value. The simulator is showing us each time a register is being writte with a value, you can see that the register writes both the register number and the value line up with the instructions in the program. --------------------------------------------------------------------- Lesson 2: ADDI for immediates ---- lesson2a.c #include "tinyasm.c" START("lesson2a.csv"); lui(r1,0x1200); addi(r1,r1,0x0034); lui(r1,0x5600); addi(r1,r1,0x0078); lui(r1,0x5640); addi(r1,r1,0x0038); halt(); END ---- lesson2a.c When building the csv file you should see do_addi limit fail pc = 3 So change that line to ---- lesson2b.c #include "tinyasm.c" START("lesson2b.csv"); lui(r1,0x1200); addi(r1,r1,0x0034); lui(r1,0x5600); addi(r1,r1,0xFFF8); lui(r1,0x5640); addi(r1,r1,0x0038); halt(); END ---- lesson2b.c When simulated the output is: [0x0000] 0x6448 lui r1,0x048 (0x1200) write_reg(r1,0x1200) [0x0001] 0x24B4 addi r1,r1,0x0034 (52) write_reg(r1,0x1234) [0x0002] 0x6558 lui r1,0x158 (0x5600) write_reg(r1,0x5600) [0x0003] 0x24F8 addi r1,r1,0xFFF8 (-8) write_reg(r1,0x55F8) [0x0004] 0x6559 lui r1,0x159 (0x5640) write_reg(r1,0x5640) [0x0005] 0x24B8 addi r1,r1,0x0038 (56) write_reg(r1,0x5678) [0x0006] 0xFFFF halt fetch_count 7 write_count 0 read_count 0 What does it do? Well for starters you can see with the first two instructions I am placing the constant 0x1234 into r1. We know that the lui instruction sets the upper 10 bits to match the immediate and lower 6 bits to zero (which also matches the immediate because the assembler complains otherwise). So if we think about the addi description in the reference material above. It adds the immediate value to the second register and stores the result in the first register. So after the first lui where 0x1200 is in r1, now we are adding r1=r1+0x0034 which is r1=0x1200+0x0034 r1=0x1234 The constant we wanted in r1. Now why didnt that work with 0x5678, why did the assembler complain for starters? A drawback to this tiny assembler is not being able to tell you a line number (I imagine there is some C magic you can use to get that to work) so it shows you the address in memory which is zero based so the first lui is address 0 the first addi is address 1 and so on, the offending addi is the addi(r1,r1,0x0078). Why is that a problem? Well the syntax enforced by this assembler is that you must show the full sign extended constant. Looking at the reference material for the addi instruction it is a 7 bit immediate (16 - 3 bits for opcode - 3 bits for dest register - 3 bits for operand register = 7 bits left over). The constant 0x0078 is 0b0000000001111000 if you count them up 0b1111000 is seven significant bits, so what is the problem? This is a signed integer, if the 7th bit is set we must have all the bits above set, if the 7th bit is zero then all the upper bits must be zero. For 0x0078 we have the 7th bit set but the upper bits zero, that is wrong so we go ahead and sign extend right? 0b1111111111111000 = 0xFFF8 and try that. Well that gives 0x5600+0xFFF8 = 0x55F8, that is not the 0x5678 we were looking for. But notice the third pair of instructions. The constant 0x0038 = 0b0000000000111000, the lower 7 bits are 0b0111000 and the upper bits match the 7th bit, so that is a valid constant. If we look at 0x5640 0b0101011001000000 the lower 6 bits are zero, so that is a valid lui constant. And when we add 0x5640+0x0038 we get 0x5678, the desired 16 bit value we wanted in r1. Although it would be nice to split constants on 8 bit boundaries 0x1234 -> 0x1200+0x0034 0x5678 -> 0x5600+0x0078 0xABCD -> 0xAB00+0x00CD We really have to split them 10 on the left and 6 on the right 0x1234 -> 0b0001001000110100 -> 0b0001001000 110100 -> 0x1200+0x0034 0x5678 -> 0b0101011001111000 -> 0b0101011001 111000 -> 0x5640+0x0038 0xABCD -> 0b1010101111001101 -> 0b1010101111 001101 -> 0xABC0+0x000D The left number so it is called here is the pattern used with lui and the right number the pattern used with the addi that follows. lui r1,0x1200 addi r1,r1,0x0034 lui r2,0x5640 addi r2,r2,0x0038 lui r3,0xABC0 addi r3,r3,0x000D --------------------------------------------------------------------- Lesson 3: ADDI for addition ---- lesson3.c #include "tinyasm.c" START("lesson3.csv"); lui(r1,0x1200); lui(r2,0x3400); addi(r3,r1,0x0003); addi(r3,r3,0x0004); addi(r4,r2,0x0005); addi(r5,r0,0x0006); addi(r2,r2,0xFFFF); addi(r6,r1,0xFFFE); addi(r1,r1,0xFFFD); halt(); END ---- lesson3.c When simulated the output is: [0x0000] 0x6448 lui r1,0x048 (0x1200) write_reg(r1,0x1200) [0x0001] 0x68D0 lui r2,0x0D0 (0x3400) write_reg(r2,0x3400) [0x0002] 0x2C83 addi r3,r1,0x0003 (3) write_reg(r3,0x1203) [0x0003] 0x2D84 addi r3,r3,0x0004 (4) write_reg(r3,0x1207) [0x0004] 0x3105 addi r4,r2,0x0005 (5) write_reg(r4,0x3405) [0x0005] 0x3406 addi r5,r0,0x0006 (6) write_reg(r5,0x0006) [0x0006] 0x297F addi r2,r2,0xFFFF (-1) write_reg(r2,0x33FF) [0x0007] 0x38FE addi r6,r1,0xFFFE (-2) write_reg(r6,0x11FE) [0x0008] 0x24FD addi r1,r1,0xFFFD (-3) write_reg(r1,0x11FD) [0x0009] 0xFFFF halt fetch_count 10 write_count 0 read_count 0 What does it do? 001aaabbbsssssss addi ra,rb,simm ra = rb + simm addi ra,rb,simm means ra = rb + simm where ra and rb are registers and simm is a signed immediate. The signed immediate is 7 bits which implies 0x00 to 0x7F but the most significant bit is sign extended which means that the possible range of values are 0x0000 to 0x003F and 0xFFC0 to 0xFFFF. Viewed as unsigned numbers that is 0 to 63 or 65417 to 65535, viewed as signed numbers 0 to 63 or -64 to -1. The example starts by using lui to put some values in a couple of registers we should not assume that the registers already have some value we need to put values in them before using them. the next line, the first addi is addi(r3,r1,0x0003); r3 = r1 + 0x0003 = 0x1200 + 0x0003 = 0x1203 Next, there is no reason why you cant use the same register as both an operand and the resultP addi(r3,r3,0x0004); r3 = r3 + 0x0004 = 0x1203 + 0x0004 = 0x1207 Next addi(r4,r2,0x0005); r4 = r2 + 0x0005 = 0x3400 + 0x0005 = 0x3405 Note on this one that r0 is always zero when used as an operand. So this is like moving the constant into r5 since adding anyting with 0 is itself. addi(r5,r0,0x0006); r5 = r0 + 0x0006 = 0x0000 + 0x0006 = 0x0006 Now some negative numbers or large unsigned numbers addi(r2,r2,0xFFFF); r2 = r2 + 0xFFFF = 0x3400 + 0xFFFF = 0x33FF addi(r6,r1,0xFFFE); r6 = r1 + 0xFFFE = 0x1200 + 0xFFFE = 0x11FE addi(r1,r1,0xFFFD); r1 = r1 + 0xFFFD = 0x1200 + 0xFFFD = 0x11FD At this point it should be painfully obvious that this is like programming in any other language except there are limits on what you can do. The registers are just like variables, this instruction limits you to a register = regsiter + constant and the constant has limits. As far as how you use it though is like programming in any other language. --------------------------------------------------------------------- Lesson 4: ADD ---- lesson4.c #include "tinyasm.c" START("lesson4.csv"); lui(r1,0x1100); lui(r2,0x2200); lui(r3,0x3300); addi(r3,r3,0x0033); add(r4,r1,r2); add(r5,r1,r3); add(r6,r3,r1); add(r7,r2,r5); add(r1,r0,r0); halt(); END ---- lesson4.c When simulated the output is: [0x0000] 0x6444 lui r1,0x044 (0x1100) write_reg(r1,0x1100) [0x0001] 0x6888 lui r2,0x088 (0x2200) write_reg(r2,0x2200) [0x0002] 0x6CCC lui r3,0x0CC (0x3300) write_reg(r3,0x3300) [0x0003] 0x2DB3 addi r3,r3,0x0033 (51) write_reg(r3,0x3333) [0x0004] 0x1082 add r4,r1,r2 write_reg(r4,0x3300) [0x0005] 0x1483 add r5,r1,r3 write_reg(r5,0x4433) [0x0006] 0x1981 add r6,r3,r1 write_reg(r6,0x4433) [0x0007] 0x1D05 add r7,r2,r5 write_reg(r7,0x6633) [0x0008] 0x0400 add r1,r0,r0 write_reg(r1,0x0000) [0x0009] 0xFFFF halt fetch_count 10 write_count 0 read_count 0 What does it do? Just like addi, think about programming in C or any other language the registers are like variables and this instruction limits you to a variable equals the sum of two variables. So reading and understanding this code should be easy to see: lui(r1,0x1100); r1 = 0x1100 lui(r2,0x2200); r2 = 0x2200 lui(r3,0x3300); r3 = 0x3300 addi(r3,r3,0x0033); r3 = r3 + 0x0033 = 0x3300 + 0x0033 = 0x3333 add(r4,r1,r2); r4 = r1 + r2 = 0x1100 + 0x2200 = 0x3300 add(r5,r1,r3); r5 = r1 + r3 = 0x1100 + 0x3333 = 0x4433 add(r6,r3,r1); r6 = r3 + r1 = 0x3333 + 0x1100 = 0x4433 add(r7,r2,r5); r7 = r2 + r5 = 0x2200 + 0x4433 = 0x6633 add(r1,r0,r0); r1 = r0 + r0 = 0x0000 + 0x0000 = 0x0000 halt(); --------------------------------------------------------------------- Lesson 5: NAND ---- lesson5.c #include "tinyasm.c" START("lesson5.csv"); lui(r1,0x1100); addi(r1,r1,0x0011); nand(r1,r1,r1); nand(r1,r1,r1); addi(r2,r0,0x003F); nand(r3,r1,r2); nand(r3,r3,r3); nand(r4,r0,r0); lui(r2,0x2300); nand(r3,r2,r4); nand(r5,r2,r3); nand(r6,r4,r3); nand(r7,r5,r6); addi(r2,r2,0x0004); nand(r3,r1,r1); nand(r4,r2,r2); nand(r5,r3,r4); halt(); END ---- lesson5.c When simulated the output is: [0x0000] 0x6444 lui r1,0x044 (0x1100) write_reg(r1,0x1100) [0x0001] 0x2491 addi r1,r1,0x0011 (17) write_reg(r1,0x1111) [0x0002] 0x4481 nand r1,r1,r1 write_reg(r1,0xEEEE) [0x0003] 0x4481 nand r1,r1,r1 write_reg(r1,0x1111) [0x0004] 0x283F addi r2,r0,0x003F (63) write_reg(r2,0x003F) [0x0005] 0x4C82 nand r3,r1,r2 write_reg(r3,0xFFEE) [0x0006] 0x4D83 nand r3,r3,r3 write_reg(r3,0x0011) [0x0007] 0x5000 nand r4,r0,r0 write_reg(r4,0xFFFF) [0x0008] 0x688C lui r2,0x08C (0x2300) write_reg(r2,0x2300) [0x0009] 0x4D04 nand r3,r2,r4 write_reg(r3,0xDCFF) [0x000A] 0x5503 nand r5,r2,r3 write_reg(r5,0xFFFF) [0x000B] 0x5A03 nand r6,r4,r3 write_reg(r6,0x2300) [0x000C] 0x5E86 nand r7,r5,r6 write_reg(r7,0xDCFF) [0x000D] 0x2904 addi r2,r2,0x0004 (4) write_reg(r2,0x2304) [0x000E] 0x4C81 nand r3,r1,r1 write_reg(r3,0xEEEE) [0x000F] 0x5102 nand r4,r2,r2 write_reg(r4,0xDCFB) [0x0010] 0x5584 nand r5,r3,r4 write_reg(r5,0x3315) [0x0011] 0xFFFF halt fetch_count 18 write_count 0 read_count 0 What does it do? 010aaabbb0000ccc nand ra,rb,rc ra = ~(rb&rc) Maybe you are asking yourself what is up with this weird instruction? Make up your mind AND or NOT, but both? Well what you may or may not know is that if you have a NOR or a NAND you can build all of the other logical operations AND, NOT, OR, XOR. The reference material above describes how this is done using truth tables. Knowing from the description that the nand instruction performs the operation ra = ~(rb&rc), at this point you should have no problem reading this code. After that we will look at what it actually does. lui(r1,0x1100); r1 = 0x1100 addi(r1,r1,0x0011); r1 = r1 + 0x0011 = 0x1100 + 0x0011 = 0x1111 nand(r1,r1,r1); r1 = ~(r1&r1) = ~(0x1111&0x1111) = ~(0x1111) = 0xEEEE nand(r1,r1,r1); r1 = ~(r1&r1) = ~(0xEEEE&0xEEEE) = ~(0xEEEE) = 0x1111 addi(r2,r0,0x003F); r2 = r0 + 0x003F = 0x0000 + 0x003F = 0x003F nand(r3,r1,r2); r3 = ~(r1&r2) = ~(0x1111&0x003F) = ~(0x0011) = 0xFFEE nand(r3,r3,r3); r3 = ~(r3&r3) = ~(0xFFEE&0xFFEE) = ~(0xFFEE) = 0x0011 nand(r4,r0,r0); r4 = ~(r0&r0) = ~(0x0000&0x0000) = ~(0x0000) = 0xFFFF lui(r2,0x2300); r2 = 0x2300 nand(r3,r2,r4); r3 = ~(r2&r4) = ~(0x2300&0xFFFF) = ~(0x2300) = 0xDCFF nand(r5,r2,r3); r5 = ~(r2&r3) = ~(0x2300&0x0011) = ~(0x0000) = 0xFFFF nand(r6,r4,r3); r6 = ~(r4&r3) = ~(0xFFFF&0xDCFF) = ~(0xDCFF) = 0x2300 nand(r7,r5,r6); r7 = ~(r5&r6) = ~(0xFFFF&0x2300) = ~(0x2300) = 0xDCFF addi(r2,r2,0x0004); r2 = r2 + 0x0004 = 0x2300 + 0x0004 = 0x2304 nand(r3,r1,r1); r3 = ~(r1&r1) = ~(0x1111&0x1111) = ~(0x1111) = 0xEEEE nand(r4,r2,r2); r4 = ~(r2&r2) = ~(0x2304&0x2304) = ~(0x2304) = 0xDCFB nand(r5,r3,r4); r5 = ~(r3&r4) = ~(0xEEEE&0xDCFB) = ~(0xCCEA) = 0x3315 halt(); So what kinds of things did we just see? The most obvious is a simple not operation, if the two operands are the same register then we know that anding somthing with itself is itself. So then you take the bitwise not of that and you get the bitwise not, of the input register. nand(r1,r1,r1); r1 = ~(r1&r1) = ~(0x1111&0x1111) = ~(0x1111) = 0xEEEE NOT(0x1111) = 0xEEEE The next thing was a simple and nand(r3,r1,r2); r3 = ~(r1&r2) = ~(0x1111&0x003F) = ~(0x0011) = 0xFFEE nand(r3,r3,r3); r3 = ~(r3&r3) = ~(0xFFEE&0xFFEE) = ~(0xFFEE) = 0x0011 The first nand you and the two operands but your result is inverted, so then you nand the result if the first operatin against itself and that is a not. The net result is an AND operation 0x1111 & 0x003F = 0x0011 From the reference material above we know that an xor operation, a xor b can be computed using this sequence of nand operations, where d, e, and f are temporary variables. d = a nand b e = a nand d f = b nand d c = e nand f nand(r3,r2,r4); r3 = ~(r2&r4) = ~(0x2300&0xFFFF) = ~(0x2300) = 0xDCFF nand(r5,r2,r3); r5 = ~(r2&r3) = ~(0x2300&0x0011) = ~(0x0000) = 0xFFFF nand(r6,r4,r3); r6 = ~(r4&r3) = ~(0xFFFF&0xDCFF) = ~(0xDCFF) = 0x2300 nand(r7,r5,r6); r7 = ~(r5&r6) = ~(0xFFFF&0x2300) = ~(0x2300) = 0xDCFF The net result is r2 xor r4, 0x2300 xor 0xFFFF = 0xDCFF From the reference material above we also know that a or b can be implemented using these instructions, where d and e are temporary variables d = a nand a e = b nand b c = d nand e nand(r3,r1,r1); r3 = ~(r1&r1) = ~(0x1111&0x1111) = ~(0x1111) = 0xEEEE nand(r4,r2,r2); r4 = ~(r2&r2) = ~(0x2304&0x2304) = ~(0x2304) = 0xDCFB nand(r5,r3,r4); r5 = ~(r3&r4) = ~(0xEEEE&0xDCFB) = ~(0xCCEA) = 0x3315 r1 or r2 = 0x1111 or 0x2304 = 0x3315 --------------------------------------------------------------------- Lesson 6: pseudo instruciton OR ---- lesson6.c #include "tinyasm.c" START("lesson6.csv"); lui(r1,0x1100); addi(r1,r1,0x0011); lui(r2,0x2300); addi(r2,r2,0x0004); or(r5,r1,r2,r3,r4); halt(); END ---- lesson6.c When simulated the output is: [0x0000] 0x6444 lui r1,0x044 (0x1100) write_reg(r1,0x1100) [0x0001] 0x2491 addi r1,r1,0x0011 (17) write_reg(r1,0x1111) [0x0002] 0x688C lui r2,0x08C (0x2300) write_reg(r2,0x2300) [0x0003] 0x2904 addi r2,r2,0x0004 (4) write_reg(r2,0x2304) [0x0004] 0x4C81 nand r3,r1,r1 write_reg(r3,0xEEEE) [0x0005] 0x5102 nand r4,r2,r2 write_reg(r4,0xDCFB) [0x0006] 0x5584 nand r5,r3,r4 write_reg(r5,0x3315) [0x0007] 0xFFFF halt fetch_count 8 write_count 0 read_count 0 What does it do? From lesson 5 and the reference material we learned that the operation c = a or b can be computing using nand and two spare registers: d = b nand b e = c nand c a = d nand e This interesting tiny assembler we are using makes it very easy to create pseudo instructions. Instructions with a different name or function or syntax, that can be implemented using one or more other instructions. Examine tinyasm.c and you will see: void do_or ( unsigned int ra, unsigned int rb, unsigned int rc, unsigned int rd, unsigned int re ) { //c = a or b: //d = b nand b //e = c nand c //a = d nand e do_nand(rd,rb,rb); do_nand(re,rc,rc); do_nand(ra,rd,re); } ... #define or(ra,rb,rc,rd,re) do_or(ra,rb,rc,rd,re) You should try making your own pseudo instructions. AND, NOT, XOR. --------------------------------------------------------------------- Lesson 7: BEQ ---- lesson7.c #include "tinyasm.c" START("lesson7.csv"); declare(loop_top); declare(loop_done); addi(r1,r0,0x0005); addi(r2,r0,0x0000); label(loop_top); addi(r2,r2,0x0001); beq(r2,r1,loop_done); beq(r0,r0,loop_top); label(loop_done); halt(); END ---- lesson7.c When simulated the output is: [0x0000] 0x2405 addi r1,r0,0x0005 (5) write_reg(r1,0x0005) [0x0001] 0x2800 addi r2,r0,0x0000 (0) write_reg(r2,0x0000) [0x0002] 0x2901 addi r2,r2,0x0001 (1) write_reg(r2,0x0001) [0x0003] 0xC881 beq r2,r1,0x0005 (0x0001 0x0005) [0x0004] 0xC07D beq r0,r0,0x0002 (0x0000 0x0000) [0x0002] 0x2901 addi r2,r2,0x0001 (1) write_reg(r2,0x0002) [0x0003] 0xC881 beq r2,r1,0x0005 (0x0002 0x0005) [0x0004] 0xC07D beq r0,r0,0x0002 (0x0000 0x0000) [0x0002] 0x2901 addi r2,r2,0x0001 (1) write_reg(r2,0x0003) [0x0003] 0xC881 beq r2,r1,0x0005 (0x0003 0x0005) [0x0004] 0xC07D beq r0,r0,0x0002 (0x0000 0x0000) [0x0002] 0x2901 addi r2,r2,0x0001 (1) write_reg(r2,0x0004) [0x0003] 0xC881 beq r2,r1,0x0005 (0x0004 0x0005) [0x0004] 0xC07D beq r0,r0,0x0002 (0x0000 0x0000) [0x0002] 0x2901 addi r2,r2,0x0001 (1) write_reg(r2,0x0005) [0x0003] 0xC881 beq r2,r1,0x0005 (0x0005 0x0005) [0x0005] 0xFFFF halt fetch_count 17 write_count 0 read_count 0 What does it do? If you are familiar with processors that use flags like the z flag means equal or c flag means carry. We dont have flags here, not for conditional branches. We have a branch if equal instruction which includes the operands to be compared. It does it all in one instruction, good, bad or otherwise. 110aaabbbsssssss beq ra,rb,simm If the contents of regsters ra and rb are equal then the program counter is modified as such pc = (pc+1) + simm, where pc is the address of the beq instruction in question. Normally you dont encode the offset yourself in a branch instruction you let the assembler compute it for you and you use labels. This tiny assembler language is a little strange in the use of labels. In a normal C program you would just put the label: unsigned int ra; unsigned int rb; ra=5; rb=0; loop_top: rb=rb+1; if(rb==ra) goto loop_done; goto loop_top: loop_done: ... And using a normal assembler you would pretty much do the same thing. With this tiny assembler C thing at some point before you use the lable you declare it. declare(loop_top); Then at the place in the code where you want the label you put the label label(loop_top); This program does what the C code does above. We start by loading a 5 into register r1, and then a 0 into register r2. We add 1 to r2 then compare and branch if equal r1 and r2. So when r2 counts to 5 r1 and r2 will be equal and the branch will happen. Otherwise you dont take that branch and you look at the next bran
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。