Thursday, May 16, 2019

Risc & Pipelining

What is bring down discipline set computing Architecture? * reduced cultivation set computing stands for Reduced affirmation Set Computer. * An nurture set is a set of information manual that helps the user to construct machine language political platforms to do computable tasks. History * In early days, the mainframes consumed a lot of resources for outgrowths * Due to this, in 1980 David Paterson, University of Berkeley introduced the reduced instruction set computer concept. * This include fewer instructions with simple constructs which had faster execution, and less remembrance usage by the mainframe computer. * Approximately a year was offspringn to design and fabricate reduced instruction set computer I in silicon * In 1983, Berkeley RISC II was produced.It is with RISC II that RISC idea was opened to the industry. * In later years it was embodied into Intel Processors * After or so years, a revolution in any casek place among the two Instruction Sets. * Wher eby RISC started incorporating much complex instructions and complex instruction set computer started to reduce the complexity of their instructions. * By mid 1990s about RISC processors became more complex than complex instruction set computer * In todays date the difference between the RISC and CISC is blurred. Characteristics and Comparisons * As menti peerlessd, the difference between RISC and CISC is getting eradicated. But these were the initial differences between the two.RISC CISC Fewer instructions More (100-250) More registers hence more on chip memory (faster) Less registers Operations done within the registers of the mainframe Can be done external to CPU eg memory Fixed distance instruction format hence easily decipherd Variable length Instruction execution in one measure cycle hence simpler instructions In multiple quantify cycles Hard wired hence faster Micro programmed Fewer addressing modes A variety Addressing modes Register direct. warm addressing, Absolute addressing Give examples on one set of instructions for a particular operation, Instruction Formats ttp//www-cs-faculty. stanford. edu/eroberts/courses/soco/projects/2000-01/reduced instruction set computer/risccisc/ Advantages and Disadvantages * Speed of instruction execution is improved * Quicker condemnation to market the processors since few instructions take less time to design and fabricate * Smaller chip size because fewer transistors ar involve * Consumes lower power and hence dissipates less heat * Less expensive because of fewer transistors * Because of the fixed length of the instructions, it does non use the memory efficiently * For complex operations, the bod of instructions will be largerPipelining The product line of pipelining is thought to be in the early 1940s. The processor has special(prenominal)ised units for execution all(prenominal) represent in the instruction cycle. The instructions be performed concurrently. It is wish an assembly line. IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS term Steps ( times) Pipelining is used to accelerate the speed of the processor by everyplacelapping various stages in the instruction cycle. It improves the instruction execution bandwidth. Each instruction takes 5 clock cycles to complete.When pipelining is used, the first instruction takes 5 clock cycles, but the next instructions finish 1 clock cycle after the previous one. Types of Pipelining in that location are various types of pipelining. These include Arithmetic assembly line, Instruction pipeline, superpipelining, superscaling and vector processing Arithmetic pipeline employ to deal with scientific problems like floating point operations and fixed point multiplications. There are different segments or sub operations for these operations. These buttocks be performed concurrently leading to faster execution.Instruction pipeline This is the ordinary pipelining, which have been explained before. Pipeline Hazards Data Dependency When two or more instructions undertake to share the same selective information resource. When an instruction is trying to access or edit entropy which is world modified by another instruction. There are trine types of data dependency RAW watch After Write This happens when instruction ij reads before instruction ii writes the data. This federal agency that the measure out read is too old. WAR Write After Read This happens when instruction ij writes before instruction ii reads the data.This meaning that the rate read is too new. WAW Write After Write This happens when instruction ij writes before instruction ii writes the data. This means that a wrong esteem is stored. Solutions Data Dependency * Stall the pipeline This means that a data dependency is predicted and the consequent instructions are not allowed to enter the pipeline. There is a need for special hardware to predict the data dependency. Also a time baffle is caused * Flush the pipeline This means that when a data dependency occurs, all other instructions are removed from the pipeline. This similarly causes a time delay. Delayed load Insertion of No Operation operating instructions in between data dependent instructions. This is done by the compiler and it avoids data dependency Clock roulette wheel 1 2 3 4 5 6 1. Load R1 IF OE OS 2. Load R2 IF OE OS 3. Add R1 + R2 IF OE OS 4. Store R3 IF OE OS Clock Cycle 1 2 3 4 5 6 7 1. Load R1 IF OE OS 2. Load R2 IF OE OS 3. NOP IF OE OS 4. Add R1 + R2 IF OE OS 5. Store R3 IF OE OS Branch Dependency this happens when one instruction in the pipeline branches into another instruction.Since the instructions have already entered the pipeline, when a branch occurs this means that a branch penalty occurs. Solutions Branch Dependency 1. Branch prevision A branch to an instruction to an instruction and its outcome is predicted and instructions are pipelined accordingly 2. Branch stern buffer 3. Delayed Branch The compiler predicts branch dependencies and rearranges the commandment in such a instruction that this branch dependency is avoided. No operation instructions can also be used. No operation instructions 1. LOAD MEM100 R1 2. INCREMENT R2 3. ADD R3 R3 + R4 4. SUB R6 R6-R5 . BRA X Clock Cycle 1 2 3 4 5 6 7 8 9 1. Load IF OE OS 2. Increment IF OE OS 3. Add IF OE OS 4. get off the ground IF OE OS 5. Branch to X IF OE OS 6. following instructions IF OE OS Clock Cycle 1 2 3 4 5 6 7 8 9 1. Load IF OE OS 2. Increment IF OE OS 3. Add IF OE OS 4. Subtract IF OE OS 5. Branch to X IF OE OS 6. NOP IF OE OS 7. Instructions in X IF OE OS Adding NOP InstructionsClock Cycle 1 2 3 4 5 6 7 8 1. Load IF OE OS 2. Increment IF OE OS 3. Branch to X IF OE OS 4. Add IF OE OS 5. Subtract IF OE OS 6. Instructions in X IF OE OS Re arranging the instructions Intel Pentium 4 processors have 20 stage pipelines. Today, most of these circuits can be found embedded in spot most micro-processors. Superscaling It is a form of balance combined with pipelining. It has a redundant execution unit which provides for the parallelism. Superscalar 1984 Star Technologies Roger ChenIF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS Superpipelining It is the implementation of longer pipelines that is pipelines with more stages. It is mainly useful when some stages in the pipeline take longer than the others. The longest stage determines the clock cycle. So if these long stages can be broken down into smaller stages, wherefore the clock cycle time can be reduced.This reduces time wasted, which will be significant if a mo of instructions are performed. Superpipelining is simple because it does not need any additional hardware like for superscaling. There will b e more side effects for superpipelining since the number of stages in the pipeline is increased. There will be a longer delay caused when there is a data or branch dependency. vector Processing Vector Processors 1970s Vector Processors pipeline the data also not just the instructions. For example, if many numbers need to be added unitedly like adding 10 pairs of numbers, in a normal processor, each pair will be added at a time.This means the same sequence of instruction fetching and decoding will have to be carried out 10 times. But in vector processing, since the data is also pipelined, the instruction fetch and decode will notwithstanding occur once and the 10 pairs of numbers (operands) will be fetched altogether. Thus the time to process the instructions are reduced significantly. C(110) = A(110) + B(110) They are mainly used in specialised applications like long range weather forecasting, artificial intelligence systems, image processing etc.Analysing the slaying limitation s of the rather constituted CISC style architectures of the period, it was discovered very(prenominal) quickly that operations on vectors and matrices were one of the most demanding CPU bound numerical computational problems faced. RISC Pipelining RISC has simple instructions. This simplicity is utilised to reduce the number of stages in the instruction pipeline. For example the Instruction Decode is not necessary because the encoding in RISC architecture is simple. Operands are all stored in the registers hence there is no need to fetch them from the memory.This reduces the number of stages further. Therefore, for pipelining with RISC architecture, the stages in the pipeline are instruction fetch, operand execute and operand store. Because the instructions are of fixed length, each stage in the RISC pipeline can be executed in one clock cycle. Questions 1. Is vector processing a type of pipelining 2. RISC and pipelining The simplest way to examine the advantages and disadvantages of RISC architecture is by contrasting it with its predecessor CISC (Complex Instruction Set Computers) architecture. Multiplying Two Numbers in MemoryOn the right is a diagram representing the storage scheme for a generic computer. The main memory is divided into locations numbered from (row) 1 (column) 1 to (row) 6 (column) 4. The execution unit is responsible for carrying out all computations. However, the execution unit can only operate on data that has been loaded into one of the six registers (A, B, C, D, E, or F). Lets say we want to make up ones mind the product of two numbers one stored in location 23 and another stored in location 52 and then store the product back in the location 23. The CISC ApproachThe primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. This is achieved by building processor hardware that is capable of understanding and executing a series of operations. For this particular task, a CISC processor would com e prepared with a specialised instruction (well call it MULT). When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers can be correct with one instruction MULT 23, 52MULT is what is known as a complex instruction. It operates directly on the computers memory banks and does not require the programmer to explicitly call any loading or storing functions. It most resembles a command in a higher level language. For instance, if we let a represent the value of 23 and b represent the value of 52, then this command is identical to the C statement a = a * b. One of the primary advantages of this system is that the compiler has to do very infinitesimal work to translate a high-level language statement into assembly.Because the length of the code is relatively short, very little RAM is required to store instructions . The fierceness is put on building complex instructions directly into the hardware. The RISC Approach RISC processors only use simple instructions that can be executed within one clock cycle. Thus, the MULT command described above could be divided into three separate commands LOAD, which moves data from the memory bank to a register, PROD, which finds the product of two operands located within the registers, and butt in, which moves data from a register to the memory banks.In order to perform the exact series of steps described in the CISC approach, a programmer would need to code four lines of assembly LOAD A, 23 LOAD B, 52 PROD A, B STORE 23, A At first, this whitethorn seem like a much less efficient way of completing the operation. Because there are more lines of code, more RAM is needed to store the assembly level instructions. The compiler must also perform more work to convert a high-level language statement into code of this form. CISC RISC Emphasis on hardware Emphas is on software Includes multi-clock complex instructions Single-clock, educed instruction only Memory-to-memory LOAD and STORE incorporated in instructions Register to register LOAD and STORE are independent instructions Small code sizes, high cycles per second Low cycles per second, large code sizes Transistors used for storing complex instructions Spends more transistors on memory registers However, the RISC strategy also brings some very important advantages. Because each instruction requires only one clock cycle to execute, the entire program will execute in approximately the same amount of time as the multi-cycle MULT command.These RISC reduced instructions require less transistors of hardware space than the complex instructions, leaving more room for general purpose registers. Because all of the instructions execute in a uniform amount of time (i. e. one clock), pipelining is possible. Separating the LOAD and STORE instructions actually reduces the amount of work that the computer must perform. After a CISC-style MULT command is executed, the processor automatically erases the registers. If one of the operands needs to be used for another computation, the processor must re-load the data from the memory bank into a register.In RISC, the operand will remain in the register until another value is loaded in its place. The Performance Equation The following equation is commonly used for expressing a computers performance ability The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program. RISC Roadblocks Despite the advantages of RISC based processing, RISC chips took over a decade to gain a foothold in the commercial-grade world. This was largely due to a lack of software support.Although Apples Power Macintosh line featured RISC-based chips and Windows NT was RISC com patible, Windows 3. 1 and Windows 95 were knowing with CISC processors in mind. Many companies were unwilling to take a chance with the emerging RISC technology. Without commercial interest, processor developers were unable to manufacture RISC chips in large enough volumes to make their price competitive. other major setback was the presence of Intel. Although their CISC chips were becoming increasingly unwieldy and difficult to develop, Intel had the resources to plow through with(predicate) development and produce powerful processors.Although RISC chips might surpass Intels efforts in specific areas, the differences were not corking enough to persuade buyers to change technologies. The Overall RISC Advantage Today, the Intel x86 is arguable the only chip which retains CISC architecture. This is primarily due to advancements in other areas of computer technology. The price of RAM has decreased dramatically. In 1977, 1MB of drachm cost about $5,000. By 1994, the same amount of memory cost only $6 (when familiarised for inflation). Compiler technology has also become more sophisticated, so that the RISC use of RAM and emphasis on software has become ideal.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.