VLSI Knowledge Transfer: 2016

Thursday, December 22, 2016

Multiple-Commands-in-one-line

This is how we can give multiple commands in one line

A; B Run A and then B, regardless of success of A

A && B Run B if A succeeded

A || B Run B if A failed

A & Run A in background.

Source : Internet

Friday, September 9, 2016

1. GLS is a step in the Design flow to ensure that the design meets the functionality after placement and routing.

2. What all inputs are needed to perform GLS: we Need post-routed netlist, Testbench, SDF (standard delay format file).

3. SDF is meant for Standard Delay format which will have all the delay information for the cell and the wire.

4. To generate SDF: we read in the routed netlist and the Extracted parasitics file(from Extraction Tool say StarRC extraction from Synopsys Inc, SPEF [ Standard Parasitics Extraction Format]).

5.Q .I have a doubt, say if I perform Formal Verification say Logical Equivalence across Gatelevel netlists(Synthesis and post routed netlist). Do you still see a reason behind GLS.

Answer: If we have verified the Synthesized netlist functionality is correct when compared to RTL and when we compare the Synthesized netlist versus Post route netlist logical Equivalence then i think we may not require GLS after P & R(placing and routing). But how do we ensure on Timing sir. To my knowledge, Formal Verification Logical Equivalence Check does not perform Timing checks and don't ensure that the design will work on the operating frequency, so still, I would go for GLS after post route database.

6. Q : I partially agree, say I perform Static Timing Analysis, after post route, I take the post routed netlist and the extracted parasitics file and the Design timing constraints and perform the Design timing checks say all possible checks(setup/hold/clockgating/…) do you still see a reason for GLS after post route.

Answer: I agree STA will check all the possible cases and corners and place the chip in different modes and things like that. But still see that GLS is a super-set over STA.

if by mistake the designer has placed timing exceptions like false-paths,multi-cycle paths, then how we ensure that the design will meet timing requirements, so i feel ,that there should be some mechanism to validate as a counter check, so i still feel GLS is needed after post route design sir.if the design is not synchronous friendly and purely asynchronous design then our STA will not favour us much.I still feel one more reason for GLS is how to ensure that the design will be out of reset and our reset sequences and initialization sequences, boot-ups are fine. So I feel GLS is mandatory though it has limitations of Ensuring the quality of test vectors.

Ensuring that the vectors will cover the complete area of the design (what i mean is the coverage analysis) and simulation run-time and things like that GLS ensure that the “Guarantee for Design Meeting for Functionality”

Gate level simulation represents a small slice of what should actually be tested for a tape-out. They offer a warm feeling that, what you are going to get back will actually work and secondly, they offer some confidence that your static timing constraints are correct.

But the common reason to go for a gate level simulations are as follows:
To check if the reset release, initialization sequence and boot up sequences are proper.
STA tools doesn't verify the asynchronous interfaces.
Unintended dependencies on initial conditions can be found through GLS
Good for verifying the functionality and timing of circuits and paths that are not covered by STA tools
Design changes can lead to incorrect false path/multi cycle path in the design constraints.
It gives an excellent feeling that the design is implemented correctly

So before shipping a design to tape-out, we run a limited set of gate level simulations. Because there are some difficulties associated with this GLS, they are:
Takes a lot of setting up and debugging
Takes a huge amount of computing recourses ( CPU time and disk space for storing wave)
RTL simulations alone take multiple days of run time even for a single regression. GLS takes 10* times.
Generation of debug data (VCD, Debussy) is impossible with GLS

In my opinion, the gate-level simulations are needed mainly to verify any environment and initialization issues.

Source: Internet

Saturday, June 4, 2016

Programming block and Module

always begin 
       @(posedge clk) $display("at the posedge of clk");
       end

and

initial begin 
          forever @(posedge clk) $display("at the posedge of clk");
       end

It is mainly a difference in intent. Some synthesis tools ignore all the code in an initial block thinking they are for simulation only and do not describe hardware to be synthesized.

Technically, there are a few thing you can do with a forever statement that you cannot do with an always block. As a looping statement, you can break out of aforever loop, and if you name the statement, you can disable it. So you can terminate the process created by an initial block. There is no way to terminate the process created by an always block.

Program Block :

The program block came from the Vera verification language that was donated to SystemVerilog. In Vera, a program was a single procedure that represented the "test". Your test was started at time 0 and when the test terminated, the program terminated the simulation. If you needed multiple test threads, you either had to use the fork statement to start it, or use multiple programs. When the last program terminated, the simulation terminated.

As part of the integration with SystemVerilog, the program was turned into a module-like construct with ports and initial blocks are now used to start the test procedure. Because an always block never terminates, it was kept out of the program block so the concept of test termination would still be there.

Today, most people do not utilize this termination feature because the OVM/UVM have their own test termination mechanisms. The program block is no longer a necessary feature of the language other than to help people converting over from Vera to SystemVerilog.

A module can have always block .

Example :

program test;
 initial 
   begin
     fork
       $display($time, " a");
       #10 $display($time, " b");
       #20 $display($time, " c");
       $display($time, " d");
     join_none
     $display($time, " e");
   end
endprogram

Output :
0 e
0 a
0 d

module test;
 initial 
   begin
     fork
       $display($time, " a");
       #10 $display($time, " b");
       #20 $display($time, " c");
       $display($time, " d");
     join_none
     $display($time, " e");
   end
endmodule

Output :
0 e
0 a
0 d
10 b
20 c

Monday, May 30, 2016

difference between $monitor $display & $strobe

$display : print the immediate values
- § 21.2.1 The display and write tasks
$strobe : print the values at the end of the current timestep
- § 21.2.2 Strobed monitoring
$monitor : print the values at the end of the current timestep if any values changed. $monitorcan only be called once; sequential call will override the previous.
- § 21.2.3 Continuous monitoring
$write : same as $display but doesn't terminate with a newline (\n)
- § 21.2.1 The display and write tasks

Example:

reg [3:0] a,b;
integer i;
initial begin
  $monitor("monitor a:%h b:%h @ %0t", a, b, $time);
  for(i=0; i<4; i=i+1) begin
    $strobe("strobe  a:%h b:%h @ %0t", a, b, $time);
    $display("display a:%h b:%h @ %0t", a, b, $time);
    case(i)
      0 : a = 4;
      1 : b = 1;
      2 : begin end // do nothing
      3 : {a,b} = 9;
    endcase
    $display("display a:%h b:%h @ %0t", a, b, $time);
    #1;
  end
end

Outputs: (notice the print order and that monitor is not displayed at time 2)

display a:x b:x @ 0
display a:4 b:x @ 0
monitor a:4 b:x @ 0
strobe a:4 b:x @ 0
display a:4 b:x @ 1
display a:4 b:1 @ 1
monitor a:4 b:1 @ 1
strobe a:4 b:1 @ 1
display a:4 b:1 @ 2
display a:4 b:1 @ 2
strobe a:4 b:1 @ 2
display a:4 b:1 @ 3
display a:0 b:9 @ 3
monitor a:0 b:9 @ 3
strobe a:0 b:9 @ 3

Saturday, May 28, 2016

AXI Protocol and related Interview Questions

Amba AXI is targeted at high performance, suitable for high-speed submicron connect.

Features:

1.separate address/control and data phases

2.support for unaligned data transfer using byte strobes

3. It is backwards compatible with existing AHB and APB interface

Architecture:

1. AXI Protocol is burst based

2. Every transaction has address and control information on the address channel

3. There are 5 channels.

1 . Difference between AXI3 and AXI4

AXI3 Vs AXI4 The difference between AXI3 and AXI4

1. AXI3 supports burst lengths up to 16 beats only. While AXI4 supports burst lengths of up to 256 beats.
2. AXI3 supports write interleaving. AXI4 does NOT support write interleaving
3. AXI3 supports locked transfers, AXI4 does NOT support locked transfers
4. AXI4 supports QoS, AXI3 does NOT suppor QoS.

I have seen many IP providers e.g. Synopsys supporting burst lengths up to 256 beats in AXI3
I have also seen many IP providers e.g. Synopsys NOT supporting write interleaving in AXI3.

Looks like the industry norm is to use AXI3 with burst lenghts up to 256 beats without support for write Interleaving.

2. why there is no separate response channel for read burst ?
I would guess it is because the VALID/READY handshake mechanism only allows for traffic flow in one direction, so for read transactions the traffic flow is slave to master for both data and response, sharing a VALID/READY handshake, whereas for write transactions the data is master to slave, but the response is slave to master, hence the response needing a separate channel to support the required VALID/READY controls.

Difference between AHB and AXI?

What is AXI Lite?

Name five special features of AXI?

Why streaming support,it's advantages?

Write an assertion on handshake signals - ready and valid, ready comes after 5 cycles from the start of valid high?

Explain AXI read transaction

What is the AXI capability of data interleaving?

Explain out-of-order transaction support on AXI?

Explain multiple outstanding address pending?

Any flow control mechanism in AXI?

How to ensure data integrity on AXI?

What is 'last' signal?

What are bursts and transfers?

Maximum size of a transfer?

Write response codes?

What is strobing in AXI?

1) Why there was no Write response for each beat in burst Write. But there is a seperate Read response for each beat in a Read burst ?

Answer : All of the AXI channels pass information in only 1 direction (only the xREADY signal goes against the channel direction), so for a slave to give a response back to the master for a write transaction, would need a separate channel.

I guess this channel could have been defined to include a BRESP for each write data item, but this would increase the bandwidth requirement for this channel, and as in most applications you will just repeat the complete transaction for a non-OKAY response, few applications would make use of the additional detail of which transfer in a write burst caused a failure.

You do give a RRESP response for each read data item because the higher bandwidth channel is already there,

2) How to terminate A read/write burst ? Specification says we can not stop bursts intermittantly.

Answer : Simple answer, you cannot.

As soon as the AXI master indicates that it will perform X number of transfers in a transaction, it must complete X transfers. There is no "Early Burst Termination" concept like there was in AHB.

For write transactions the master could complete the burst, but driving the WSTRB bits all to logic '0' (dummy accesses) so that no data is actually being transferred to the slave, but for read transactions there is no equivalent, and so "real" read accesses will be completed.

In AHB Bursts can be early terminated either as a result of the Arbiter removing the HGRANT to a master part way through a burst, or after a slave returns a non-OKAY response to any beat of a burst. Note however that a master cannot decide to terminate a defined length burst unless prompted to do so by the Arbiter or Slave responses.

All AHB Masters, Slaves and Arbiters must be designed to support Early Burst Termination.

3) Can A master can give WLAST in middle of a burst transfer ?

Answer : No. WLAST can only be asserted while WVALID is high when the final WDATA of a burst is being transferred. Indicating WLAST (and WVALID) too early in a burst would be a protocol violation.
Also, many slave designs will not use the WLAST input, and will simply count data items coming in, so this would not be a safe (or legal) method of terminating a burst.

4) in the same way if slave assersts RLAST before the completion of a busrt read?

Answer : If the slave drives RLAST (and RVALID) too early, this too is a protocol violation, and just as for the WLAST signal, some masters might not be monitoring RLAST, so this illegal use could be missed anyway.

5) If WLAST and RLAST can not do the above cases, then what is the special use of WLAST and RLAST because we are getting individual beat responses anyway?

WLAST and RLAST can be used by masters and slaves that need to be told when the final data in a burst is being transferred.
Most masters and slaves will count the data coming in against how many transfers were indicated on AWLEN and ARLEN, so in these designs the xLAST inputs would not be required.

However to support all master and slave designs, masters must always drive WLAST when appropriate, and slaves must drive RLAST.

6 ) Whats the exact use of Exclusive Read and Write Pair transaction? Where exactly these will be used?

Semaphore passing: Semaphore passing is a software requirement, whereas my background is hardware, so please forgive any vagueness in the following answer.
If you have a shared area of memory used for passing control information between masters (or processes running on a master), you want to make sure that you complete the READ/WRITE sequence without another master changing the shared location.
If your master read the shared memory location, and it was changed by another master before your master could complete the subsequent write to that location, the interim write from the other master would be lost, which could have an impact on how your system works (control information lost)

So Exclusive Accesses are a hardware mechanism to support the software, indicating to the master when it did have uninterrupted access to the shared location, meaning that no write accesses from other masters will be accidentally overwritten.

7) Is there a possibility that A Read transaction can complete in One Cycle?
Section 3.1.4 on 3.4 says that "A default ARREADY value of LOW is possible but not recommended, because it implies that the transfer takes at least two cycles, one to assert ARVALID and another to assert ARREADY"

No.

It would take a minimum of 1 clock cycle to pass the address from the master to the slave (assumes ARREADY was high when ARVALID was asserted), and then a minimum of 1 clock cycle to pass the data from the slave to the master (assumes RREADY was high when RVALID was asserted).

If ARREADY is initially low when an address is signaled on ARVALID, it will take one clock cycle for the slave to sample this ARVALID and then assert ARREADY (if it can accept the address), and the address handshake then completes on the next clock rising edge (when both ARREADY and ARVALID are high). So 2 clock cycles just to pass the address from master to slave if ARREADY defaults to LOW.

It would then take at least a further clock cycle before the read data could be returned to the master.

-------------------------------------------------------------------------------------------------------------------------
Exclusive access in AXI protocol

This mechanism enables the implementation of semaphore type of operation without requiring the bus to remain the locked to a particular master for the duration of the operation.
the advantage of exclusive access is that semaphore type operation does not impact either critical bus access latency or the maximum achievable bandwidth.

four Atomic access: Normal, exclusive, locked and reserved
the signal name for atomic access: AWLOCK n ARLOCK

consider a system in which two AXI master devices are using shared memory and as a system designer, you always will make sure that at a time your one master does not overwrite your memory written by another master.

Consider AXI Master 1 (M1) has initiated exclusive read transaction for address location 12'h100 to 12'h10F. Now slave will start monitoring these addresses for ARID given by M1. Now till exclusive write operation is performed slave monitors that address and if that address is changed by another master M2, it will give an indication of exclusive access failure during the exclusive write transaction and memory will not get updated by M1.

How it works:
What happened in above scenario is that Slave has reserved some memory resource for M1 virtually by given exclusive read request from the master. When the master comes for write transaction for that memory location slave will allow writing that memory resource only if another master device is not using that memory resource other wise data is not written to the memory resource.

So we can avoid memory overwrite problem for shared memory using EXCLUSIVE Access in AXI.

Thursday, May 19, 2016

Extern Keyword in System Verilog

Class methods and Constraints can be defined in the following places:

inside a class.

outside a class in the same file.

outside a class in a separate file.

The process of declaring an out of block method involves:

declaring the method prototype or constraint within the class declaration with extern qualifier.

declaring the full method or constraint outside the class body.

The extern qualifier indicates that the body of the method (its implementation) or constraint block is to be found outside the declaration.

NOTE : class scope resolution operator :: should be used while defining.

EXAMPLE:
class B;
extern task printf();
endclass

task B::printf();
$display(" Hi ");
endtask

program main;
initial
begin
B b;
b = new();
b.printf();
end
endprogram

RESULT:

Hi

#from testbench.in

Sunday, April 24, 2016

Clock Generator

1. A Clock is the main synchronizing events to which all other signals are referenced.
2. Some testbenchs need more than one clock generator.
3. So testbench need clock with different phases
4 .some other need clock generator with jitter.
( jitter affect the effective cycle time )

module Tb();
reg clock;
integer no_of_clocks;

parameter CLOCK_PERIOD = 5;
initial no_of_clocks = 0;
initial clock = 1'b0;

always #(CLOCK_PERIOD/2) clock = ~clock;

always@(posedge clock)
no_of_clocks = no_of_clocks +1 ;

initial
begin
#50000;
$display("End of simulation time is %d , total number of clocks seen is %d expected is %d",$time,no_of_clocks,($time/5));
$finish;
end
endmodule
RESULTS:

End of simulation time is 50000 , total number of clocks seen is 12500 expected is 10000

Note : 1. There are 25 % of more clocks than expected. The reason is half clock period is 2 insted of 2.5
2. Make sure that CLOCK_PERIOD is evenly divided by two.
3. If CLOCK_PERIOD is odd, the reminder is truncated the frequency of the clock generated in not what expected.

4. If integer division is replaced by real division, the result is rounded off according to the specified resolution.

example regarding 4th point

module Tb();
reg clock;
integer no_of_clocks;

parameter CLOCK_PERIOD = 5;

initial no_of_clocks = 0;
initial clock = 1'b0;

always #(CLOCK_PERIOD/2.0) clock = ~clock;

always@(posedge clock)
no_of_clocks = no_of_clocks +1 ;

initial
begin
#50000;
$display("End of simulation time is %d , total number of clocks seen is %d expected is %d",$time,no_of_clocks,($time/5));
$finish;
end
endmodule

RESULTS:

End of simulation time is 50000 , total number of clocks seen is 8333 expected is 10000

Look at the result, total number of clock seen are 8333, where the rest of the clocks have gone? There is some improvement than earlier example. But the results are not proper. Well that is because of `timeprecision. By default time precision is 1ns/1ns. Half of the clock period is 2.5 . It is rounded of to 3 . So total time period is 6 and resulted 8333 clocks( 50000/6) instead of (50000/5). 2.5 can be rounded to 3 or 2 . LRM is specific about this. So try out this example on your tool. You may see 12500.

Timescale And Precision Enlightment:

Delay unit is specified using 'timescale, which is declared as `timescale time_unit base / precision base
--time_unit is the amount of time a delay of 1 represents. The time unit must be 1 10 or 100
--base is the time base for each unit, ranging from seconds to femtoseconds, and must be: s ms us ns ps or fs
--precision and base represent how many decimal points of precision to use relative to the time units.

Time precision plays major role in clock generators. For example, to generate a clock with 30% duty cycle and time period 5 ns ,the following code has some error.

EXAMPLE:
`timescale 1ns/100ps
module Tb();
reg clock;
integer no_of_clocks;

parameter CLOCK_PERIOD = 5;
initial clock = 1'b0;
always
begin
#(CLOCK_PERIOD/3.0) clock = 1'b0;
#(CLOCK_PERIOD - CLOCK_PERIOD/3.0) clock = 1'b1;
end

initial no_of_clocks = 0;

always@(posedge clock)
no_of_clocks = no_of_clocks +1 ;

initial
begin
#50000;
$display(" End of simulation time is %d , total number of clocks seen is %d expected is %d",$time,no_of_clocks,($time/5));
$finish;
end
endmodule
RESULTS:

End of simulation time is 50000 , total number of clocks seen is 9999 expected is 10000

Now CLOCK_PERIOD/3.0 is 5/3 which is 1.666. As the time unit is 1.0ns, the delay is 1.666ns. But the precision is 100ps. So 1.666ns is rounded to 1.700ns only.
and when (CLOCK_PERIOD - CLOCK_PERIOD/3.0) is done, the delay is 3.300ns instead of 3.333.The over all time period is 5.If the clock generated is implemented without taking proper care, this will be the biggest BUG in testbench.

All the above clock generators have hard coded duty cycle. The following example shows the clock generation with parameterizable duty cycle. By changing the duty_cycle parameter, different clocks can be generated. It is beneficial to use parameters to represent the delays, instead of hard coding them. In a single testbench, if more than one clock is needed with different duty cycle, passing duty cycle values to the instances of clock generators is easy than hard coding them.

NOTE: Simulation with `timescale 1ns/1ns is faster than `timescale 1ns/10ps
A simulation using a `timescale 10ns/10ns and with `timescale 1ns/1ns will take same time.

EXAMPLE:
parameter CLK_PERIOD = 10;
parameter DUTY_CYCLE = 60; //60% duty cycle
parameter TCLK_HI = (CLK_PERIOD*DUTY_CYCLE/100);
parameter TCLK_LO = (CLK_PERIOD-TCLK_HI);

reg clk;

initial
clk = 0;

always
begin
#TCLK_LO;
clk = 1'b1;
#TCLK_HI;
clk = 1'b0;
end

Make sure that parameter values are properly dividable. The following example demonstrates how the parameter calculations results. A is 3 and when it is divided by 2,the result is 1.If integer division is replaced by real division, the result is rounded off according to the specified resolution. In the following example is result of real number division.

EXAMPLE:
module Tb();

parameter A = 3;
parameter B = A/2;
parameter C = A/2.0;

initial
begin
$display(" A is %e ,B is %e ,C is %e ",A,B,C);
end

endmodule
RESULTS:

A is 3.000000e+00 ,B is 1.000000e+00 ,C is 1.500000e+00

above examples are useful to avoid to generate a buggy clock ,
for more info about jitter and multiplier , you can visit the below link , all the above examples are taken from here only :

http://www.testbench.in/TB_08_CLOCK_GENERATOR.html

VLSI Knowledge Transfer