One of the most frustrating events while running a tool would be to experience a tool crash.
In Specman you would usually see something like:
*** Error: OS signal 11 (segmentation violation) received
See the stack trace in ./specman.err
o Rerun the same test with the same seed in interpreted mode, after
setting "break on error". Load also any previously compiled modules.
** One user module is compiled.
o For help on debugging e code, see "Debugger Commands" and "Source
Code Debugging" in the online help.
o If any user C code is linked into the Specman environment, try
debugging it using a C debugger to ensure that you are not accessing
any null pointers or memories not allocated by the C program.
If the problem is still not resolved, please send to Cadence support :
1. Description of how you have tried to resolve the problem
(Error itself might vary, as we will see shortly)
However, there are some things you can do in such cases that will help you to either resolve the issue, or narrow it down so that Cadence Support can find a faster resolution.
In such event of a Specman crash, Specman will create a specman.err file. Your overall goal would be to determine the cause of the crash and then try to correct or eliminate the cause.
This post comes to help you understand what the elements in the Specman error report are, and how to use the report details, in order to identify the cause and what actions to take.
Error report layout
When Specman crashes, it creates an error report file (specman.err) in the run directory. This file contains the information that can help you identify the cause of the error, so that you can take steps to correct or eliminate the problem.
This specman.err file includes the following parts:
- Top section - specifies the error
- Raw stack trace - stack that contains the interpreted symbols of the compiled code (generated when the compiler is compiling Specman). You might not find this stack trace very useful, but its contents can be very important to Cadence Support and R&D, especially in cases where the crash was caused by a bug in an internal Specman module.
- Interpreted stack trace - provides interpretation of the symbols found in the raw stack trace. It essentially translates C-functions to their corresponding Specman internal methods.
- User View stack trace - Stack trace at the time of the failure, provided in user recognizable terms. This is the stack trace that you are likely to find most useful, because it generally identifies the module and line in the user code where the problem started.
- # of compiled modules - single line that specifies how many files are compiled in this testbench.
- Bottom section (environment data) - the bottom of the error file provides environment data such as the platform you are running on, the patches included in the simulation, packages loaded (including Specman and UVCs, and their versions).
Identifying the problem
Following a Specman crash, first thing you should do is examine the specman.err file, proceeding in the following sequence:
- 1. Examine the error message(s) in the top section to determine the error type (see the list below).
- 2. Examine the User View stack trace and identify to which module and line it points.
- 3. Open the identified module and examine the contents of the identified line. It should provide important details regarding the nature of the problem.
- 4. Handle the problem appropriately, according to the error type (listed below).
This post specifies problem identification, and will provide handling guidelines, for the following error types:
- 1. OS11
- 2. OS11 during garbage collection
- 3. Unhandled OS11
- 4. Memory exceeded absolute_max_size
- 5. Specman internal error
1. OS11 errors
Let's say you got an OS signal 11 error:
OS signal 11 (segmentation violation) at Tue Aug 10 10:46:37 2010
OS signal 11 (segmentation violation) received
As stated in the error message, the first thing to do is check is if the crash happened in compiled mode (that is, if the e-testbench is compiled):
- If so, you should run it in interpreted mode (that is, so that the e-testbench is loaded), and see if you can get a meaningful error message in this mode.
- If you cannot run in interpreted mode, you should recompile the environment with the -debug flag (‘sn_compile.sh -debug ...' or ‘irun -sncompargs ‘-debug'...'). Specman will then add additional checks, which might provide meaningful error messages than provided by the crash.
If the crash persists also after trying one of the above and you don't get any meaningful message, you will need to analyze the crash.
- 1. Examine the User View stack trace and identify to which module and line it points.
- 2. Open the module and examine the line to which the User View stack trace points. See if it contains a call to a C routine.
- 3. Check Interpreted stack trace and see if it points to a user library. For example:
***** Interpreted stack trace:
( 0) 0x85f2601 ahb_create_instance + 0x9 [./libahb_mytb.so]
If (2) or (3) are correct, it probably means that the error comes from your C code, and you now need to debug your own library/C code. In this case:
- 4. You might try setting the environment variable SN_HANDLE_ALL to none (‘setenv SN_HANDLE_ALL none'), and rerun. This instructs Specman not to catch any signal. This will allow other tools to catch their signals, and might facilitate a more informative stack.
- 5. In either case, to debug your C code using gdb, you should
- Recompile your C code with -g.
- Rerun with gdb attached and debug your C code.
For more information regarding debugging C Interface code, please refer to "Incisive Enterprise Specman Elite Testbench Integrators Guide", Chapter 1.16 - Debugging C/C++ Code
2. OS signal 11 during garbage collection
The following is an example of an OS 11 message issued during garbage collection (GC). This error hints that GC probably found inconsistency with the e types that might have been caused by a memory corruption.
Internal Error at Wed Nov 9 04:40:14 2011
: Fatal error
OS signal 11 (segmentation violation) during garbage collection - must exit.
- Set ‘config mem -check_consistency=TRUE', and rerun. Specman will print the path to the corrupted struct, if one exists. This information might help Cadence Support to debug the issue.
- If Specman's check consistency option did not return anything, try to get more information on the memory consumption by turning on the memory debug flags and rerunning; the recommended flags are: ‘config mem -show_mem_raw=TRUE' and 'config mem -print_debug_msgs=TRUE'. Then send the log to Cadence Support.
3. Unhandled OS signal 11
The following is an example of an Unhandled OS signal 11 message:
OS signal 11 (segmentation violation) at Wed Mar 10 13:26:05 2010
Internal Specman Error: unhandled OS signal 11 (segmentation violation).
An unhandled OS signal 11 message often indicates that Specman caught an OS signal which it should not have - for example, a signal that was sent to a third-party tool or to the simulator.
You should examine the User View stack trace and the Interpreted stack trace and see where they point.
In the following example, the User View stack trace clearly points to the simulator:
***** User View stack trace:
( 0x0) simulator
In another example, the Interpreted stack trace points to the simulator:
***** Interpreted stack trace:
( 0) 0x80c42bc
( 1) 0x80c4a0e
( 2) 0x80c5ef5
( 3) 0x80c635e
( 4) 0x839d775
( 5) 0x86a3612 TclInvokeStringCommand + 0x73 [ncsim]
( 6) 0x86a4f5f
( 7) 0x86a597f Tcl_EvalEx + 0x36e [ncsim]
( 8) 0x86a617f Tcl_EvalObjEx + 0x19f [ncsim]
In order to identify the actual source of the crash, set environment variable SN_HANDLE_ALL to none (‘setenv SN_HANDLE_ALL none'), and then rerun. This instructs Specman not to catch any signal; this will allow other tools to catch their signals, and might also facilitate a more informative stack. Note: This flag should only be used for debug purposes.
4. Memory requested from the operating system exceeds limit
The following is an example of the messages issued when Specman reaches the user-defined maximum memory allowance granted to Specman (absolute_max_size):
Internal Error at Mon Nov 7 22:29:34 2011
: Fatal error
Total memory requested from operating system
exceeds get_config(memory,absolute_max_size) (5242880000)
You should try to determine why the memory's absolute_max_size was reached. For example, it might be that
- this memory settings is not high enough for this environment and you need to increase the absolute_max_size value (config memory -absolute_max_size=<higher value>).
- the environment's memory consumption is too high and you need to profile your testbench.
To resolve the later memory consumption problem, search for "Specman memory consumption is too high" in Specman documentation, and ascertain what steps you should take to tackle the problem.
5. Specman internal error
Although there is little you can do if a bug in a Specman module causes an internal error, there are some steps you can take in order to help Cadence Support identify the issue faster.
Internal errors are generally accompanied by additional information that helps to narrow the suspected area of code. If an internal error ((a) in the message below) is accompanied by a message that points to a line in an internal Specman module (d), it generally indicates that the error was caused by a bug in that Specman module (which is detailed in (b) and (c)).
The accompanying User View stack trace shows the calls which lead to the crash:
***** User View stack trace:
( 0) analyzing constraint at line 39 in @intc_basic_types
( 1) gen context #145 tb_env_config_s
( 2) generation static analysis (finalize)
( 3) generation static analysis
( 4) pre specman run
( 5) specman
When you encounter a Specman internal error, you should send the specman.err file to Cadence Support, along with the module file to which the User View stack trace points to. If you cannot send the module file, try sending the code around the line in that module to which the User View stack trace points.
Semadar Sadeh & Avi Farjoun