<< >> Title Contents Index Home Help

2 PGI Workstation 4.0
Release Notes

This document describes changes between the Workstation 4.0 and previous releases, as well as late-breaking information not included in the current printing of the PGI User's Guide.

2.1 Workstation 4.0 Contents

The PGI Workstation 4.0 includes the following components:

Depending on the product you purchased, you may not have received all of the above components.

2.2 Supported Systems and Licensing

PGI Workstation 4.0 is supported on systems using the Intel Pentium or Pentium Pro/II/III/4 or compatible processors, including AMD Athlon/AthlonXP, and running Linux with a kernel version of 2.2.10 or above, Solaris 7 for Intel or higher, or Win32 operating systems including NT 4.0, Win98, Win2K, and WinXP. This includes versions of Linux that use glibc2.2.x, such as Redhat 6.0 to 7.3, and SuSE 6.1 to 8.0. For more information about release levels and operating systems supported, go to http://www.pgroup.com/faq/install.htm .

The compilers and tools are license-managed. For Workstation products using PGI-style licensing (the default), a single user can run as many simultaneous copies of the compiler as desired, on a single system, and no license daemon or Ethernet card is required. However, usage of the compilers and tools is restricted to a pre-specified username. If you would like our compilers and tools to be usable under any username, you must request FLEXlm-style license keys and use FLEXlm-style licensing. See section 1, The PGI Workstation 4.0 Installation Notes, for a more detailed description of licensing. http://www.pgroup.com/faq/install.htm#install1m covers many of the online license generation questions.

The PGI Workstation 4.0 is supported on systems using Intel IA32 (e.g. Pentium or Pentium Pro/II/III/4) or AMD Athlon processors running Linux with a kernel version of 2.2.10 or above, Solaris 7 for Intel or higher, or Win32 operating systems including NT 4.0, Win98, and Win2K. The latest 7.x releases of Linux, like Redhat 7.0 and SuSE 7.1, which use glibc2.2.x, are now supported as well. Our Workstation 4.0 extends support to Red Hat 7.2 and SuSE 7.3 versions, as well as Windows XP.

2.3 New Features

The PGI Workstation 4.0 includes the following components

Following are the new features included in The PGI Workstation 4.0:

2.4 New Compiler Options

2.4.1 New Generic Options

Several new or updated generic compiler options (options which apply to all of our compilers) are present in release 4.0. These are in addition to those previously documented in the 3.3 Release Notes (http://www.pgroup.com/docs.htm) . Some prior switches are mentioned here as well, to provide context or clarify their use.

2.4.2 New Win32 Compiler Options

No new Win32 specific compiler options are present in 4.0.

2.5 OpenMP Directives and Pragmas

Full support for the OpenMP Fortran Application Program Interface, Version 1.1 is included in release 4.0 of the PGF77 and PGF90 compilers. The PGI User's Guide, Chapter 10, contains a complete description of the OpenMP directives, functions, and environment variables supported by the PGF77 and PGF90 compilers.

Full support for the OpenMP C and C+ Application Program Interface, Version 1.0 is included in release 4.0 of the PGCC ANSI C and C++ compilers. The PGI User's Guide, Chapter 11, contains a complete description of the OpenMP pragmas, functions, and environment variables supported by the PGCC compilers.

For more information on the OpenMP programming model or to obtain copies of the OpenMP API specifications, see the URL http://www.openmp.org.

2.6 Pentium III, Pentium 4, Athlon, and AthlonXP Support

2.6.1 Pentium III Support

The 4.0 compilers support the Pentium III SSE and prefetch instructions where possible. The compilers also support the 3DNow! Professional Instructions of the AthlonXP. See the 3.3 Release Notes at http://www.pgroup.com/docs.htm for more information about this capability.

2.6.2 Pentium 4 Support

Release 4.0 of the PGI compilers support the Pentium 4 SSE and SSE2 instructions, when invoking the compiler switch -Mvect=sse on a Pentium 4, or in conjunction with the new -tp p7 cpu-type code generation switch. In addition.

With Release 4.0, the new switches -Mscalarsse and -fastsse are introduced. -Mscalarsse utilizes, if possible, the SSE and SSE2 instructions of chips such as the Pentium 4. -fastsse combines several switches ( currently -`-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz') that should work together to improve performance. Note that some older versions of assemblers do not accept the SSE2 type instructions, and it is the user's responsibility to obtain versions that accept the assembler source the compiler produces, if they wish to use these features.

Examples of improved performance using optimizations is available at http://www.pgroup.com/faq/perf_notes.htm. .

2.6.3 AthlonXP 3DNow! Professional Instructions

Release 4.0 supports the 3DNow! Professional Instructions of the AthlonXP cpu-type. These instructions are utilized when the switch set -Mvect=sse or -Mscalarsse or -fastsse is used on an AthlonXP based platform, or the same switches with -tp athlonxp will work any platform running the compilers. See http://www.pgroup.com/faq/perf_notes.htm for examples.

2.7 New Fortran Features

2.7.1 OPEN specifier for Byte Swapping

All fortran languages (pgf77, pgf90, pghpf) now support an OPEN statement specifier which allows for byte-swapping I/O to be performed on specific logical units. Previously, byte-swapping I/O was only enabled by the command-line option, -byteswapio, and was applied to all unformatted I/O operations which appeared in the files compiled -byteswapio. The new OPEN specifier is

   CONVERT=<char expr>
where <char expr> is one of

The value 'BIG_ENDIAN' is used to convert big-endian format data files produced by most RISC workstations and high-end servers to the little-endian format used on Intel Architecture systems on-the-fly during file reads/writes. This value assumes that the record layouts of unformatted sequential access and direct access files are the same on the systems.

For the values 'LITTLE_ENDIAN' and 'NATIVE", byte-swapping is not performed during file reads/writes since the little-endian format is used on Intel Architecture systems.

2.7.2 OPEN Form specifier for Byte Streams

All fortran (pgf77, pgf90, pghpf) now support a new value, 'BINARY', to the FORM specifier in the OPEN statement which allows for binary unformatted I/O to be performed on specific logical units. For an unformatted file whose form is 'BINARY', the file is just viewed as a byte-stream file, such as a file created by fwrite() calls in a C program; the data in the file is not organized into records.

2.8 -Mlfs - Large File Support

2.8.1 -Mlfs

For Linux Kernels 2.4.x and above, files larger than 2GB are now accessible. Users who program in C for file I/O will need to call the new routines, but for pgf77/pgf90/pghpf, where I/O is part of the language, support for large files is handled by including an additional directory to the link stage. Before, for example, pgf77 a.f b.f -o c Would limit you to less than 2GB file size. To incorporate large file support, pgf77 a.f b.f -o c -Mlfs

will incorporate the large file support versions. Note that using -Mlfs on small files will also work, but it may be slower than the standard lib I/O. pgcc and pgCC do not have integrated I/O like fortran, and the user must call special stdio functions that Linux provides to access large files.

2.9 IPA - Inter-procedural Analysis

2.9.1 -Mipa

For Linux Kernels 2.4.x and above, files larger inter-procedural analysis optimization technology has been added to pgf90.

The following table describes the command line switch options associated with -Mipa optimizations. To learn more about -Mipa,see the 3.3 Release Notes at http://www.pgroup.com/docs.htm

-Mipa arguments



No option

Same as -Mipa=const


Constant propogation


Pointer disambiguation


Recognize cache-line aligned pointers


removes arguments replaced by -Mipa=ptr,const


eliminates functions that are not called


optimizes references to globals


Short for: Mipa=const,globals,ptr,vestigial


Assume references without .ip* files are `safe'


run phase one of IPA (data collection), without generating code.


the default, removes stale object files at IPA link time. norm prevents this behavior

2.10 Building with -Mpf

The 4.0 release profile feedback capability is substantially new, and its affect on performance improvement is very limited.

The pgf77/pgcc/pgf90 compilers perform profile-directed feedback optimizations using the -Mpf command-line switch in combination with the -Mipa command-line switch. The steps in this process are as follows:

1. Compile and link an executable including the -Mprof=lines compile/link option, together with any other command-line options already in use

2. Run the resulting executable; an pgprof.out tracefile is automatically generated and saved in the current working directory

3. Re-compile and link the program using the -Mipa and -Mpf=pgprof.out compile/link command-line switches; the resulting executable is optimized using IPA and optimizations derived from profile data in the pgprof.out tracefile.

The resulting file that is instrumented for line profiling will be quite slow relative to compiling without -Mprof=lines.

2.11 Debugging with PGDBG

2.11.1 PGDBG 4.0 New Features

PGDBG 4.0 can debug SMP OpenMP (or linux pthread) programs. The PGI license file restricts the total number of threads that PGDBG will debug.

PGDBG's parallel debug capabilities are extensively documented in the PGDBG User's Guide, available from http://www.pgroup.com/docs.htm or $PGI/doc/index.htm. This documentation is intended to supplement Chapter 15 of the PGI User's Guide.

The following enhancements are included in PGDBG 4.0:

2.11.2 PGDBG 4.0 Technical Information

Here are a number of details not documented in the PGDBG User's Guide.

Threads and Signals

PGDBG intercepts all signals sent to any of the threads in a multi-threaded program, and passes them on according to that signal's disposition maintained by PGDBG (see the catch, ignore commands).

If a thread runs into a busy loop, or if the program runs into deadlock, control-C over the debugging command line to interrupt the threads. This causes SIGINT to be sent to all threads. By default PGDBG does not relay SIGINT to any of the threads, so in most cases program behavior is not affected.

Sending a SIGINT (control-C) to a program while it is in the middle of initializing its threads (calling omp_set_num_threads(), or entering a parallel region ) may kill some of the threads if the signal is sent before each thread is fully initialized. Avoid sending SIGINT in these situations. When the number of threads employed by a program is large, thread initialization may take a while.

Signals Used by Internally by PGDBG

SIGTRAP indicates a breakpoint has been hit. A message is displayed whenever a thread hits a breakpoint. SIGSTOP is used internally by PGDBG. Its use is mostly invisible to the user. Changing the disposition of these signals in PGDBG will result in undefined behavior.

Reserved Signals: On Linux86, the thread library uses SIGRT1, SIGRT3 to communicate among threads internally. In the absence of real-time signals in the kernel, SIGUSR1, SIGUSR2 are used. Changing the disposition of these signals in PGDBG will result in undefined behavior.


Nested Subroutines

To reference a nested subroutine you must qualify its name with the name of its enclosing function using the scoping operator @.

For example:

subroutine subtest (ndim)
integer(4), intent(in) :: ndim
integer, dimension(ndim) :: ijk
call subsubtest ()
subroutine subsubtest ()
integer :: I
ijk(1) = 1
end subroutine subsubtest
subroutine subsubtest2 ()
ijk(1) = 1
end subroutine subsubtest2
end subroutine subtest
program testscope
integer(4), parameter :: ndim = 4
call subtest (ndim)
end program testscope

pgdbg> break subtest@subsubtest
breakpoint set at: subsubtest line: 8 in "ex.f90" address: 0x80494091
pgdbg> names subtest@subsubtest
i = 0
pgdbg> decls subtest@subsubtest
integer*4 i;
pgdbg> whereis subsubtest
function: "ex.f90"@subtest@subsubtest

Fortran 90 Modules

To access a member mm of a Fortran 90 module M you must qualify mm

with M using the scoping operator @. If the current scope is M the qualification can be omitted.

For example:

module M
implicit none
real mm
subroutine stub
print *,mm
end subroutine stub
end module M program test
use M
implicit none
call stub()
print *,mm
end program test

pgdbg> Stopped at 0x80494e3, function MAIN, file M.f90, line 13
#13: call stub()
pgdbg> which mm
pgdbg> print "M.f90"@m@mm
pgdbg> names m
mm = 0
stub = "M.f90"@m@stub
pgdbg> decls m
real*4 mm;
subroutine stub();
pgdbg> print m@mm
pgdbg> break stub
breakpoint set at: stub line:6 in "M.f90" address: 0x8049446 1
pgdbg> c
Stopped at 0x8049446, function stub, file M.f90, line 6
Warning: Source file M.f90 has been modified more recently than object file
#6: print *,mm
pgdbg> print mm

Lexical Blocks

Line numbers are used to name lexical blocks. The line number of the first instruction contained by a lexical block indicates the start scope of the lexical block.

Below variable var is declared in the lexical block starting at line 5. The lexical block has the unique name "lex.c"@main@5. The variable var declared in "lex.c"@main@5 has the unique name "lex.c"@main@5@var.

For Example:

int var = 0;
int var = 1;
printf("var %d\n",var);
printf("var %d\n",var)
} pgdbg> n
Stopped at 0x8048b10, function main, file
/home/pete/pgdbg/bugs/workon3/ctest/lex.c, line 6
#6: printf("var %d\n",var);
pgdbg> print var
pgdbg> which var
pgdbg> whereis var
variable: "lex.c"@main@var
variable: "lex.c"@main@5@var
pgdbg> names "lex.c"@main@5
var = 1

Private Variables

PGDBG understands private variables with some restrictions. In particular, inspecting private variables while debugging FORTRAN programs is not supported.

Private variables in C must be declared in the enclosing lexical block of the parallel region in order for them to be visible using PGDBG.

For example:

#pragma omp parallel
int i;
/* i is private to 'this' thread */

In the above case, i would be visible inside PGDBG for each thread. However, in the following example, i is not visible inside PGDBG:

int i;
#pragma omp parallel private(i)
/* i is private to 'this' thread
but not visible within PGDBG */

A private variable of a Thread A is accessed by switching the current thread to A, and by using the name (qualified if necessary) of the private variable.

Miscellaneous GUI Issues

Setting the Font

Use the xlsfonts command to list all fonts installed on your system, then choose one you like. For this example, we choose a sony font that is completely specified by the following string:


There are two ways to set the font that your PGDBG GUI uses.

  1. Use your .Xresources file:

    Xpgdbg*font : <chosen font>
    pgdbg*font : <chosen font>

    For example:

    pgdbg*font : -sony-fixed-medium-r-normal--24-230-75-75-c-120-iso8859-1  

    You will have to merge these changes into your X environment for them to take effect. You can use the following command:

           % xrdb -merge $HOME/.Xresources
  2. Use the command line options : -fn <font>. For example:

    % pgdbg -fn -sony-fixed-medium-r-normal--0-0-100-100-c-0-jisx0201.1976-0...

Control-C from GUI

The active window must be the command window (upper window) where the PGDBG prompt appears for control-C to interrupt the program being debugged.

Shared Object Files

PGDBG supports debugging of dynamically linked executables that reference shared object files created using the compilers. If the executable being debugged is dynamically linked, PGDBG will report when each shared object is loaded and/or unloaded.

For example:

  pgdbg> ...
pgdbg> n
Stopped at 0x8048bee, function main, file
dynload.c, line 36
#36: handle = dlopen("libpetesSO2.so",RTLD_NOW);
pgdbg> n
libpetesSO2.so loaded by ld-linux.so.2.
Stopped at 0x8048c31, function main, file
dynload.c, line 41
#41: if (handle){
pgdbg> n
Stopped at 0x8048c37, function main, file
dynload.c, line 42
#42: dlclose(handle);
pgdbg> n
libpetesSO2.so unloaded by ld-linux.so.2.
Stopped at 0x8048c42, function main, file
dynload.c, line 45
#45: }
pgdbg> ...

The global symbols defined by a dynamically linked shared object are visible during a PGDBG debug session. These symbols are currently available only without type and line number information. The machine level PGDBG commands (breaki, dump, hwatch, disasm, etc) are useful for inspecting these symbols. Each symbol is available with respect to the load status of its defining shared object.

For example, dynamically-linkable Position Independent Code (PIC) is implemented using a Procedure Linkage Table (PLT) and Global Offset Table (GOT). Each PIC function is bound lazily at run-time. If a function has not been linked dynamically, PGDBG reports the address of its PLT entry as its address. If a function has been linked dynamically, PGDBG reports the virtual address of the function itself. So, PGDBG reports the current or "effective" address of symbols with respect to dynamic linking and loading. PGDBG treats global symbols defined in shared objects in a similar way. The address of a global variable may be the address of its GOT entry or an absolute address, depending in part on its load status.

2.11.3 Debugging with gdb on Win32 Systems

The PGI Workstation 4.0 compilers for Win32 support generation of GNU STABS format debug information under control of the -g and -Mstabs compile/link switches. This enables debugging of compiled programs using the version of gdb included in mingw32, which is the UNIX-like environment included with our Workstation for Win32 software package.

Once you have created an executable (for example a.out) using the above switches, simply invoke gdb as follows:

% gdb a.out

within a PGI Workstation 4.0 shell window.

Note that there are shortcomings in gdb with respect to its ability to debug Fortran - in particular it doesn't support COMPLEX data types and cannot examine data included in Fortran COMMON blocks. Also, on Win32 gdb doesn't understand the 'drive' (C:\) syntax of path names, so you must use gdb commands to set the source directory paths. The Win32 version of gdb does allow you to set and run to function and line breakpoints, examine variables, list source lines, and examine stack traces.

2.12 Profiling with PGPROF

The PGPROF profiler is a tool that analyzes tracefiles generated during execution of specially compiled C, C++, F77, F90 and HPF programs. It allows programmers to discover which functions and lines were executed, how often they were executed and how much of the total execution time they consumed.

On multiprocessor systems, the PGPROF profiler also allows you to view information on a processor-by-processor basis for HPF programs and on a thread-by-thread basis for OpenMP programs. You can view a summary of minimum or maximum execution times for each program unit or line, or view performance data for each individual processor or thread. This information can be used to identify communication patterns in HPF programs, load balancing problems in HPF or OpenMP programs, and identify the portions of a program that will benefit the most from performance tuning.

2.12.1 Analyzing Scalability of Parallel Programs

The PGPROF 4.0 now supports scaling analysis of HPF and OpenMP parallel programs. If you have not used PGPROF previously, read through Chapter 14 of the User's Guide for a description of the capabilities of PGPROF and how it is used. Once you are familiar with PGPROF, follow these steps to utilize PGPROF 4.0 scaling analysis:

  1. Compile your parallel program using the appropriate parallelizing compiler - PGHPF for HPF programs, PGF90 for F90 OpenMP programs, PGF77 for F77 OpenMP programs, PGCC for OpenMP C programs or PGC++ for OpenMP C++ programs. In addition to the options you normally use, add the option
    -Mprof=func during compilation and linking.
  2. Run the resulting executable on a single processor. See section 1.4 of the User's Guide for a brief introduction to running OpenMP parallel and HPF parallel programs on 1 or more processors.
  3. At the completion of the single-processor run, a PGPROF tracefile named pgprof.out is automatically written to your current working directory. Rename pgprof.out to (for example) pgprof.out.1.
  4. Rerun the executable on (for example) 2 processors. At the completion of the 2 processor run, a PGPROF tracefile named pgprof.out is again automatically written to your current working directory. Rename pgprof.out to (for example) pgprof.out.2.
  5. Invoke PGPROF using the following command:
    % pgprof -scale pgprof.out.1 pgprof.out.2

PGPROF opens a window for each pgprof.out file; the first one listed is taken to be the base run against which scaling is computed. In this example, the base run is on 1 processor, but it could be on any number of processors. A scaling metric is displayed in the window for each subsequent pgprof.out file, comparing the time values against those of the base run. Two or more pgprof.out files can be specified - a separate PGPROF window will be opened for each one. Negative scaling indicates the program slows down with additional processors, positive scaling indicates program speedup.

Alternatively, this can be done with the PGPROF GUI menus. Open a pgprof.out tracefile for the base run as usual, and subsequent files under the File menu using the Scalability Comparison option.

Performance scaling can be analyzed at the function level, or even at the line level if you compile a given program unit using the -Mprof=lines option. However, -Mprof=lines can sometimes incur substantial execution overhead. For this reason, it is advisable to compile only selected program units with this option rather than compiling your entire application with line profiling enabled.

2.13 LAPACK, the BLAS and FFTs

2.13.1 Pre-compiled BLAS and LAPACK Math Libraries

Precompiled versions of the BLAS and LAPACK math libraries are included for all target systems in the files $PGI/<target>/lib/libblas.a and $PGI/<target>/lib/liblapack.a. These can be linked in to your applications by simply placing the -llapack -lblas options on the link line:

% pgf77 myprog.F -lblas -llapack

Note that these libraries are compiled with switches that are relatively optimal but fully portable across the various IA32 architectures. In particular, they do not take advantage of Pentium III/4 SSE/SSE2 instructions, Pentium III prefetch instructions, Athlon prefetch instructions, or the AthlonXP/AthlonMP 3DNow! Professional instructions. If you would like to rebuild libblas.a and liblapack.a on a Pentium III/4, we recommend using the following options:

-fast -pc 64 -Mvect=sse -Mcache_align -Kieee

NOTE: slmach.f and dlmach.f must be compiled -O0!

If you would like to rebuild libblas.a and liblapack.a on an AMD AthlonXP, we recommend using the following options:

-fast -pc 64 -Mvect=sse -Mcache_align -Kieee

As on the Pentium III, slmach.f and dlmach.f must be compiled -O0.

2.13.2 Assembly-coded Math Libraries

We are no longer allowed to bundle the Intel libmkl.a routines that were available for Win32. However, users can download newer assembly-coded BLAS and FFT routines at a site found by going to


These libraries are available for both Win32 and Linux machines.

2.14 Fortran calling conventions on Win32

The pgf77/pgf90/pghpf compilers support all Microsoft calling conventions including Fortran STDCALL. In addition, the Fortran compilers support UNIX-style calling conventions on Win32. This allows simple porting of mixed Fortran/C applications from UNIX to Win32.

IMPORTANT: Object files compiled using release 1.7-6 or prior of the pgf77/pgf90/pghpf compilers for Win32 are not compatible with object files compiled using releases 3.0 or 4.0.

Section 6.14 of the User's Guide contains a detailed description of all supported Fortran calling conventions under Win32.

2.15 OpenMP Tutorial

A self-guided online tutorial is available to help you become familiar with how OpenMP parallelization directives. In particular, the tutorial takes the user step by step through the process of parallelizing the NAS FT benchmark using OpenMP directives. The tutorial can be found at:


You can download this file using a web browser, and unpack the file using the following commands:

       % gunzip fftpde.tar.gz
% tar xvf fftpde.tar

Change directories to the fftpde sub-directory, and follow the instructions in the README file.

2.16 PGCC C and C++ Compiler Notes

The Rogue Wave Standard Template Library has been replaced with STLport, version 4.5. Enhancements with the STLport include: Users should look at the STLport license for any usage issues.

2.17 The PGI Workstation 4.0 and glibc

Release 4.0 of the PGI Workstation compilers and tools are built and validated under both the Linux 2.2.10 through 2.4.x kernels. Distributions of Linux, from Red Hat 6.0 to 7.3 and SuSE 6.1 to 8.0, incorporate revision 2.2.10 or greater of the Linux kernel and glibc2.1.x or greater. If you are using a version of Linux that is supported by our current release, the PGI installation script will automatically detect it. Your installation will be modified as appropriate for these systems.

2.18 The PGI Workstation 4.0 for Win32

2.18.1 Workstation Shell Environment

On Win32, a UNIX-like shell environment is bundled with the Workstation. After installation, a double-left-click on the Workstation icon on your desktop will launch a bash shell command window with pre-initialized environment settings. Most familiar UNIX commands are available (vi, emacs, sed, grep, awk, make, etc). If you are unfamiliar with the bash shell, reference the user's guide included with the online HTML documentation.

Alternatively, you can launch a standard Win32 command window pre-initialized for usage of the compilers by selecting the appropriate option from the Workstation program group accessed in the usual way through the "Start" button.

Except where noted in the User's Guide, the command-level compilers and tools on Win32 function identically to their UNIX counterparts. You can customize your command window (white background with black text, add a scroll bar, etc.) by right-clicking on the top border of the PGI Workstation command window, selecting "Properties", and making the appropriate modifications. When the changes are complete, Win32 will allow you to apply the modifications globally to any command window launched using the Workstation desktop icon.

2.18.2 PGI Compilers for Win32 in MKS Toolkit

The PGI Workstation 4.0 no longer supports the MKS toolkit.

<< >> Title Contents Index Home Help