Consider a Single Program Multiple Data (SPMD) program that is running on multiple processors but which does I/O to a single shared filesystem. The typical I/O model when SPMD is implemented using a communication library, such as MPI or SHMEM, is independent I/O from each image. If each image reads and writes to distinct files there is no problem, but typicaly data from multiple images is transfered to and from a single file. Fortran 95 does not allow the same file to be connected to two external file units at the same time. Even if the SPMD images are assumed to have totally independent I/O spaces, Fortran 95's I/O model certainly assumes exclusive access has been granted to connected files. For example, it assumes that no other process is writing into an open file. Therefore, the only completely safe SPMD Fortran I/O model is to never have the same file connected to more than one image at a time. Shared files are handled by a single image doing all Fortran I/O and transferring the data to/from other images using communication libraray calls. Single image I/O obviously leaves no room for application directed parallel I/O to the file.
In practice it is possible to take advantage of extensions (often undocumented) to the Fortran standard for I/O on a given machine, but the resulting program may not be portable. Generally, files with ACTION='READ' can safely be accessed from multiple images but files with ACTION='WRITE' must be handled with care and ACTION='READWRITE' is never safe. Direct access writes from multiple images to distinct records may be safe, and may produce parallel I/O, on some machines. However, vendors rarely document this fact. Typically it requires at a minimum that Fortran I/O libraries be made "SPMD safe", so that CLOSE does not truncate the file to the highest record written by that image. On some systems simultaneous direct access writes are safe and produce parallel I/O only for certain record sizes and/or only if the file is identified to the operating system as "parallel" at about the time it is opened (e.g. by using non-standard system library calls).
One of the major limitations of the initial Co-Array Fortran implementation in CF90 for the Cray T3E is that it does not support Co-Array Fortran's I/O extensions to Fortran 95, using instead the independent I/O from each image approach that was already available on the T3E for SHMEM or MPI. Cray has an extensive set of vendor-specific extensions to Fortran I/O based around the assign command, which specifies properties of a file before it is opened. It is usually possible to make independent I/O to the same file work on the T3E by specifying the appropriate assign options for the file. However, the resulting program may not be portable to any other machine type (not even to the Origin 2000, also from SGI/Cray).
Co-Array Fortran can be implemented either by mapping images onto processes (e.g. CF90 on the Cray T3E), or by mapping images onto threads (e.g. Subset Co-Array Fortran translated to OpenMP Fortran on a SGI Origin or Sun E10000) . Threaded I/O introduces a different set of problems. All threads share the same I/O space, so only one thread at a time can do I/O to a unit and if two threads do a sequential read to the same file they will get different records. If any multi-image I/O to the same file is involved, it is unlikely that a program using standard Fortran I/O will be portable between a machine that uses an indepedent I/O space per image and a machine that uses a single I/O space for all images.
Co-Array Fortran I/O is only a minor extension to Fortran 95 I/O, but it can be implemented using either thread-based images or process-based images and it has none of the portability problems of Fortran 95 I/O on multiple images. Here is the complete definition of the I/O extensions:
Most of the time, each image executes its own read and write statements without regard for the execution of other images. However, Fortran 95 input and output processing cannot be used from more than one image without restrictions unless the images reference distinct file systems. Co-Array Fortran assumes that all images reference the same file system, but it avoids the problems that this can cause by specifying a single set of I/O units shared by all images and by extending the file connection statements to identify which images have access to the unit.
It is possible for several images to be connected on the same unit for direct-access input/output. The intrinsic sync_file may be used to ensure that any changed records in buffers that the image is using are copied to the file itself or to a replication of the file that other images access. This intrinsic plays the same role for I/O buffers as the intrinsic sync_memory does for temporary copies of co-array data. Execution of sync_file also has the effect of requiring the reloading of I/O buffers in case the file has been altered by another image. Because of the overheads of I/O, sync_file applies to a single file.
It is possible for several images to to be connected on the same unit for sequential output. The processor shall ensure that while one image is transfering the data of a record to the file, no other image transfers data to the file. Thus, each record in an external file arises from a single image. The processor is permitted to hold the data in a buffer and transfer several whole records on execution of sync_file.
The I/O keyword TEAM is used to specify an integer rank-one array, connect_team, for the images that are associated with the given unit. All elements of connect_team shall have values between 1 and num_images() and there shall be no repeated values. One element shall have the value this_image(). The default connect_team is (/this_image()/).
The keyword TEAM is a connection specifier for the OPEN statement. All images in connect_team, and no others, shall invoke OPEN with an identical connection-spec-list. There is an implied call to sync_team with the single argument connect_team before and after the OPEN statement. The OPEN statement connects the file on the invoking images only, and the unit becomes unavailable on all other images. If the OPEN statement is associated with a processor dependent file, the file is the same for all images in connect_team. If connect_team contains more than one image, the OPEN shall have ACCESS=DIRECT or ACTION=WRITE.
An OPEN on a unit already connected to a file must have the same connect_team as currently in effect.
A file shall not be connected to more than one unit, even if the connect_teams for the units have no images in common.
Pre-connected units that allow sequential read shall be accessible on the first image only. All other pre-connected units have a connect_team containing all the images. Note: The input unit identified by "*" is therefore only available on image 1.
CLOSE has a TEAM= specifier. If the unit exists and is connected on more than one image, the CLOSE statement must have the same connect_team as currently in effect. There is an implied call to sync_file for the unit before CLOSE. There are implied calls to sync_team with single argument connect_team before and after the implied sync_file and before and after the CLOSE.
BACKSPACE, REWIND, and ENDFILE have a TEAM= specifier. If the unit exists and is connected on at least one image, the file positioning statement must have the same connect_team as currently in effect. There is an implied call to sync_file for the unit before the file positioning statement. There are implied calls to sync_team with single argument connect_team before and after the implied sync_file and before and after the file positioning statement.
sync_file(unit) is a subroutine for marking the progress of input-output on a unit. unit is an INTENT(IN) scalar argument of type integer and specifies the unit. The subroutine affects only the data for the file connected to the unit. If the unit is not connected on this image or does not exist, the subroutine has no effect. Before return from the subroutine, any file records that are held by the image in temporary storage and for which WRITE statements have been executed since the previous call of sync_file on the image (or since execution of OPEN in the case of the first sync_file call) shall be placed in the file itself or a replication of the file that other images access. The first subsequent access by the image to file data in temporary storage shall be preceded by data recovery from the file itself or its replication. If the unit is connected for sequential access, the previous WRITE statement shall have been for advancing input/output.
sync_file is required only when a record written by one image is read by another or when the relative order of writes from images is important. Without a sync_file all writes could be buffered locally until the file is closed. The sync_file intrinsic may require local buffers to be flushed to the actual file, and therefore have the effect of the flush subroutine available in many I/O libraries. However, if global buffers are used sync_file need only flush data from local buffers to global buffers. Since threaded implementations typically have only global I/O buffers, sync_file may be a null operation in this case.
Sequential I/O from multiple images is severely restricted in Co-Array Fortran. Sequential read from multiple images is not allowed at all, but see a recommended extension below. In addition, the compiler or I/O library automatically serializes any attempt at simultaneous sequential write from multiple images (at the file level, simultaneous writes to local I/O buffers are possible). The advantage of these restrictions is that the operations can be guarenteed to be safe and can be implemented without difficulty using either threads or processes.
On the other hand, the less severe restrictions on direct access files have been designed to facilitate parallel I/O. A language definition does not typically require performance, it can only put the elements in place that allow optimizations which may lead to high performance. In the case of direct access, the I/O library can take advantage of the fact that all images connected to a given file/unit are using exactly the same record length and the programmer's guarentee that if an image is writing to a particular record no other image is simultaneously reading or writing that same record. High quality implementations of Co-Array Fortran will take full advantage of these factors to produce optimized direct access read and write performance. For example:
REAL A(N)[*]
INQUIRE(IOLENGTH=LA) A
IF (THIS_IMAGE().EQ.1) THEN
OPEN( UNIT=11,FILE='fort.11',STATUS='NEW',ACTION='WRITE',&
FORM='UNFORMATTED',ACCESS='DIRECT',RECL=LA*NUM_IMAGES())
WRITE(UNIT=11,REC=1) A(:)[:]
CLOSE(UNIT=11)
ENDIF
OPEN( UNIT=21,FILE='fort.21',STATUS='NEW',ACTION='WRITE',&
FORM='UNFORMATTED',ACCESS='DIRECT',RECL=LA, &
TEAM=(/ (I, I=1,NUM_IMAGES()) /) )
WRITE(UNIT=21,REC=THIS_IMAGE()) A
CLOSE(UNIT=21, TEAM=(/ (I, I=1,NUM_IMAGES()) /) )
The co-array A is written identically to units 11 and 21. Unit 11 is open on the first image only, and the co-array is written as one long record. Since A is a co-array, the communication necessary to access remote image data occurs as part of the write statement. Unit 21 is open on all images and each image writes its local piece of the co-array to the file. Since each image is writing to a different record, there is no need for sync_file and the writes can occur simultaneously. Both the OPEN and CLOSE have implied synchronization, so the WRITE on one image cannot overlap the OPEN or CLOSE on another image.
The total restriction on multi-image sequential READ can make porting existing SPMD codes to Co-Array Fortran difficult, because a very common way of initializing a SPMD program is for each image to read the same setup data from the same sequential file. This restriction also makes new programs harder to write, since a lot of boiler-plate code can be required just to get each image into an identical initial state. The following extension to Co-Array Fortran I/O fixes this problem and is very easy to implement at the compiler or I/O library level:
Allow ACTION=READ on an OPEN with a multi-image connect_team.
Each sequential read operation on a unit connected on more than one
image must include a TEAM= specifier and shall have the same
connect-team as currently in effect. All the images in
connect-team, and no others, shall invoke the read with an
identical
READ(11,TEAM=(/(I,I=1,NUM_IMAGES())/)) A,B,C
This is otherwise impossible to do without a lot of boiler-plate code:
REAL BUF(3)[*]
IF (THIS_IMAGE().EQ.1) THEN
READ(11) BUF
ENDIF
CALL SYNC_ALL()
A = BUF(1)[1]
B = BUF(2)[1]
C = BUF(3)[1]
CALL SYNC_ALL()
The boiler-plate isn't difficult to write, but it is precisely the kind of make work that languages and compilers should shield us from. Note that multi-image sequential read is still not available for pre-connected units such as stdin.
The Subset Co-Array Fortran translator already supports this extension. CF90 3.1 does not yet support any of the Co-Array Fortran I/O extensions to Fortran 95. If you use CF90 on the T3E and want to continue to use multi-image sequential read when the full Co-Array Fortran language is implemented, please ask Cray to implement the above extension. Procurements that specify Co-Array Fortran should ask for the full language, as defined in ACM Fortran Forum, volume 17, no 2. (with a list of corrections), but in addition they should require support for multi-image sequential read exactly as defined above.