This is a HTML version of section 3 from R.W. Numrich and J.K. Reid (1998). Co-Array Fortran for parallel programming. Fortran Forum, volume 17, no 2. Please refer to the full paper in technical citations.
Each image has its own set of data objects, all of which may be accessed in the normal Fortran way. Some objects are declared with co-dimensions in square brackets immediately following dimensions in parentheses or in place of them, for example:
real, dimension(20)[20,*] :: a
real :: c[*], d[*]
character :: b(20)[20,0:*]
integer :: ib(10)[*]
type(interval) :: s
dimension :: s[20,*]
Unless the array is allocatable
(Section 3.6),
the form for the
dimensions in square brackets is the same as that for the dimensions in
parentheses for an assumed-size array.
The set of objects on all the images is itself an
array, called a co-array, which
can be addressed with array syntax using subscripts in
square brackets following any subscripts in parentheses (round
brackets), for example:
a(5)[3,7] = ib(5)[3]
d[3] = c
a(:)[2,3] = c[1]
We call any object whose designator includes square
brackets a co-array subobject;
it may be a co-array element, a co-array section,
or a co-array structure component.
The subscripts in square brackets are mapped to images in the same way
as Fortran array subscripts in parentheses are mapped to memory
locations in a Fortran 95 program. The subscripts within an array
that correspond to data for the current image are available
from the intrinsic this_image with the co-array name as
its argument.
The rank, extents, size, and shape of a co-array or co-array subobject are given as for Fortran 95 except that we include both the data in parentheses and the data in square brackets. The local rank, local extents, local size, and local shape are given by ignoring the data in square brackets. The co-rank, co-extents, co-size, and co-shape are given from the data in square brackets. For example, given the co-array declared thus
real, dimension(10,20)[20,5,*] :: a
a(:,:)[:,:,1:15] has rank 5, local rank 2, co-rank 3, shape
(/10,20,20,5,15/), local shape (/10,20/), and co-shape (/20,5,15/).
The co-size of a co-array is always equal to the number of images. If the co-rank is one, the co-array has a co-extent equal to the number of images and it has co-shape (/num_images()/). If the co-rank is greater than one, the co-array has no final extent, no final upper bound, and no co-shape (and hence no shape).
The local rank and the co-rank are each limited to seven. The syntax automatically ensures that these are the same on all images. The rank of a co-array subobject (sum of local rank and co-rank) must not exceed seven.
For a co-array subobject, square brackets may never precede parentheses.
A co-array must have the same bounds (and hence the same extents) on all images. For example, the subroutine
subroutine solve(n,a,b)
integer :: n
real :: a(n)[*], b(n)
must not be called on one image with n having the value 1000 and
on another with n having the value 1001.
A co-array may be allocatable:
subroutine solve(n,a,b)
integer :: n
real :: a(n)[*], b(n)
real,allocatable :: work(:)[:]
Allocatable arrays are discussed in
Section 3.6.
There is no mechanism for assumed-co-shape arrays. A co-array is not permitted to be a pointer. Automatic co-arrays are not permitted; for example, the co-array work in the above code fragment is not permitted to be declared thus
subroutine solve(n,a,b)
integer :: n
real :: a(n)[*], b(n)
real :: work(n)[*] ! Not permitted
A co-array is not permitted to be a constant.
A DATA statement initializes only local data. Therefore, co-array subobjects are not permitted in DATA statements. For example:
real :: a(10)[*]
data a(1) /0.0/ ! Permitted
data a(1)[2] /0.0/ ! Not permitted
Unless it is allocatable or a dummy argument, a co-array always has the SAVE attribute.
The image indices of a co-array always form a sequence, without any gaps, commencing at one. This is true for any lower bounds. For example, for the array declared as
real :: a(10,20)[20,0:5,*]
a(:,:)[1,0,1] refers to the rank-two array a(:,:) in
image one.
Co-arrays may be of derived type but components of derived types are not permitted to be co-arrays.
Each object exists on every image, whether or not it is a co-array. In an expression, a reference without square brackets is always a reference to the object on the invoking image. For example, size(b) for co-array b declared as
character :: b(20)[20,0:*]
returns its local size, which is 20.
The subscript order value of the co-subscript list must never exceed the number of images. For example, if there are 16 images and the the co-array a is declared thus
real :: a(10)[5,*]
a(:)[1,4] is valid since it has co-subscript order value 16, but
a(:)[2,4] is invalid.
Two arrays conform if they have the same shape. Co-array subobjects may be used in intrinsic operations and assignments in the usual way, for example,
b(:,1:m) = a[:,1:m]*c(:)[1:m] ! All have rank two.
b(j,:) = a[:,k] ! Both have rank one.
c[1:p:3] = d(1:p:3)[2] ! Both have rank one.
Square brackets attached to objects in an expression or an
assignment alert the reader to communication between images. Unless
square brackets appear explicitly, all expressions and assignments
refer to the invoking image. Communication may take place, however,
within a procedure that is referenced, which might be a defined
operation or assignment.
The rank of the result of an intrinsic operation is derived from the ranks of its operands by the usual rules, disregarding the distinction between local rank and co-rank. The local rank of the result is equal to the rank. The co-rank is zero. Similarly, a parenthesized co-array subobject has co-rank zero. For example 2.0*d(1:p:3)[2] and (d(1:p:3)[2]) each have rank 1, local rank 1, and co-rank 0.
p[6] = 1
is executed by every image, not just image 6.
If code is to be executed selectively,
the Fortran IF or CASE statement is needed. For example, the code
real :: p[*]
...
if (this_image(p)==1)then
read(6,*)p
p[:] = p
end if
call sync_all
employs the first image to read data and broadcast it to other images.
A co-array subobject is permitted only in intrinsic operations, intrinsic assignments, and input/output lists.
If a dummy argument has co-rank zero, the value of a co-array subobject may be passed by using parentheses to make an expression, for example,
c(1:p:2) = sin( (d[1:p:2]) )
If a dummy argument has nonzero co-rank, the co-array properties are defined afresh and are completely independent of those of the actual argument. The interface must be explicit. The actual argument must be the name of a co-array or a subobject of a co-array without any square brackets, vector-valued subscripts, or pointer component selection; any subscript expressions must have the same value on all images. If the dummy argument has nonzero local rank and its local shape is not assumed, the actual argument shall not be an array section, involve component selection, be an assumed-shape array, or be a subobject of an assumed-shape array.
A function result is not permitted to be a co-array.
A pure or elemental procedure is not permitted to contain any Co-Array Fortran extensions.
The rules for resolving generic procedure references remain unchanged.
COMMON and EQUIVALENCE statements are permitted for co-arrays and specify how the storage is arranged on each image (the same for every one). Therefore, co-array subobjects are not permitted in an EQUIVALENCE statement. For example
equivalence (a[10],b[7]) ! Not allowed (compile-time constraint)
is not permitted. Appearing in a COMMON and EQUIVALENCE statement
has no effect on whether an object is a co-array; it is a co-array only
if declared with square brackets. An EQUIVALENCE statement is not
permitted to associate a co-array with an object that is not a co-array.
For example
integer :: a,b[*]
equivalence (a,b) ! Not allowed (compile-time constraint)
is not permitted. A COMMON block that contains a
co-array always has the SAVE attribute. Which objects in the COMMON
block are co-arrays may vary between scoping units. Since blank COMMON
may vary in size between scoping units, co-arrays are not
permitted in blank COMMON.
A co-array may be allocatable. The ALLOCATE statement is extended so that the co-extents can be specified, for example,
real, allocatable :: a(:)[:], s[:,:]
:
allocate ( array(10)[*], s[34,*] )
The upper bound for the final co-dimension must always be given
as an asterisk and values of all the other bounds are required to be
the same on all images. For example, the following are not permitted
allocate(a(num_images())) ! Not allowed (compile-time constraint)
allocate(a(this_image())[*]) ! Not allowed (run-time constraint)
There is implicit synchronization of all images in association with
each ALLOCATE statement that involves one or more co-arrays.
Images do not commence executing subsequent
statements until all images finish execution of an ALLOCATE statement
for the same set of co-arrays. Similarly, for DEALLOCATE, all images
delay making the deallocations until they are all about to execute a
DEALLOCATE statement for the same set of co-arrays.
An allocatable co-array without the SAVE attribute must not have the status of currently allocated if it goes out of scope when a procedure is exited by execution of a RETURN or END statement.
When an image executes an allocate statement, no communication is involved apart from any required for synchronization. The image allocates the local part and records how the corresponding parts on other images are to be addressed. The compiler, except perhaps in debug mode, is not required to enforce the rule that the bounds are the same on all images. Nor is the compiler responsible for detecting or resolving deadlock problems. For allocation of a co-array that is local to a recursive procedure, each image must descend to the same level of recursion or deadlock may occur.
A co-array is not permitted to be a pointer.
A co-array may be of a derived type with pointer components. For example, if p is a pointer component, z[i]%p is a reference to the target of component p of z on image i. To avoid references with co-array syntax to data that is not in a co-array, we limit each pointer component of a co-array to the behaviour of an allocatable component of a co-array:
To avoid hidden references to co-arrays, the target in a pointer assignment statement is not permitted to be any part of a co-array. For example,
q => z[i]%p ! Not allowed (compile-time constraint)
is not permitted.
Intrinsic assignments are not permitted for co-array subobjects
of a derived type that has a pointer component, since they would
involve a disallowed pointer assignment for the component:
z[i] = z ! Not allowed if Z has a pointer
z = z[i] ! component (compile-time constraint)
Similarly, it is legal to allocate a co-array of a derived type
that has pointer components, but it is illegal to allocate
one of those pointer components on another image:
type(something), allocatable :: t[:]
...
allocate(t[*]) ! Allowed
allocate(t%ptr(n)) ! Allowed
allocate(t[q]%ptr(n)) ! Not allowed (compile-time constraint)
Most of the time, each image executes on its own as a Fortran 95 program without regard to the execution of other images. It is the programmer's responsibility to ensure that whenever an image alters co-array data, no other image might still need the old value. Also, that whenever an image accesses co-array data, it is not an old value that needs to be updated by another image. The programmer uses invocations of the intrinsic synchronization procedures to do this, and the programmer should make no assumptions about the execution timing on different images. This obligation on the programmer provides the compiler with scope for optimization. When constructing code for execution on an image, it may assume that the image is the only image in execution until the next invocation of one of the intrinsic synchronization procedures and thus it may use all the optimization techniques available to a standard Fortran 95 compiler.
In particular, if the compiler employs temporary memory such as cache or registers (or even packets in transit between images) to hold co-array data, it must copy any such data it has defined to memory that can be accessed by another image to make it visible to it. Also, if another image changes the co-array data, the executing image must recover the data from global memory to the temporary memory it is using. The intrinsic procedure sync_memory is provided for both purposes. It is concerned only with data held in temporary memory on the executing image for co-arrays in the local scope. Given this fundamental intrinsic procedure, the other synchronization procedures can be programmed in Co-Array Fortran, but the intrinsic versions, which we describe next, are likely to be more efficient. In addition, the programmer may use it to express customized synchronization operations in Co-Array Fortran.
If data calculated on one image are to be accessed on another, the first image must call sync_memory after the calculation is complete and the second must call sync_memory before accessing the data. Synchronization is needed to ensure that sync_memory is called on the first before sync_memory is called on the second.
The subroutine sync_team provides synchronization for a team of images. The subroutine sync_all (see Section 3.10) provides a shortened call for the important case where the team contains all the images. Each invocation of sync_team or sync_all has the effect of sync_memory. The subroutine sync_all is not discussed further in this section.
For each invocation of sync_team on one image of a team, there shall be a corresponding invocation of sync_team on every other image of the team. The n-th invocation for the team on one image corresponds to the n-th invocation for the team on each other image of the team, n=1,2,... . The team is specified in an obligatory argument team.
The subroutine also has an optional argument wait. If this argument is absent from a call on one image it must be absent from all the corresponding calls on other images of the team. If wait is absent, each image of the team waits for all the other images of the team to make corresponding calls. If wait is present, the image is required to wait only for the images specified in wait to make corresponding calls.
me = this_image()
if(me>1) call sync_team( me-1 )
p[6] = p[6] + 1
if(me<num_images()) call sync_team( me+1 )
Without a further call of sync_memory, the full result is
available only on the last image.
Teams are permitted to overlap, but the following rule is needed to avoid any possibility of deadlock. If a call for one team is made ahead of a call for another team on a single image, the corresponding calls shall be in the same order on all images in common to the two teams.
The intrinsic sync_file plays a similar role for file data to that of sync_memory for co-array data. Because of the high overheads associated with file operations, sync_team does not have the effect of sync_file. If data written by one image to a file is to be read by another image without closing the connection and re-opening it on the other image, calls of sync_file on both images are needed (details in Section 3.9).
To avoid the need for the programmer to place invocations of sync_memory around many procedure invocations, these are implicitly placed around any procedure invocation that might involve any reference to sync_memory. Formally, we define a caf procedure as
Exceptionally, it may be necessary to limit execution of a piece of code to one image at a time. Such code is called a critical section. We provide the subroutine start_critical to mark the commencement of a critical region and the subroutine end_critical to mark its completion. Both have the effect of sync_memory. Each image maintains an integer called its critical count. Initially, all these counts are zero. On entry to start_critical, the image waits for the system to give it permission to continue, which will only happen when all other images have zero critical counts. The image then increments its critical count by one and returns. Having these counts permits nesting of critical regions. On entry to end_critical, the image decrements its critical count by one and returns.
me = this_image()
call start_critical
p[6] = p[6] + 1
call end_critical
if (me==1) then
call sync_all( (/ (i, i=1,num_images()) /) )
else
call sync_all( me )
endif
the critical region guarantees atomic update of p[6], but the
sync_all is required to make the full result available on image 1.
The effect of a STOP statement is to cause all images to cease execution. If a delay is required until other images have completed execution, a synchronization statement should be employed.
Most of the time, each image executes its own read and write statements without regard for the execution of other images. However, Fortran 95 input and output processing cannot be used from more than one image without restrictions unless the images reference distinct file systems. Co-Array Fortran assumes that all images reference the same file system, but it avoids the problems that this can cause by specifying a single set of I/O units shared by all images and by extending the file connection statements to identify which images have access to the unit.
It is possible for several images to be connected on the same unit for direct-access input/output. The intrinsic sync_file may be used to ensure that any changed records in buffers that the image is using are copied to the file itself or to a replication of the file that other images access. This intrinsic plays the same role for I/O buffers as the intrinsic sync_memory does for temporary copies of co-array data. Execution of sync_file also has the effect of requiring the reloading of I/O buffers in case the file has been altered by another image. Because of the overheads of I/O, sync_file applies to a single file.
It is possible for several images to to be connected on the same unit for sequential output. The processor shall ensure that while one image is transfering the data of a record to the file, no other image transfers data to the file. Thus, each record in an external file arises from a single image. The processor is permitted to hold the data in a buffer and transfer several whole records on execution of sync_file.
The I/O keyword TEAM is used to specify an integer rank-one array, connect_team, for the images that are associated with the given unit. All elements of connect_team shall have values between 1 and num_images() and there shall be no repeated values. One element shall have the value this_image(). The default connect_team is (/this_image()/).
The keyword TEAM is a connection specifier for the OPEN statement. All images in connect_team, and no others, shall invoke OPEN with an identical connection-spec-list. There is an implied call to sync_team with the single argument connect_team before and after the OPEN statement. The OPEN statement connects the file on the invoking images only, and the unit becomes unavailable on all other images. If the OPEN statement is associated with a processor dependent file, the file is the same for all images in connect_team. If connect_team contains more than one image, the OPEN shall have ACCESS=DIRECT or ACTION=WRITE.
An OPEN on a unit already connected to a file must have the same connect_team as currently in effect.
A file shall not be connected to more than one unit, even if the connect_teams for the units have no images in common.
Pre-connected units that allow sequential read shall be accessible on the first image only. All other pre-connected units have a connect_team containing all the images.
CLOSE has a TEAM= specifier. If the unit exists and is connected on more than one image, the CLOSE statement must have the same connect_team as currently in effect. There is an implied call to sync_file for the unit before CLOSE. There are implied calls to sync_team with single argument connect_team before and after the implied sync_file and before and after the CLOSE.
BACKSPACE, REWIND, and ENDFILE have a TEAM= specifier. If the unit exists and is connected on at least one image, the file positioning statement must have the same connect_team as currently in effect. There is an implied call to sync_file for the unit before the file positioning statement. There are implied calls to sync_team with single argument connect_team before and after the implied sync_file and before and after the file positioning statement.
Co-Array Fortran adds the following intrinsic procedures. Only num_images, log2_images, and rem_images are permitted in specification expressions. None are permitted in initialization expressions. We use italic square brackets, [ ], to indicate optional arguments.
end_critical() is a subroutine for limiting parallel execution. Each image holds an integer called its critical count. On entry, the count for the image shall be positive. The subroutine decrements this count by one. end_critical has the effect of sync_memory.
log2_images() returns the base-2 logarithm of the number of images, truncated to an integer. It is an inquiry function whose result is a scalar of type default integer.
num_images() returns the number of images. It is an inquiry function whose result is a scalar of type default integer.
rem_images() returns mod(num_images(),2**log2_images()). It is an inquiry function whose result is a scalar of type default integer.
start_critical() is a subroutine for limiting parallel execution. Each image holds an integer called its critical count. Initially all these counts are zero. The image waits for the system to give it permission to continue, which will only happen when all other images have zero critical counts. The image then increments its critical count by one and returns. start_critical has the effect of sync_memory.
sync_all([wait]) is a subroutine that synchronizes all images. sync_all() is treated as sync_team(all) and sync_all(wait) is treated as sync_team(all,wait), where all has the value (/ (I,I=1,num_images()) /).
sync_all([wait]) has the effect of sync_memory.
sync_file(unit) is a subroutine for marking the progress of input-output on a unit. unit is an INTENT(IN) scalar argument of type integer and specifies the unit. The subroutine affects only the data for the file connected to the unit. If the unit is not connected on this image or does not exist, the subroutine has no effect. Before return from the subroutine, any file records that are held by the image in temporary storage and for which WRITE statements have been executed since the previous call of sync_file on the image (or since execution of OPEN in the case of the first sync_file call) shall be placed in the file itself or a replication of the file that other images access. The first subsequent access by the image to file data in temporary storage shall be preceded by data recovery from the file itself or its replication. If the unit is connected for sequential access, the previous WRITE statement shall have been for advancing input/output.
sync_team(team [,wait]) is a subroutine that synchronizes images. team is an INTENT(IN) argument that is of type integer and is scalar or of rank one. The scalar case is treated as if the argument were the array (/this_image(),team/); in this case, team must not have the value this_image(). All elements of team shall have values in the range 1<=team(i)<=num_images() and there shall be no repeated values. One element of team shall have the value this_image(). wait is an optional INTENT(IN) argument that is of type integer and is scalar or of rank one. Each element, if any, of wait shall have a value equal to that of an element of team. The scalar case is treated as if the argument were the array (/wait/).
The argument team specifies a team of images that includes the invoking image. For each invocation of sync_team on one image, there shall be a corresponding invocation of sync_team for the same team on every other image of the team. The n-th invocation for the team on one image corresponds to the n-th invocation for the team on each other image of the team, n=1, 2, ... . If a call for one team is made ahead of a call for another team on a single image, the corresponding calls shall be in the same order on all images in common to the two teams.
If wait is absent on one image it must be absent in all the corresponding calls on the other images of the team. In this case, wait is treated as if it were equal to team and all images of the team wait until all other images of the team are executing corresponding calls. If wait is present, the image waits for all the images specifed by wait to execute corresponding calls.
sync_team(team[,wait]) has the effect of sync_memory.
sync_memory() is a subroutine for marking the progress of the execution sequence. Before return from the subroutine, any co-array data that is accessible in the scoping unit of the invocation and is held by the image in temporary storage and has been defined there shall be placed in the storage that other images access. The first subsequent access by the image to co-array data in this temporary storage shall be preceded by data recovery from the storage that other images access.
this_image([array[,dim]]) returns the index of the invoking image, or the set of co-subscripts of array that denotes data on the invoking image. The type of the result is always default integer. There are four cases:
Case (i). If array is absent, the result is a scalar with value equal to the index of the invoking image. It is in the range 1, 2, ..., num_images().
Case (ii). If array is present with co-rank 1 and dim is absent, the result is a scalar with value equal to co-subscript of the element of array that resides on the invoking image.
Case (iii). If array is present with co-rank greater than 1 and dim is absent, the result is an array of size equal to the co-rank of array. Element k of the result has value equal to co-subscript k of the element of array that resides on the invoking image.
Case (iv). If array and dim are present, the result is a scalar with value equal to co-subscript dim of the element of array that resides on the invoking image.
All the following changes are expressed with respect to the original Fortran Forum article and all have been applied to the body of the text above. They are inlcuded here for completeness.