Investigation of the Effects of Input Uncertainty on Population Forecasting

Phil Rees and Ian Turton
School of Geography, University of Leeds, Leeds, LS2 9JT, UK
Email: p.rees@geog.leeds.ac.uk, i.turton@geog.leeds.ac.uk



Abstract

Cohort-component models have been used for many years to project future populations under a variety of scenarios. However, little work has been carried out on the effects of uncertainty in the model inputs on projection outcomes. The paper investigates these effects using a projection model for 12 European Community countries and 71 regions. Scenarios for fertility, mortality, extra-European Community migration, inter-country and inter-region migration drive the projections. The projection inputs are based on past data and expert opinion. The projection model program was modified to run on a parallel computer, which allowed thousands of model runs to be implemented, randomly perturbing all or some of the inputs. The simulation of input uncertainty allows error bounds to be placed on projection outputs. The results for individual inputs show how important is their contribution to output uncertainty. The analysis concludes with an evaluation of the approach used to handle uncertainty and makes suggestions for improvements and further experimentation.

1. The problem

Population forecasting is now a routine activity for National Statistical Offices (NSOs) and International Bodies (UN, Eurostat). The activity involves predicting the distribution of the population in the future across key demographic characteristics such as age and sex for national populations. These predictions are used in a variety of planning contexts, the most important of which include state pension provision, meeting the demand for education and organising housing supply and infra-structure development. Population forecasting also extends in most developed countries to sub-national populations.

In thispaper we look at population forecasts in a new way that has been pioneered recently by researchers in a variety of countries (Lutz et al. 1996, Lutz et al. 1997, Lutz and Scherbov 1998, Alho and Spencer 1985, Alho and Spencer 1995, Alho 1997, Keilman 1990, Lee and Tuljpurkar 1994). The idea is to estimate, in one of a variety of ways, uncertainty in the inputs to population forecasts and then to derive from these estimates probability distributions of projection outcomes. In the rest of this section of the paper the basic ideas used in population forecasting are briefly summarized. In section 2 strategies for dealing with uncertainty are reviewed and one selected for experimentation. Section 3 contains a description of the projection model and system for the experimentation while section 4 outline how parallel processing is used to run the projection model thousands of times with different, randomly selected departures from component scenarios. Results are evaluated for a set twelve countries and seventy one regions in section 5. Section 6 assesses the experiments and draws out lessons for further expolration of methods for generating probabilistic population projections.

1.1 Forecasting methods

The main methodology used for population forecasting is the demographic cohort-component model, routinely expanded into multiregional form (Rogers 1995) when forecasts are required for populations exchanging sub-national numbers of migrants such as major world regions (Lutz 1994), or sub-national regions (Van Imhoff et al. 1994, Van der Gaag et al. 1997). The cohort-component model applies age-specific intensities of fertility, mortality and migration to base populations disaggregated suitably by age and sex.

1.2 Methods of handling migration

The migration component can be treated as one, two or three sets of variables. Most national population forecasts have one set of external migration inputs, usually the gross or net flows between that country and the rest of the world (e.g. OPCS 1995). Many sub-national population forecasts will incorporate a second set of migration variables - the flows between the regions for which forecasts are prepared (Van Imhoff et al.1994, Van der Gaag et al. 1997). A third method for treating migration involves a system of many countries divided into regions: migration flows between countries are represented explicitly and their totals distributed to regions alongside interregional migration (Rees, Stillwell and Convey 1992, Rees 1996).

1.3 Forecasting assumptions

Each population forecast is driven by a set of assumptions about how the demographic behaviour of the populations under study will change in future. Although there are a large number of exogenous input variables needed in any forecast, it is usual to reduce these to a small number of leading indicators. Future fertility trajectories may be projected assumptions about the total fertility rate (number of children per woman); future mortality may be projected by assumptions about life expectancy at birth; future migration may be projected by altering a general level of migration variable (total numbers or gross migration rates) or by changing the attractiveness of destinations.

1.4 Forecasting scenarios

Through analysis of past trends or of factors influencing demographic behaviour, views are taken of the likely development of leading indicators. Collections of trajectories of component leading indicators that have common features are called scenarios.

It is very common to prepare several demographic scenarios termed low, middle and high. The middle or central scenario usually assumes continuation of current levels of fertility, mortality and migration or continuation of current trends in the short term with adoption of some stationary level thereafter. For example, mortality rates are often assumed to decline at a steady or diminishing relative rate (e.g. 1% per annum) down to some limit when they cease to decline (e.g. OPCS 1995). A low scenario is one in which the trends in vital and migration indicators lead to lower forecast populations. A high scenario produces the opposite result from a combination of higher fertility, lower mortality and higher in-migration. When low, middle and high projections are offered, users usually plump for the middle, in the absence of any guidance as to the likelihood of each. Occasionally, high and low projections are referred to as upper and low confidence limits, by analogy with inferential statistics based on sampling. This implies that only 5 per cent of the time is the projected population likely to move outside these limits.

It is not obvious that the low, middle and high scenarios on the three components of change should necessarily coincide. For example, a high fertility population might experience excellent health and experience a low mortality scenario. Lutz et al. (1994) carry out projections for world regions using combinations of low, central and high scenarios: in theory, there are 27 combinations (3 fertility scenarios x 3 mortality scenarios x 3 migration scenarios) but only a subset of these are presented. A plethora of new information results from alternative scenarios and we need ways of organising and using this information.

The development of scenario projections suggests that there is considerable uncertainty about the assumptions input to the process and that users of projection information would like clearer guidance on the degree of uncertainty associated with projections.

2. Strategies for dealing with uncertainty

What can we do about uncertainty?

2.1 Ignore uncertainty

We could ignore it and use just one central projection. NSOs argue that when many public and private bodies incorporate NSO projections in their planning, the same set of forecasts should be used by all. This can only be characterised as the approach of an ostrich burying its head in the sand.

2.2 Develop variant projections

The conventional alternative is to produce variants. These have rather imprecise meanings still: what does "high" and "low" really mean? Most conventional variants focus on fertility variation and hold mortality and migration constant (Lutz and Scherbov 1998). However, when fertility appears stuck a low sub-replacement levels, as is the case in most European countries (Rees 1997), then continuing improvements in mortality and higher in-migration than in the past are postponing the point at which European country populations decline. Additionally, more favourable mortality experience results in increased numbers at the older ages, for example.

A second problem with variant projections is that they usually assume that the populations involved follow the same trend – high or low. Sometimes such assumptions are justified: fertility trends in European Union countries since the 1960s have shared remarkable coincidence; within those countries regional fertility trends have closely shadowed national. Often the assumption of correlation of trends across the units of population is not justified. This is necessarily the case for trends in inter-regional migration within a country: if some regions are experiencing increasing in-migration, other regions must be contributing those migrants and so be experiencing increasing out-migration. In extending the concept of high and low scenarios to regional projections, Van der Gaag et al. (1997) redefined regional migration scenarios in terms of convergence and divergence.

These are system wide properties and address the experience of individual regional populations in an indirect way.

2.3 A statistical approach

Uncertainty is a phenomenon which has long been studied in the field of statistics, particularly when inferences are required about true population characteristics from sample observations. How do the ideas of inductive statistics apply to populations when, strictly speaking, we are not in a sample taking situation? Hagood and Price (1952: 286-294) argue that, even when all units in a limited universe have been observed (e.g. when we measure demographic indicators for a population), we can regard those population measures as samples from a hypothetical superuniverse of possibilities. There are all sorts of ways in which population measures can vary: boundary changes, changes in questions, chance events, the calendar and so on make the population measure rather like a sample measure. When a population measure in "guessed" for the future, there is yet more possibility of variation. With Hagood and Price (1952), we argue that use of standard errors and statistical tests of hypotheses are justified in a "fully measured" situation.

2.4 Estimate standard errors

2.4.1 Time series methods

This view then faces a difficult question: how can the standard errors of the input indicators to a population projection be estimated? The most obvious approach of estimating the standard error from an observed time series of the indicators has been tried (Passel 1975) but changes in trend are found to carry the indicator rapidly outside the historic 95% confidence interval. It is extremely difficult to predict shifts in demographic régime. Zelinsky (1971), for example, failed to predict "counter-urbanisation" in his mobility transition proposal. Which observers predicted the dramatic crash in fertility and sorrowful rise in mortality in former Soviet Union states after 1989?

2.4.2 Expert judgement

Alho (1997) has applied statistical methods combined with some expert judgement to arrive at an estimate of the standard error of the world region projections carried out by the International Institute for Applied Systems Analysis (IIASA) (Lutz 1994). He interpreted the high-low interval in the IIASA forecasts of the world population as roughly a 85% prediction interval. However, he recognised the confidence interval estimates as "relatively crude" and "back of an envelope".

Lutz (1994) assembled a task force of experts in aspects of population forecasting and asked them to draw up a set of high and low scenarios for fertility, mortality and net interregional migration for 13 world regions. In a revised edition of this work (Lutz 1996), Lutz et al. (1996) interpret the expert-derived high and low assumptions as corresponding roughly to a 90% confidence interval. In other words, in 9 out of 10 future "superuniverses" the scenario indicator values would lie within the high-low range but 5% of the time values would exceed the high scenario and 5% of the time values would fall below the low scenario. The assumption is then made that these subjective probability distributions are normal, with the central projection values of the indicators being the mean of the distributions.

How should this expert knowledge of the probability distribution of the different leading indicators be used in combination? Lutz et al. (1996) reintroduce sampling into the process. The projection model is run one thousand times, randomly selecting the component indicators from normal distributions, the 90% confidence limits of which have been supplied by the experts consulted. The projection runs produce one thousand values of each output population from which a probability distribution can be constructed. It is then possible to derive statements such as

"...we find that there is a probability of two-thirds that the world's population will not double in the twenty-first century". (Lutz et al. 1997).

To drive these probabilistic simulations, a number of detailed assumptions must be made concerning whether fresh or repeated random numbers should be applied to each time interval, demographic component and population unit in the projection. The approach used by Lutz et al. (1996) for time intervals is as follows. They choose a random number (using a library routine linked to their DIALOG projection model program) and use it to select an indicator value at three points in the projection time sequence. Let Zi be the ith random value of a standard normal deviate drawn from a normal distribution the ith value of the component indicators for time interval j and the central value of the component indicator for time interval j. Let be the difference between the expert determined high and low values of the indicator.

Then

(1) where 3.29 is the difference between the upper and lower bounds of the 90% confidence interval. Three time intervals are chosen; the same random value is used for each interval and intermediate values are interpolated; and values beyond the third interval held constant. This approach is called "random line" sampling where, if you start at a low value of an indicator, you persist on that low trajectory, and vice versa when starting with high values. The alternative sampling approach of independently selecting a value from the normal distribution for each time interval can be called a "random walk" approach. No knowledge of previous history is assumed and the indicator "lurches drunkenly" through future time. This is the approach adopted in our experiments described in section 5.

Another decision which is needed in this simulation is to decide whether the same random draw should be used to determine the projection indicator value for all components and for all separate population units. Lutz et al. (1996) carry out simulations where fertility and mortality are assumed either independent or correlated. Correlated projections involve using the same random number. In this case correlated projections result in a narrower range of projected populations than uncorrelated projections because of the way the high-high and low-low scenarios for fertility and mortality tend to cancel out. This compensation is also achieved by uncorrelated regional projections compared with correlated. In our experiments in section 4, we avoid such complications by sticking to independent sampling across time intervals, components and population units (countries and regions).

In the next section we describe the projection model and baseline assumptions used in our simulation experiments. The experiments of Lutz et al. (1996) used thirteen world regions but only some of the flow balances, between developing and developed regions but not within either set of regions, were incorporated. Lutz and Scherbov (1997) carry out probabilistic projections for one country, Austria. Our experiments project populations of the 12 European Community member states (called EUR12) in 1990 and their 71 top tier (NUTS 1) regions.

3. The population projection model and baseline assumptions

The probabilistic projections described in section 5 use an existing projection model and forecasting assumptions for European countries and regions. Details are given in Rees, Stillwell and Convey (1992) and Rees (1996). It should be stressed that the model has features needing improvement and that the assumptions based on knowledge at the end of the 1980s are in need of revision. However, these features provide the opportunity to assess the revealed error in the short term (1990-95) in the projections and to determine whether the projected populations fall inside the simulated confidence limits. For more robust and up to date forecasts of European Economic Area regional populations see Van der Gaag et al. 1997.

3.1 The projection model

The European Community POPulation model called ECPOP adopts a two tier structure of member states and regions within member states, and a simplified multiregional cohort-component projection model is used with each tier (Rees et al. 1992, Rees 1996). The two tiers are connected by distributing projected migration flows for the top tier countries to regions within countries. Fertility and mortality are treated conventionally. Age-specific fertility and mortality rates for countries are adjusted to match observed regional births, and national total period fertility rates and life expectancies are used as leading indicators for developing scenarios. No region specific trends are considered.

Migration is decomposed into three sets of flows: (1) extra-community migration (i.e. flows from or to countries outside the EUR12), (2) inter-member state migration (i.e. flows which go from one country to another within the EUR12) and (3) inter-regional migration (i.e. flows which go from one region to another within a member state). The idea behind this three fold distinction is that different determinants are at work influencing each flow. Extra-Community migration is subject to national immigration laws and policies; inter-member state migration of labour is barrier-free in theory but still constrained by linguistic differences and national residence requirements; inter-region migration is subject to few formal barriers though regional variation in things like higher education entry or housing policy can limit migration.

The equations that projects the population for a typical age group of a member state is as follows:

(2)

where

population of member state m in age group a (five year age group)
successive points in time, 5 years apart
mortality probability for age a (period-cohort) in member state m
proportion of the sum of model migration schedule probabilities occurring at age a (period-cohort)
ratio of the model schedule migration probability for age a to the model schedule average
the probability of migration from member state m to member state n over the 5 year interval
net external migrants to member state m

Equation (2) applies to both sexes. The terms on the right hand side of equation (2), following the term for the initial population stock, project, respectively: deaths, net extra-Community migration, total out-migration to other member states and total in-migration from other member states. The population of a region in member state m for a typical age group a is projected as follows:






(3)

where

the region j in member state m
the region i in member state m
the share of in-migrants from outside the EUR12 to region j(m)
the share of out-migrants to other member states from region j(m)
the share of in-migrants from other member states to region j(m)

The populations of the regions are used to compute shares in each case, but in principle better observed data can be used directly. The equation applies to both sexes. The terms on the right hand side of equation (2), after the starting population stock, project respectively, regional deaths, regional net extra-Community migration, regional out-migration to other member states, regional in-migration to other member states, regional total out-migration to other regions in the same member state and regional total in-migration from other regions in the same member state.

Births are projected very simply as

(4)

where

is the age-specific (period-cohort) fertility rate for age group a and member
state m and w refers to women, and the corresponding regional equation is

(5)

A sex proportion at birth is applied to the births totals and the totals are entered as starting populations in equations (2) and (3) for survival from birth to the end of the time interval.

3.2 Scenarios

For European regions Eurostat (1991) has developed two long term demographic perspectives, a high and a low scenario. The high scenario sees fertility recover towards replacement level, mortality improves at a steady but not spectacular rate, and external in-migration remains quite high for most countries though lower than during the 1989-91 transition period. The low scenario envisages a small further decline in fertility, virtually static mortality levels and a return to lower in-migration levels of the 1980s. Table 1 shows the total fertility rates (first panel) and life expectancies (second panel) for the first and last periods of the projection in this low scenario. Also included in the table are more recent statistics for 1995. The total fertility rates for 1995 are lower in Belgium, France, Germany, Greece, Ireland, Italy, the Netherlands, Spain and the United Kingdom than assumed for the first projection period, 1990-94. This confirms the low scenario to be representative for the 1990s.

Table 1: A pessimistic scenario for EUR-12 member states

Country

Fertility (TFR)

   
 

1990-94

2015-19

1995

     

Belgium

1.585

1.500

1.55

     

Denmark

1.645

1.510

1.81

     

France

1.785

1.705

1.70

     

Germany

1.410

1.300

1.25

     

Greece

1.425

1.570

1.32

     

Ireland

2.050

1.700

1.87

     

Italy

1.315

1.300

1.22

     

Luxembourg

1.555

1.405

1.67

     

Netherlands

1.610

1.510

1.53

     

Portugal

1.380

1.500

1.41

     

Spain

1.340

1.395

1.17

     

United Kingdom

1.765

1.680

1.71

     

 

Country

Life expectancy (e0)

Males

Life expectancy (e0)

Females

 

1990-94

2015-19

1995

1990-94

2015-19

1995

Belgium

72.80

73.5

73.9

79.40

80.0

80.6

Denmark

72.15

72.5

72.6

77.80

78.0

77.8

France

72.80

73.5

73.4

80.95

81.5

81.6

Germany

72.20

72.5

73.0

78.70

79.0

79.5

Greece

73.75

74.0

75.0

78.95

79.5

80.2

Ireland

71.95

72.5

72.3

77.50

78.0

77.9

Italy

73.75

74.5

74.1

80.25

81.0

80.5

Luxembourg

71.75

72.5

72.6

78.75

79.5

79.1

Netherlands

73.80

74.0

74.7

78.75

79.5

79.1

Portugal

71.55

72.5

71.3

80.15

80.5

80.3

Spain

73.65

74.0

73.4

80.10

80.5

81.3

United Kingdom

72.95

73.5

73.9

78.45

79.0

79.2

 

Country

Extra-Community

migration

(1000s pa)

Net int'l

migration

(1000s pa)

Inter-member state

migration scenarios:

destination attractions

 
 

1990-94

2015-19

1995

1990-94

2015-19

 

Belgium

9

9

13

104

115

 

Denmark

3

3

29

104

115

 

France

58

48

41

104

115

 

Germany

301

75

401

105

118

 

Greece

24

20

21

102

112

 

Ireland

-4

6

-6

102

112

 

Italy

62

32

92

104

115

 

Luxembourg

-1

-2

5

105

118

 

Netherlands

28

19

15

104

115

 

Portugal

14

17

5

102

112

 

Spain

31

40

47

102

112

 

United Kingdom

5

-6

100

102

115

 

Sources:

  1. Fertility and life expectancy scenarios for 1990-94 & 2015-19: Eurostat 1991
  2. Fertility and life expectancy for 1995: Council of Europe 1997
  3. Migration scenarios: Rees, Stillwell & Convey 1992

Mortality levels assumed in the low scenario are, however, too pessimistic. In 8 of the 12 countries life expectancies in 1995 were already higher than the low scenario values for 2015-19.

Migration into the EUR12 countries looks like being higher than assumed in the low scenario, as the third panel of Table 1 reveals. Immigration is markedly higher in Denmark, Germany, Italy, Spain and the United Kingdom.

The final columns in the third panel of Table 1 list the growth factors applied to inter-member state migration. These will have a rather modest effect on final destination populations as less than 1% of the EUR12 migrates between member states in a five year period and because increasing migration also means fewer "stay-at-homes", cancelling out much of the growth effect of increased in-migration.

Four inter-regional migration scenarios are set for 71 regions by Rees, Stillwell and Convey (1992) following different hypotheses: (1) that migration intensities will remain constant, (2) that migration intensities will be at a zero level (used to assess the impact of inter-region migration on forecast populations), (3) that migration inflows will increase to rich regions and decrease to poor regions, the income hypothesis and (4) that migration inflows will increase to lower density regions and decrease to higher density regions, the counter-urbanization hypothesis. In the experiments of section 5, only the constant assumption is used. However, in future such a range of hypotheses might be used to establish standard errors for migration indicators. A second reason for sticking to the constant scenario is that a better model of inter-regional migration needs to be incorporated in the projection model. Champion et al. (1998) present a scheme for feeding in knowledge of migration determinants into spatial interaction sub-models that would be modules inside a multi-regional projection model, but the scheme awaits implementation.

3.3 Guessed standard errors

To implement the sampled population projections, standard errors of the scenario indicators are needed. Lutz et al. (1996) used a panel of experts to establish 90% confidence limits for each world region for the three components. Here we draw on a panel of two (the authors) to propose some plausible guesses as to the relative standard error. The guessed values are set out in Table 2. The standard errors are expresssed as percentages of the scenario value.

Table 2: Guestimated standard errors and confidence limits for the baseline projections

 

 

 

% of scenario indicator

 

Component

%

Standard

Error

SE

 

Lower

Confidence

Level

-2SE

Upper

Confidence

Level

+2SE

 

Scenario

indicator

Fertility

25

-50

+50

Total fertility rate

 

Mortality

Initial

Shifted

3

 

-6

-8

 

+6

+4

Life

expectancy

at birth

 

Extra-Community

Migration

Initial

Shifted

 

 

 

-300

-250

 

 

+300

+350

 

Net migrants

in 1000s

per annum

 

Inter-Member

State Migration

 

20

-40

+40

Total migration

rates

Inter-Regional

Migration

10

-20

+20

Total migration

rates

For fertility we assume quite a lot of potential volatility with a standard error of 25% with upper and lower 95% confidence limits of +50% and -50%. For a country or region with a total fertility rate of 1.5, for example, we expect fertility to lie between 0.75 and 2.25 children per woman 19 times out of 20.

Mortality we considered much less volatile and adopt a standard error of 3%. However, we shift the confidence limits to the negative side of the death rate indicators believing that decreases in death rates are much more likely than increases. Past projections of the United Kingdom populations have consistently underestimated improvements in mortality experience, for instance (OPCS 1995).

Extra-Community migration we considered to be the most volatile of the five components, adopting a standard error of 150%. Using a standard error greater than 100% provides a chance of current positive balances becoming negative. We shifted the confidence interval to the positive side, believing that increases in extra-Community net in-migration were more likely than decreases.

Fairly conservative standard errors were assumed for both inter-member state migration and inter-region migration. We have relatively little knowledge of migration variation over time in European countries collectively as no time series of relevant statistics are published. Because these standard errors are used with migration rates for all flows, the overall volume of migration is raised or lowered, but it is probable that this will have relatively little effect on net exchanges between member states or regions. Because the Table 2 standard errors can be regarded only as informed guesses, we do report some sensitivity analyses in section 5.

A final comment on sources of error in projections relates to the starting populations used. Often these themselves are only estimates and are subject to revision. In Table 3 column 1 shows the country populations for 1990 used in the projections reported in this paper. However, a later source reported in column 2 shows how some country populations are subject to revision. This occurs particularly in countries which rely on censuses rather than registers. After each census there is a revision of annual population estimated since the last census. In the case of Portugal there is a downward revision of 418 thousand, for example.

Table 3: Evaluation of the ECPOP population projections for 2020

Country

1990

1990

1996

 

2020

     

2020

 
 

ECPOP

COE97

COE97

ECPOP

ECPOP Pessimistic Scenario

CBS/NIDI

       

Constant

Mean

.025

.975

Baseline

Low

High

 

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

Belgium

9948

9948

10143

9861

9375

8993

9757

10658

9898

11270

Denmark

5135

5135

5251

4980

4864

4710

5018

5526

5075

5950

France

56305

56577

58255

60416

57563

56285

58841

62831

59307

66896

Germany

79106

79112

81818

73885

69674

67504

71844

84670

79073

91559

Greece

10019

10121

10465

10023

9087

8708

9466

11269

10450

11901

Ireland

3507

3507

3616

3515

3338

2811

3865

-

-

-

Italy

57579

56694

57333

55305

52552

51369

53735

56544

52753

60334

Luxembourg

378

378

413

430

393

319

467

-

-

-

Netherlands

14893

14893

15494

15823

15246

14787

15705

17205

15819

18319

Portugal

10337

9919

9921

10465

9996

9787

10205

10513

9808

11265

Spain

38924

38826

39242

38937

37337

36576

38098

40308

37809

43504

United Kingdom

57310

57459

58694

59868

56752

55090

58414

61038

58013

65326

                     

Notes:

All populations are in 1000s.

Sources:

  1. Eurostat
  2. Council of Europe 1997
  3. Council of Europe 1997
  4. Rees 1996
  5. Authors' computations
  6. Authors' computations
  7. Authors' computations
  8. Van der Gaag et al. 1997
  9. Van der Gaag et al. 1997
  10. Van der Gaag et al. 1997

4. Implementation of confidence intervals: thousands of randomly perturbed projections

This section of the paper describes the way in which the projection model was adapted to run on a high performance computer (HPC) thousands of times.

4.1 Parallelising the ECPOP model

There are two basic strategies that can be adopted when converting an existing serial program to run on a parallel machine. The first is to divide the program up across the available processors; the second is to run the model in full on each processor with a single processor being designated to collect the results and handle passing out the data. A programmer is also faced with the choice between shared data programming and a message passing approach. Shared data programming is easier for a first time parallel programmer. However, it is a very limited paradigm if the programmer wishes to take advantage of the full flexibility of a parallel machine. Message passing has a higher learning curve but is much more flexible and often results in faster and more elegant programs. In the case of the ECPOP model is was decided to make use of the message passing interface (MPI) to replicate the models on to each processor. The existence of a portable library of MPI routines means that, even though a supercomputer and a group of workstations use different methods to communicate, a single program can run in both environments as Gropp et al. (1994) explain:

"The primary aim of the MPI specification is to demonstrate that users need not compromise among efficiency, portability and functionality. This means that one can write portable programs that can still take advantage of the specialised hardware and software offered by individual vendors."

This triple goal of efficiency, portability and functionality is a key design feature. MPI is an attempt to keep the best features of many existing message passing systems by offering a form of layered message passing. Subroutines use lower level libraries to perform the specified tasks. It is this feature that ensures the portability (because the specification of the interface is fixed) and moderate to good efficiency (since the low level subroutines can be made specific to particular hardware and when locally optimised should be highly efficient).

The ECPOP model is too small to parallelise over a large number of processors, since the cost of passing data from one processor to processor to another would dominate the time taken to calculate a model. Instead each model was assigned to a processor, this allows 128, 256 or 512 models to run simultaneously, depending on the number of processors available. Each of these models starts with a slightly different set of parameters. This provides a very efficient parallel implementation since each model takes the same amount of time to run and there are thus no load balancing problems provided the total number of models to be run is an integer multiple of the number of processors.

This is also a very easy style of parallelisation to apply to an existing serial program, since the existing code is simply wrapped in a parallel shell. This shell was written in using the message passing interface (MPI) library and Fortran77. The only changes required to the original code were to modify it to add perturbations to the input parameters and to collect results at the end of each projection run.

4.2 Adding uncertainty to the model

To model the effects of uncertainty each time the model was run each of the parameters used in the model was sampled from a normally distributed set of parameters centred on the original parameter. A normally distributed pseudo random number with a mean of 0.0 and a standard deviation of 1.0 was generated using the RANLIB library from netlib (Ahrens and Dieter, 1973). This number was then multiplied by the required standard error and added to the parameter.

This differs from the approach of Lutz et al. (1996) who chose a single random perturbation of a model parameter and applied this change to the parameter at each time step, giving what they describe as a random path through the model. The approach adopted in this paper is to modify the parameter at each time step with a different random number from the distribution. This method could be extended to allow a different range of uncertainty to be applied at each time step to reflect the difficulty of predicting the value of variables in the future. The Lutz et al (1996) method also assumes that the uncertainty in the projection is a bias rather than a reflection of possible error, that is to say that an expert will always guess high or low for a given parameter and will not be randomly wrong over time.

The model was then run and the results stored, once the model has been run a thousand times on each processor the program then calculates the arithmetic mean and standard deviation of the outputs. To reduce the cost of storage and calculation for the standard deviation the following formula was used.

(6) This avoids the need to store the entire data set and to make repeated passes through it to calculate the mean and the standard deviation. By using this method the control processor simply has to keep a running count of the square of each value and its mean, and then apply this formula at the end of the run.

5. Results and their interpretation

So, adopting a pessimistic scenario for the future of national and regional populations in the European Community, what range projected populations result from executing each scenario 128 thousand times!

5.1 Results for a pessimistic scenario and guessed standard errors

Figure 1 plots the population trajectories and percentile ranges for the EUR12 countries. The central line in each country plots the mean of the 128000 projections. The bottom line plots the 2.5 % percentile of the projection distribution - that is, only 2.5 % of the projections fall below this line and 97.5% above. The lines just below and just above the mean projection plot the mean minus one standard error and the mean plus one standard error: These represent the 16.1% and 83.9% percentiles in the distribution. The graphs for the top nine countries use the same population scale but the bottom three have larger scales for visibility.

Figure 1: Future population sizes of selected European countries (EUR12) using the ECPOP model with a pessimistic scenario and guessed standard errors

Germany 
Key
France 
Italy 
United Kingdom 
Spain 
Netherlands 
Portugal 
Belgium 
Greece 
Denmark 
Ireland 
Luxembourg 

It is clear from these pessimistic projections that the EUR12 populations are approaching their high water mark and all tip over into population decline before 2020, most by 2000. Population declines in Germany and Italy, in particular, are very marked. Germany is projected to have nearly 10 million fewer inhabitants in 2020 than in 1995, and Italy has five million fewer. In France, the UK and the Netherlands declines are postponed until the 21st century and it is only after 2020 that populations become substantially lower than their 1990 starting values. Population momentum carries the populations of Spain and Portugal forward to 2000 but quite rapid decline then sets in. Decline in the population of Greece sets in from 1990 and that of Belgium from 1995. Luxembourg postpones decline until 2005, due to positive in-migration, large relative to its population size. Driving these population declines under this scenario are the low levels of fertility, insufficiently compensated for by slight mortality improvements and net in-migration.

The 95% confidence limits around these projections are surprisingly modest. For example, for the UK the upper confidence limit is 58.4 millions and the lower confidence limit 56.8 millions. Overall these results suggest that the EUR12 population will lie in 2020 between an upper bound of 335 millions and a lower bound of 317 millions.

How do these bounds compare with alternative projections? Table 3 reports the pessimistic scenario in columns 5 to 7 , and alternative projections in columns 4, and 8 to 10. Choosing the ECPOP constant scenario (which averages the EUROSTAT high and low scenarios for countries) produces projected populations well above the pessimistic scenarios 2020 upper limit. When more recent projections (Van der Gaag et al. 1997) incorporating substantial knowledge of demographic régimes in the 1990s, the poor performance of our confidence intervals is even clearer. The upper confidence limits of the ECPOP pessimistic scenarios in 2020 fall below the low projections by CBS/NIDI. For the ECPOP pessimistic scenario 95% confidence interval to include the CBS/NIDI high scenario outcome would require increasing the standard deviation of projection outcomes by 5 to 10 times. Even if we had started from the ECPOP constant scenario, standard errors would need to be increased by 3 to 8 times.

Where you start from in terms of scenario trajectories is clearly vital. In the 1990s, as Table 1's statistics showed, substantial mortality improvements and much higher immigration have occurred which, if continued into the twenty first century, will postpone population decline for several decades compared with a pessimistic scenario. There is, however, the strong possibility that choosing to carry out the simulation sampling independently across time periods, demographic components and countries/regions substantially dampens variability compared with the random paths approach. The experiments carried out by Lutz et al. (1996: 419-420) suggest that this might be the case, indicating modification of our methods and code are needed.

Bearing in mind these caveats, what do the outcomes look like at regional level? Figure 2 displays results for eigth upper tier regions in France. Regions in the ECPOP projections receive information from the national projections about fertility and mortality levels which control regional vital rates plus shares of national extra-Community and inter-member state migration which is distributed to regions on the basis of population. However, no constraint is imposed that regional projections sum to the national: the French region projections produce higher populations as regional population shares move towards faster growing regions. The northern French regions have higher than average fertility which results in continued population growth in the Île de France and Nord-Pas-de-Calais regions despite migration losses. The Bassin Parisien sees strongest growth because of high levels of migration. The South West region, for example, shows least growth because of the lower than average fertility levels.

Figure 2: Future population sizes of French regions using the ECPOP model with a pessimistic scenario and guessed standard errors

Île de France 
Bassin Parisien 
Key 
Nord-Pas-de-Calais 
Est 
 Ouest 
Sud-Ouest 
Centre-Est 
Mediterranee 

The graphs (Figure 2) show considerable variation around mean trajectories, particularly for the largest regions. However, a comparison of these projections with those of Van de Gaag et al. (1997) shows that their baseline (central) scenario is contained within the 95% confidence intervals of the pessimistic scenario at regional level. Further investigation is needed of why this should be so when a different result occurs at national scale.

5.2 Sensitivity analysis

To gauge how sensitive the results might be to the size of the relative error assumed a variety of experiments were carried out, using the standard scenario for all but one component and sampling using different standard errors for the one component of interest.

Figure 3: Future population sizes of selected European countries (EUR12) using the ECPOP model with a pessimistic scenario and a 10% error in the death rate

Germany 
Key 
France 
Italy 
United Kingdom 
Spain 
Netherlands 
Portugal 
Belgium 
Greece 
Denmark 
Ireland 
Luxembourg 

Figure 3 shows the probabilistic projections using a 10% standard error for mortality (compared with 3% in the earlier projections). Rather small variation in projection outcomes results. Figure 4 displays the range of projection outcoes for regions when a 10% standard error is used for mortality. The range of variation is quite narrow. Some comments are applicable on the likelihood of these projections. The variation between regions in trajectory is driven by the inter-regional migration pattern. The rates are based on data for the late 1980s when the pattern of migration departed considerably from that earlier in the decade and in the 1990s. The UK was experiencing an economic boom led by finance sector expansion in the South East. Housing proces rose rapidly in the South East and pushed more London workers to seek residences beyond the region rim, in East Anglia, the East Midlands, the South West and in southern Yorkshire and Humberside, and net out-migration from the South East resulted. Prior to 1987-89 and after 1990 net in-migration to the South East was prevalent.

These observations suggest that to properly account for uncertainty at regional level, scenarios must be based on long term patterns and that the error for each individual region to region flow needs to be considered. When there is a large number of regions this is infeasible and instead scenarios and error estimates should be developed for total outflows and total inflows, reducing the number of variables to be determined from N2 to 2N.

Figure 4: Future population sizes of United Kingdom regions using the ECPOP model with a pessimistic scenario and a 10% error in the death rate

North 
Yorkshire and Humberside 
Key 
East Midlands 
East Anglia 
South-East 
South-West 
West Midlands
North-West
Wales
Scotland
Northern Ireland

Table 4 shows what happens if we apply 1%, 5% and 10% error, respectively to each component, using the single scenario for the other components. The component that generates the widest confidence range at each error level is fertility, followed by inter-member state migration, and mortality with very little variation in outcomes as a result of error in extra-Community net inflows or inter-regional migration.

Table 4: Sensitivity of projected population ranges to estimates of standard errors

Component

 

Population, 1 January 2020 (1000s)

 

Input

   

EUR12

 

United Kingdom

 

SE

mean

SE

0.025

0.975

mean

SE

0.025

0.975

Mortality

                 
 

1%

324743

246

324497

324989

56813

41

56772

56854

 

5%

324607

993

323614

325600

56790

164

56626

56954

 

10%

324184

2041

322143

326225

56719

336

56383

57055

Fertility

                 
 

1%

324748

656

324092

325404

56813

131

56682

56944

 

5%

325096

6058

319038

331154

56879

1182

55697

58061

 

10%

325011

58789

266222

383800

56860

10183

46677

67043

Extra-Community migration

           
 

1%

324744

5

324739

324749

56813

0

56813

56813

 

5%

324744

46

324698

324790

56813

1

56812

56814

 

10%

324744

88

324656

324832

56813

2

56811

56815

Inter-member state migration

           
 

1%

324744

206

324538

324950

56813

36

56777

56849

 

5%

324744

1788

322956

326532

56811

314

56497

57125

 

10%

324745

3593

321152

328338

56808

633

56175

57441

Inter-region migration

           
 

1%

324744

0

324744

324744

56813

0

56813

56813

 

5%

324744

46

324698

324790

56813

1

56812

56814

 

10%

324744

88

324656

324832

56813

2

56811

56815

The mortality variation has less influence than might be expected probably because the highest mortality rates are applied to the smallest age group populations. Until age 60 mortality can virtually be ignored as a component in EUR12 populations. Fertility rates are applied to large populations in the fertile age range. The differences in sensitivity to errors between the three migration components is probably due to the way each flow is represented in the projection model. Extra-Community migration involves the net addition of a small absolute number relative to the whole population. Inter-regional migration rates are raised or lowered in the sampling process but the effects tend to cancel out because higher/lower in-migration is balanced by higher/lower out-migration when a constant rates scenario is used. Inter-member state migration is, surprisingly, more sensitive because it is the destination attractiveness factors (see Table 1) which are altered and these can cause quite wide swings.

6. Conclusions

What has been accomplished in our analysis?

We have discussed how uncertainty about the driving forces behind population projection is being handled by demographers and statisticians. In the absence of a reliable way of measuring error, guestimates are used of the standard errors of projection parameters and simulations carried out using randomly sampled values of those parameters.

Following the lead of Lutz et al. (1996), we have shown how probabilistic projections can be generated on a high performance computer and how hundreds of thousands of projections can be computed, not just thousands. We have also demonstrated that the method can be applied at subnational scale to regional populations within countries as well as to sets of inter-connected countries.

The work has, however, revealed that results are still very sensitive to the scenario adopted for sampling around and to the assumed error levels. If we were to repeat the analysis, we would seek to update the scenario trajectory for EUR12 countries to incorporate improved longevity and higher immigration.

Decisions about the degree to which time periods, demographic components and country/region populations should be regarded as independent or correlated are also very important. Our approach of independent sampling across all three dimensions needs to be assessed against the Lutz et al. (1996) random paths approach. Independent sampling tends to reduce variation because of averaging of different randomly selected parameter values.

The experiments have exposed difficulties in the formulation and operation of an existing complex projection model, pointing to a need for disassembly, overhaul and reconstruction. One particular component that needs attention is the interregional migration system. The model and our experiments assume that the inter-region migration rates vary together, but it is more likely that socio-economic developments affect regions in different ways. A new sub-model of migration is indicated as part of an extensive agenda of research into methods of dealing with uncertainty in population forecasting which is now emerging

References

Ahrens, J.H. and Dieter, U. (1973) Extensions of Forsythe's method for random sampling from the normal distribution, Mathematical Computing 27(124): 927-937

Alho, J. (1997) Scenarios, uncertainty and conditional forecasts of the world, Journal of the Royal Statistical Society, Series A 160 (1): 71-85

Alho, J. and Spencer, B.D. (1985) Uncertain population forecasting, Journal of the American Statistical Association 80: 306-314

Alho, J. and Spencer, B.D. (1995) The practical specification of the expected error of the population forecasts. Paper presented at the symposium on Analysis of errors in demographic forecasts with implications for policy, Koli, Finland, March 30-April 2

Champion, A., Fotheringham, S., Rees, P., Boyle, P. and Stillwell, J. (1998) The determinants of migration flows in England. A review of the existing data and evidence (Newcastle upon Tyne: Department of Geography, University of Newcastle)

Council of Europe (1997) Recent demographic developments in Europe 1997 (Strasbourg: Council of Europe Publishing)

Eurostat (1991) Two long term population scenarios in the European Community: principal assumptions and results. Scenarios prepared for the Congress on "Human resources in Europe at the dawn of the 21st Century" Luxembourg, 27-29 November (Luxembourg: Statistical office of the European Communities, Social and Regional Statistics Directorate)

Eurostat (1992) Demographic statistics 1992 (Luxembourg: Statistical Office of the European Communities)

Gropp, W., Lusk, E. and Skjellum, A. (1994) Using MPI: portable parallel programming with the message passing interface (Cambridge, Mass.: MIT Press)

Hagood, M. and Price, D. (1952) Statistics for sociologists Revised Edition (New York: Holt, Rinehart and Winston)

Keilman, N. (1990) Uncertainty in national population forecasting. NIDI CBGS Publications (Amsterdam: Swets and Zeitlinger)

Lee, R. and Tuljpurkar, S. (1994) Stochastic population projections for the United States: beyond high, medium and low. Journal of the American Statistical Association 89 (428): 1175-1189

Lutz, W. (ed.) (1994) The future population of the world. What can we assume today? (London: Earthscan Publications)

Lutz, W. (ed.) (1996) The future population of the world. What can we assume today? Revised and updated edition (London: Earthscan Publications)

Lutz, W., Sanderson, W. and Scherbov, S. (1996) Probabilistic population projections based on expert opinion. Chapter 16 in Lutz W (ed.) The future population of the world. What can we assume today? Revised and updated edition. (London: Earthscan Publications) 397-428

Lutz, W., Sanderson, W. and Scherbov, S. (1997) Doubling of world population unlikely. Nature 387 (6635): 803-805

Lutz, W. and Scherbov, S. (1998) An expert-based framework for the probabilistic national projections: the example of Austria European Journal of Population forthcoming

OPCS (1995) 1992-based population projections. OPCS Series PP2, no.19 (London: OPCS)

Passel, J.S. (1975) Population projections utilising age-specific and age-parity specific birth rates, predicited with time series model: projections with confidence limits. PhD Thesis, John Hopkins University, Baltimore, Maryland, USA. 262p

Rees, P. (1996) Projecting the national and regional populations of the European Union using migration information. Chapter 18 in Rees, P., Stillwell, J., Convey, A. and Kupiszewski, M. (ed.) Population migration in the European Union (Chichester: Wiley) 331-364

Rees, P. (1997) The second demographic transition: what does it mean for the future of Europe’s population? Environment and Planning A 29(3): 381-390

Rees, P., Stillwell, J. and Convey, A. (1992) Intra-Community migration and its impact on the demographic structure at the regional level. Working paper 92/1 (Leeds: School of Geography, University of Leeds)

Rogers, A. (1995) Multiregional demography: principles, methods and extensions (Chichester: Wiley)

Van der Gaag, N., Van Imhoff, E. and Van Wissen, L. (1997) Internal Migration in the Countries of the European Union (The Hague: Netherlands Interdisciplinary Institute and Luxembourg: Eurostat Working Paper). Report to the Statistical Office of the European Communities and the European Commission, Directorate- General XVI

Van Imhoff, E., Van Wissen, L. and Spiess, K. (1994) Regional population projections in the countries of the European Economic Area NIDI CBGS Publications (Lisse: Swets and Zeitlinger)

Zelinsky, W. (1971) The hypothesis of the mobility transition Geographical Review 61: 219-249