Chapter 5. Panel data

Table of Contents
Panel structure
Dummy variables
Using lagged values with panel data
Pooled estimation
Illustration: the Penn World Table

Panel structure

Panel data (pooled cross-section and time-series) require special care. Here are some pointers.

Consider a data set composed of observations on each of n cross-sectional units (countries, states, persons or whatever) in each of T periods. Let each observation comprise the values of m variables of interest. The data set then contains mnT values.

The data should be arranged "by observation": each row represents an observation; each column contains the values of a particular variable. The data matrix then has nT rows and m columns. That leaves open the matter of how the rows should be arranged. There are two possibilities.[1]

You may use whichever arrangement is more convenient. The first is perhaps easier to keep straight. If you use the second then of course you must ensure that the cross-sectional units appear in the same order in each of the period data blocks. Under gretl's Sample menu you will find an item "Restructure panel" which allows you to convert from stacked cross-section form to stacked time series.

In either case you can use the frequency field in the observations line of the data header file (see Chapter 4) to make life a little easier.

If you decide to construct a panel data set using a spreadsheet program then import the data into gretl, the program may not at first recognize the special nature of the data. You can fix this by using the command setobs (see Chapter 10) or the GUI menu item "Sample, Set frequency, startobs…".

Notes

[1]

If you don't intend to make any conceptual or statistical distinction between cross-sectional and temporal variation in the data you can arrange the rows arbitrarily, but this is probably wasteful of information.