POL 200 Lab 4: Comparing Using Crosstabs and Mean Comparisons

In previous labs, we have covered describing our data and simple data cleaning and variable creation in Stata. Now let’s move on to comparing values of two variables together. We will focus on two simple methods, cross-tabulations and mean comparisons. Both methods are used when you have independent variables with categorical or ordinal values. Crosstabs are used when the dependent variable is also categorical or ordinal, while mean comparisons can be used when the dependent variable is continuous/interval or is a dummy/dichotomous variable.

For this lab, we will use the July 2020 AP-NORC Poll, available from the Roper Center. See the instructions for downloading and accessing the data from the previous lab.

First hypothesis: respondents exposed to the coronavirus are more likely to support closing bars and restaurants than are those who have not been exposed.

Second hypothesis: respondents worried about the coronavirus infection are more likely to say the country is headed in the wrong direction.

Third hypothesis: respondents experiencing economic hardship are more likely to say the country is headed in the wrong direction.

.  * Change the file path below to the appropriate working directory for your machine
.  
.  cd h:\POL200\labs
h:\POL200\labs

.  use 31117583.dta, clear 
.  * Recode the variables we'll use in the analysis, making sure to code
.  *   missing data as periods (.)
.  *   We can also specify value labels directly in the recode command if
.  *   we are creating a new variable using the "gen" option
. 
. codebook CUR1 

--------------------------------------------------------------------------------------------------------------------------------------------------------------
CUR1                                                                          CUR1: Generally speaking, would you say things in this country are heading in th
--------------------------------------------------------------------------------------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: CUR1

                 Range: [1,99]                        Units: 1
         Unique values: 3                         Missing .: 0/1,057

            Tabulation: Freq.   Numeric  Label
                          197         1  (1) Right direction
                          851         2  (2) Wrong direction
                            9        99  (99) DON'T KNOW/SKIPPED ON
                                         WEB/REFUSED (VOL)

. recode CUR1 (1=1 "Right direction")(2=0 "Wrong direction") ///
>                         (99=.), gen(rightdir)
(860 differences between CUR1 and rightdir)

. codebook politics B2AB

--------------------------------------------------------------------------------------------------------------------------------------------------------------
politics                                                                      POLITICS: Do you consider yourself a Democrat, a Republican, an independent or n
--------------------------------------------------------------------------------------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: POLITICS

                 Range: [1,99]                        Units: 1
         Unique values: 5                         Missing .: 0/1,057

            Tabulation: Freq.   Numeric  Label
                          347         1  (1) Democrat
                          324         2  (2) Republican
                          258         3  (3) Independent
                          119         4  (4) None of these
                            9        99  (99) DON'T KNOW/SKIPPED ON
                                         WEB/REFUSED (VOL)

--------------------------------------------------------------------------------------------------------------------------------------------------------------
B2AB                                                                          B2AB: And how would you describe the financial situation in your own household t
--------------------------------------------------------------------------------------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: B2AB

                 Range: [1,7]                         Units: 1
         Unique values: 7                         Missing .: 0/1,057

            Tabulation: Freq.   Numeric  Label
                          168         1  (1) Very good
                          359         2  (2) Somewhat good
                          188         3  (3) Lean toward good
                            1         4  (4) Neither good nor poor
                          133         5  (5) Lean toward poor
                          146         6  (6) Somewhat poor
                           62         7  (7) Very poor

. 
. codebook VIRUS2A

--------------------------------------------------------------------------------------------------------------------------------------------------------------
VIRUS2A                                                                       VIRUS2A: [The coronavirus] How worried are you about you or someone in your fami
--------------------------------------------------------------------------------------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: VIRUS2A

                 Range: [1,99]                        Units: 1
         Unique values: 6                         Missing .: 0/1,057

            Tabulation: Freq.   Numeric  Label
                          266         1  (1) Extremely worried
                          239         2  (2) Very worried
                          329         3  (3) Somewhat worried
                          138         4  (4) Not too worried
                           82         5  (5) Not at all worried
                            3        99  (99) DON'T KNOW/SKIPPED ON
                                         WEB/REFUSED (VOL)

. recode VIRUS2A (1=5 "Extremely Worried")(2=4)(3=3 "Somewhat worried") ///
>                                 (4=2)(5=1 "Not at all worried")(99=.), gen(worried)
(728 differences between VIRUS2A and worried)

.                                 
. codebook VIRUS7A 

--------------------------------------------------------------------------------------------------------------------------------------------------------------
VIRUS7A                                                                       VIRUS7A: [Requiring bars and restaurants to close] In response to the coronaviru
--------------------------------------------------------------------------------------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: VIRUS7A

                 Range: [1,99]                        Units: 1
         Unique values: 6                         Missing .: 0/1,057

            Tabulation: Freq.   Numeric  Label
                          290         1  (1) Strongly favor
                          272         2  (2) Somewhat favor
                          170         3  (3) Neither favor nor oppose
                          193         4  (4) Somewhat oppose
                          127         5  (5) Strongly oppose
                            5        99  (99) DON'T KNOW/SKIPPED ON
                                         WEB/REFUSED (VOL)

. recode VIRUS7A (1=5 "Strongly favor")(2=4)(3=3 "Neither favor nor oppose") ///
>                                 (4=2)(5=1 "Strongly Oppose")(99=.), gen(closebars)
(887 differences between VIRUS7A and closebars)

.                                 
. recode VIRUS14 (1=1 "Yes")(2=0 "No")(99=.), gen(gotcorona)
(777 differences between VIRUS14 and gotcorona)

. 
.  * To make our tables easier to read later, let's change a few variable lables:
. 
. label var worried "How worried are you about Covid-19"  

. label var closebars "Requiring bars/restaurants to close" 

. label var gotcoron "Resp or close friend has had Covid-19"

. label var rightdir "Is country going in right direction"

Crosstabs

There are multiple commands that can generate a crosstab. A crosstab is a two-way frequency table. It shows how your observations are jointly distributed across both variables. We can use such a table to evaluate the relationship between X and Y by seeing how the values of your Y variable become more (or less) likely as you change categories of the X variable. One quick command is tab. Be sure to specify the col option to calculate column percentages. Crosstabs are interpreted by reading the percentages across columns within a row.

.  * SYNTAX: tab dv iv, col
. tab closebars gotcorona, col

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

            Requiring | Resp or close friend
  bars/restaurants to |   has had Covid-19
                close |        No        Yes |     Total
----------------------+----------------------+----------
      Strongly Oppose |       100         27 |       127 
                      |     13.09       9.64 |     12.16 
----------------------+----------------------+----------
                    2 |       142         50 |       192 
                      |     18.59      17.86 |     18.39 
----------------------+----------------------+----------
Neither favor nor opp |       130         39 |       169 
                      |     17.02      13.93 |     16.19 
----------------------+----------------------+----------
                    4 |       194         76 |       270 
                      |     25.39      27.14 |     25.86 
----------------------+----------------------+----------
       Strongly favor |       198         88 |       286 
                      |     25.92      31.43 |     27.39 
----------------------+----------------------+----------
                Total |       764        280 |     1,044 
                      |    100.00     100.00 |    100.00 

This is a nice little table, and the syntax was easy to call up. But, it has a huge downside! There is no way to automatically export this table to a program that will let you share your findings with others. So instead, let’s turn to Stata’s table command, which we’ve used before.

.  * CROSSTAB SYNTAX: table dv iv, stat(percent, across(dv)) stat(freq)
. table closebars gotcorona, stat(percent, across(closebars)) stat(freq)

--------------------------------------------------------------------------------
                                    |    Resp or close friend has had Covid-19  
                                    |            No            Yes         Total
------------------------------------+-------------------------------------------
Requiring bars/restaurants to close |                                           
  Strongly Oppose                   |                                           
    Percent                         |         13.09           9.64         12.16
    Frequency                       |           100             27           127
  2                                 |                                           
    Percent                         |         18.59          17.86         18.39
    Frequency                       |           142             50           192
  Neither favor nor oppose          |                                           
    Percent                         |         17.02          13.93         16.19
    Frequency                       |           130             39           169
  4                                 |                                           
    Percent                         |         25.39          27.14         25.86
    Frequency                       |           194             76           270
  Strongly favor                    |                                           
    Percent                         |         25.92          31.43         27.39
    Frequency                       |           198             88           286
  Total                             |                                           
    Percent                         |        100.00         100.00        100.00
    Frequency                       |           764            280         1,044
--------------------------------------------------------------------------------

As always with the table command, we can combine it with collect for exporting.

. collect table closebars gotcorona, stat(percent, across(closebars)) stat(freq)

--------------------------------------------------------------------------------
                                    |    Resp or close friend has had Covid-19  
                                    |            No            Yes         Total
------------------------------------+-------------------------------------------
Requiring bars/restaurants to close |                                           
  Strongly Oppose                   |                                           
    Percent                         |         13.09           9.64         12.16
    Frequency                       |           100             27           127
  2                                 |                                           
    Percent                         |         18.59          17.86         18.39
    Frequency                       |           142             50           192
  Neither favor nor oppose          |                                           
    Percent                         |         17.02          13.93         16.19
    Frequency                       |           130             39           169
  4                                 |                                           
    Percent                         |         25.39          27.14         25.86
    Frequency                       |           194             76           270
  Strongly favor                    |                                           
    Percent                         |         25.92          31.43         27.39
    Frequency                       |           198             88           286
  Total                             |                                           
    Percent                         |        100.00         100.00        100.00
    Frequency                       |           764            280         1,044
--------------------------------------------------------------------------------

. collect export crosstab1.xlsx, replace
(collection Table exported to file crosstab1.xlsx)

Mean Comparisons

Mean comparison tests follow a similar logic. What happens to the mean of the dependent variable when we change categories of the independent variable? Does the average value of the DV change in the hypothesized way? We can conduct a mean comparison test also using the tab command, this time with the sum option. The IV should be categorical or ordinal, and the DV should be continuous or a dummy variable.

. tab rightdir

     Is country |
 going in right |
      direction |      Freq.     Percent        Cum.
----------------+-----------------------------------
Wrong direction |        851       81.20       81.20
Right direction |        197       18.80      100.00
----------------+-----------------------------------
          Total |      1,048      100.00

. 
. * SYNTAX: tab iv, sum(dv)
. tab worried, sum(rightdir)

How worried |
    are you |   Summary of Is country going in
      about |           right direction
   Covid-19 |        Mean   Std. dev.       Freq.
------------+------------------------------------
  Not at al |       .2375     .428236          80
          2 |   .30434783   .46180692         138
  Somewhat  |   .22769231    .4199896         325
          4 |   .13445378   .34185816         238
  Extremely |   .10984848   .31329473         264
------------+------------------------------------
      Total |   .18755981   .39054716       1,045

Again, perfectly nice table except that we can’t use it easily in a report or presentation. Let’s use table instead.

. * MEAN COMPARISON SYNTAX: table iv, stat(mean dv)
. table worried, stat(mean rightdir) 

----------------------------------------------
                                   |      Mean
-----------------------------------+----------
How worried are you about Covid-19 |          
  Not at all worried               |     .2375
  2                                |  .3043478
  Somewhat worried                 |  .2276923
  4                                |  .1344538
  Extremely Worried                |  .1098485
  Total                            |  .1875598
----------------------------------------------

. 
. * You could specify a couple more statistics if you wanted, and then export
. * the table using collect
. collect table worried, stat(mean rightdir) stat(sd rightdir) stat(count rightdir)

--------------------------------------------------------------------------------------------------
                                   |      Mean   Standard deviation   Number of non-missing values
-----------------------------------+--------------------------------------------------------------
How worried are you about Covid-19 |                                                              
  Not at all worried               |     .2375              .428236                             80
  2                                |  .3043478             .4618069                            138
  Somewhat worried                 |  .2276923             .4199896                            325
  4                                |  .1344538             .3418582                            238
  Extremely Worried                |  .1098485             .3132947                            264
  Total                            |  .1875598             .3905472                          1,045
--------------------------------------------------------------------------------------------------

. collect export meancomp1.xlsx, replace
(collection Table exported to file meancomp1.xlsx)

Now let’s create another mean comparison test and append it to our existing using collect export

. 
.  * We can tell Stata to modify an existing excel file 
.  *  and to write the table starting at a specific cell
.  *  using the modify and cell() options
.  
. collect table B2AB, stat(mean rightdir) ///
>                                         stat(sd rightdir) ///
>                                         stat(count rightdir)

------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                 |      Mean   Standard deviation   Number of non-missing values
---------------------------------------------------------------------------------+--------------------------------------------------------------
B2AB: And how would you describe the financial situation in your own household t |                                                              
  (1) Very good                                                                  |   .327381             .4706604                            168
  (2) Somewhat good                                                              |   .220339             .4150619                            354
  (3) Lean toward good                                                           |  .1621622             .3695998                            185
  (4) Neither good nor poor                                                      |         1                    .                              1
  (5) Lean toward poor                                                           |  .0977444             .2980914                            133
  (6) Somewhat poor                                                              |  .0758621             .2656951                            145
  (7) Very poor                                                                  |  .1451613              .355139                             62
  Total                                                                          |  .1879771             .3908804                          1,048
------------------------------------------------------------------------------------------------------------------------------------------------

. collect export meancomp1.xlsx, modify cell(A13)                                 
(collection Table exported to file meancomp1.xlsx)

Accounting for Confounding Variable Z

There are several ways to control” for a confounding variable. In a crosstab or mean comparison, we could hold the categories of the Z variable constant and look at the relationship between X and Y inside each category of Z. Let’s do this for both the crosstab test (controlling for gender) and the mean comparison (controlling for political party).

.  * Perhaps the simplest way to control for Z is to run the 
.  *   crosstab command multiple times, each time selecting 
.  *   different categories of Z:
. 
.  * Let's look at the values of Z
. codebook gender

--------------------------------------------------------------------------------------------------------------------------------------------------------------
gender                                                                                                                                          GENDER: Gender
--------------------------------------------------------------------------------------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: GENDER

                 Range: [1,2]                         Units: 1
         Unique values: 2                         Missing .: 0/1,057

            Tabulation: Freq.   Numeric  Label
                          423         1  (1) Male
                          634         2  (2) Female
.  * Now let's re-run our cross tab, once for women and once for men
.  *    using "if" to select each categories
. table closebars gotcorona if gender==1, ///
>                                 stat(percent, across(closebars)) stat(freq)

--------------------------------------------------------------------------------
                                    |    Resp or close friend has had Covid-19  
                                    |            No            Yes         Total
------------------------------------+-------------------------------------------
Requiring bars/restaurants to close |                                           
  Strongly Oppose                   |                                           
    Percent                         |         15.43          11.46         14.52
    Frequency                       |            50             11            61
  2                                 |                                           
    Percent                         |         18.83          17.71         18.57
    Frequency                       |            61             17            78
  Neither favor nor oppose          |                                           
    Percent                         |         14.20          21.88         15.95
    Frequency                       |            46             21            67
  4                                 |                                           
    Percent                         |         26.85          19.79         25.24
    Frequency                       |            87             19           106
  Strongly favor                    |                                           
    Percent                         |         24.69          29.17         25.71
    Frequency                       |            80             28           108
  Total                             |                                           
    Percent                         |        100.00         100.00        100.00
    Frequency                       |           324             96           420
--------------------------------------------------------------------------------

. table closebars gotcorona if gender==2, ///
>                                 stat(percent, across(closebars)) stat(freq)

--------------------------------------------------------------------------------
                                    |    Resp or close friend has had Covid-19  
                                    |            No            Yes         Total
------------------------------------+-------------------------------------------
Requiring bars/restaurants to close |                                           
  Strongly Oppose                   |                                           
    Percent                         |         11.36           8.70         10.58
    Frequency                       |            50             16            66
  2                                 |                                           
    Percent                         |         18.41          17.93         18.27
    Frequency                       |            81             33           114
  Neither favor nor oppose          |                                           
    Percent                         |         19.09           9.78         16.35
    Frequency                       |            84             18           102
  4                                 |                                           
    Percent                         |         24.32          30.98         26.28
    Frequency                       |           107             57           164
  Strongly favor                    |                                           
    Percent                         |         26.82          32.61         28.53
    Frequency                       |           118             60           178
  Total                             |                                           
    Percent                         |        100.00         100.00        100.00
    Frequency                       |           440            184           624
--------------------------------------------------------------------------------

We can also control for Z using a single, exportable table command. Notice our addition of the variable gender in the below command. Now, the columns of gotcorona will be nested inside of categories of gender.

. 
.  * SYNTAX: table dv (z iv), stat(percent, across(dv)) stat(freq)
.  *   To supress the Total rows and columns, add the notables option
. table closebars (gender gotcorona), stat(percent, across(closebars)) stat(freq)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                    |                                                            GENDER: Gender                                                         
                                    |                   (1) Male                                   (2) Female                                    Total                  
                                    |    Resp or close friend has had Covid-19       Resp or close friend has had Covid-19       Resp or close friend has had Covid-19  
                                    |            No            Yes         Total             No            Yes         Total             No            Yes         Total
------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------
Requiring bars/restaurants to close |                                                                                                                                   
  Strongly Oppose                   |                                                                                                                                   
    Percent                         |         15.43          11.46         14.52          11.36           8.70         10.58          13.09           9.64         12.16
    Frequency                       |            50             11            61             50             16            66            100             27           127
  2                                 |                                                                                                                                   
    Percent                         |         18.83          17.71         18.57          18.41          17.93         18.27          18.59          17.86         18.39
    Frequency                       |            61             17            78             81             33           114            142             50           192
  Neither favor nor oppose          |                                                                                                                                   
    Percent                         |         14.20          21.88         15.95          19.09           9.78         16.35          17.02          13.93         16.19
    Frequency                       |            46             21            67             84             18           102            130             39           169
  4                                 |                                                                                                                                   
    Percent                         |         26.85          19.79         25.24          24.32          30.98         26.28          25.39          27.14         25.86
    Frequency                       |            87             19           106            107             57           164            194             76           270
  Strongly favor                    |                                                                                                                                   
    Percent                         |         24.69          29.17         25.71          26.82          32.61         28.53          25.92          31.43         27.39
    Frequency                       |            80             28           108            118             60           178            198             88           286
  Total                             |                                                                                                                                   
    Percent                         |        100.00         100.00        100.00         100.00         100.00        100.00         100.00         100.00        100.00
    Frequency                       |           324             96           420            440            184           624            764            280         1,044
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
.  * SYNTAX: table iv z, stat(mean dv)
. table worried politics, stat(mean rightdir)

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                   |                         POLITICS: Do you consider yourself a Democrat, a Republican, an independent or n                      
                                   |  (1) Democrat   (2) Republican   (3) Independent   (4) None of these   (99) DON'T KNOW/SKIPPED ON WEB/REFUSED (VOL)      Total
-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------
How worried are you about Covid-19 |                                                                                                                               
  Not at all worried               |             0         .2619048              .125            .3846154                                              1      .2375
  2                                |      .1818182         .3648649          .2941176            .1666667                                              0   .3043478
  Somewhat worried                 |      .1348315         .3583333          .0857143            .2727273                                             .5   .2276923
  4                                |      .0384615         .3111111          .1363636                  .2                                       .3333333   .1344538
  Extremely Worried                |      .0526316         .2439024          .1538462            .0833333                                              0   .1098485
  Total                            |      .0724638          .326087          .1474104            .2184874                                           .375   .1875598
-------------------------------------------------------------------------------------------------------------------------------------------------------------------

This table is messy. Let’s restrict the analysis to just Democrats and Republicans.

.  * the "<=" below means "less than or equal to"
.  * alternatively, we could have done: politics == 1 | politics ==2
.  * where the "|" means "or"
.  
. table worried politics if politics <= 2, stat(mean rightdir)

--------------------------------------------------------------------------------------------------------------------------
                                   |    POLITICS: Do you consider yourself a Democrat, a Republican, an independent or n  
                                   |                 (1) Democrat                  (2) Republican                    Total
-----------------------------------+--------------------------------------------------------------------------------------
How worried are you about Covid-19 |                                                                                      
  Not at all worried               |                            0                        .2619048                      .22
  2                                |                     .1818182                        .3648649                 .3411765
  Somewhat worried                 |                     .1348315                        .3583333                 .2631579
  4                                |                     .0384615                        .3111111                 .1208054
  Extremely Worried                |                     .0526316                        .2439024                 .0977011
  Total                            |                     .0724638                         .326087                 .1949025
--------------------------------------------------------------------------------------------------------------------------

Ok, one more mean comparison test. Let’s repeat our table command, this time using B2AB (financial situation of household) as the IV. What do we learn about the relationship between Covid-19 and right/wrong direction and personal financial situation and right/wrong direction between the two major parties in 2020?

. table B2AB politics if politics <= 2, stat(mean rightdir)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                 |    POLITICS: Do you consider yourself a Democrat, a Republican, an independent or n  
                                                                                 |                 (1) Democrat                  (2) Republican                    Total
---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------
B2AB: And how would you describe the financial situation in your own household t |                                                                                      
  (1) Very good                                                                  |                      .106383                        .4683544                 .3333333
  (2) Somewhat good                                                              |                     .1010101                        .3357664                 .2372881
  (3) Lean toward good                                                           |                     .0508475                        .2413793                 .1452991
  (5) Lean toward poor                                                           |                     .0434783                        .1578947                 .0769231
  (6) Somewhat poor                                                              |                     .0322581                              .2                 .0804598
  (7) Very poor                                                                  |                       .09375                              .2                 .1081081
  Total                                                                          |                     .0724638                        .3281734                 .1961078
------------------------------------------------------------------------------------------------------------------------------------------------------------------------