POL 200 Lab 1: Introduction to Stata and Data Management

Introduction to Stata

Overview:

Windows: Results, Variables, Review, Variable Viewer, Command line, Viewer, Data Browser, Data Editor

Do files

Do files are text files which contain a list of Stata commands. When you run a do file, Stata will initiate each command in order. Using do files is a great way to keep track of your commands and is vital to doing reproducible social science.

  • Click on the do file icon (the one with the notepad and pencil)
  • If you have already created a do file, you can run one using: do filename.do
  • Do files also allow you to save only working commands, create and close log files, and open and save datasets automatically
  • Do files may also be helpful since you are using Stata on a lab computer. Any formatting changes (installing extra packages and commands, changing default preferences in how Stata looks, etc) will ~not~ be saved once you log off the computer. Therefore you may want to create a do file to accomplish these things automatically every time you use Stata

Accessing Datasets

Stata filenames are always given .dta file types. If the filename ends in .dta, Stata should be able to read it. To open a Stata dataset, do one of the following:

  • File > Open, then browse to your data
  • Type use [filename.dta], clear in the command line. Replace [filename.dta] with the filepath of your dataset. You can also download datasets and open them directly in Stata by replacing the filename for the URL of the file
    • The clear option tells Stata to open the new
    • Example: use http://www.sampledata/filename.dta}
    • Try it: Download from Canvas (Files > Datasets) the file gss2012_small.dta. Note where you save the file on your computer, and then open the file by utilizing the use command
  • Be careful, Stata does not automatically save your data. If you make changes to your dataset, you need to save your data by typing save filename.dta, replace
  • As a data management strategy, I suggest keeping a clean data file in its natural state. Using do files, you can make changes to your data (recoding variables, creating new variables, etc). At the end of the do file, you can save the file under a different name (say, filename_recoded.dta)

Getting Help

Stata has a nontrivial learning curve: you may need to hunt down how to perform certain functions. You should familiarize yourself with Stata’s help options.

  • Syntax: help [statacommand]
  • Example: type help codebook in the command line and hit enter
  • If you are repeatedly receiving error messages when you attempt to enter a command, look up the help file for that command
  • You may also want to try running the command through the drop-down menu rather than through the command line, to ensure your syntax is correct

Examining Your Data

  • codebook command: a good way to get a sense of the variable. inspect and describe commands also useful
  • tab1 command is one of the most useful command in Stata and shows basic frequencies of the variable. Use nolabel option to see numeric values and nolabel missing to see numeric values without labels and the number of missing cases (the number of observation on which no data is available for that variable)
  • table is a newly reconfigured command in Stata 17 which allows easy-ish exporting of tables to Word, Excel, Markdown, LaTeX or other document creation programs. We’ll mostly use table for our table creation to automate the process of creating tables for various publications we might want to use
    • Syntax of a one-way frequency table: table [varname], statistic(freq) statistic(percent)
  • list command allows you to display your data in the results window.
    • Syntax: list [varnames]
    • This command is only useful when you have a small number of observations (no more than 100). An alternate to list is browse. Use the latter to open up the browser window to see your raw data. browse can also be combined with a list of variable names if you want to view just some of your data
  • sort command: allows you to change the order of observation; useful when combining with list (and others) or to inspect dataset in the data editor
    • Syntax: sort [varnames]. Can sort by multiple variables if desired
  • Running commands on subsets of observations: you can run Stata commands on subsets of your dataset by using the if
    • Syntax: tab1 [var1] if [var2] == value.
    • The above command says: Run the tab1 command on var1 for just those observations which also have value as the value for var2
    • == means equals or is set to”
    • Variations on ==: > (greater than), < (less than), != (not equal to)
    • if always goes before , [options] for the command
    • Always specify the values, not labels, when using if
    • If you want to select values of a string variable (text variable), enclose the value in quotation marks. Syntax: if state == "NJ"

Practice

Write a do-file that opens the dataset gss2012_small.dta, runs the table command with options to include frequencies and percentages on the variable earthsun, sorts the data by age, runs the table command again on just those older than 50, and saves the dataset to your H:/. Be sure to save your do-file so you will have access to it later.