POL 200 Lab 1: Introduction to Stata and Data Management
Introduction to Stata
Overview:
Windows: Results, Variables, Review, Variable Viewer, Command line, Viewer, Data Browser, Data Editor
Do files
Do files are text files which contain a list of Stata commands. When you run a do file, Stata will initiate each command in order. Using do files is a great way to keep track of your commands and is vital to doing reproducible social science.
- Click on the do file icon (the one with the notepad and pencil)
- If you have already created a do file, you can run one using:
do filename.do
- Do files also allow you to save only working commands, create and close log files, and open and save datasets automatically
- Do files may also be helpful since you are using Stata on a lab computer. Any formatting changes (installing extra packages and commands, changing default preferences in how Stata looks, etc) will ~not~ be saved once you log off the computer. Therefore you may want to create a do file to accomplish these things automatically every time you use Stata
Accessing Datasets
Stata filenames are always given .dta
file types. If the filename ends in .dta
, Stata should be able to read it. To open a Stata dataset, do one of the following:
- File > Open, then browse to your data
- Type
use [filename.dta], clear
in the command line. Replace[filename.dta]
with the filepath of your dataset. You can also download datasets and open them directly in Stata by replacing the filename for the URL of the file- The
clear
option tells Stata to open the new - Example:
use http://www.sampledata/filename.dta}
- Try it: Download from Canvas (Files > Datasets) the file
gss2012_small.dta
. Note where you save the file on your computer, and then open the file by utilizing theuse
command
- The
- Be careful, Stata does not automatically save your data. If you make changes to your dataset, you need to save your data by typing
save filename.dta, replace
- As a data management strategy, I suggest keeping a clean data file in its natural state. Using do files, you can make changes to your data (recoding variables, creating new variables, etc). At the end of the do file, you can save the file under a different name (say,
filename_recoded.dta
)
Getting Help
Stata has a nontrivial learning curve: you may need to hunt down how to perform certain functions. You should familiarize yourself with Stata’s help
options.
- Syntax:
help [statacommand]
- Example: type
help codebook
in the command line and hit enter - If you are repeatedly receiving error messages when you attempt to enter a command, look up the help file for that command
- You may also want to try running the command through the drop-down menu rather than through the command line, to ensure your syntax is correct
Examining Your Data
codebook
command: a good way to get a sense of the variable.inspect
anddescribe
commands also usefultab1
command is one of the most useful command in Stata and shows basic frequencies of the variable. Usenolabel
option to see numeric values andnolabel missing
to see numeric values without labels and the number of missing cases (the number of observation on which no data is available for that variable)table
is a newly reconfigured command in Stata 17 which allows easy-ish exporting of tables to Word, Excel, Markdown, LaTeX or other document creation programs. We’ll mostly usetable
for our table creation to automate the process of creating tables for various publications we might want to use- Syntax of a one-way frequency table:
table [varname], statistic(freq) statistic(percent)
- Syntax of a one-way frequency table:
list
command allows you to display your data in the results window.- Syntax:
list [varnames]
- This command is only useful when you have a small number of observations (no more than 100). An alternate to
list
isbrowse
. Use the latter to open up the browser window to see your raw data.browse
can also be combined with a list of variable names if you want to view just some of your data
- Syntax:
sort
command: allows you to change the order of observation; useful when combining withlist
(and others) or to inspect dataset in the data editor- Syntax:
sort [varnames]
. Can sort by multiple variables if desired
- Syntax:
- Running commands on subsets of observations: you can run Stata commands on subsets of your dataset by using the
if
- Syntax:
tab1 [var1] if [var2] == value
. - The above command says: “Run the tab1 command on var1 for just those observations which also have value as the value for var2
==
means equals or “is set to”- Variations on
==
:>
(greater than),<
(less than),!=
(not equal to) if
always goes before, [options]
for the command- Always specify the values, not labels, when using
if
- If you want to select values of a string variable (text variable), enclose the value in quotation marks. Syntax:
if state == "NJ"
- Syntax:
Practice
Write a do-file that opens the dataset gss2012_small.dta
, runs the table
command with options to include frequencies and percentages on the variable earthsun
, sorts the data by age
, runs the table
command again on just those older than 50, and saves the dataset to your H:/. Be sure to save your do-file so you will have access to it later.