Currently haven can read and write logical, integer, numeric, character and factors. See labelled() for how labelled variables in Stata are handled in R.

read_dta(file, encoding = NULL)

read_stata(file, encoding = NULL)

write_dta(data, path, version = 14)

Arguments

file

Either a path to a file, a connection, or literal data (either a single string or a raw vector).

Files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed. Files starting with http://, https://, ftp://, or ftps:// will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed.

Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).

encoding

The character encoding used for the file. Generally, only needed for Stata 13 files and earlier. See Encoding section for details.

data

Data frame to write.

path

Path to a file where the data will be written.

version

File version to use. Supports versions 8-15.

Value

A tibble, data frame variant with nice defaults.

Variable labels are stored in the "label" attribute of each variable. It is not printed on the console, but the RStudio viewer will show it.

write_dta() returns the input data invisibly.

Character encoding

Prior to Stata 14, files did not declare a text encoding, and the default encoding differed across platforms. If encoding = NULL, haven assumes the encoding is windows-1252, the text encoding used by Stata on Windows. Unfortunately Stata on Mac and Linux use a different default encoding, "latin1". If you encounter an error such as "Unable to convert string to the requested encoding", try encoding = "latin1"

For Stata 14 and later, you should not need to manually specify encoding value unless the value was incorrectly recorded in the source file.

Examples

path <- system.file("examples", "iris.dta", package = "haven") read_dta(path)
#> # A tibble: 150 x 5 #> sepallength sepalwidth petallength petalwidth species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.10 3.5 1.40 0.200 setosa #> 2 4.90 3 1.40 0.200 setosa #> 3 4.70 3.20 1.30 0.200 setosa #> 4 4.60 3.10 1.5 0.200 setosa #> 5 5 3.60 1.40 0.200 setosa #> 6 5.40 3.90 1.70 0.400 setosa #> 7 4.60 3.40 1.40 0.300 setosa #> 8 5 3.40 1.5 0.200 setosa #> 9 4.40 2.90 1.40 0.200 setosa #> 10 4.90 3.10 1.5 0.100 setosa #> # ... with 140 more rows
tmp <- tempfile(fileext = ".dta") write_dta(mtcars, tmp) read_dta(tmp)
#> # A tibble: 32 x 11 #> mpg cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 #> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 #> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 #> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 #> # ... with 22 more rows
read_stata(tmp)
#> # A tibble: 32 x 11 #> mpg cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 #> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 #> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 #> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 #> # ... with 22 more rows