CRAN release: 2022-08-22
write_*()now accept functions as well as strings in the
.name_repairargument in line with the documentation. Previously they only supported string values (#684).
CRAN release: 2022-04-15
- @gorcha is now a haven author in recognition of his significant and sustained contributions.
write_functions can now write custom variable widths by setting the
When writing files, the minimum width for character variables is now 1. This fixes issues with statistical software reading blank character variables with width 0 (#650).
write_dta()now uses strL when strings are too long to be stored in an str# variable (#437). strL is used when strings are longer than 2045 characters by default, which matches Stata’s behaviour, but this can be reduced with the
write_sav()now supports all 3 SPSS compression modes specified as a character string - “byte”, “none” and “zsav” (#614).
FALSEcan be used for backwards compatibility, and correspond to the “zsav” and “none” options respectively.
POSIXct and POSIXlt values with no time component (e.g. “2010-01-01”) were being converted to
NAwhen attempting to convert the output timezone to UTC. These now output successfully (#634).
Fix bug in output timezone conversion that was causing variable labels and other variable attributes to disappear (#624).
Updated to ReadStat 1.1.8 RC.
vctrs casting and coercion generics now do less work when working with two identical
labelled()vectors. This significantly improves performance when working with
labelled()vectors in grouped data frames (#658).
Errors and warnings now use
R 3.4 is now the minimum supported version, in line with tidyverse policy.
cli >= 3.0.0 has been added to Imports to support new error messaging.
lifecycle has been added to Imports, and is now used to manage deprecations.
CRAN release: 2021-08-02
Updated to ReadStat 1.1.7 RC (#620).
CRAN release: 2021-04-23
- Fix buglet when combining
labelled()with identical labels.
CRAN release: 2021-04-14
labelled_spss()gains full vctrs support thanks to the hard work of @gorcha (#527, #534, #538, #557). This means that they should now work seamlessly in dplyr 1.0.0, tidyr 1.0.0 and other packages that use vctrs.
Date-times are no longer forced to UTC, but instead converted to the equivalent UTC (#555). This should ensure that you see the same date-time in R and in Stata/SPSS/SAS.
Updated to ReadStat 1.1.5. Most importantly this includes support for SAS binary compression.
as_factor(levels = "values")preserves values of unlabelled elements (#570).
write_*()now validate file and variable metadata with ReadStat. This should prevent many invalid files from being written (#408). Additionally, validation failures now provide more details about the source of the problem (e.g. the column name of the problem) (#463).
write_sav(compress = FALSE)now uses SPSS bytecode compression instead of the rarely-used uncompressed mode.
compress = TRUEcontinues to use the newer (and not universally supported, but more compact) zlib format (@oliverbock, #544).
CRAN release: 2020-06-01
CRAN release: 2020-05-24
CRAN release: 2019-11-08
Thanks to the hard work of @mikmart, all
read_*() functions gain three new arguments that allow you to read in only part of a large file:
col_select: selects columns to read with a tidyselect interface (#248).
skip: skips rows before reading data (#370).
n_max: limits the number of rows to read.
This also brings with it a deprecation:
read_sas() has been deprecated in favour of the new
write_functions gain a
.name_repairargument that controls what happens when the input dataset has repeated column names (#436).
write_functions can now write labelled vectors with
CRAN release: 2019-02-19
labelled objects get pretty printing that shows the labels and NA values when inside of a
tbl_df. Turn this behaviour off with behavior using
option(haven.show_pillar_labels = FALSE) (#340, @gergness).
Updated to latest ReadStat from @evanmiller:
read_por()can now read files from SPSS 25 (#412)
read_por()now uses base-30 instead of base-10 for the exponent (#413)
read_sas()can read zero column file (#420)
read_sav()reads long strings (#381)
read_sav()has greater memory limit allowing it to read more labels (#418)
read_spss()reads long variable labels (#422)
write_sav()no longer creates incorrect column names when >10k columns (#410)
write_sav()no longer crashes when writing long label names (#395)
CRAN release: 2018-11-21
labelled_spss()now produce objects with class “haven_labelled” and “haven_labelled_spss”. Previously, the “labelled” class name clashed with the labelled class defined by Hmisc (#329).
Unfortunately I couldn’t come up with a way to fix this problem except to change the class name; it seems reasonable that haven should be the one to change names given that Hmisc has been around much longer. This will require some changes to packages that use haven, but shouldn’t affect user code.
labelled_spss()now support adding the
labelattribute to the resulting object. The
labelis a short, human-readable description of the object, and is now also used when printing, and can be easily removed using the new
zap_label()function. (#362, @huftis)
labelattribute was supported both when reading and writing SPSS files, but it was not possible to actually create objects in R having the
labelattribute using the constructors
CRAN release: 2018-06-27
haven can read and write non-ASCII paths in R 3.5 (#371).
write_dta()allows non-ASCII variable labels for version 14 and above (#383). It also uses a less strict check for integers so that a labelled double containing only integer values can written (#343).
Update to latest readstat.
CRAN release: 2018-01-18
Update to latest readstat. Includes:
write_*()correctly measures lengths of non-ASCII labels (#258): this fixes the cryptic error “A provided string value was longer than the available storage size of the specified column.”
CRAN release: 2017-07-09
Update to latest readstat. Includes:
All write methds now check that you’re trying to write a data frame (#287).
write_*functions turn ordered factors into labelled vectors (#285)
CRAN release: 2016-09-23
Update to latest ReadStat (#65). Includes:
Added support for reading and writing variable formats. Similarly to to variable labels, formats are stored as an attribute on the vector. Use
zap_formats()if you want to remove these attributes. (@gorcha, #119, #123).
Added support for reading file “label” and “notes”. These are not currently printed, but are stored in the attributes if you need to access them (#186).
Added support for “tagged” missing values (in Stata these are called “extended” and in SAS these are called “special”) which carry an extra byte of information: a character label from “a” to “z”. The downside of this change is that all integer columns are now converted to doubles, to support the encoding of the tag in the payload of a NaN.
labelled_spss()is a subclass of
labelled()that can model user missing values from SPSS. These can either be a set of distinct values, or for numeric vectors, a range.
zap_labels()strips labels, and replaces user-defined missing values with
zap_missing()just replaces user-defined missing values with
labelled_spss()is potentially dangerous to work with in R because base functions don’t know about
labelled_spss()functions so will return the wrong result in the presence of user-defined missing values. For this reason, they will only be created by
user_na = TRUE(normally user-defined missings are converted to NA).
levels = "defaultor
levels = "both"preserves unused labels (implicit missing) when converting (#172, @itsdalmo). Labels (and the resulting factor levels) are always sorted by values.
as_factor()gains a new
levels = "default"mechanism. This uses the labels where present, and otherwise uses the labels. This is now the default, as it seems to map better to the semantics of labelled values in other statistical packages (#81). You can also use
levels = "both"to combine the value and the label into a single string (#82). It also gains a method for data frames, so you can easily convert every labelled column to a factor in one function call.
vignette("semantics", package = "haven")discusses the semantics of missing values and labelling in SAS, SPSS, and Stata, and how they are translated into R.
labelled()is less strict with its checks: you can mix double and integer value and labels (#86, #110, @lionel-), and
is.labelled()is now exported (#124). Putting a labelled vector in a data frame now generates the correct column name (#193).
read_dta()now recognises “%d” and custom date types (#80, #130). It also gains an encoding parameter which you can use to override the default encoding. This is particularly useful for Stata 13 and below which did not store the encoding used in the file (#163).
read_sav()now correctly recognises EDATE and JDATE formats as dates (#72). Variables with format DATE, ADATE, EDATE, JDATE or SDATE are imported as
Datevariables instead of
POSIXct. You can now set
user_na = TRUEto preserve user defined missing values: they will be given class
type_sum()method for labelled objects so they print nicely in tibbles.
write_dta()now verifies that variable names are valid Stata variables (#132), and throws an error if you attempt to save a labelled vector that is not an integer (#144). You can choose which
versionof Stata’s file format to output (#217).
write_sas()allows you to write data frames out to
sas7bdatfiles. This is still somewhat experimental.
write_sav()support writing date and date/times (#25, #139, #145). Labelled values are always converted to UTF-8 before being written out (#87). Infinite values are now converted to missing values since SPSS and Stata don’t support them (#149). Both use a better test for missing values (#70).
zap_labels()has been completely overhauled. It now works (@markriseley, #69), and only drops label attributes; it no longer replaces labelled values with
NAs. It also gains a data frame method that zaps the labels from every column.
print.labelled_spss()now display the type.
CRAN release: 2015-04-09
fixed a bug in
as_factor.labelled, which generated
’s and wrong labels for integer labels.
zap_labels()now leaves unlabelled vectors unchanged, making it easier to apply to all columns.
Updates from ReadStat. Including fixes for various parsing bugs, more encodings, and better support for large files.
hms objects deal better with missings when printing.
Fixed bug causing labels for numeric variables to be read in as integers and associated error:
Error: `x` and `labels` must be same type