| Title: | Streamlining Complex Survey Estimation and Reliability Assessment in R |
|---|---|
| Description: | Short and understandable commands that generate tabulated, formatted, and rounded survey estimates. Mostly a wrapper for the 'survey' package (Lumley (2004) <doi:10.18637/jss.v009.i08> <https://CRAN.R-project.org/package=survey>) that identifies low-precision estimates using the National Center for Health Statistics (NCHS) presentation standards (Parker et al. (2017) <https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf>, Parker et al. (2023) <doi:10.15620/cdc:124368>). |
| Authors: | Alex Strashny [aut, cre] (ORCID: <https://orcid.org/0000-0002-6408-7745>) |
| Maintainer: | Alex Strashny <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 0.9.10.9000 |
| Built: | 2026-05-16 09:03:17 UTC |
| Source: | https://github.com/cdcgov/surveytable |
Coerce a surveytable table to a data frame. To restructure tables to make them
easier to process programmatically, see restructure(). Also see set_opts(output = "raw").
If a tabulation function produces multiple tables, that group of tables is a list,
with each element of the list being an individual table. To convert one of these tables
to a data.frame, use [[. For example, in the following code, we generate
3 tables, and then convert the third table to a data.frame.
set_survey(namcs2019sv)
mytables = tab("MDDO", "SPECCAT", "MSA")
mydf = as.data.frame(mytables[[3]])
## S3 method for class 'surveytable_table' as.data.frame(x, ...)## S3 method for class 'surveytable_table' as.data.frame(x, ...)
x |
a table produced by a tabulation function |
... |
ignored |
A data frame.
Other print:
print.surveytable_table(),
restructure(),
set_opts()
set_survey(namcs2019sv) as.data.frame( tab("AGER") )set_survey(namcs2019sv) as.data.frame( tab("AGER") )
Create a codebook for the survey
codebook(all = FALSE)codebook(all = FALSE)
all |
tabulate all the variables? |
A list of tables.
set_survey(namcs2019sv) codebook()set_survey(namcs2019sv) codebook()
Selected variables from a data system of visits to office-based physicians. Note that the unit of observation is visits, not patients - this distinction is important since a single patient can make multiple visits.
namcs2019sv namcs2019sv_dfnamcs2019sv namcs2019sv_df
An object of class survey.design2 (inherits from survey.design) with 8250 rows and 33 columns.
An object of class data.frame with 8250 rows and 33 columns.
namcs2019sv_df is a data frame.
namcs2019sv is a survey object created from namcs2019sv_df
using survey::svydesign().
SAS data: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NAMCS/sas/namcs2019_sas.zip
Survey design variables: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NAMCS/sas/readme2019-sas.txt
SAS formats: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NAMCS/sas/nam19for.txt
Documentation: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NAMCS/doc2019-508.pdf
National Summary Tables: https://www.cdc.gov/nchs/data/ahcd/namcs_summary/2019-namcs-web-tables-508.pdf
NHIS is a national survey that monitors the health of the U.S. population. This survey object contains selected variables and is limited to observations on individuals aged 18+ only.
nhis2024anhis2024a
An object of class survey.design2 (inherits from survey.design) with 32577 rows and 12 columns.
Data ("Sample adult interview"): https://www.cdc.gov/nchs/nhis/documentation/2024-nhis.html
If a tabulation function is called from the top level, it should print out
its table(s) on its own. If that tabulation function is called not from the
top level, such as from within a loop or another function, you need to call
print() explicitly. For example:
set_survey(namcs2019sv)
for (vr in c("AGER", "SEX")) {
print( tab_subset(vr, "MAJOR", "Preventive care") )
}
## S3 method for class 'surveytable_table' print(x, ...) ## S3 method for class 'surveytable_list' print(x, ...)## S3 method for class 'surveytable_table' print(x, ...) ## S3 method for class 'surveytable_list' print(x, ...)
x |
an object of class |
... |
passed to helper functions. |
The package used to produce the tables can be changed – see the output argument
of set_opts() for details. By default, the table-making package huxtable is used.
Returns x invisibly.
Other print:
as.data.frame.surveytable_table(),
restructure(),
set_opts()
set_survey(namcs2019sv) table1 = tab("AGER") print(table1) table_many = tab("MDDO", "SPECCAT", "MSA") print(table_many)set_survey(namcs2019sv) table1 = tab("AGER") print(table1) table_many = tab("MDDO", "SPECCAT", "MSA") print(table_many)
A data system of RCC residents.
rccsu2018rccsu2018
An object of class survey.design2 (inherits from survey.design) with 904 rows and 81 columns.
SAS data: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NPALS/
Documentation: https://www.cdc.gov/nchs/npals/RCCresident-readme03152021vr.pdf
Codebook: https://www.cdc.gov/nchs/data/npals/final2018rcc_su_puf_codebook.pdf
Restructure the output of the tabulation functions to make it more structured and easier to process programmatically.
restructure(tab_output, lvls = c())restructure(tab_output, lvls = c())
tab_output |
output from a tabulation function. An object of class
|
lvls |
(optional) only show these levels. |
Also see as.data.frame.surveytable_table() and set_opts(output = "raw").
data.frame
Other print:
as.data.frame.surveytable_table(),
print.surveytable_table(),
set_opts()
set_survey(namcs2019sv, mode = "nchs") ## total() |> restructure() restructure( total() ) ## tab_subset("MAJOR", "AGER") |> restructure(lvls = c("Pre-surgery", "Post-surgery")) mytables = tab_subset("MAJOR", "AGER") restructure(mytables, lvls = c("Pre-surgery", "Post-surgery"))set_survey(namcs2019sv, mode = "nchs") ## total() |> restructure() restructure( total() ) ## tab_subset("MAJOR", "AGER") |> restructure(lvls = c("Pre-surgery", "Post-surgery")) mytables = tab_subset("MAJOR", "AGER") restructure(mytables, lvls = c("Pre-surgery", "Post-surgery"))
set_opts() sets certain package options. To view these options, use show_opts().
For more advanced control and detailed customization, experienced users can
also employ options() and show_options() (refer to surveytable-options
for further information).
set_opts( reset = NULL, mode = NULL, adj = NULL, output = NULL, file = NULL, .file_temp = NULL, count = NULL, lpe = NULL, drop_na = NULL, max_levels = NULL ) show_opts()set_opts( reset = NULL, mode = NULL, adj = NULL, output = NULL, file = NULL, .file_temp = NULL, count = NULL, lpe = NULL, drop_na = NULL, max_levels = NULL ) show_opts()
reset |
reset all options to their default values? |
mode |
|
adj |
adjustment to the Korn and Graubard confidence intervals for proportions. See
|
output |
specify how the output is printed: |
file |
file name (see |
.file_temp |
place |
count |
round counts to the nearest integer ( |
lpe |
identify low-precision estimates? |
drop_na |
drop missing values ( |
max_levels |
a categorical variable can have at most this many levels. Used to avoid printing huge tables. |
If you are not setting a particular option, leave it as NULL.
mode can be either "general" or "NCHS" and has the following meaning:
"general":
Round counts to the nearest integer – same as count = "int".
Do not look for low-precision estimates – same as lpe = FALSE.
Retain missing values – same as drop_na = FALSE.
Percentage CI's: use standard Korn-Graubard CI's – same as adj = "none".
"nchs":
Round counts to the nearest 1,000 – same as count = "1k".
Identify low-precision estimates – same as lpe = TRUE.
Drop missing values – same as drop_na = TRUE.
Percentage CI's: adjust Korn-Graubard CI's for the number of degrees of
freedom, matching the SUDAAN calculation – same as adj = "nchs". This
is appropriate for some, but not all, NCHS data systems. For some NCHS
data systems, such as NHIS, you might need to set adj to one of the other values.
adj specifies the adjustment to the Korn and Graubard confidence intervals for
proportions. See svyciprop_adjusted() for details.
output determines how the output is printed:
"auto" (default): automatically select the table-making package, depending on the
destination (such as screen, HTML, or PDF / LaTeX).
"huxtable", "gt", "kableExtra", "flextable": use this table-making package. Be sure
that this package is installed.
"raw": unformatted / raw output. This is useful for getting lots of significant digits.
Also see as.data.frame.surveytable_table() and restructure().
"Excel", "Excel_v1": print to an Excel workbook. Please specify the name of an Excel
file using the file argument. Before using Excel printing, please be sure to install these
packages: openxlsx2 and mschart.
"Word": print to a Word document. Please specify the name of a Word
file using the file argument. Before using Word printing, please be sure to install these
packages: flextable and officer.
"CSV": print to a comma-separated values (CSV) file. Please specify the name of a
CSV file using the file argument.
(Nothing.)
Other options:
set_survey(),
show_options(),
surveytable-options
Other print:
as.data.frame.surveytable_table(),
print.surveytable_table(),
restructure()
set_survey(namcs2019sv) # Round counts to the nearest one thousand: set_opts(count = "1k") tab("AGER") set_opts(count = "int") show_opts()set_survey(namcs2019sv) # Round counts to the nearest one thousand: set_opts(count = "1k") tab("AGER") set_opts(count = "int") show_opts()
You must specify a survey before the other functions, such as tab(),
will work. To convert a data.frame or similar to a survey object, see survey::svydesign()
or survey::svrepdesign().
set_survey(design, ...)set_survey(design, ...)
design |
a survey object, created with |
... |
arguments to |
Optionally, the survey can have an attribute called label, which is the
long name of the survey. Optionally, each variable in the survey can have an
attribute called label, which is the variable's long name.
info about the survey
Other options:
set_opts(),
show_options(),
surveytable-options
set_survey(namcs2019sv) set_survey(namcs2019sv, mode = "general")set_survey(namcs2019sv) set_survey(namcs2019sv, mode = "general")
See surveytable-options for a discussion of some of the options.
show_options(sw = "surveytable")show_options(sw = "surveytable")
sw |
starting characters |
List of options and their values.
Other options:
set_opts(),
set_survey(),
surveytable-options
show_options()show_options()
Subset a survey, while preserving variable labels
survey_subset(design, subset, label)survey_subset(design, subset, label)
design |
a survey object |
subset |
an expression specifying the sub-population |
label |
survey label of the newly created survey object |
a new survey object
children = survey_subset(namcs2019sv, AGE < 18, "Children < 18") set_survey(children) tab("AGER")children = survey_subset(namcs2019sv, AGE < 18, "Children < 18") set_survey(children) tab("AGER")
This article describes certain package options and is intended for more advanced
users. Typical users should see set_opts() and show_opts() to set and show certain options.
To view all available options, use show_options(). Below is a description
of some noteworthy options.
By default, all estimates are rounded in a certain way. The user can change how the rounding is performed.
The following options are the names of functions that control rounding:
surveytable.tx_count (for estimates of counts), surveytable.tx_prct (for estimates
of percentages), surveytable.tx_rate (for estimates of rates), and
surveytable.tx_numeric (for estimates of numeric variables). To turn off all
rounding, set each one of these options to ".tx_none".
Each function takes one argument, a data.frame with the following columns:
x (point estimates), s (standard errors), ll and ul (CI's).
Each function outputs a data.frame with the same column names. For examples of
how this works, see the internal functions surveytable:::.tx_count_int (counts,
rounded to the nearest integer), surveytable:::.tx_count_1k (counts, rounded
to the nearest one thousand), surveytable:::.tx_prct (percentages), surveytable:::.tx_rate
(rates), and surveytable:::.tx_numeric (numeric variables).
You can set the above options to your own custom functions. You might also want
to adjust the following options, which are the names of
columns in the printed tables: surveytable.names_count (by default, this
changes when rounding counts to the nearest one thousand) and surveytable.names_prct.
The tabulation functions return objects of class surveytable_table (for a single
table) or surveytable_list (for multiple tables, which is just a list of surveytable_table
objects). A surveytable_table object is just a data.frame with the following
attributes: title, footer, and num, which is the index of columns that
should be formatted as a number.
Naturally, these objects can be printed using a variety of packages. surveytable
ships with the ability to use huxtable, gt, or kableExtra. See the output
argument of set_opts().
You can supply custom code to use another table-making package or to use one of these
table-making packages, but in a different way. The surveytable.print option
is the name of a function with the following arguments: x and ..., where x is
either a surveytable_table or a surveytable_list object. The function prints this
object. For an example of this, see the internal function surveytable:::.print_huxtable().
Optionally, all of the tabulation functions can identify low-precision estimates.
Turn on this functionality using any of the following: set_opts(lpe = TRUE),
set_opts(mode = "nchs"), set_survey(*, mode = "nchs"), or options(surveytable.find_lpe = TRUE).
By default, low-precision estimates are identified using National Center for Health Statistics (NCHS) algorithms. However, this can be changed, as described below.
Here is a description of the options related to the identification of low-precision estimates.
surveytable.find_lpe: should the tabulation functions look for low-precision
estimates? You can change this directly with options() or with either set_opts()
or set_survey().
surveytable.lpe_n, surveytable.lpe_counts, surveytable.lpe_percents: names
of 3 functions.
The argument for surveytable.lpe_n is a vector of the number of observations
for each level of the variable.
The argument for surveytable.lpe_counts is a data frame with count-related estimates.
Specifically, the data frame has the following variables:
x: point estimates of counts
s: SE
ll, ul: CI
samp.size: effective sample size
counts: actual sample size
degf: degrees of freedom
The argument for surveytable.lpe_percents is a data frame with percent-related
estimates. Specifically, the data frame has the following variables:
Proportion: point estimates of proportions (between 0 and 1)
SE: SE
LL, UL: CI
n numerator: the number of observations for which the variable is TRUE
n denominator: the total number of observations
Each of these functions must return a list with the following elements:
id: the name of the algorithm used, such as "NCHS presentation standards"
flags: a vector. For each level of the variable, short codes indicating the presence of
low-precision estimates.
has.flag: a vector of short codes that are present in flags.
descriptions: a named vector. The names must be the short codes, the values are
the longer descriptions.
For example, if a variable has 3 levels, flags might be c("", "A1 A2", ""). This
indicates that for the first and third level, nothing was found, whereas for the second
level, two different things were found, indicated by short codes A1 and A2. In
this case, has.flag = c("A1", "A2"), descriptions = c(A1 = "A1: something", A2 = "A2: something else").
Maintainer: Alex Strashny [email protected] (ORCID)
Useful links:
Other options:
set_opts(),
set_survey(),
show_options()
A version of survey::svyciprop( method = "beta" ) that adjusts for the degrees of freedom.
svyciprop_adjusted(formula, design, level = 0.95, adj = "none", ...)svyciprop_adjusted(formula, design, level = 0.95, adj = "none", ...)
formula |
see |
design |
see |
level |
see |
adj |
adjustment to the Korn and Graubard confidence intervals: |
... |
see |
adj specifies the adjustment to the Korn and Graubard confidence intervals.
"none": No adjustment is performed. Produces standard Korn and Graubard confidence intervals,
same as survey::svyciprop( method = "beta" ).
"NCHS": Adjustment that might be required by some (though not all) NCHS data systems. With
this adjustment, the degrees of freedom is set to degf(design). Consult the documentation
for the data system that you are analyzing to determine if this is the appropriate
adjustment.
"NHIS": Adjustment that might be required by NHIS. With this adjustment, the degrees
of freedom is set to nrow(design) - 1. Consult the documentation
for the data system that you are analyzing to determine if this is the appropriate
adjustment.
To use these adjustments in surveytable tabulations, call set_survey() or set_opts() with the
appropriate mode or adj argument.
Originally written by Makram Talih in 2019.
The point estimate of the proportion, with the confidence interval as an attribute.
set_survey(namcs2019sv) set_opts(adj = "NCHS") tab("AGER") set_opts(adj = "none")set_survey(namcs2019sv) set_opts(adj = "NCHS") tab("AGER") set_opts(adj = "none")
Tabulate categorical (factor or character), logical, or numeric variables.
tab( ..., test = FALSE, alpha = 0.05, p_adjust = FALSE, drop_na = getOption("surveytable.drop_na"), max_levels = getOption("surveytable.max_levels") )tab( ..., test = FALSE, alpha = 0.05, p_adjust = FALSE, drop_na = getOption("surveytable.drop_na"), max_levels = getOption("surveytable.max_levels") )
... |
names of variables (in quotes) |
test |
perform hypothesis tests? |
alpha |
significance level for tests |
p_adjust |
adjust p-values for multiple comparisons? |
drop_na |
drop missing values ( |
max_levels |
a categorical variable can have at most this many levels. Used to avoid printing huge tables. |
For categorical and logical variables, for each category, this function presents the following:
the number of observations (n);
the estimated count (Number), with its standard error (SE) and confidence
interval (LL and UL); and
the estimated percentage (Percent), with its standard error (SE) and confidence
interval (LL and UL).
Optionally, this function identifies low-precision estimates and flags
them if, according to the guidelines (such as the NCHS presentation standards), they should
be suppressed, footnoted, or reviewed by an analyst. To enable this functionality,
see set_opts() with arguments lpe = TRUE or mode = "NCHS".
For numeric variables, this function presents the following:
percentage of observations with known values (% known);
the mean of known values (Mean), with its standard error (SEM) and confidence
interval (LL and UL); and
the standard deviation (SD).
Confidence intervals (CIs) are calculated at the 95% confidence level. CIs for
count estimates are the log Student's t CIs, with adaptations
for complex surveys. CIs for percentage estimates are
the Korn and Graubard CIs, with optional adjustments. See set_opts() argument
adj. CIs for estimates of means are the Wald CIs.
A list of tables or a single table.
Other tables:
tab_cross(),
tab_rate(),
tab_subset_rate(),
total(),
total_rate()
set_survey(namcs2019sv) tab("AGER") tab("MDDO", "SPECCAT", "MSA") # Numeric variables tab("NUMMED") # Hypothesis testing with categorical variables tab("AGER", test = TRUE)set_survey(namcs2019sv) tab("AGER") tab("MDDO", "SPECCAT", "MSA") # Numeric variables tab("NUMMED") # Hypothesis testing with categorical variables tab("AGER", test = TRUE)
Create subsets of the survey using one variable, and tabulate another variable within each of the subsets. Interact two variables and tabulate.
tab_cross(vr, vrby, max_levels = getOption("surveytable.max_levels")) tab_subset( vr, vrby, lvls = c(), test = FALSE, alpha = 0.05, p_adjust = FALSE, drop_na = getOption("surveytable.drop_na"), max_levels = getOption("surveytable.max_levels") )tab_cross(vr, vrby, max_levels = getOption("surveytable.max_levels")) tab_subset( vr, vrby, lvls = c(), test = FALSE, alpha = 0.05, p_adjust = FALSE, drop_na = getOption("surveytable.drop_na"), max_levels = getOption("surveytable.max_levels") )
vr |
variable to tabulate |
vrby |
use this variable to subset the survey |
max_levels |
a categorical variable can have at most this many levels. Used to avoid printing huge tables. |
lvls |
(optional) only show these levels of |
test |
if |
alpha |
significance level for tests |
p_adjust |
adjust p-values for multiple comparisons? |
drop_na |
drop missing values ( |
tab_subset() creates subsets using the levels of vrby, and tabulates
vr in each subset. Optionally, only use the lvls levels of vrby.
vr can be categorical (factor or character), logical, or numeric.
tab_cross() crosses or interacts vr and vrby and tabulates the new
variable. Tables created using tab_subset() and tab_cross() have the same
counts but different percentages. With tab_subset(), percentages within each
subset add up to 100%. With tab_cross(), percentages across the entire
population add up to 100%. Also see var_cross().
test = TRUE performs a test of association between the two variables. Also
performs t-tests for all pairs of levels of vr and vrby.
test = "{LEVEL}", where {LEVEL} is a level of vr, performs a
conditional independence test to compare the proportion of
vr = "{LEVEL}" for different values of vrby.
A list of tables or a single table.
Other tables:
tab(),
tab_rate(),
tab_subset_rate(),
total(),
total_rate()
set_survey(namcs2019sv) # For each SEX, tabulate AGER tab_subset("AGER", "SEX") # Same counts as tab_subset(), but different percentages. tab_cross("AGER", "SEX") # Numeric variables tab_subset("NUMMED", "AGER") # Hypothesis testing tab_subset("NUMMED", "AGER", test = TRUE)set_survey(namcs2019sv) # For each SEX, tabulate AGER tab_subset("AGER", "SEX") # Same counts as tab_subset(), but different percentages. tab_cross("AGER", "SEX") # Numeric variables tab_subset("NUMMED", "AGER") # Hypothesis testing tab_subset("NUMMED", "AGER", test = TRUE)
Calculate the rates for categorical (factor) or logical variables.
tab_rate( vr, pop, per = getOption("surveytable.rate_per"), drop_na = getOption("surveytable.drop_na"), max_levels = getOption("surveytable.max_levels") )tab_rate( vr, pop, per = getOption("surveytable.rate_per"), drop_na = getOption("surveytable.drop_na"), max_levels = getOption("surveytable.max_levels") )
vr |
variable to tabulate |
pop |
either a single number or a |
per |
calculate rate per this many items in the population |
drop_na |
drop missing values ( |
max_levels |
a categorical variable can have at most this many levels. Used to avoid printing huge tables. |
A list of tables or a single table.
Other tables:
tab(),
tab_cross(),
tab_subset_rate(),
total(),
total_rate()
set_survey(namcs2019sv) # pop is a data frame tab_rate("MSA", uspop2019$MSA) # pop is a single number tab_rate("MDDO", uspop2019$total)set_survey(namcs2019sv) # pop is a data frame tab_rate("MSA", uspop2019$MSA) # pop is a single number tab_rate("MDDO", uspop2019$total)
Create subsets of the survey using one variable, and tabulate the rates of another variable within each of the subsets.
tab_subset_rate( vr, vrby, pop, lvls = c(), per = getOption("surveytable.rate_per"), drop_na = getOption("surveytable.drop_na"), max_levels = getOption("surveytable.max_levels") )tab_subset_rate( vr, vrby, pop, lvls = c(), per = getOption("surveytable.rate_per"), drop_na = getOption("surveytable.drop_na"), max_levels = getOption("surveytable.max_levels") )
vr |
variable to tabulate |
vrby |
use this variable to subset the survey |
pop |
a |
lvls |
(optional) only show these levels of |
per |
calculate rate per this many items in the population |
drop_na |
drop missing values ( |
max_levels |
a categorical variable can have at most this many levels. Used to avoid printing huge tables. |
A list of tables or a single table.
Other tables:
tab(),
tab_cross(),
tab_rate(),
total(),
total_rate()
set_survey(namcs2019sv) tab_subset_rate("AGER", "SEX", uspop2019$`AGER x SEX`)set_survey(namcs2019sv) tab_subset_rate("AGER", "SEX", uspop2019$`AGER x SEX`)
Total count
total()total()
A table
Other tables:
tab(),
tab_cross(),
tab_rate(),
tab_subset_rate(),
total_rate()
set_survey(namcs2019sv) total()set_survey(namcs2019sv) total()
Overall rate
total_rate(pop, per = getOption("surveytable.rate_per"))total_rate(pop, per = getOption("surveytable.rate_per"))
pop |
population |
per |
calculate rate per this many items in the population |
A table
Other tables:
tab(),
tab_cross(),
tab_rate(),
tab_subset_rate(),
total()
set_survey(namcs2019sv) total_rate(uspop2019$total)set_survey(namcs2019sv) total_rate(uspop2019$total)
Population estimates of the civilian non-institutional population of the
United States as of July 1, 2019. Used for calculating rates. For usage
examples, see the *_rate functions.
uspop2019uspop2019
An object of class list of length 7.
Create a new variable which is true if all of the variables in a list of variables are true.
var_all(newvr, vrs)var_all(newvr, vrs)
newvr |
name of the new variable to be created |
vrs |
vector of logical variables |
Survey object
Other variables:
var_any(),
var_case(),
var_collapse(),
var_copy(),
var_cross(),
var_cut(),
var_not()
set_survey(namcs2019sv) var_all("Medicare and Medicaid", c("PAYMCARE", "PAYMCAID")) tab("Medicare and Medicaid")set_survey(namcs2019sv) var_all("Medicare and Medicaid", c("PAYMCARE", "PAYMCAID")) tab("Medicare and Medicaid")
Create a new variable which is true if any of the variables in a list of variables are true.
var_any(newvr, vrs)var_any(newvr, vrs)
newvr |
name of the new variable to be created |
vrs |
vector of logical variables |
Survey object
Other variables:
var_all(),
var_case(),
var_collapse(),
var_copy(),
var_cross(),
var_cut(),
var_not()
set_survey(namcs2019sv) var_any("Imaging services" , c("ANYIMAGE", "BONEDENS", "CATSCAN", "ECHOCARD", "OTHULTRA" , "MAMMO", "MRI", "XRAY", "OTHIMAGE")) tab("Imaging services")set_survey(namcs2019sv) var_any("Imaging services" , c("ANYIMAGE", "BONEDENS", "CATSCAN", "ECHOCARD", "OTHULTRA" , "MAMMO", "MRI", "XRAY", "OTHIMAGE")) tab("Imaging services")
Convert factor to logical
var_case(newvr, vr, cases, retain_na = TRUE)var_case(newvr, vr, cases, retain_na = TRUE)
newvr |
name of the new logical variable to be created |
vr |
factor variable |
cases |
one or more levels of |
retain_na |
for the observations where |
Survey object
Other variables:
var_all(),
var_any(),
var_collapse(),
var_copy(),
var_cross(),
var_cut(),
var_not()
set_survey(namcs2019sv) var_case("Preventive care visits", "MAJOR", "Preventive care") tab("Preventive care visits") var_case("Surgery-related visits" , "MAJOR" , c("Pre-surgery", "Post-surgery")) tab("Surgery-related visits") var_case("Non-primary" , "SPECCAT.bad" , c("Surgical care specialty", "Medical care specialty")) tab("Non-primary") tab("Non-primary", drop_na = TRUE)set_survey(namcs2019sv) var_case("Preventive care visits", "MAJOR", "Preventive care") tab("Preventive care visits") var_case("Surgery-related visits" , "MAJOR" , c("Pre-surgery", "Post-surgery")) tab("Surgery-related visits") var_case("Non-primary" , "SPECCAT.bad" , c("Surgical care specialty", "Medical care specialty")) tab("Non-primary") tab("Non-primary", drop_na = TRUE)
Collapse two or more levels of a factor variable into a single level.
var_collapse(vr, newlevel, oldlevels)var_collapse(vr, newlevel, oldlevels)
vr |
factor variable |
newlevel |
name of the new level |
oldlevels |
vector of old levels |
Survey object
Other variables:
var_all(),
var_any(),
var_case(),
var_copy(),
var_cross(),
var_cut(),
var_not()
set_survey(namcs2019sv) tab("PRIMCARE") var_collapse("PRIMCARE", "Unknown if PCP", c("Blank", "Unknown")) tab("PRIMCARE")set_survey(namcs2019sv) tab("PRIMCARE") var_collapse("PRIMCARE", "Unknown if PCP", c("Blank", "Unknown")) tab("PRIMCARE")
Create a new variable that is a copy of another variable. You can modify the copy, while the original remains unchanged. See examples.
var_copy(newvr, vr)var_copy(newvr, vr)
newvr |
name of the new variable to be created |
vr |
variable |
Survey object
Other variables:
var_all(),
var_any(),
var_case(),
var_collapse(),
var_cross(),
var_cut(),
var_not()
set_survey(namcs2019sv) var_copy("Age group", "AGER") var_collapse("Age group", "65+", c("65-74 years", "75 years and over")) var_collapse("Age group", "25-64", c("25-44 years", "45-64 years")) tab("AGER", "Age group")set_survey(namcs2019sv) var_copy("Age group", "AGER") var_collapse("Age group", "65+", c("65-74 years", "75 years and over")) var_collapse("Age group", "25-64", c("25-44 years", "45-64 years")) tab("AGER", "Age group")
Create a new variable which is an interaction of two other variables. Also
see tab_cross().
var_cross(newvr, vr, vrby)var_cross(newvr, vr, vrby)
newvr |
name of the new variable to be created |
vr |
first variable |
vrby |
second variable |
Survey object
Other variables:
var_all(),
var_any(),
var_case(),
var_collapse(),
var_copy(),
var_cut(),
var_not()
set_survey(namcs2019sv) var_cross("Age x Sex", "AGER", "SEX") tab("Age x Sex")set_survey(namcs2019sv) var_cross("Age x Sex", "AGER", "SEX") tab("Age x Sex")
Create a new categorical variable based on a numeric variable.
var_cut(newvr, vr, breaks, labels)var_cut(newvr, vr, breaks, labels)
newvr |
name of the new factor variable to be created |
vr |
numeric variable |
breaks |
see |
labels |
see |
Survey object
Other variables:
var_all(),
var_any(),
var_case(),
var_collapse(),
var_copy(),
var_cross(),
var_not()
set_survey(namcs2019sv) # In some data systems, variables might contain "special values". For example, # negative values might indicate unknowns (which should be coded as `NA`). # Though in this particular data, there are no unknowns. var_cut("Age group" , "AGE" , c(-Inf, -0.1, 0, 4, 14, 64, Inf) , c(NA, "Under 1", "1-4", "5-14", "15-64", "65 and over")) tab("Age group")set_survey(namcs2019sv) # In some data systems, variables might contain "special values". For example, # negative values might indicate unknowns (which should be coded as `NA`). # Though in this particular data, there are no unknowns. var_cut("Age group" , "AGE" , c(-Inf, -0.1, 0, 4, 14, 64, Inf) , c(NA, "Under 1", "1-4", "5-14", "15-64", "65 and over")) tab("Age group")
List variables in a survey.
var_list(sw = "", all = FALSE)var_list(sw = "", all = FALSE)
sw |
starting characters in variable name (case insensitive) |
all |
print all variables? |
A table
set_survey(namcs2019sv) var_list("age")set_survey(namcs2019sv) var_list("age")
Logical NOT
var_not(newvr, vr)var_not(newvr, vr)
newvr |
name of the new variable to be created |
vr |
a logical variable |
Survey object
Other variables:
var_all(),
var_any(),
var_case(),
var_collapse(),
var_copy(),
var_cross(),
var_cut()
set_survey(namcs2019sv) var_not("Private insurance not used", "PAYPRIV")set_survey(namcs2019sv) var_not("Private insurance not used", "PAYPRIV")