Package 'censobr' reference manual

Title:	Download Data from Brazil's Population Census
Description:	Easy access to data from Brazil's population censuses. The package provides a simple and efficient way to download and read the data sets and the documentation of all the population censuses taken in and after 1960 in the country. The package is built on top of the 'Arrow' platform <https://arrow.apache.org/docs/r/>, which allows users to work with larger-than-memory census data using 'dplyr' familiar functions. <https://arrow.apache.org/docs/r/articles/arrow.html#analyzing-arrow-data-with-dplyr>.
Authors:	Rafael H. M. Pereira [aut, cre] , Rogério J. Barbosa [aut] , Diego Rabatone Oliveira [ctb], Neal Richardson [ctb], Ipea - Institute for Applied Economic Research [cph, fnd]
Maintainer:	Rafael H. M. Pereira <[email protected]>
License:	MIT + file LICENSE
Version:	0.4.1
Built:	2025-02-28 12:40:22 UTC
Source:	https://github.com/ipeagit/censobr

Manage cached files from the censobr package

Description

Manage cached files from the censobr package

Usage

censobr_cache(list_files = TRUE, delete_file = NULL)
censobr_cache(list_files = TRUE, delete_file = NULL)

Arguments

`list_files`	Logical. Whether to print a message with the address of all censobr data sets cached locally. Defaults to `TRUE`.
`delete_file`	String. The file name (basename) of a censobr data set cached locally that should be deleted. Defaults to `NULL`, so that no file is deleted. If `delete_file = "all"`, then all cached censobr files are deleted.

Value

A message indicating which file exist and/or which ones have been deleted from local cache directory.

Examples


# list all files cached
censobr_cache(list_files = TRUE)

# delete particular file
censobr_cache(delete_file = '2010_deaths')

# list all files cached
censobr_cache(list_files = TRUE)

# delete particular file
censobr_cache(delete_file = '2010_deaths')

Data dictionary of Brazil's census data

Description

Open on a browser the data dictionary of Brazil's census data.

Usage

data_dictionary(year = 2010, dataset = NULL, showProgress = TRUE, cache = TRUE)
data_dictionary(year = 2010, dataset = NULL, showProgress = TRUE, cache = TRUE)

Arguments

`year`	Numeric. Year of reference in the format `yyyy`. Defaults to `2010`.
`dataset`	Character. The dataset of data dictionary to be opened. Options include `c("population", "households", "families", "mortality", "emigration", "tracts")`.
`showProgress`	Logical. Defaults to `TRUE` display download progress bar. The progress bar only reflects only the downloading time, not the time to load the data to memory.
`cache`	Logical. Whether the function should read the data cached locally, which is much faster. Defaults to `TRUE`. The first time the user runs the function, `censobr` will download the file and store it locally so that the file only needs to be download once. If `FALSE`, the function will download the data again and overwrite the local file.

Value

Returns NULL and opens .html or .pdf file on the browser

Examples


# Open data dictionary on browser
data_dictionary(year = 2010,
                dataset = 'population',
                showProgress = FALSE)

data_dictionary(year = 1980,
                dataset = 'households',
                showProgress = FALSE)

data_dictionary(year = 2010,
                dataset = 'tracts',
                showProgress = FALSE)

# Open data dictionary on browser
data_dictionary(year = 2010,
                dataset = 'population',
                showProgress = FALSE)

data_dictionary(year = 1980,
                dataset = 'households',
                showProgress = FALSE)

data_dictionary(year = 2010,
                dataset = 'tracts',
                showProgress = FALSE)

Interview manual of the data collection of Brazil's censuses

Description

Open on a browser the interview manual of the data collection of Brazil's censuses

Usage

interview_manual(year = NULL, showProgress = TRUE, cache = TRUE)
interview_manual(year = NULL, showProgress = TRUE, cache = TRUE)

Arguments

`year`	Numeric. Year of reference in the format `yyyy`. Defaults to `2010`.
`showProgress`	Logical. Defaults to `TRUE` display download progress bar. The progress bar only reflects only the downloading time, not the time to load the data to memory.
`cache`	Logical. Whether the function should read the data cached locally, which is much faster. Defaults to `TRUE`. The first time the user runs the function, `censobr` will download the file and store it locally so that the file only needs to be download once. If `FALSE`, the function will download the data again and overwrite the local file.

Value

Opens a .pdf file on the browser

Examples


# Open interview manual on browser
interview_manual(year = 2010, showProgress = FALSE)

# Open interview manual on browser
interview_manual(year = 2010, showProgress = FALSE)

Questionnaires used in the data collection of Brazil's censuses

Description

Open on a browser the questionnaire used in the data collection of Brazil's censuses

Usage

questionnaire(year = 2010, type = NULL, showProgress = TRUE, cache = TRUE)
questionnaire(year = 2010, type = NULL, showProgress = TRUE, cache = TRUE)

Arguments

`year`	Numeric. Year of reference in the format `yyyy`. Defaults to `2010`.
`type`	Character. The type of questionnaire used in the survey, whether the `"long"` one used in the sample component of the census, or the `"short"` one, which is answered by more households. Options include `c("long", "short")`.
`showProgress`	Logical. Defaults to `TRUE` display download progress bar. The progress bar only reflects only the downloading time, not the time to load the data to memory.
`cache`	Logical. Whether the function should read the data cached locally, which is much faster. Defaults to `TRUE`. The first time the user runs the function, `censobr` will download the file and store it locally so that the file only needs to be download once. If `FALSE`, the function will download the data again and overwrite the local file.

Value

Opens a .pdf file on the browser

Examples


library(censobr)

# Open questionnaire on browser
questionnaire(year = 2010, type = 'long', showProgress = FALSE)

library(censobr)

# Open questionnaire on browser
questionnaire(year = 2010, type = 'long', showProgress = FALSE)

Download microdata of emigration records from Brazil's census

Description

Download microdata of emigration records from Brazil's census. Data collected in the sample component of the questionnaire.

Usage

read_emigration(
  year = 2010,
  columns = NULL,
  add_labels = NULL,
  merge_households = FALSE,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)
read_emigration(
  year = 2010,
  columns = NULL,
  add_labels = NULL,
  merge_households = FALSE,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)

Arguments

`year`	Numeric. Year of reference in the format `yyyy`. Defaults to `2010`.
`columns`	String. A vector of column names to keep. The rest of the columns are not read. Defaults to `NULL` and read all columns.
`add_labels`	Character. Whether the function should add labels to the responses of categorical variables. When `add_labels = "pt"`, the function adds labels in Portuguese. Defaults to `NULL`.
`merge_households`	Logical. Indicate whether the function should merge household variables to the output data. Defaults to `FALSE`.
`as_data_frame`	Logical. When `FALSE` (Default), the function returns an Arrow Dataset, which allows users to work with larger-than-memory data. If `TRUE`, the function returns `data.frame`.
`showProgress`	Logical. Defaults to `TRUE` display download progress bar. The progress bar only reflects only the downloading time, not the time to load the data to memory.
`cache`	Logical. Whether the function should read the data cached locally, which is much faster. Defaults to `TRUE`. The first time the user runs the function, `censobr` will download the file and store it locally so that the file only needs to be download once. If `FALSE`, the function will download the data again and overwrite the local file.

Value

An arrow Dataset or a "data.frame" object.

Examples


# return data as arrow Dataset
df <- read_emigration(year = 2010,
                      showProgress = FALSE)


# return data as data.frame
df <- read_emigration(year = 2010,
                      as_data_frame = TRUE,
                      showProgress = FALSE)


# return data as arrow Dataset
df <- read_emigration(year = 2010,
                      showProgress = FALSE)


# return data as data.frame
df <- read_emigration(year = 2010,
                      as_data_frame = TRUE,
                      showProgress = FALSE)

Download microdata of family records from Brazil's census

Description

Download microdata of family records from Brazil's census. Data collected in the sample component of the questionnaire.

Usage

read_families(
  year = 2000,
  columns = NULL,
  add_labels = NULL,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)
read_families(
  year = 2000,
  columns = NULL,
  add_labels = NULL,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)

Arguments

`year`	Numeric. Year of reference in the format `yyyy`. Defaults to `2000`.
`columns`	String. A vector of column names to keep. The rest of the columns are not read. Defaults to `NULL` and read all columns.
`add_labels`	Character. Whether the function should add labels to the responses of categorical variables. When `add_labels = "pt"`, the function adds labels in Portuguese. Defaults to `NULL`.
`as_data_frame`	Logical. When `FALSE` (Default), the function returns an Arrow Dataset, which allows users to work with larger-than-memory data. If `TRUE`, the function returns `data.frame`.
`showProgress`	Logical. Defaults to `TRUE` display download progress bar. The progress bar only reflects only the downloading time, not the time to load the data to memory.
`cache`	Logical. Whether the function should read the data cached locally, which is much faster. Defaults to `TRUE`. The first time the user runs the function, `censobr` will download the file and store it locally so that the file only needs to be download once. If `FALSE`, the function will download the data again and overwrite the local file.

Value

An arrow Dataset or a "data.frame" object.

Examples


# return data as arrow Dataset
df <- read_families(year = 2000,
                    showProgress = FALSE)


# return data as arrow Dataset
df <- read_families(year = 2000,
                    showProgress = FALSE)

Download microdata of household records from Brazil's census

Description

Download microdata of household records from Brazil's census. Data collected in the sample component of the questionnaire.

Usage

read_households(
  year = 2010,
  columns = NULL,
  add_labels = NULL,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)
read_households(
  year = 2010,
  columns = NULL,
  add_labels = NULL,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)

Arguments

`year`	Numeric. Year of reference in the format `yyyy`. Defaults to `2010`.
`columns`	String. A vector of column names to keep. The rest of the columns are not read. Defaults to `NULL` and read all columns.
`add_labels`	Character. Whether the function should add labels to the responses of categorical variables. When `add_labels = "pt"`, the function adds labels in Portuguese. Defaults to `NULL`.
`as_data_frame`	Logical. When `FALSE` (Default), the function returns an Arrow Dataset, which allows users to work with larger-than-memory data. If `TRUE`, the function returns `data.frame`.
`showProgress`	Logical. Defaults to `TRUE` display download progress bar. The progress bar only reflects only the downloading time, not the time to load the data to memory.
`cache`	Logical. Whether the function should read the data cached locally, which is much faster. Defaults to `TRUE`. The first time the user runs the function, `censobr` will download the file and store it locally so that the file only needs to be download once. If `FALSE`, the function will download the data again and overwrite the local file.

Value

An arrow Dataset or a "data.frame" object.

1960 Census

The 1960 microdata version available in censobr is a combination of two versions of the Demographic Census sample. The 25% sample data from the 1960 Census was never fully processed by IBGE - several states did not have their questionnaires digitized. Currently, this dataset only has data from 16 states of the Federation (and from a contested border region between Minas Gerais and Espirito Santo called Serra dos Aimores). Information is missing for the states of the former Northern Region, Maranhão, Piaui, Guanabara, Santa Catarina, and Espírito Santo. In 1965, IBGE decided to draw a probabilistic sub-sample of approximately 1.27% of the population, including all units of the federation. With this data, IBGE produced several official reports at the time. The data from censobr is the combination of these two datasets.

We pre-processed the 1.27% sample data to ensured data consistency, given the original data was partially corrupted. We also created a sample weight variable to correct for unbalanced data and to expand te sample to the total population. For the data from the 25% sample, the weights expand to the municipal totals. Meanwhile, for the data from the 1.27% sample, the weights expand to the state totals. Additionally, we constructed a few variables that allow for the approximate incorporation of the complex sample design, enabling the proper calculation of standard errors and confidence intervals.

You can read more about the 1960 Census and find a thorough documentation of how this dataset was processed on this link https://github.com/antrologos/ConsistenciaCenso1960Br.

Examples


# return data as arrow Dataset
df <- read_households(year = 2010,
                      showProgress = FALSE)


# return data as arrow Dataset
df <- read_households(year = 2010,
                      showProgress = FALSE)

Download microdata of death records from Brazil's census

Description

Download microdata of death records from Brazil's census. Data collected in the sample component of the questionnaire.

Usage

read_mortality(
  year = 2010,
  columns = NULL,
  add_labels = NULL,
  merge_households = FALSE,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)
read_mortality(
  year = 2010,
  columns = NULL,
  add_labels = NULL,
  merge_households = FALSE,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)

Arguments

`year`	Numeric. Year of reference in the format `yyyy`. Defaults to `2010`.
`columns`	String. A vector of column names to keep. The rest of the columns are not read. Defaults to `NULL` and read all columns.
`add_labels`	Character. Whether the function should add labels to the responses of categorical variables. When `add_labels = "pt"`, the function adds labels in Portuguese. Defaults to `NULL`.
`merge_households`	Logical. Indicate whether the function should merge household variables to the output data. Defaults to `FALSE`.
`as_data_frame`	Logical. When `FALSE` (Default), the function returns an Arrow Dataset, which allows users to work with larger-than-memory data. If `TRUE`, the function returns `data.frame`.
`showProgress`	Logical. Defaults to `TRUE` display download progress bar. The progress bar only reflects only the downloading time, not the time to load the data to memory.
`cache`	Logical. Whether the function should read the data cached locally, which is much faster. Defaults to `TRUE`. The first time the user runs the function, `censobr` will download the file and store it locally so that the file only needs to be download once. If `FALSE`, the function will download the data again and overwrite the local file.

Value

An arrow Dataset or a "data.frame" object.

Examples


library(censobr)

# return data as arrow Dataset
df <- read_mortality(year = 2010,
                     showProgress = FALSE)

# dplyr::glimpse(df)

# return data as data.frame
df <- read_mortality(year = 2010,
                     as_data_frame = TRUE,
                     showProgress = FALSE)

# dplyr::glimpse(df)

library(censobr)

# return data as arrow Dataset
df <- read_mortality(year = 2010,
                     showProgress = FALSE)

# dplyr::glimpse(df)

# return data as data.frame
df <- read_mortality(year = 2010,
                     as_data_frame = TRUE,
                     showProgress = FALSE)

# dplyr::glimpse(df)

Download microdata of population records from Brazil's census

Description

Download microdata of population records from Brazil's census. Data collected in the sample component of the questionnaire.

Usage

read_population(
  year = 2010,
  columns = NULL,
  add_labels = NULL,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)
read_population(
  year = 2010,
  columns = NULL,
  add_labels = NULL,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)

Arguments

`year`	Numeric. Year of reference in the format `yyyy`. Defaults to `2010`.
`columns`	String. A vector of column names to keep. The rest of the columns are not read. Defaults to `NULL` and read all columns.
`add_labels`	Character. Whether the function should add labels to the responses of categorical variables. When `add_labels = "pt"`, the function adds labels in Portuguese. Defaults to `NULL`.
`as_data_frame`	Logical. When `FALSE` (Default), the function returns an Arrow Dataset, which allows users to work with larger-than-memory data. If `TRUE`, the function returns `data.frame`.
`showProgress`	Logical. Defaults to `TRUE` display download progress bar. The progress bar only reflects only the downloading time, not the time to load the data to memory.
`cache`	Logical. Whether the function should read the data cached locally, which is much faster. Defaults to `TRUE`. The first time the user runs the function, `censobr` will download the file and store it locally so that the file only needs to be download once. If `FALSE`, the function will download the data again and overwrite the local file.

Value

An arrow Dataset or a "data.frame" object.

1960 Census

You can read more about the 1960 Census and find a thorough documentation of how this dataset was processed on this link https://github.com/antrologos/ConsistenciaCenso1960Br.

Examples


# return data as arrow Dataset
df <- read_population(year = 2010,
                      showProgress = FALSE)


# return data as arrow Dataset
df <- read_population(year = 2010,
                      showProgress = FALSE)

Download census tract-level data from Brazil's censuses

Description

Download census tract-level aggregate data from Brazil's censuses.

Usage

read_tracts(
  year = 2010,
  dataset = NULL,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)
read_tracts(
  year = 2010,
  dataset = NULL,
  as_data_frame = FALSE,
  showProgress = TRUE,
  cache = TRUE
)

Arguments

`year`	Numeric. Year of reference in the format `yyyy`. Defaults to `2010`.
`dataset`	Character. The dataset to be opened. Options currently include `c("Basico", "Domicilio", "DomicilioRenda", "Responsavel", "ResponsavelRenda", "Pessoa", "PessoaRenda", "Entorno")`. Preliminary results of the 2022 census are available with `"Preliminares"`.
`as_data_frame`	Logical. When `FALSE` (Default), the function returns an Arrow Dataset, which allows users to work with larger-than-memory data. If `TRUE`, the function returns `data.frame`.
`showProgress`	Logical. Defaults to `TRUE` display download progress bar. The progress bar only reflects only the downloading time, not the time to load the data to memory.
`cache`	Logical. Whether the function should read the data cached locally, which is much faster. Defaults to `TRUE`. The first time the user runs the function, `censobr` will download the file and store it locally so that the file only needs to be download once. If `FALSE`, the function will download the data again and overwrite the local file.

Value

An arrow Dataset or a "data.frame" object.

Examples


library(censobr)

# return data as arrow Dataset
df <- read_tracts(year = 2010,
                  dataset = 'PessoaRenda',
                  showProgress = FALSE)

# return data as data.frame
df <- read_tracts(year = 2010,
                  dataset = 'Basico',
                  as_data_frame = TRUE,
                  showProgress = FALSE)


library(censobr)

# return data as arrow Dataset
df <- read_tracts(year = 2010,
                  dataset = 'PessoaRenda',
                  showProgress = FALSE)

# return data as data.frame
df <- read_tracts(year = 2010,
                  dataset = 'Basico',
                  as_data_frame = TRUE,
                  showProgress = FALSE)

Set custom cache directory for censobr files

Description

Set custom directory for caching files from the censobr package. If users want to set a custom cache directory, the function needs to be run again in each new R session.

Usage

set_censobr_cache_dir(path = NULL)
set_censobr_cache_dir(path = NULL)

Arguments

path

String. The path to an existing directory. It defaults to path = NULL, to use the default directory

Value

A message indicating the directory where censobr files are cached.

Examples


# Set custom cache directory
tempd <- tempdir()
set_censobr_cache_dir(path = tempd)

# back to default path
set_censobr_cache_dir(path = NULL)

# Set custom cache directory
tempd <- tempdir()
set_censobr_cache_dir(path = tempd)

# back to default path
set_censobr_cache_dir(path = NULL)

Package 'censobr'

Help Index

Manage cached files from the censobr package

Description

Usage

Arguments

Value

See Also

Examples

Data dictionary of Brazil's census data

Description

Usage

Arguments

Value

See Also

Examples

Interview manual of the data collection of Brazil's censuses

Description

Usage

Arguments

Value

See Also

Examples

Questionnaires used in the data collection of Brazil's censuses

Description

Usage

Arguments

Value

Examples

Download microdata of emigration records from Brazil's census

Description

Usage

Arguments

Value

See Also

Examples

Download microdata of family records from Brazil's census

Description

Usage

Arguments

Value

See Also

Examples

Download microdata of household records from Brazil's census

Description

Usage

Arguments

Value

1960 Census

See Also

Examples

Download microdata of death records from Brazil's census

Description

Usage

Arguments

Value

See Also

Examples

Download microdata of population records from Brazil's census

Description

Usage

Arguments

Value

1960 Census

See Also

Examples

Download census tract-level data from Brazil's censuses

Description

Usage

Arguments

Value

Examples

Set custom cache directory for censobr files

Description

Usage

Arguments

Value

See Also

Examples