Title: | Download Data from Brazil's Population Census |
---|---|
Description: | Easy access to data from Brazil's population censuses. The package provides a simple and efficient way to download and read the data sets and the documentation of all the population censuses taken in and after 1960 in the country. The package is built on top of the 'Arrow' platform <https://arrow.apache.org/docs/r/>, which allows users to work with larger-than-memory census data using 'dplyr' familiar functions. <https://arrow.apache.org/docs/r/articles/arrow.html#analyzing-arrow-data-with-dplyr>. |
Authors: | Rafael H. M. Pereira [aut, cre] , Rogério J. Barbosa [aut] , Diego Rabatone Oliveira [ctb], Neal Richardson [ctb], Ipea - Institute for Applied Economic Research [cph, fnd] |
Maintainer: | Rafael H. M. Pereira <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.0999 |
Built: | 2024-10-24 02:19:22 UTC |
Source: | https://github.com/ipeagit/censobr |
Manage cached files from the censobr package
censobr_cache(list_files = TRUE, delete_file = NULL)
censobr_cache(list_files = TRUE, delete_file = NULL)
list_files |
Logical. Whether to print a message with the address of all
censobr data sets cached locally. Defaults to |
delete_file |
String. The file name (basename) of a censobr data set
cached locally that should be deleted. Defaults to |
A message indicating which file exist and/or which ones have been deleted from local cache directory.
Other Cache data:
set_censobr_cache_dir()
# list all files cached censobr_cache(list_files = TRUE) # delete particular file censobr_cache(delete_file = '2010_deaths')
# list all files cached censobr_cache(list_files = TRUE) # delete particular file censobr_cache(delete_file = '2010_deaths')
Open on a browser the data dictionary of Brazil's census data.
data_dictionary(year = 2010, dataset = NULL, showProgress = TRUE, cache = TRUE)
data_dictionary(year = 2010, dataset = NULL, showProgress = TRUE, cache = TRUE)
year |
Numeric. Year of reference in the format |
dataset |
Character. The dataset of data dictionary to be opened. Options
include |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
Returns NULL
and opens .html or .pdf file on the browser
Other Census documentation:
interview_manual()
# Open data dictionary on browser data_dictionary(year = 2010, dataset = 'population', showProgress = FALSE) data_dictionary(year = 1980, dataset = 'households', showProgress = FALSE) data_dictionary(year = 2010, dataset = 'tracts', showProgress = FALSE)
# Open data dictionary on browser data_dictionary(year = 2010, dataset = 'population', showProgress = FALSE) data_dictionary(year = 1980, dataset = 'households', showProgress = FALSE) data_dictionary(year = 2010, dataset = 'tracts', showProgress = FALSE)
Open on a browser the interview manual of the data collection of Brazil's censuses
interview_manual(year = NULL, showProgress = TRUE, cache = TRUE)
interview_manual(year = NULL, showProgress = TRUE, cache = TRUE)
year |
Numeric. Year of reference in the format |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
Opens a .pdf
file on the browser
Other Census documentation:
data_dictionary()
# Open interview manual on browser interview_manual(year = 2010, showProgress = FALSE)
# Open interview manual on browser interview_manual(year = 2010, showProgress = FALSE)
Open on a browser the questionnaire used in the data collection of Brazil's censuses
questionnaire(year = 2010, type = NULL, showProgress = TRUE, cache = TRUE)
questionnaire(year = 2010, type = NULL, showProgress = TRUE, cache = TRUE)
year |
Numeric. Year of reference in the format |
type |
Character. The type of questionnaire used in the survey, whether
the |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
Opens a .pdf
file on the browser
library(censobr) # Open questionnaire on browser questionnaire(year = 2010, type = 'long', showProgress = FALSE)
library(censobr) # Open questionnaire on browser questionnaire(year = 2010, type = 'long', showProgress = FALSE)
Download microdata of emigration records from Brazil's census. Data collected in the sample component of the questionnaire.
read_emigration( year = 2010, columns = NULL, add_labels = NULL, merge_households = FALSE, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
read_emigration( year = 2010, columns = NULL, add_labels = NULL, merge_households = FALSE, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
merge_households |
Logical. Indicate whether the function should merge
household variables to the output data. Defaults to |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
An arrow Dataset
or a "data.frame"
object.
Other Microdata:
read_families()
,
read_households()
,
read_mortality()
,
read_population()
# return data as arrow Dataset df <- read_emigration(year = 2010, showProgress = FALSE) # return data as data.frame df <- read_emigration(year = 2010, as_data_frame = TRUE, showProgress = FALSE)
# return data as arrow Dataset df <- read_emigration(year = 2010, showProgress = FALSE) # return data as data.frame df <- read_emigration(year = 2010, as_data_frame = TRUE, showProgress = FALSE)
Download microdata of family records from Brazil's census. Data collected in the sample component of the questionnaire.
read_families( year = 2000, columns = NULL, add_labels = NULL, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
read_families( year = 2000, columns = NULL, add_labels = NULL, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
An arrow Dataset
or a "data.frame"
object.
Other Microdata:
read_emigration()
,
read_households()
,
read_mortality()
,
read_population()
# return data as arrow Dataset df <- read_families(year = 2000, showProgress = FALSE)
# return data as arrow Dataset df <- read_families(year = 2000, showProgress = FALSE)
Download microdata of household records from Brazil's census. Data collected in the sample component of the questionnaire.
read_households( year = 2010, columns = NULL, add_labels = NULL, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
read_households( year = 2010, columns = NULL, add_labels = NULL, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
An arrow Dataset
or a "data.frame"
object.
The 1960 microdata version available in censobr is a combination of two versions of the Demographic Census sample. The 25% sample data from the 1960 Census was never fully processed by IBGE - several states did not have their questionnaires digitized. Currently, this dataset only has data from 16 states of the Federation (and from a contested border region between Minas Gerais and Espirito Santo called Serra dos Aimores). Information is missing for the states of the former Northern Region, Maranhão, Piaui, Guanabara, Santa Catarina, and Espírito Santo. In 1965, IBGE decided to draw a probabilistic sub-sample of approximately 1.27% of the population, including all units of the federation. With this data, IBGE produced several official reports at the time. The data from censobr is the combination of these two datasets.
We pre-processed the 1.27% sample data to ensured data consistency, given the original data was partially corrupted. We also created a sample weight variable to correct for unbalanced data and to expand te sample to the total population. For the data from the 25% sample, the weights expand to the municipal totals. Meanwhile, for the data from the 1.27% sample, the weights expand to the state totals. Additionally, we constructed a few variables that allow for the approximate incorporation of the complex sample design, enabling the proper calculation of standard errors and confidence intervals.
You can read more about the 1960 Census and find a thorough documentation of how this dataset was processed on this link https://github.com/antrologos/ConsistenciaCenso1960Br.
Other Microdata:
read_emigration()
,
read_families()
,
read_mortality()
,
read_population()
# return data as arrow Dataset df <- read_households(year = 2010, showProgress = FALSE)
# return data as arrow Dataset df <- read_households(year = 2010, showProgress = FALSE)
Download microdata of death records from Brazil's census. Data collected in the sample component of the questionnaire.
read_mortality( year = 2010, columns = NULL, add_labels = NULL, merge_households = FALSE, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
read_mortality( year = 2010, columns = NULL, add_labels = NULL, merge_households = FALSE, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
merge_households |
Logical. Indicate whether the function should merge
household variables to the output data. Defaults to |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
An arrow Dataset
or a "data.frame"
object.
Other Microdata:
read_emigration()
,
read_families()
,
read_households()
,
read_population()
library(censobr) # return data as arrow Dataset df <- read_mortality(year = 2010, showProgress = FALSE) # dplyr::glimpse(df) # return data as data.frame df <- read_mortality(year = 2010, as_data_frame = TRUE, showProgress = FALSE) # dplyr::glimpse(df)
library(censobr) # return data as arrow Dataset df <- read_mortality(year = 2010, showProgress = FALSE) # dplyr::glimpse(df) # return data as data.frame df <- read_mortality(year = 2010, as_data_frame = TRUE, showProgress = FALSE) # dplyr::glimpse(df)
Download microdata of population records from Brazil's census. Data collected in the sample component of the questionnaire.
read_population( year = 2010, columns = NULL, add_labels = NULL, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
read_population( year = 2010, columns = NULL, add_labels = NULL, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
An arrow Dataset
or a "data.frame"
object.
The 1960 microdata version available in censobr is a combination of two versions of the Demographic Census sample. The 25% sample data from the 1960 Census was never fully processed by IBGE - several states did not have their questionnaires digitized. Currently, this dataset only has data from 16 states of the Federation (and from a contested border region between Minas Gerais and Espirito Santo called Serra dos Aimores). Information is missing for the states of the former Northern Region, Maranhão, Piaui, Guanabara, Santa Catarina, and Espírito Santo. In 1965, IBGE decided to draw a probabilistic sub-sample of approximately 1.27% of the population, including all units of the federation. With this data, IBGE produced several official reports at the time. The data from censobr is the combination of these two datasets.
We pre-processed the 1.27% sample data to ensured data consistency, given the original data was partially corrupted. We also created a sample weight variable to correct for unbalanced data and to expand te sample to the total population. For the data from the 25% sample, the weights expand to the municipal totals. Meanwhile, for the data from the 1.27% sample, the weights expand to the state totals. Additionally, we constructed a few variables that allow for the approximate incorporation of the complex sample design, enabling the proper calculation of standard errors and confidence intervals.
You can read more about the 1960 Census and find a thorough documentation of how this dataset was processed on this link https://github.com/antrologos/ConsistenciaCenso1960Br.
Other Microdata:
read_emigration()
,
read_families()
,
read_households()
,
read_mortality()
# return data as arrow Dataset df <- read_population(year = 2010, showProgress = FALSE)
# return data as arrow Dataset df <- read_population(year = 2010, showProgress = FALSE)
Download census tract-level aggregate data from Brazil's censuses.
read_tracts( year = 2010, dataset = NULL, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
read_tracts( year = 2010, dataset = NULL, as_data_frame = FALSE, showProgress = TRUE, cache = TRUE )
year |
Numeric. Year of reference in the format |
dataset |
Character. The dataset to be opened. Options currently include
|
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
An arrow Dataset
or a "data.frame"
object.
library(censobr) # return data as arrow Dataset df <- read_tracts(year = 2010, dataset = 'PessoaRenda', showProgress = FALSE) # return data as data.frame df <- read_tracts(year = 2010, dataset = 'Basico', as_data_frame = TRUE, showProgress = FALSE)
library(censobr) # return data as arrow Dataset df <- read_tracts(year = 2010, dataset = 'PessoaRenda', showProgress = FALSE) # return data as data.frame df <- read_tracts(year = 2010, dataset = 'Basico', as_data_frame = TRUE, showProgress = FALSE)
Set custom directory for caching files from the censobr package. If users want to set a custom cache directory, the function needs to be run again in each new R session.
set_censobr_cache_dir(path = NULL)
set_censobr_cache_dir(path = NULL)
path |
String. The path to an existing directory. It defaults to |
A message indicating the directory where censobr files are cached.
Other Cache data:
censobr_cache()
# Set custom cache directory tempd <- tempdir() set_censobr_cache_dir(path = tempd) # back to default path set_censobr_cache_dir(path = NULL)
# Set custom cache directory tempd <- tempdir() set_censobr_cache_dir(path = tempd) # back to default path set_censobr_cache_dir(path = NULL)