Functions

Multi-threading support

The following functions will try to use multiple threads if possible when there are at least 2 columns and 1 million rows:

  • CleanTable constructor when copycols=true
  • All compact functions
  • delete_const_columns, delete_const_columns! and delete_const_columns_ROT
  • reinfer_schema, reinfer_schema! andreinfer_schema_ROT
  • get_all_repeated
  • level_distribution

Index

Summarize information

Cleaner.sizeFunction
size(table::CleanTable)

Returns a tuple containing the number of rows and columns of the given CleanTable.

source
Cleaner.get_all_repeatedFunction
get_all_repeated(table, columns::Vector{Symbol})

Returns a CleanTable with row indexes containing only the selected columns and only the rows that were repeated.

source
Cleaner.category_distributionFunction
category_distribution(table, columns::Vector{Symbol}; round_digits=1, bottom_prct=0, top_prct=0)

Returns a CleanTable only taking into account the selected columns and containing unique rows and the percentage they represent out of the total rows. The percentage is rounded with up to round_digits. bottom_prct can be specified to have the least represented categories up to bottom_prct percentage become Bottom_other. top_prct can be specified to have the most represented categories up to top_prct percentage become Top_other.

source
Cleaner.compare_table_columnsFunction
compare_table_columns(tables...; dupe_sanitize=true)

Returns a CleanTable comparing all column names and column types from the tables passed. By default sanitizes duplicated column names when found in the same table but the keyword argument dupe_sanitize=false can be passed to opt-out on this behavior.

source

Working with column names

Cleaner.renameFunction
rename(table, names::Vector{Symbol})

Creates a CleanTable with copied columns and changes its column names to be names.

source
Cleaner.rename!Function
rename!(ct::CleanTable, names::Vector{Symbol})

Changes in-place the column names of a CleanTable to be names.

source
Cleaner.rename_ROTFunction
rename_ROT(table, names::Vector{Symbol})

Returns a new table of the original table type where its column names have been changed to be names.

source
Cleaner.generate_polished_namesFunction
generate_polished_names(names; style::Symbol=:snake_case)

Return a vector of symbols containing new names that are unique and formated using the style selected.

source
Cleaner.polish_namesFunction
polish_names(table; style=:snake_case)

Create and return a CleanTable with copied columns having column names replaced to be unique and formated using the style selected.

Styles

  • snake_case
  • camelCase
source
Cleaner.polish_names!Function
polish_names!(table::CleanTable; style::Symbol=:snake_case)

Return a CleanTable where column names have been replaced to be unique and formated using the style selected.

Styles

  • snake_case
  • camelCase
source
Cleaner.polish_names_ROTFunction
polish_names_ROT(table; style::Symbol=:snake_case)

Returns a new table of the original table type where column names have been replaced to be unique and formated using the style selected.

Styles

  • snake_case
  • camelCase
source
Cleaner.row_as_namesFunction
row_as_names(table, i::Int; remove::Bool=true)

Creates a CleanTable with copied columns and renames the table using row i as new names and removes in-place all the rows above row i if remove=true.

Default behavior is to remove rows above row i.

source
Cleaner.row_as_names!Function
row_as_names!(table::CleanTable, i::Int; remove::Bool=true)

Renames the table using row i as new names and removes in-place all the rows above row i if remove=true.

Default behavior is to remove rows above row i.

source
Cleaner.row_as_names_ROTFunction
row_as_names_ROT(table, i::Int; remove::Bool=true)

Returns a new table of the original table type that has been renamed using row i as new names and removes in-place all the rows above row i if remove=true.

source

Row/Column removal

Cleaner.compact_columnsFunction
compact_columns(table; empty_values::Vector=[])

Creates a CleanTable with copied columns and removes from it all columns filled entirely by missing and empty_values.

source
Cleaner.compact_columns!Function
compact_columns!(table::CleanTable; empty_values::Vector=[])

Removes in-place from a CleanTable all columns filled entirely by missing and empty_values.

source
Cleaner.compact_columns_ROTFunction
compact_columns_ROT(table; empty_values::Vector=[])

Returns a new table of the original table type where all columns filled entirely by missing and empty_values have been removed.

source
Cleaner.compact_rowsFunction
compact_rows(table; empty_values::Vector=[])

Creates a CleanTable with copied columns and removes from it all rows filled entirely by missing and empty_values.

source
Cleaner.compact_rows!Function
compact_rows!(table::CleanTable; empty_values::Vector=[])

Removes in-place from a CleanTable all rows filled entirely by missing and empty_values.

source
Cleaner.compact_rows_ROTFunction
compact_rows_ROT(table; empty_values::Vector=[])

Returns a new table of the original table type where all rows filled entirely by missing and empty_values have been removed.

source
Cleaner.compact_tableFunction
compact_table(table; empty_values::Vector=[])

Creates a CleanTable with copied columns and removes from it all rows and columns filled entirely by missing and empty_values.

source
Cleaner.compact_table!Function
compact_table!(table::CleanTable; empty_values::Vector=[])

Removes in-place from a CleanTable all rows and columns filled entirely by missing and empty_values.

source
Cleaner.compact_table_ROTFunction
compact_table_ROT(table; empty_values::Vector=[])

Returns a new table of the original table type where all rows and columns filled entirely by missing and empty_values have been removed.

source
Cleaner.delete_const_columnsFunction
delete_const_columns(table)

Creates a CleanTable with copied columns and removes each column filled with just a constant value.

source
Cleaner.delete_const_columns_ROTFunction
delete_const_columns_ROT(table)

Returns a new table of the original table type where all columns filled with just a constant value have been removed.

source
Cleaner.drop_missingFunction
drop_missing(table; missing_values::Vector=[])

Creates a CleanTable with copied columns and removes from it all rows where missing or missing_values have been found.

source
Cleaner.drop_missing!Function
drop_missing!(table::CleanTable; missing_values::Vector=[])

Removes in-place from a CleanTable all rows where missing or missing_values have been found.

source
Cleaner.drop_missing_ROTFunction
drop_missing_ROT(table; missing_values::Vector=[])

Returns a new table of the original table type where all rows where missing or missing_values have been found were removed.

source

Modifiying table schema

Cleaner.reinfer_schemaFunction
reinfer_schema(table; max_types::Int=3)

Creates a CleanTable with copied columns and tries to minimize the amount of element types for each column without making the column type Any.

For this, will try to make the column of type Union with up to maxtypes and internally use `Base.promotetypejoin` on all numeric types. If not possible, leaves the column as-is.

source
Cleaner.reinfer_schema!Function
reinfer_schema!(table::CleanTable; max_types::Int=3)

Tries to minimize the amount of element types for each column without making the column type Any.

For this, will try to make the column of type Union with up to max_types and internally use Base.promote_typejoin on all numeric types. If not possible, leaves the column as-is.

source
Cleaner.reinfer_schema_ROTFunction
reinfer_schema_ROT(table; max_types::Int=3)

Returns a new table of the original table type where it has been tried to minimize the amount of element types for each column without making the column type Any.

For this, will try to make the column of type Union with up to maxtypes and internally use `Base.promotetypejoin` on all numeric types. If not possible, leaves the column as-is.

source
Cleaner.add_indexFunction
add_index(table)

Creates a CleanTable with copied columns and adds to it a new column being the row index for the table passed.

source
Cleaner.add_index!Function
add_index!(table::CleanTable)

Adds in-place a column being the row index for the CleanTable table.

source
Cleaner.add_index_ROTFunction
add_index_ROT(table)

Returns a new table of the original table type where a new column being the row index for the table passed have been added.

source