Functions
Multi-threading support
The following functions will try to use multiple threads if possible when there are at least 2 columns and 1 million rows:
- CleanTableconstructor when- copycols=true
- All compactfunctions
- delete_const_columns,- delete_const_columns!and- delete_const_columns_ROT
- reinfer_schema,- reinfer_schema!and- reinfer_schema_ROT
- get_all_repeated
- level_distribution
Index
- Cleaner.add_index
- Cleaner.add_index!
- Cleaner.add_index_ROT
- Cleaner.category_distribution
- Cleaner.compact_columns
- Cleaner.compact_columns!
- Cleaner.compact_columns_ROT
- Cleaner.compact_rows
- Cleaner.compact_rows!
- Cleaner.compact_rows_ROT
- Cleaner.compact_table
- Cleaner.compact_table!
- Cleaner.compact_table_ROT
- Cleaner.compare_table_columns
- Cleaner.delete_const_columns
- Cleaner.delete_const_columns!
- Cleaner.delete_const_columns_ROT
- Cleaner.drop_missing
- Cleaner.drop_missing!
- Cleaner.drop_missing_ROT
- Cleaner.generate_polished_names
- Cleaner.get_all_repeated
- Cleaner.polish_names
- Cleaner.polish_names!
- Cleaner.polish_names_ROT
- Cleaner.reinfer_schema
- Cleaner.reinfer_schema!
- Cleaner.reinfer_schema_ROT
- Cleaner.rename
- Cleaner.rename!
- Cleaner.rename_ROT
- Cleaner.row_as_names
- Cleaner.row_as_names!
- Cleaner.row_as_names_ROT
- Cleaner.size
Summarize information
Cleaner.size — Functionsize(table::CleanTable)Returns a tuple containing the number of rows and columns of the given CleanTable.
Cleaner.get_all_repeated — Functionget_all_repeated(table, columns::Vector{Symbol})Returns a CleanTable with row indexes containing only the selected columns and only the rows that were repeated.
Cleaner.category_distribution — Functioncategory_distribution(table, columns::Vector{Symbol}; round_digits=1, bottom_prct=0, top_prct=0)Returns a CleanTable only taking into account the selected columns and containing unique rows and the percentage they represent out of the total rows. The percentage is rounded with up to round_digits. bottom_prct can be specified to have the least represented categories up to bottom_prct percentage become Bottom_other. top_prct can be specified to have the most represented categories up to top_prct percentage become Top_other.
Cleaner.compare_table_columns — Functioncompare_table_columns(tables...; dupe_sanitize=true)Returns a CleanTable comparing all column names and column types from the tables passed. By default sanitizes duplicated column names when found in the same table but the keyword argument dupe_sanitize=false can be passed to opt-out on this behavior.
Working with column names
Cleaner.rename — Functionrename(table, names::Vector{Symbol})Creates a CleanTable with copied columns and changes its column names to be names.
Cleaner.rename! — Functionrename!(ct::CleanTable, names::Vector{Symbol})Changes in-place the column names of a CleanTable to be names.
Cleaner.rename_ROT — Functionrename_ROT(table, names::Vector{Symbol})Returns a new table of the original table type where its column names have been changed to be names.
Cleaner.generate_polished_names — Functiongenerate_polished_names(names; style::Symbol=:snake_case)Return a vector of symbols containing new names that are unique and formated using the style selected.
Cleaner.polish_names — Functionpolish_names(table; style=:snake_case)Create and return a CleanTable with copied columns having column names replaced to be unique and formated using the style selected.
Styles
- snake_case
- camelCase
Cleaner.polish_names! — Functionpolish_names!(table::CleanTable; style::Symbol=:snake_case)Return a CleanTable where column names have been replaced to be unique and formated using the style selected.
Styles
- snake_case
- camelCase
Cleaner.polish_names_ROT — Functionpolish_names_ROT(table; style::Symbol=:snake_case)Returns a new table of the original table type where column names have been replaced to be unique and formated using the style selected.
Styles
- snake_case
- camelCase
Cleaner.row_as_names — Functionrow_as_names(table, i::Int; remove::Bool=true)Creates a CleanTable with copied columns and renames the table using row i as new names and removes in-place all the rows above row i if remove=true.
Default behavior is to remove rows above row i.
Cleaner.row_as_names! — Functionrow_as_names!(table::CleanTable, i::Int; remove::Bool=true)Renames the table using row i as new names and removes in-place all the rows above row i if remove=true.
Default behavior is to remove rows above row i.
Cleaner.row_as_names_ROT — Functionrow_as_names_ROT(table, i::Int; remove::Bool=true)Returns a new table of the original table type that has been renamed using row i as new names and removes in-place all the rows above row i if remove=true.
Row/Column removal
Cleaner.compact_columns — Functioncompact_columns(table; empty_values::Vector=[])Creates a CleanTable with copied columns and removes from it all columns filled entirely by missing and empty_values.
Cleaner.compact_columns! — Functioncompact_columns!(table::CleanTable; empty_values::Vector=[])Removes in-place from a CleanTable all columns filled entirely by missing and empty_values.
Cleaner.compact_columns_ROT — Functioncompact_columns_ROT(table; empty_values::Vector=[])Returns a new table of the original table type where all columns filled entirely by missing and empty_values have been removed.
Cleaner.compact_rows — Functioncompact_rows(table; empty_values::Vector=[])Creates a CleanTable with copied columns and removes from it all rows filled entirely by missing and empty_values.
Cleaner.compact_rows! — Functioncompact_rows!(table::CleanTable; empty_values::Vector=[])Removes in-place from a CleanTable all rows filled entirely by missing and empty_values.
Cleaner.compact_rows_ROT — Functioncompact_rows_ROT(table; empty_values::Vector=[])Returns a new table of the original table type where all rows filled entirely by missing and empty_values have been removed.
Cleaner.compact_table — Functioncompact_table(table; empty_values::Vector=[])Creates a CleanTable with copied columns and removes from it all rows and columns filled entirely by missing and empty_values.
Cleaner.compact_table! — Functioncompact_table!(table::CleanTable; empty_values::Vector=[])Removes in-place from a CleanTable all rows and columns filled entirely by missing and empty_values.
Cleaner.compact_table_ROT — Functioncompact_table_ROT(table; empty_values::Vector=[])Returns a new table of the original table type where all rows and columns filled entirely by missing and empty_values have been removed.
Cleaner.delete_const_columns — Functiondelete_const_columns(table)Creates a CleanTable with copied columns and removes each column filled with just a constant value.
Cleaner.delete_const_columns! — Functiondelete_const_columns!(table::CleanTable)Removes in-place from a CleanTable each column filled with just a constant value.
Cleaner.delete_const_columns_ROT — Functiondelete_const_columns_ROT(table)Returns a new table of the original table type where all columns filled with just a constant value have been removed.
Cleaner.drop_missing — Functiondrop_missing(table; missing_values::Vector=[])Creates a CleanTable with copied columns and removes from it all rows where missing or missing_values have been found.
Cleaner.drop_missing! — Functiondrop_missing!(table::CleanTable; missing_values::Vector=[])Removes in-place from a CleanTable all rows where missing or missing_values have been found.
Cleaner.drop_missing_ROT — Functiondrop_missing_ROT(table; missing_values::Vector=[])Returns a new table of the original table type where all rows where missing or missing_values have been found were removed.
Modifiying table schema
Cleaner.reinfer_schema — Functionreinfer_schema(table; max_types::Int=3)Creates a CleanTable with copied columns and tries to minimize the amount of element types for each column without making the column type Any.
For this, will try to make the column of type Union with up to maxtypes and internally use `Base.promotetypejoin` on all numeric types. If not possible, leaves the column as-is.
Cleaner.reinfer_schema! — Functionreinfer_schema!(table::CleanTable; max_types::Int=3)Tries to minimize the amount of element types for each column without making the column type Any.
For this, will try to make the column of type Union with up to max_types and internally use Base.promote_typejoin on all numeric types. If not possible, leaves the column as-is.
Cleaner.reinfer_schema_ROT — Functionreinfer_schema_ROT(table; max_types::Int=3)Returns a new table of the original table type where it has been tried to minimize the amount of element types for each column without making the column type Any.
For this, will try to make the column of type Union with up to maxtypes and internally use `Base.promotetypejoin` on all numeric types. If not possible, leaves the column as-is.
Cleaner.add_index — Functionadd_index(table)Creates a CleanTable with copied columns and adds to it a new column being the row index for the table passed.
Cleaner.add_index! — Functionadd_index!(table::CleanTable)Adds in-place a column being the row index for the CleanTable table.
Cleaner.add_index_ROT — Functionadd_index_ROT(table)Returns a new table of the original table type where a new column being the row index for the table passed have been added.