Functions
Multi-threading support
The following functions will try to use multiple threads if possible when there are at least 2 columns and 1 million rows:
CleanTable
constructor whencopycols=true
- All
compact
functions delete_const_columns
,delete_const_columns!
anddelete_const_columns_ROT
reinfer_schema
,reinfer_schema!
andreinfer_schema_ROT
get_all_repeated
level_distribution
Index
Cleaner.add_index
Cleaner.add_index!
Cleaner.add_index_ROT
Cleaner.category_distribution
Cleaner.compact_columns
Cleaner.compact_columns!
Cleaner.compact_columns_ROT
Cleaner.compact_rows
Cleaner.compact_rows!
Cleaner.compact_rows_ROT
Cleaner.compact_table
Cleaner.compact_table!
Cleaner.compact_table_ROT
Cleaner.compare_table_columns
Cleaner.delete_const_columns
Cleaner.delete_const_columns!
Cleaner.delete_const_columns_ROT
Cleaner.drop_missing
Cleaner.drop_missing!
Cleaner.drop_missing_ROT
Cleaner.generate_polished_names
Cleaner.get_all_repeated
Cleaner.polish_names
Cleaner.polish_names!
Cleaner.polish_names_ROT
Cleaner.reinfer_schema
Cleaner.reinfer_schema!
Cleaner.reinfer_schema_ROT
Cleaner.rename
Cleaner.rename!
Cleaner.rename_ROT
Cleaner.row_as_names
Cleaner.row_as_names!
Cleaner.row_as_names_ROT
Cleaner.size
Summarize information
Cleaner.size
— Functionsize(table::CleanTable)
Returns a tuple containing the number of rows and columns of the given CleanTable
.
Cleaner.get_all_repeated
— Functionget_all_repeated(table, columns::Vector{Symbol})
Returns a CleanTable
with row indexes containing only the selected columns and only the rows that were repeated.
Cleaner.category_distribution
— Functioncategory_distribution(table, columns::Vector{Symbol}; round_digits=1, bottom_prct=0, top_prct=0)
Returns a CleanTable
only taking into account the selected columns and containing unique rows and the percentage they represent out of the total rows. The percentage is rounded with up to round_digits
. bottom_prct
can be specified to have the least represented categories up to bottom_prct
percentage become Bottom_other
. top_prct
can be specified to have the most represented categories up to top_prct
percentage become Top_other
.
Cleaner.compare_table_columns
— Functioncompare_table_columns(tables...; dupe_sanitize=true)
Returns a CleanTable
comparing all column names and column types from the tables passed. By default sanitizes duplicated column names when found in the same table but the keyword argument dupe_sanitize=false can be passed to opt-out on this behavior.
Working with column names
Cleaner.rename
— Functionrename(table, names::Vector{Symbol})
Creates a CleanTable
with copied columns and changes its column names to be names
.
Cleaner.rename!
— Functionrename!(ct::CleanTable, names::Vector{Symbol})
Changes in-place the column names of a CleanTable
to be names
.
Cleaner.rename_ROT
— Functionrename_ROT(table, names::Vector{Symbol})
Returns a new table of the original table
type where its column names have been changed to be names
.
Cleaner.generate_polished_names
— Functiongenerate_polished_names(names; style::Symbol=:snake_case)
Return a vector of symbols containing new names that are unique and formated using the style
selected.
Cleaner.polish_names
— Functionpolish_names(table; style=:snake_case)
Create and return a CleanTable
with copied columns having column names replaced to be unique and formated using the style
selected.
Styles
- snake_case
- camelCase
Cleaner.polish_names!
— Functionpolish_names!(table::CleanTable; style::Symbol=:snake_case)
Return a CleanTable
where column names have been replaced to be unique and formated using the style
selected.
Styles
- snake_case
- camelCase
Cleaner.polish_names_ROT
— Functionpolish_names_ROT(table; style::Symbol=:snake_case)
Returns a new table of the original table
type where column names have been replaced to be unique and formated using the style
selected.
Styles
- snake_case
- camelCase
Cleaner.row_as_names
— Functionrow_as_names(table, i::Int; remove::Bool=true)
Creates a CleanTable
with copied columns and renames the table using row i
as new names and removes in-place all the rows above row i
if remove=true
.
Default behavior is to remove rows above row i
.
Cleaner.row_as_names!
— Functionrow_as_names!(table::CleanTable, i::Int; remove::Bool=true)
Renames the table
using row i
as new names and removes in-place all the rows above row i
if remove=true
.
Default behavior is to remove rows above row i
.
Cleaner.row_as_names_ROT
— Functionrow_as_names_ROT(table, i::Int; remove::Bool=true)
Returns a new table of the original table
type that has been renamed using row i
as new names and removes in-place all the rows above row i
if remove=true
.
Row/Column removal
Cleaner.compact_columns
— Functioncompact_columns(table; empty_values::Vector=[])
Creates a CleanTable
with copied columns and removes from it all columns filled entirely by missing
and empty_values
.
Cleaner.compact_columns!
— Functioncompact_columns!(table::CleanTable; empty_values::Vector=[])
Removes in-place from a CleanTable
all columns filled entirely by missing
and empty_values
.
Cleaner.compact_columns_ROT
— Functioncompact_columns_ROT(table; empty_values::Vector=[])
Returns a new table of the original table
type where all columns filled entirely by missing
and empty_values
have been removed.
Cleaner.compact_rows
— Functioncompact_rows(table; empty_values::Vector=[])
Creates a CleanTable
with copied columns and removes from it all rows filled entirely by missing
and empty_values
.
Cleaner.compact_rows!
— Functioncompact_rows!(table::CleanTable; empty_values::Vector=[])
Removes in-place from a CleanTable
all rows filled entirely by missing
and empty_values
.
Cleaner.compact_rows_ROT
— Functioncompact_rows_ROT(table; empty_values::Vector=[])
Returns a new table of the original table
type where all rows filled entirely by missing
and empty_values
have been removed.
Cleaner.compact_table
— Functioncompact_table(table; empty_values::Vector=[])
Creates a CleanTable
with copied columns and removes from it all rows and columns filled entirely by missing
and empty_values
.
Cleaner.compact_table!
— Functioncompact_table!(table::CleanTable; empty_values::Vector=[])
Removes in-place from a CleanTable
all rows and columns filled entirely by missing
and empty_values
.
Cleaner.compact_table_ROT
— Functioncompact_table_ROT(table; empty_values::Vector=[])
Returns a new table of the original table
type where all rows and columns filled entirely by missing
and empty_values
have been removed.
Cleaner.delete_const_columns
— Functiondelete_const_columns(table)
Creates a CleanTable
with copied columns and removes each column filled with just a constant value.
Cleaner.delete_const_columns!
— Functiondelete_const_columns!(table::CleanTable)
Removes in-place from a CleanTable
each column filled with just a constant value.
Cleaner.delete_const_columns_ROT
— Functiondelete_const_columns_ROT(table)
Returns a new table of the original table
type where all columns filled with just a constant value have been removed.
Cleaner.drop_missing
— Functiondrop_missing(table; missing_values::Vector=[])
Creates a CleanTable
with copied columns and removes from it all rows where missing
or missing_values
have been found.
Cleaner.drop_missing!
— Functiondrop_missing!(table::CleanTable; missing_values::Vector=[])
Removes in-place from a CleanTable
all rows where missing
or missing_values
have been found.
Cleaner.drop_missing_ROT
— Functiondrop_missing_ROT(table; missing_values::Vector=[])
Returns a new table of the original table
type where all rows where missing
or missing_values
have been found were removed.
Modifiying table schema
Cleaner.reinfer_schema
— Functionreinfer_schema(table; max_types::Int=3)
Creates a CleanTable
with copied columns and tries to minimize the amount of element types for each column without making the column type Any
.
For this, will try to make the column of type Union
with up to maxtypes and internally use `Base.promotetypejoin` on all numeric types. If not possible, leaves the column as-is.
Cleaner.reinfer_schema!
— Functionreinfer_schema!(table::CleanTable; max_types::Int=3)
Tries to minimize the amount of element types for each column without making the column type Any
.
For this, will try to make the column of type Union
with up to max_types
and internally use Base.promote_typejoin
on all numeric types. If not possible, leaves the column as-is.
Cleaner.reinfer_schema_ROT
— Functionreinfer_schema_ROT(table; max_types::Int=3)
Returns a new table of the original table
type where it has been tried to minimize the amount of element types for each column without making the column type Any
.
For this, will try to make the column of type Union
with up to maxtypes and internally use `Base.promotetypejoin` on all numeric types. If not possible, leaves the column as-is.
Cleaner.add_index
— Functionadd_index(table)
Creates a CleanTable
with copied columns and adds to it a new column being the row index for the table passed.
Cleaner.add_index!
— Functionadd_index!(table::CleanTable)
Adds in-place a column being the row index for the CleanTable
table.
Cleaner.add_index_ROT
— Functionadd_index_ROT(table)
Returns a new table of the original table
type where a new column being the row index for the table passed have been added.