Getting the dirt out

No value, not kept

Adhering to this philosophy, usually we don't want to keep rows or columns filled just with empty values in our table. Empty values can quickly become a big problem to handle when they come in different standards such as Julia's missing, Python's None, R's NA and a diversity of common strings like "", ' ', etc.

As an easy way to handle this common problems we got the compact functions, being them compact_table, compact_columns and compact_rows with their mutating in-place and ROT variants i.e. compact_table!, compact_table_ROT et al.

They all recieve a table as first argument and an optional keyword argument empty_values where you can pass a vector of what you consider being empty values present in your table. By default Julia's missing is always considered an empty value.

julia> using Cleaner

julia> ct = CleanTable([:A, :B, :C], [[missing, missing, missing], [1, missing, 3], ["x", "", "z"]])
┌─────────┬─────────┬────────┐
│       A │       B │      C │
│ Missing │  Int64? │ String │
├─────────┼─────────┼────────┤
│ missing │       1 │      x │
│ missing │ missing │        │
│ missing │       3 │      z │
└─────────┴─────────┴────────┘


julia> compact_columns(ct)
┌─────────┬────────┐
│       B │      C │
│  Int64? │ String │
├─────────┼────────┤
│       1 │      x │
│ missing │        │
│       3 │      z │
└─────────┴────────┘


julia> compact_rows(ct; empty_values=[""])
┌─────────┬────────┬────────┐
│       A │      B │      C │
│ Missing │ Int64? │ String │
├─────────┼────────┼────────┤
│ missing │      1 │      x │
│ missing │      3 │      z │
└─────────┴────────┴────────┘


julia> compact_table(ct; empty_values=[""])
┌────────┬────────┐
│      B │      C │
│ Int64? │ String │
├────────┼────────┤
│      1 │      x │
│      3 │      z │
└────────┴────────┘

You might also feel that columns filled with just a constant value are not adding any value to your table and may prefer to remove them, for those cases we got the delete_const_columns, delete_const_columns! and delete_const_columns_ROT functions.

julia> ct = CleanTable([:A, :B, :C], [[4, 5, 6], [1, 1, 1], String["7", "8", "9"]])
┌───────┬───────┬────────┐
│     A │     B │      C │
│ Int64 │ Int64 │ String │
├───────┼───────┼────────┤
│     4 │     1 │      7 │
│     5 │     1 │      8 │
│     6 │     1 │      9 │
└───────┴───────┴────────┘


julia> delete_const_columns(ct)
┌───────┬────────┐
│     A │      C │
│ Int64 │ String │
├───────┼────────┤
│     4 │      7 │
│     5 │      8 │
│     6 │      9 │
└───────┴────────┘

One missing, remove em all

A more radical aproach can be taken when desired by using drop_missing, drop_missing! or drop_missing_ROT to remove all rows where at least one missing or missing_values has been found.

julia> ct = CleanTable([:A, :B], [[1, missing, 3], ["x", "y", "z"]])
┌─────────┬────────┐
│       A │      B │
│  Int64? │ String │
├─────────┼────────┤
│       1 │      x │
│ missing │      y │
│       3 │      z │
└─────────┴────────┘


julia> drop_missing(ct)
┌────────┬────────┐
│      A │      B │
│ Int64? │ String │
├────────┼────────┤
│      1 │      x │
│      3 │      z │
└────────┴────────┘