Categorical Representation of Data in Julia
Last Updated :
15 Sep, 2021
Julia is a high performance, dynamic programming language that is easy to learn as it is a high-level language. But sometimes, when dealing with data in programming languages like Julia, we encounter structures or representations with a small number of levels as represented below.
Julia
a = [ "Geeks" , "For" , "Geeks" , "Useful" , "For" , "Everybody" ]
|
As you can see, the elements of the array are simply categorized as full strings.
Categorical Data
By changing the array type to CategoricalArray type we can represent the elements better to make things easier in the future for some tasks. The CategoricalArray type represents the strings as indices in a number of levels.
Julia
cat = CategoricalArray(a)
|
In the example mentioned above, 232 levels are represented (UInt32).
CategoricalArray type can also classify a missing value as shown below:
Julia
cat = CategoricalArray([ "Geeks" , "For" , "Geeks" ,
missing, missing, "Everybody" ])
|
Levels of the Array
CategoricalArray type allows us to know the levels which are valid as there are repeated data, by using the levels() function where the argument to be passed is the array.
We can change the placement or order of the levels by using the levels!() function, as it might be useful later on.
Julia
levels!(cat, [ "Geeks" , "For" , "Everybody" ]);
levels(cat)
|
And we can sort the array according to the changed order of the levels.
Compression of levels
The CategoricalArray type can have 232 levels as shown in the description of the array in the outputs. If these many levels are not required we decrease them by using the compress() function. The following example shows the decrease of the levels to 28 levels.
Categorical function
We can directly use the categorical function instead of using CategoryArrays which allows us to apply a keyword argument like the compress keyword which when set to ‘true’, implicates implementation of that keyword on the elements.
Julia
cat2 = categorical([ "Geeks" , "For" , "Geeks" ], compress = true)
|
In the same way, we have implemented the compress keyword, the ordered keyword can be implemented by equating it to ‘true’, which gives an order to the levels of the array.
Julia
cat3 = categorical([ "Geeks" , "For" , "Geeks" ], ordered = true)
|
Order of the levels
We can check the levels of arrays for order and when it is not an ordered array, it produces an error as shown below.
When the array is ordered, it results in either true or false based on the order of the levels.
We can check whether if an array is ordered with the isordered() function.
We can change an unordered array to ordered and vice-versa by using the ordered!() function.
Now that we have ordered the array, we can test it.
Categorical data in a DataFrame
We can implement the categorical function on one or more columns of a Dataframe by using the categorical!() function in which the first argument is the DataFrame and the second argument can be columns of the DataFrame we want to apply on and some keyword function.
Julia
using DataFrames
df = DataFrame(A = [ "A" , "A" , "A" , "B" , "B" , "C" ],
B = [ "D" , "E" , "E" , "F" , "G" , "G" ])
|
We can change the type of a specific column of the DataFrame to categorical type.
If we don’t specify the column, the columns with an AbstractString type will change to categorical. By equating compress keyword function to true we can apply the function on all of the columns.
Julia
categorical!(df, compress = true)
|
We can check the types of the columns of the DataFrame with eltype() function.
Share your thoughts in the comments
Please Login to comment...