Open In App

Descriptive Statistics in Julia

Last Updated : 12 Oct, 2020
Improve
Improve
Like Article
Like
Save
Share
Report

Julia is an appropriate programming language to perform data analysis. It has various built-in statistical functions and packages to support descriptive statistics. Descriptive Statistics helps in understanding the characteristics of the given data and to obtain a quick summary of it.

Packages required for performing Descriptive Statistics in Julia:

  • Distributions.jl: It provides a large collection of probabilistic distributions and related functions such as sampling, moments, entropy, probability density, logarithm, maximum likelihood estimation, distribution composition, etc.
  • StatsBase.jl: It provides basic support for statistics. It consists of various statistics-related functions, such as scalar statistics, high-order moment computation, counting, ranking, covariances, sampling, and empirical density estimation.
  • CSV.jl: It is used reading and writing Comma Separated Values(CSV) files.
  • Dataframes.jl: It is used for the creation of different data structures.
  • StatsPlots.jl: It is used to represent various statistical plots.

Steps to perform Descriptive Statistics in Julia:

Step 1: Installing Required Packages

The following command can be used to install the required packages:

Using Pkg
Pkg.add(“Distributions”)
Pkg.add(“StatsBase”)
Pkg.add(“CSV”)
Pkg.add(“Dataframes”)
Pkg.add(“StatsPlots”)

Step 2: Importing the Required Packages

Julia




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions 
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV 
  
# For creation of Data Structures 
using DataFrames  
  
# For representing various plots
using StatsPlots


Step 3: Creating stimulated Data (Random Variables)

Let’s create various variables with random data values

Example:

Julia




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions 
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV 
  
# For creation of Data Structures 
using DataFrames  
  
# For representing various plots
using StatsPlots 
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);


Step 4: Performing Descriptive statistics

The common statistical functions in Julia include mean(), median(), var(), and std() for calculating mean, median, variance and standard deviation of the data respectively. The more convenient functions aredescribe(), summarystats() from StatsBase package to perform descriptive statistics.

Example:

Julia




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100); 
  
# mean of Age variable
mean(Age)
  
# median of Age variable
median(Age)
  
# Variance of Age variable
var(Age)
  
# Standard deviation of Age variable
std(Age)
  
# Descriptive statistics of Age variable
describe(Age)
  
# summarystats function excludes type
summarystats(Age)


Output:

Step 5: Creating data frames from the stimulated data

Stimulated data should be stored in data frame objects for performing manipulation operations easily.

Example:

Julia




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# number of rows and columns
size(DF)
  
# First 5 rows
head(DF, 5)
  
# Last 5 rows
tail(DF, 5)
  
# Selecting specific data only
# Data in which BGRP=AB is printed
DFAB = DF[DF[:BGRP] .=="AB", :] 
  
# Data in which AGE>50 is printed
DF50 = DF[DF[:AGE] .>90, :]


Output:

Step 6: Descriptive Statistics using DataFrame Objects

  • describe() function can be used to perform descriptive statistics of the data objects.

Example:

Julia




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Perform descriptive statistics of data frame
describe(DF)


Output:

  • by() function is used to calculate the number of elements in the sample space of a categorical variable.

Example:

Julia




# Descriptive Statistics in Julia
# Importing required packages 
#to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Counting the number of rows 
# with blood groups A,B,O,AB
by(DF, :BGRP, DF-> DataFrame(Total = size(DF, 1)))
  
# Counting the number of rows
# with blood groups A, B, O, AB 
# using size argument
by(DF, :BGRP, size)


Output:

  • The descriptive statistics of different numerical variables can be calculated after separating them by categorical variables.

Example:

Julia




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Mean AGE of Blood groups A, B, AB, O
by(DF, :BGRP, DF->mean(DF.AGE))
  
# Using the describe function 
# we can get the complete descriptive statistics
by(DF, :BGRP, DF->describe(DF.AGE))


Output:

Step 7: Visualizing Data using Plots

DataFrames package works well with the Plots package using the macro functions. In the following code:

  • Let’s analyze the Age distribution of the Blood groups A, B, AB, O:

Example:

Julia




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Plotting density plot
@df DF density(
   :AGE,
   group = :BGRP,
   xlab = "Age",
   ylab = "Distribution"    
)


Output:

  • Let’s create a box-and-Whisker plot of Age :

Example:

Julia




# Descriptive Statistics in Julia
# Importing required packages to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Plotting Box plot
@df DF boxplot(
  :AGE,
  xlab = ”Age”,
  ylab = ”Distribution”    
)


Output:



Similar Reads

Julia end Keyword | Marking end of blocks in Julia
Keywords in Julia are reserved words that have a pre-defined meaning to the compiler. These keywords can't be used as a variable name. 'end' keyword in Julia is used to mark the end of a block of statements. This block can be of any type like struct, loop, conditional statement, module, etc. Syntax: block_type block_name Statement Statement end Exa
1 min read
Julia function keyword | Create user-defined functions in Julia
Keywords are the reserved words in Julia which have a predefined meaning to the compiler. These keywords are used to reduce the number of lines in code. Keywords in Julia can't be used as variable names. 'function' keyword is used to create user-defined functions in Julia. These functions are reusable codes that can be called from anywhere in the c
1 min read
Julia continue Keyword | Continue iterating to next value of a loop in Julia
Keywords in Julia are predefined words that have a definite meaning to the compiler. These keywords can’t be used to name variables. 'continue' keyword in Julia skips the statement immediately after the continue statement. Whenever the continue keyword is executed, the compiler immediately stops iterating over further values and sends the execution
1 min read
Julia break Keyword | Exiting from a loop in Julia
Keywords in Julia are predefined words that have a definite meaning to the compiler. These keywords can't be used to name variables. 'break' keyword in Julia is used to exit from a loop immediately. Whenever the break keyword is executed, the compiler immediately stops iterating over further values and sends the execution pointer out of the loop. S
1 min read
Julia local Keyword | Creating a local variable in Julia
Keywords in Julia are reserved words whose value is pre-defined to the compiler and can not be changed by the user. These words have a specific meaning and perform their specific operation on execution.'local' keyword in Julia is used to create a variable of a limited scope whose value is local to the scope of the block in which it is defined. Synt
2 min read
Julia global Keyword | Creating a global variable in Julia
Keywords in Julia are reserved words whose value is pre-defined to the compiler and can not be changed by the user. These words have a specific meaning and perform their specific operation on execution. 'global' keyword in Julia is used to access a variable that is defined in the global scope. It makes the variable where it is used as its current s
2 min read
Getting ceiling value of x in Julia - ceil() Method
The ceil() is an inbuilt function in julia which is used to return the nearest integral value greater than or equal to the specified value x. Syntax: ceil(x) Parameters: x: Specified value. Returns: It returns the nearest integral value greater than or equal to the specified value x. Example 1: # Julia program to illustrate # the use of ceil method
1 min read
Getting floor value of x in Julia - floor() Method
The floor() is an inbuilt function in julia which is used to return the nearest integral value less than or equal to the specified value x. Syntax: floor(x) Parameters: x: Specified value. Returns: It returns the nearest integral value less than or equal to the specified value x. Example 1: # Julia program to illustrate # the use of floor() method
1 min read
Getting the minimum value from a list in Julia - min() Method
The min() is an inbuilt function in julia which is used to return the minimum value of the parameters. Syntax: min(x, y, ...) Parameters: x: Specified 1st value. y: Specified 2nd value and so on. Returns: It returns the minimum value of the parameters. Example 1: # Julia program to illustrate # the use of min() method # Getting the minimum value of
1 min read
Array Interfaces in Julia
An array interface is a syntactical contract for arrays in Julia that they must follow. This article describes the various methods which can be adopted to construct array interfaces in Julia. It also explains how to perform indexing in an array interface to access its elements. An array interface can be constructed by taking a standard array and li
5 min read
Article Tags :