Open In App

Git – GC (Garbage Collection)

Improve
Improve
Like Article
Like
Save
Share
Report

The git gc command is used to maintain a repository. Garbage collection is denoted by “gc.” Git gc is a command that tells Git to clean up the mess it has created in the existing repository. Garbage collection is a concept that comes from dynamic memory allocation in interpreted programming languages. In a programming language, garbage collection is performed to retrieve a memory that has been unavailable to the workforce management. It runs a number of maintenance activities within the present repository, like eliminating unreachable objects generated by the previous git add calls, compressing file revisions (to save disc space and improve performance), pruning reflog, rerere metadata, removing old working trees, and packing refs. It’s also possible to change supplementary indexes like the commit graph.

Using git gc manually can only be used if you need to add items to a repository without running porcelain commands on a routine basis, performing a one-time repository optimization, or cleaning up a suboptimal mass-import. For example, Various sorts of garbage or trash accumulate in the Git repos. Orphaned or unavailable commits make up most of the Git garbage. When using background commands like git resets or git rebase, git commits can become unavailable. Git will not discard detached commits in order to maintain the history and protect data. A detached commit can still be viewed in the git log, cherry-picked, and inspected.

Whenever common activities that generate objects are executed, they will check if the repository has grown much since the last maintenance and, if so, will automatically run git gc.

git gc will execute compression on saved Git Objects in addition to detached commit cleaning, in order to free up valuable disc space. Git will compress a bunch of related objects into a ‘pack’ when it detects them. Packs are Git project zip files that reside in the “./git/objects/pack directory of a repository.”

Git gc checks numerous git config settings before running. These values will aid in the understanding of the rest of git gc’s responsibilities.

Git gc config:

gc.reflogExpire

This is an optional variable with a default value of 90 days. It is used to specify the length of time records in a branch’s reflog should be kept.

gc.reflogExpireUnreachable

This is an optional variable with a default value of 30 days. It is used to specify the length of time inaccessible reflog records should be kept.

gc.aggressiveWindow

This is an optional variable with a default value of 250. When git gc is run with the —aggressive option, it determines how much time is spent in the delta compression phase of object packing. Because this can take longer than expected, the impacts of assertive command are typically long-lasting.

gc.aggressiveDepth

Optional variable with a value of 50 by default. It specifies the compression depth used by git-repack during a git gc —aggressive command.

gc.pruneExpire

This setting is optional and defaults to “2 weeks ago.” It determines how long an inaccessible item will be kept before being pruned.

gc.worktreePruneExpire

This setting is optional and defaults to “3 months ago.” It specifies the amount of time a stale functioning tree will be kept before being removed.

git gc exec:

Git gc really runs a bunch of different private subcommands like git prune, git repack, git pack, and git reference behind the scenes. These commands’ increased responsibility is to find any Git items which are outside of the git gc configuration’s standard limits. These items are then compressed or trimmed as needed once they have been located.

What is the significance of git gc aggressive?

The –aggressive command prompt can be used to run git gc. The  –aggressive option tells git gc to put greater effort into optimizing the code. This makes git gc run slower, but it saves more disc space once it’s finished. The consequences of –aggressive are long-lasting, therefore it’s only necessary to use it after a substantial number of modifications have been made to a repo.

How is git prune different from git gc?

git gc is a parent command and git prune is a child. Essentially, git prune will be triggered by git gc. Git prune is used to delete Git objects that the git gc config has judged unreachable. Learn more about the git prune command.

What is the meaning of git gc auto?

Before executing, the git gc–auto command variant checks if any maintenance is needed on the repository. It exits without even doing work if it determines that cleaning is not required. After execution, several Git tasks run git gc–auto to clear away any loose items they’ve produced. Git gc –auto checks the git settings for threshold levels on free objects and packing compression size before executing. git config can be used to set these values. Git gc–auto will be run if the repository exceeds any of the housekeeping thresholds.

git gc options

$ cd gc --aggressive

The git gc command usually has a quick execution speed, as well as flawless disc space efficiency and desired performance. As a result, the aggressive command will improve memory efficiency while slowing down execution. Because this can take longer than expected, the impacts of assertive command are usually lasting.

$ cd gc --auto

You can use this option to determine whether or not a warehouse is required. It simply moves out if you don’t need it. When configuration variables like gc.auto or gc.autoPackLimit are used in conjunction with the git auto command, the cleaning mechanism is automatically triggered.

$ cd gc --prune=<date>

The prune command is identical to this one. This command’s main aim is to eliminate or keep losing control of things that have been specified on a specific date. It merely displays the older objects that were present at a certain point in time. As a result, if another operation is running in the repository at the same time, the aging and danger of corruption are raised.

$ cd gc --no-prune

This command simply removes all of the repository’s missing objects.

$ cd gc --quite

This command is used to remove all previous progress reports.

$ cd gc --force

Despite the fact that another git gc command may be running in the repository, this command is utilized to conduct the current command. It takes precedence over the previously running git gc command and executes it.

$ cd gc --keep-largest-pack

As previously stated, the pack command combines all of the elements into a single packet. As an outcome, when you run the command above, all of the related data is compressed into one pack, with the exception of the largest pack. During the execution of this command, the gc.bigPackThreshold pack is simply ignored.

We can conclude that Git gc has a difficult time deleting any referring objects from the repo. As a result, the index preserves the referring objects as from branches and tags, allowing remote tracking of branches or commits that have been rewound or altered. It’s crucial to remember that references do not keep objects alive. If you expect to delete some objects, you must inspect all of their locations and determine whether the deletion makes sense or not before eliminating those references.

In addition, when using the git gc command with many processes operating at the same time, there is a risk of deleting a function that has not yet generated its reference. If the other process is intimately coupled with the simultaneously executable process, it may also fail as a result. Git has two features that help to solve this problem.

The following are the two attributes as follows:

  1. If Git discovers that a newer object is being referenced or is reachable, it preserves it.
  2. Whenever the alteration of the interval of time is existing, the preceding procedure is put into consideration. However, because the feature is not part of the whole solution, users prefer not to utilize it in practice because of the possibility of corruption.


Last Updated : 20 Apr, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads