Before diving deep into any tool or technology, it's essential to understand why it was developed and what problem it solves. This is equally true for Version Control Systems like Git. Knowing the "why" behind a tool is as important as learning the "how."
So, in this special blog, I will be walking you through why and how version controls evolved over time till the date. We will also cover a few important fundamentals of version controls (naming and definitions) ,which will be necessary to help you understand anything about version controls beyond this point .
This series is the first blog in series: Git Basic To Advanced
If you are not into learning about git in an easy way and not much interested into learning why of version controls, then this blog is not for you. You can end this blog here, and check for any other relevant blogs from me and Thank me, for saving your time.
Before digging any further, let’s start with the why :
In the beginning , there was a world without any version control system on developing various softwares or programs , but developers quickly ran into problems with versions, i.e. tracking the changes of files in order to revert back or update the code to an earlier version. Although, a dumb or manual way of solving this problem was to create a time stamped directory or file at each major milestones, but it was not that of convenience and error prone i.e. developer had to check for all the code and understand what was going within those timestamped code and literally would take weeks to figure out until the bug or problem was found, even in a small program and sometimes people would make it even worse when trying to fix things. The solution to the above problem came from a simple Local Version Control Systems.
Local Version Control :
It was a very simple solution , i.e., an automated version management of manual model discussed above, i.e. a simple database that kept all the changes of files. Some of the oldest systems are RCS(Revision Control System) which was written in C and introduced in 1982. RCS can be called the first generation of VCSs. It was developed using Unix commands that keeps the differences of files which makes it possible to recover older versions of files as per need. RCS kept the difference of file i.e. patch in a disk in a special format, which would be used to look at any files at any point of time by adding up all the patches i.e. to see how it looks .
Centralized Version Controls
Although the RVS ,solved the issue of maintaining versions and efficiently helped to manage the versions but as softwares started growing there was a need to make softwares easy for collaboration across various teams, this problem was not solved by RCS so , Centralized Version Control systems were introduced, which included (Subversion(2000) ,Perforce(1995), CVS(1990)) . It can also be called the second generation of version control. These tools were developed around the 1990s. The Centralized Version Control system consists of a Central server , which has all the versions of files and the history of changes, which the Developer could checkout from the central repository. The tool had some advantages over the RVS, where each team member could see the work of other teams and even administrators could monitor the team and projects to a level. But on the downside, centralized version control systems had a serious problem of a single point of failure. Where if the centralized system would go out for some time , developers would not be able to collaborate or more obvious is that if the memory of the central repository got corrupted, there was no way to recover the whole project and you were going to lose everything.
Distributed Version Controls
To cope with the above challenges, Distributed Version Control systems such as git, mercurial, or Darcs) were introduced around the 2000s i.e. from 2003-2005. The DVCs don’t only keep the differences or snapshot of changes rather, they fully mirror the repository, including its full history. Hence this gives each of the clients connected in that repository to have a full repository including their history from day 1 which makes it resilient such that even a single repository in a client can have full history of the project ,which could be restored to every single entity . In other words, each clone or client is a full backup of the system. Another possibility raised by the distributed version controls is that any team can work collaboratively from any parts of the world with internet connectivity using the powerful feature of remote repository in distributed version control systems. This allows you to set up several types of workflows that aren’t possible in centralized systems, such as hierarchical models.
Hence now let’s move on to what is Version control?
Version Control:
The version control, sometimes called a revision control or a source control, refers to the management of changes to documents, computer programs, large websites, or other collections of information over time so that you can recall specific versions later.
Benefits of using a Version control:
Additionally ,following functionalities make them more appealing to the teams:
What actually is git?
Git is a distributed version control system that helps you track changes to files and collaborate with others on software development projects. It allows multiple developers to work on the same project simultaneously, managing and merging their changes efficiently.
Before understanding , other concepts on git , it's necessary to understand about three states of git:
Files within git remains mainly in following three states,
As shown in the diagram are three stage area in git which are more over explained by git states:
Working Directory: The working directory is the current state of your project on your local machine. It includes all the files and directories that make up your project at a particular point of your code. It is where you make changes to files even before adding to the staging area and committing them. The working directory is the directory outside of the .git directory where you perform your work.
Staging Area (Index): The staging area is an intermediate step between your working directory and a commit or .git directory. It acts as a holding area where you can choose which changes to include in the next commit. You selectively add or remove changes from the staging area before creating a commit. You can either choose to add all the modified files or select only the specific files to the staging area to add them to the git database in the next commit.
A good rule of thumb for which files to add to your staging area is : You add only those files that are relevant to the current commit message, such that the commit reference can explain which files were changed for what purpose.
The Git directory: It is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer. Once you commit your changes then the snapshot is stored in the .git directory .
To summarize, the following three steps are involved in making history of changes
Other basic Definitions required to understand more on git:
Repository:
A Git repository is a folder or directory where your project's files and revision history are stored. It contains all the commits, branches, and tags associated with your project.
Commits:
A commit in Git represents a snapshot of your project at a specific point in time. It captures the changes made to the files in your repository and includes a unique identifier called a commit hash. Commits are like milestones in your project's history.
Stash:
The git stash command is used to temporarily store uncommitted work in order to clean out your working directory without having to commit unfinished work on a branch.
Branches:
Branches in Git allow you to work on different versions of your project simultaneously. The default branch is usually named "master" or "main." You can create new branches to add features, fix bugs, or experiment without affecting the main branch. Branches provide isolation for different lines of development and can be merged back into the main branch when ready. You can change branches using the git checkout command.
Staged Changes:
Staged changes are modifications to files that have been explicitly marked to be included in the next commit. When you use the command git add
Unstaged Changes:
Unstaged changes are modifications made to files that have not been included in the staging area. These changes are present in your working directory but have not been explicitly marked to be part of the next commit. Unstaged changes can include new files, modified files, or deleted files that have not been staged yet.
Working Directory:
The working directory is the current state of your project on your local machine. It includes all the files and directories that make up your project. You make changes to files in the working directory before staging and committing them.
Staging Area (Index):
The staging area is an intermediate step between your working directory and a commit. It acts as a holding area where you can choose which changes to include in the next commit. You selectively add or remove changes from the staging area before creating a commit.
Remote Repositories:
Git allows you to work with remote repositories hosted on servers like GitHub, GitLab, or Bitbucket. Remote repositories enable collaboration and provide a centralized location for sharing code with others. You can push your local commits to a remote repository and pull changes made by others.
Fetch:
Fetch communicates with a remote repository and fetches down all the information that is in that repository that is not in your current one and stores it in your local git database.
Clone:
Cloning a repository means creating a local copy of a remote repository on your machine. It downloads the entire revision history and branches, allowing you to work on the project locally. The git clone command is actually something of a wrapper around several other commands. It creates a new directory, goes into it and runs git init to make it an empty Git repository, adds a remote (git remote add) to the URL that you pass it (by default named origin), runs a git fetch from that remote repository and then checks out the latest commit into your working directory with git checkout
Pull:
Pulling is the process of fetching changes from a remote repository and merging them into your local branch. It is used to update your local branch with the latest commits made by others.
Push:
Pushing is the process of sending your local commits to a remote repository. It updates the remote branch with your changes, making them accessible to others.
Merge:
Git allows you to combine changes from different branches by merging them. Merging takes the contents of a source branch and integrates them into a target branch. It combines the commit history of both branches, creating a new merge commit. Merging is commonly used to incorporate feature branches into the main branch or to merge changes made by multiple developers.
Conflict:
A conflict occurs when Git is unable to automatically merge changes from different branches. It happens when two branches have made conflicting modifications to the same file or even the same lines within a file. Resolving conflicts requires manual intervention, where you choose which changes to keep and how to reconcile the conflicting differences.
Remote Tracking Branches:
When you clone a remote repository, Git creates remote tracking branches that keep track of the state of branches in the remote repository. These branches are prefixed with the name of the remote repository (e.g., origin/main). Remote tracking branches allow you to easily see the differences between your local branch and the corresponding branch on the remote repository.
Pull Request:
A pull request is a feature commonly found in web-based Git hosting platforms like GitHub, GitLab, and Bitbucket. It is a way to propose changes to a repository. When you create a pull request, you are asking the repository owner to review and merge your changes into their repository. Pull requests facilitate collaboration, code review, and discussion among team members.
Rebase:
Rebase is an alternative to merging that allows you to integrate changes from one branch onto another. It rewrites the commit history, making it appear as if the changes on the rebased branch were made directly on the target branch. Rebase can create a cleaner, linear history but requires caution when used on shared branches.
Tag:
A tag in Git is a label assigned to a specific commit. It is commonly used to mark important points in your project's history, such as releases or major milestones. Tags provide a way to easily reference specific commits in a more meaningful way than using commit hashes.
Git Stash
Git stash temporary shelves (or stashes /stores) changes you've made to your working copy in a different memory than your working directory so, you can work on something else, and then come back and re-apply them later on. Stashing is handy if you need to quickly switch context and work on something else, but you're mid-way through a code change and aren't quite ready to commit.
Gitignore:
The .gitignore file is used to specify files and directories that should be ignored by Git. You can define patterns to exclude certain files, such as build artifacts, logs, or temporary files, from being tracked in the repository. The .gitignore file helps keep your repository clean and focused on the essential code and resources.
Other blogs from this Series: