Version Control Systems - Part 1

Nayan Pahuja - Sep 15 '23 - - Dev Community

Millions of developers today use Git or Git Hub. It has become a core part of any kind of programmer's day after their work has been done. If you were applying for a development position today, it would be a prerequisite that you understand git and Github or some kind of version control system.

I remember 2 years back when I first started basic development, coding, and stuff in my first year of college that I interacted with this. VCS is something developers can't think of working without today and yet no college university explicitly covers VCS.

I will try to make this into a two-part series on Version Control Systems and subsequently the de facto standard for VCS today git.
In this post I will be trying to cover how it works and what exactly happens to your data, so Let's get started.

Version Control Systems:

What are Version Control Systems?

We have been using VCS for a long time now. To put it to a definition:

Version Control Systems are tools that are used to track changes to source(original) code changes or the collections of files and folders around it.

As the name implies Version Control Systems helps us maintain a history of changes also facilitating collaboration at the same time.

Gone is the time we would need to manually change our existing codebase line by line when working with some other member. VCS does all of that and more for us today.

VCSs track changes to a folder and its contents in a series of snapshots, where each snapshot encapsulates the entire state of files/folders within a top-level directory. VCSs also maintain metadata like who created each snapshot, messages associated with each snapshot, and so on.

Why Version Control Systems?

Well, we have all played games and never played them in one sitting(well sometimes), So what do we do? We save our game to our last checkpoint log in the next day and continue from the same spot or if we make a mistake we could start from our last checkpoint and start again. Why could we not do that with our code? Well, we have already been doing it, that's exactly what VCSs are doing, We can easily track what we have been doing wrong and where we went off right or try new things without breaking the existing working code.

Types of Version Control Systems and their History:

  1. Local Version Control Systems:

    • History: Local VCSs were some of the earliest form of version control. They operated on a single computer and kept track of changes made to files using a simple database or file system.
  2. Centralized Version Control Systems (CVCS):

    • Examples: Concurrent Versions System (CVS), Subversion (SVN).
    • History: CVCSs introduced the concept of a central server that stored the repository. Developers could check out code from this central location and commit their changes back to it. While it improved collaboration compared to local VCSs, it had limitations in terms of scalability and offline access and could also lead to some conflicts more easily.
  3. Distributed Version Control Systems (DVCS):

    • Examples: Git, Mercurial, Bazaar.
    • History: DVCSs represent a significant evolution in version control. Instead of relying on a central server, each developer has their own local repository, which is a complete copy of the entire project history. Developers can work offline, branch easily, and commit locally before pushing changes to a shared central repository.

Features of Modern VCSs

It’s an invaluable tool for seeing what other people have changed, as well as resolving conflicts in concurrent development.

  • Who authored this module?
  • When was this specific line in the particular file last modified, and who made the changes? What was the reason behind the modification?
  • Within the past 100 revisions, when and for what reason did a specific unit test cease to function properly?

While there exist many VCSs git is the de facto standard for today's version control systems.

This fun comic by xkcd completely debunks half of the developer community when they start out and I certainly have done this as well.

Git Comic


Git

Let's use git to understand a little bit how modern day DVCS work and let's explore how git works internally a little bit.

When I began learning Git, I found that starting with the command-line interface was quite confusing. Instead of grasping the underlying concepts, I ended up memorizing a handful of commands and relied on them like magic spells whenever I ran into problems.
I am very confident I am not alone who has done this.

Git may not have the prettiest interface, but its core design and ideas are elegant. Instead of just memorizing commands, it's helpful to start from the basics, understanding Git's data model first, and then getting into the command-line part. Once you grasp the data model, using Git commands becomes more logical instead of feeling like chanting magical incantations and it works somehow.

Git Data Model:

While there are numerous ad-hoc approaches to version control, Git stands out with its carefully designed model that underpins essential version control features such as history maintenance, branch support, and collaborative capabilities.

Snapshots:

Imagine you are doing work in some parts but the catch is you can't exactly remember where you left the work last.
What you can do is take a picture of where you working last time and next time you come back to it you can refer to that picture and start seeing changes from there.

Every time you commit git takes a snapshot(not exactly a picture the example was just for understanding)
Git also views what was the state of your code in the last snapshot which exists and shows the changes currently made by simply their differences.

In Git terminology, a file is called a “blob” (Binary Large Object), and it’s just a bunch of bytes. A directory is called a “tree”, and it maps names to blobs or trees (so directories can contain other directories). A snapshot is the top-level tree that is being tracked.

Let's make some files and folders to understand this in a better way.

~$ mkdir blogExample
~$ cd blogExample
~/blogExample$ echo "Hey This Side Nayan" > index.txt
~/blogExample$ git init
~/blogExample$ git add index.txt
~/blogExample$ git commit -m "First commit, added introduction."


Enter fullscreen mode Exit fullscreen mode

For example, we might have a tree as follows:

<root> (tree)
|
+- blogExample(tree)
|  |
|  + index.txt (blob, contents = "This Side Nayan")
|

Enter fullscreen mode Exit fullscreen mode

Version History:

How should a VCS relate version history, which is one of the more crucial parts of a VCS. We can always take a linear history take something like this:

Linear Commit History

But git doesn't exactly work in this way, It uses a common Data Structure known as Directed Acyclic Graph to store the snapshots.

What this essentially means in Git is that each snapshot I create has a reference to a set of "parents," which are the snapshots that came before it. It's a set of parents, not just a single one (as you would have in a linear history), because a snapshot can be connected to multiple parents. This can happen, for instance, when I merge two parallel branches of development.

DAG

Though I have used the word commit in the images, what those circle essentially represent are Snapshots taken in place by git.

This might correspond to, for example, two separate features being developed in parallel, independently from each other. In the future, these branches may be merged to create a new snapshot that incorporates both of the features,

This is essentially what branching is.

Commits in Git are immutable. This doesn’t mean that mistakes can’t be corrected, however; it’s just that “edits” to the commit history are actually creating entirely new commits, and references are updated to point to the new ones.

Github Origin:

You can create a Git repository on your own computer and manage it locally, taking advantage of Git's features such as branches and commits. However, if you intend to collaborate with others on the project, this approach won't suffice. An alternative is to copy the entire project to another computer and regularly export changes, sending them back and forth for updates. Nevertheless, this method is far from ideal. What you'll end up with is a centralized location where project data is stored, and it becomes the hub for downloading and uploading changes and commits. This central location is referred to as "origin." You have the option to host it on your personal server, but there are also many free solutions available, such as GitHub, that you can use for this purpose.


Conclusion Part 1:

While there are numerous features and functionalities in VCSs(git and more), such as committing, pushing changes, merging, and more, the main takeaway from this post should be a fundamental understanding of what Git is. For those encountering version control systems for the first time, the initial confusion can be overwhelming. However, I hope this post has provided you with a clear idea of how VCSs operate and why it is an invaluable tool in the world of software development.

I will be covering more of Object Data Model of git, how it uses SHA1 hashing to store the data and how commits are actually not objects but are pointers to snapshots and such more things including git commands.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .