1. GIT —BASIC CONCEPTSGITVersion Control SystemVersion Control System (VCS) is a software that helps software developers towork together and maintain a complete history of their work.Listed below are the functions of a VCS: Allows developers to work simultaneously. Does not allow overwriting each other’s changes. Maintains a history of every version.Following are the types of VCS: Centralized version control system (CVCS). Distributed/Decentralized version control system (DVCS).In this chapter, we will concentrate only on distributed version control systemand especially on Git. Git falls under distributed version control system.Distributed Version Control SystemCentralized version control system (CVCS) uses a central server to store all filesand enables team collaboration. But the major drawback of CVCS is its singlepoint of failure, i.e., failure of the central server. Unfortunately, if the centralserver goes down for an hour, then during that hour, no one can collaborate atall. And even in a worst case, if the disk of the central server gets corrupted andproper backup has not been taken, then you will lose the entire history of theproject. Here, distributed version control system (DVCS) comes into picture.DVCS clients not only check out the latest snapshot of the directory but theyalso fully mirror the repository. If the sever goes down, then the repository fromany client can be copied back to the server to restore it. Every checkout is a fullbackup of the repository. Git does not rely on the central server and that is whyyou can perform many operations when you are offline. You can commitchanges, create branches, view logs, and perform other operations when youare offline. You require network connection only to publish your changes andtake the latest changes.1

GITAdvantages of GitFree and open sourceGit is released under GPL’s open source license. It is available freely over theinternet. You can use Git to manage propriety projects without paying a singlepenny. As it is an open source, you can download its source code and alsoperform changes according to your requirements.Fast and smallAs most of the operations are performed locally, it gives a huge benefit in termsof speed. Git does not rely on the central server; that is why, there is no need tointeract with the remote server for every operation performed. The core part ofGit is written in C, which avoids runtime overheads associated with other highlevel languages. Though Git mirrors entire repository, the size of the data on theclient side is small. This illustrates the efficiency of Git at compressing andstoring data on the client side.Implicit backupThe chances of losing data are very rare when there are multiple copies of it.Data present on any client side mirrors the repository, hence it can be used inthe event of a crash or disk corruption.SecurityGit uses a common cryptographic hash function called secure hash function(SHA1), to name and identify objects within its database. Every file and commitis check-summed and retrieved by its checksum at the time of checkout. Itimplies that it is impossible to change file, date, and commit message and anyother data from the Git database without knowing Git.No need of powerful hardwareIn case of CVCS, the central server needs to be powerful enough to serverequests of the entire team. For smaller teams, it is not an issue, but as theteam size grows, the hardware limitations of the server can be a performancebottleneck. In case of DVCS, developers don’t interact with the server unlessthey need to push or pull changes. All the heavy lifting happens on the clientside, so the server hardware can be very simple indeed.Easier branchingCVCS uses cheap copy mechanism. If we create a new branch, it will copy all thecodes to the new branch, so it is time-consuming and not efficient. Also, deletionand merging of branches in CVCS is complicated and time-consuming. Butbranch management with Git is very simple. It takes only a few seconds tocreate, delete, and merge branches.2

GITDVCS TerminologiesLocal RepositoryEvery VCS tool provides a private workplace as a working copy. Developersmake changes in their private workplace and after commit, these changesbecome a part of the repository. Git takes it one step further by providing thema private copy of the whole repository. Users can perform many operations withthis repository such as add file, remove file, rename file, move file, commitchanges, and many more.Working Directory and Staging Area or IndexThe working directory is the place where files are checked out. In other CVCS,developers generally make modifications and commit their changes directly tothe repository. But Git uses a different strategy. Git doesn’t track each andevery modified file. Whenever you do commit an operation, Git looks for the filespresent in the staging area. Only those files present in the staging area areconsidered for commit and not all the modified files.Let us see the basic workflow of Git.Step 1: You modify a file from the working directory.Step 2: You add these files to the staging area.Step 3: You perform commit operation that moves the files from the stagingarea. After push operation, it stores the changes permanently to the Gitrepository.Suppose you modified two files, namely “sort.c” and “search.c” and you wanttwo different commits for each operation. You can add one file in the staging3

GITarea and do commit. After the first commit, repeat the same procedure foranother file.# First commit[bash] git add sort.c# adds file to the staging area[bash] git commit –m “Added sort operation”# Second commit[bash] git add search.c# adds file to the staging area[bash] git commit –m “Added search operation”BlobsBlob stands for Binary Large Object. Each version of a file is represented byblob. A blob holds the file data but doesn’t contain any metadata about the file.It is a binary file and in Git database, it is named as SHA1 hash of that file. InGit, files are not addressed by names. Everything is content-addressed.TreesTree is an object, which represents a directory. It holds blobs as well as othersub-directories. A tree is a binary file that stores references to blobs and treeswhich are also named as SHA1 hash of the tree object.CommitsCommit holds the current state of the repository. A commit is also namedby SHA1 hash. You can consider a commit object as a node of the linked list.Every commit object has a pointer to the parent commit object. From a givencommit, you can traverse back by looking at the parent pointer to view thehistory of the commit. If a commit has multiple parent commits, then thatparticular commit has been created by merging two branches.BranchesBranches are used to create another line of development. By default, Git has amaster branch, which is same as trunk in Subversion. Usually, a branch iscreated to work on a new feature. Once the feature is completed, it is mergedback with the master branch and we delete the branch. Every branch is4

GITreferenced by HEAD, which points to the latest commit in the branch. Wheneveryou make a commit, HEAD is updated with the latest commit.TagsTag assigns a meaningful name with a specific version in the repository. Tagsare very similar to branches, but the difference is that tags are immutable. Itmeans, tag is a branch, which nobody intends to modify. Once a tag is createdfor a particular commit, even if you create a new commit, it will not be updated.Usually, developers create tags for product releases.CloneClone operation creates the instance of the repository. Clone operation not onlychecks out the working copy, but it also mirrors the complete repository. Userscan perform many operations with this local repository. The only timenetworking gets involved is when the repository instances are beingsynchronized.PullPull operation copies the changes from a remote repository instance to a localone. The pull operation is used for synchronization between two repositoryinstances. This is same as the update operation in Subversion.PushPush operation copies changes from a local repository instance to a remote one.This is used to store the changes permanently into the Git repository. This issame as the commit operation in Subversion.HEADHEAD is a pointer, which always points to the latest commit in the branch.Whenever you make a commit, HEAD is updated with the latest commit. Theheads of the branches are stored in .git/refs/heads/ directory.[CentOS] ls -1 .git/refs/heads/master[CentOS] cat 4502188b0c495

GITRevisionRevision represents the version of the source code. Revisions in Git arerepresented by commits. These commits are identified by SHA1 secure hashes.URLURL represents the location of the Git repository. Git URL is stored in config file.[[email protected] tom repo] pwd/home/tom/tom repo[[email protected] tom repo] cat .git/config[core]repositoryformatversion 0filemode truebare falselogallrefupdates true[remote "origin"]url [email protected]:project.gitfetch refs/heads/*:refs/remotes/origin/*6

2. GIT —ENVIRONMENT SETUPGITBefore you can use Git, you have to install and do some basic configurationchanges. Below are the steps to install Git client on Ubuntu and Centos Linux.Installation of Git ClientIf you are using Debian base GNU/Linux distribution, then apt-get commandwill do the needful.[ubuntu ] sudo apt-get install git-core[sudo] password for ubuntu:[ubuntu ] git --versiongit version if you are using RPM based GNU/Linux distribution, then use yum commandas given.[CentOS ] su Password:[CentOS ]# yum -y install gi