Jump to content
Mikael Kalms

Tips for making a Git->Plastic sync of a large Git repo less painful and fragile?

Recommended Posts

Hi,

I am replicating the UE4 Git repository (https://github.com/EpicGames/UnrealEngine.git) to a repo in Plastic Cloud. It is a large repository; 130k changesets, many of which are large. Is there something I can do to make this particular sync less painful and fragile?

 

My experience so far for the UE4 repository is:

- The initial download step was quick.

- The "Processing Objects" step took ~30-60 mins to complete. During this step cm.exe uses 100% of 1 logical core.

- Writing 130k changesets takes a long time - I estimate it to 7-10 days. The workstation which was doing the job restarted itself after 3 days (50k changesets done) thanks to Windows Update, and that left the repo in a bad state (so a second sync failed after 5k changesets when it encountered some form of bad data).

- I have disabled Windows Update restarts, created another empty repository, and restarted the sync process.  Fingers crossed.

 

Mikael

Share this post


Link to post
Share on other sites

I have made another attempt. 49k changesets in two days, then a Redis-related timeout occurred:

Quote

The changeset '9babb945edf3c08e406047a77b270793e3a3932e' could not be imported. Error: There has been an unexpected error "Timeout performing SET falldamage:52738:trlck, inst: 18, mgr: Inactive, err: never, queue: 186, qu: 0, qs: 186, qc: 0, wr: 0, wq: 0, in: 19508, ar: 0, clientName: azureserver_IN_1, serverEndpoint: Unspecified/prodplasticscmcache.redis.cache.windows.net:6380, keyHashSlot: 14246, IOCP: (Busy=4,Free=996,Min=4,Max=1000), WORKER: (Busy=1,Free=32766,Min=4,Max=32767) (Please take a look at this article for some common client-side issues that can cause timeouts: http://stackexchange.github.io/StackExchange.Redis/Timeouts)". For more information check the server log. Please contact with the support team.

When I tried restarting it,

1) GitSync claimed that there were 90k changesets remaining to pull, and 1 changeset remaining to push

2) The import process failed early on, with the message:

Quote

The changeset '9babb945edf3c08e406047a77b270793e3a3932e' could not be imported. Error: The object is currently locked. Try later. RepId:52738 ChangesetId:49123 BranchId:1304815. Please contact with the support team.

 

Based on the above it seems to me that GitSync is not performant/robust enough to be used for Git repos the size of the UE4 repo. I'm not going to try again. Instead, we will import a few individual snapshots of the UE4 Git repo, without the history.

Share this post


Link to post
Share on other sites

(For the future, my main concern is not performance. It is robustness. I am concerned when I see that the Git->Plastic sync can leave the Plastic repository in an inconsistent state when interrupted.)

Share this post


Link to post
Share on other sites

HI Mikael,

- Regarding the robustness: if you stop the process or kill the operation via Ctrl+C, it should finish the sync of the current changeset and stop the operation. Then, you should be able to re-launch it. The problem is when there is a power outage or an uncontrolled shutdown. In that case, the sync may be broken.

- @manu performed the same gitsync against a local server (a few months ago) and he reports that it took just a few hours. Not sure why it's taking so long to sync to your cloud organization. Could you try installing a hosted Plastic server in your machine? If this operation is much faster, then you can push it from the Plastic local server to the cloud.

Regards,

Carlos-

 

Share this post


Link to post
Share on other sites


I'm trying out the sync against a local server. After 24 hours it has processed 70k of 133k changesets. Quicker than against Plastic Cloud _but_ it will still take multiple days to complete.

I am using a reasonably high-end workstation.

CPU: AMD Ryzen 7 1700X 8-core @ 3.4GHz

Disk: two SSDs (boot partition: Samsung 960 EVO, repo partition: Samsung 860 QVO)

RAM: 32GB

OS: Windows 10 Home.

 

Most time seems to be spent in LibGit2Sharp:

image.thumb.png.06beacdb76e7529115ba7786568dd0e1.png

I have disabled logging for cm.exe, but that did not seem to make any major performance difference.

 

The only attribute which stands out for the cm process is memory usage:

image.png.0dc2e3e5082aafc78fd2576ad809e8de.png

 

Breaking into it with a debugger, the bulk of time is spent inside of ForeignScm.dll!Codice.Foreign.GitPuller.GetForeignTree. That av9.b() method spends a lot of time constructing a tree -- just by breakpointing I would notice that sometimes the while-loops in there could take 1+ second to complete:

internal TreeForeignNode b(Tree A_0)
{
    Stack<TreeForeignNode> stack = new Stack<TreeForeignNode>();
    Stack<Tree> stack2 = new Stack<Tree>();
    TreeForeignNode treeForeignNode = new TreeForeignNode(A_0.Sha, string.Empty, TreeForeignNode.Mode.Directory, false);
    stack.Push(treeForeignNode);
    stack2.Push(A_0);
    TreeEntryInfo treeEntryInfo = new TreeEntryInfo();
    while (stack2.Count > 0)
    {
        TreeForeignNode treeForeignNode2 = stack.Pop();
        Tree a_ = stack2.Pop();
        TreeIterator treeIterator = this.a(a_);
        if (treeIterator != null)
        {
            while (this.a(treeIterator, treeEntryInfo))
            {
                if (av9.h(treeEntryInfo))
                {
                    TreeForeignNode treeForeignNode3;
                    if (this.a(treeEntryInfo, out treeForeignNode3))
                    {
                        treeForeignNode2.AddChild(treeForeignNode3);
                    }
                    else
                    {
                        treeForeignNode3 = this.g(treeEntryInfo);
                        if (treeForeignNode3 != null)
                        {
                            treeForeignNode2.AddChild(treeForeignNode3);
                            if (treeForeignNode3.IsDirectory)
                            {
                                Tree tree = this.b(treeEntryInfo);
                                if (!(tree == null) && tree.Count != 0)
                                {
                                    stack.Push(treeForeignNode3);
                                    stack2.Push(tree);
                                }
                            }
                        }
                    }
                }
            }
        }
    }
    this.a.a(treeForeignNode);
    return treeForeignNode;
}

 

I really don't know why this completes in a few hours for Manu but takes >24h for me. I could imagine it is one of these three things happening:

1) The CLR is somehow running a lot slower on my workstation than it does on manu's workstation, in general...?

2) There are a lot more changesets in the UnrealEngine repo than there used to be (... but, the repo has existed for 4 years, it doesn't make sense)?

3) Interop is a lot slower on my machine than on yours, because of ... configuration reasons on my machine?


I have stopped the replication for the time being.

 

If you have ideas on what to test, let me know. I would like to find a solution for this in the next month(s) but it is not urgent for us.

Share this post


Link to post
Share on other sites

Hi,

I started the same clone operation this morning on my local PC. Around 7 hours running:

Importing... \ 45784/138997
I'm using the "cm sync" (next time please also use the "cm" so we can get more feedback on the operation). I'm using the office network and my SSD drive.
It doesn't seem to be as fast as Manu reported but it should be finishing in less than 24 hours (faster than your test).
 
Anyway, the initial operation is what should take a longer time (this is a big repo). But after that, the sync for the new commits should be very fast.
 
Regards,
Carlos.

Share this post


Link to post
Share on other sites

I think you are missing the full story by not running the entire test. My machine reached 45k imported changesets after 2.5 hours - it is around the 50k-60k mark that I begin to see real performance problems.
 

 

Here are some tools for profiling the situation: https://github.com/kalmalyzer/profile-gitsync in case you are interested.

I have let the sync step run for approx 12 hours now. I am syncing to a local server. I interrupted it after it had imported 72k changesets out of 140k. Here is where the time has been spent:

Stage Duration (seconds)
Compressing objects 6.402
Downloading 219.25
Processing objects 2823.818
Importing 42205.704

 

This graph shows the average time taken (in seconds) for importing each changeset. The light-blue line represents individual values, the dark-blue line is a trend line, formed with moving average, over the 10 nearest samples:

image.thumb.png.e451bbd3cf866ea35a55ff78782bc714.png

What this means is, importing the remaining 68k changesets will take a long time.

- Even if the trend would suddenly stop, and land at 2 seconds/changeset, it would take another 37 hours to import the remaining changesets.

- More likely, the trend continues to rise with 1 second per 10k changesets, and it will take 4 days and 20 hours to import the remaining changesets.


I will disable Windows Update for a month and continue running this on my machine, just to see what the trend looks like.

Share this post


Link to post
Share on other sites

After another hiccup (needed to restart machine, forgot I had the sync running in the background), I restarted the sync, and it completed quicker than I had expected - in ~41 hours.

Statistics:

Stage Duration
Compressing objects 5
Downloading 203
Processing objects 2816
Importing 145680

 

Avg time per changeset for importing:

image.thumb.png.262491f3f1da9bf651d93efcd2b495da.png

From the look of this, it appears that time spent per changeset is a function of two things:

1) the previous number of changesets (because there is more history to walk in the git tree) - causing a gradual increase in processing time, compare 0..50k vs 80k-115k.

2) the content of individual changesets (perhaps many changed files -> more tree walks need to be done?) - causing the two major bumps at 60k-80k and 115k-140k.

Anyway -- my fourth, or fifth, attempt to sync a very large Git repository to Plastic has been successful.

Share this post


Link to post
Share on other sites

Wow! Thank you so much for the update and the detailed analysis! Good to know that the operation was able to complete. 

The sync duration would depend for sure on the content of the individual commits but not very sure why it seems to also depend on the previously synced changesets.

Regards,

Carlos.

Share this post


Link to post
Share on other sites

My guess: I think it has to do with the algorithm used to convert a single commit into a changeset.

I don't think it depends on previously synced changesets on the Plastic side. Rather, it depends on how much previous history walking that happens within the Git library - and I presume that there is a gradual increase in the amount of Git-side history walking necessary, for Plastic to be able to convert a commit -> a changeset. This is either due to more calls to the Git library, or because each individual call results in longer walks, on average.

Share this post


Link to post
Share on other sites

Update:

I have a complete replica of the UE4 Git repository on my workstation's local Plastic server. However, my local UE4 repo cannot be replicated in its entirety.

 

/main can be replicated to Plastic Cloud just fine. When I attempt to replicate all branches from my local server to Plastic Cloud, however, I encounter this error:

Quote

 

The sync process was unable to replicate the following branch:

Branch: /4.4
Operation: Push
Source repository: UE4@local
Destination repository: UE4@<org>@cloud

Error description: 
The source branch has two heads. Merge from cs:16316cb0-9a9c-48d8-b3bc-cad4cbe2dad0 to br:/4.4 to unify the heads.

 

... and this is the problematic location on the '4.4' branch:

image.png.eca63f1e26cb37c3412dd64444d0618f.png 

As you can see above, there are indeed two heads in the branch. The changesets have been created by the 'cm sync' command, I haven't touched the repository myself.

 

I will proceed by replicating all remaining branches, and see how far that takes me.

Share this post


Link to post
Share on other sites

If you are using GitSync, you should be using the same client (the one where the original mappings were created to perform the future syncs). Also the Gitsync operation is performed between one specifc Plastic repo and a git repo. 

If you want to involve a totally new repo in the sync, it needs to be empty so at the same time it's populated, the local mappings are also generated in your client.

I'm gessing if you are replicating a repo in a client where you don't have the original GitSync mappings and this is duplicating all the repo history. We need to be careful with this workflow because it could break the repo history.

Regards,

Carlos.

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...