Jump to content
Mikael Kalms

Tips for making a Git->Plastic sync of a large Git repo less painful and fragile?

Recommended Posts

Hi,

I am replicating the UE4 Git repository (https://github.com/EpicGames/UnrealEngine.git) to a repo in Plastic Cloud. It is a large repository; 130k changesets, many of which are large. Is there something I can do to make this particular sync less painful and fragile?

 

My experience so far for the UE4 repository is:

- The initial download step was quick.

- The "Processing Objects" step took ~30-60 mins to complete. During this step cm.exe uses 100% of 1 logical core.

- Writing 130k changesets takes a long time - I estimate it to 7-10 days. The workstation which was doing the job restarted itself after 3 days (50k changesets done) thanks to Windows Update, and that left the repo in a bad state (so a second sync failed after 5k changesets when it encountered some form of bad data).

- I have disabled Windows Update restarts, created another empty repository, and restarted the sync process.  Fingers crossed.

 

Mikael

Share this post


Link to post
Share on other sites

I have made another attempt. 49k changesets in two days, then a Redis-related timeout occurred:

Quote

The changeset '9babb945edf3c08e406047a77b270793e3a3932e' could not be imported. Error: There has been an unexpected error "Timeout performing SET falldamage:52738:trlck, inst: 18, mgr: Inactive, err: never, queue: 186, qu: 0, qs: 186, qc: 0, wr: 0, wq: 0, in: 19508, ar: 0, clientName: azureserver_IN_1, serverEndpoint: Unspecified/prodplasticscmcache.redis.cache.windows.net:6380, keyHashSlot: 14246, IOCP: (Busy=4,Free=996,Min=4,Max=1000), WORKER: (Busy=1,Free=32766,Min=4,Max=32767) (Please take a look at this article for some common client-side issues that can cause timeouts: http://stackexchange.github.io/StackExchange.Redis/Timeouts)". For more information check the server log. Please contact with the support team.

When I tried restarting it,

1) GitSync claimed that there were 90k changesets remaining to pull, and 1 changeset remaining to push

2) The import process failed early on, with the message:

Quote

The changeset '9babb945edf3c08e406047a77b270793e3a3932e' could not be imported. Error: The object is currently locked. Try later. RepId:52738 ChangesetId:49123 BranchId:1304815. Please contact with the support team.

 

Based on the above it seems to me that GitSync is not performant/robust enough to be used for Git repos the size of the UE4 repo. I'm not going to try again. Instead, we will import a few individual snapshots of the UE4 Git repo, without the history.

Share this post


Link to post
Share on other sites

(For the future, my main concern is not performance. It is robustness. I am concerned when I see that the Git->Plastic sync can leave the Plastic repository in an inconsistent state when interrupted.)

Share this post


Link to post
Share on other sites

HI Mikael,

- Regarding the robustness: if you stop the process or kill the operation via Ctrl+C, it should finish the sync of the current changeset and stop the operation. Then, you should be able to re-launch it. The problem is when there is a power outage or an uncontrolled shutdown. In that case, the sync may be broken.

- @manu performed the same gitsync against a local server (a few months ago) and he reports that it took just a few hours. Not sure why it's taking so long to sync to your cloud organization. Could you try installing a hosted Plastic server in your machine? If this operation is much faster, then you can push it from the Plastic local server to the cloud.

Regards,

Carlos-

 

Share this post


Link to post
Share on other sites


I'm trying out the sync against a local server. After 24 hours it has processed 70k of 133k changesets. Quicker than against Plastic Cloud _but_ it will still take multiple days to complete.

I am using a reasonably high-end workstation.

CPU: AMD Ryzen 7 1700X 8-core @ 3.4GHz

Disk: two SSDs (boot partition: Samsung 960 EVO, repo partition: Samsung 860 QVO)

RAM: 32GB

OS: Windows 10 Home.

 

Most time seems to be spent in LibGit2Sharp:

image.thumb.png.06beacdb76e7529115ba7786568dd0e1.png

I have disabled logging for cm.exe, but that did not seem to make any major performance difference.

 

The only attribute which stands out for the cm process is memory usage:

image.png.0dc2e3e5082aafc78fd2576ad809e8de.png

 

Breaking into it with a debugger, the bulk of time is spent inside of ForeignScm.dll!Codice.Foreign.GitPuller.GetForeignTree. That av9.b() method spends a lot of time constructing a tree -- just by breakpointing I would notice that sometimes the while-loops in there could take 1+ second to complete:

internal TreeForeignNode b(Tree A_0)
{
    Stack<TreeForeignNode> stack = new Stack<TreeForeignNode>();
    Stack<Tree> stack2 = new Stack<Tree>();
    TreeForeignNode treeForeignNode = new TreeForeignNode(A_0.Sha, string.Empty, TreeForeignNode.Mode.Directory, false);
    stack.Push(treeForeignNode);
    stack2.Push(A_0);
    TreeEntryInfo treeEntryInfo = new TreeEntryInfo();
    while (stack2.Count > 0)
    {
        TreeForeignNode treeForeignNode2 = stack.Pop();
        Tree a_ = stack2.Pop();
        TreeIterator treeIterator = this.a(a_);
        if (treeIterator != null)
        {
            while (this.a(treeIterator, treeEntryInfo))
            {
                if (av9.h(treeEntryInfo))
                {
                    TreeForeignNode treeForeignNode3;
                    if (this.a(treeEntryInfo, out treeForeignNode3))
                    {
                        treeForeignNode2.AddChild(treeForeignNode3);
                    }
                    else
                    {
                        treeForeignNode3 = this.g(treeEntryInfo);
                        if (treeForeignNode3 != null)
                        {
                            treeForeignNode2.AddChild(treeForeignNode3);
                            if (treeForeignNode3.IsDirectory)
                            {
                                Tree tree = this.b(treeEntryInfo);
                                if (!(tree == null) && tree.Count != 0)
                                {
                                    stack.Push(treeForeignNode3);
                                    stack2.Push(tree);
                                }
                            }
                        }
                    }
                }
            }
        }
    }
    this.a.a(treeForeignNode);
    return treeForeignNode;
}

 

I really don't know why this completes in a few hours for Manu but takes >24h for me. I could imagine it is one of these three things happening:

1) The CLR is somehow running a lot slower on my workstation than it does on manu's workstation, in general...?

2) There are a lot more changesets in the UnrealEngine repo than there used to be (... but, the repo has existed for 4 years, it doesn't make sense)?

3) Interop is a lot slower on my machine than on yours, because of ... configuration reasons on my machine?


I have stopped the replication for the time being.

 

If you have ideas on what to test, let me know. I would like to find a solution for this in the next month(s) but it is not urgent for us.

Share this post


Link to post
Share on other sites

Hi,

I started the same clone operation this morning on my local PC. Around 7 hours running:

Importing... \ 45784/138997
I'm using the "cm sync" (next time please also use the "cm" so we can get more feedback on the operation). I'm using the office network and my SSD drive.
It doesn't seem to be as fast as Manu reported but it should be finishing in less than 24 hours (faster than your test).
 
Anyway, the initial operation is what should take a longer time (this is a big repo). But after that, the sync for the new commits should be very fast.
 
Regards,
Carlos.

Share this post


Link to post
Share on other sites

I think you are missing the full story by not running the entire test. My machine reached 45k imported changesets after 2.5 hours - it is around the 50k-60k mark that I begin to see real performance problems.
 

 

Here are some tools for profiling the situation: https://github.com/kalmalyzer/profile-gitsync in case you are interested.

I have let the sync step run for approx 12 hours now. I am syncing to a local server. I interrupted it after it had imported 72k changesets out of 140k. Here is where the time has been spent:

Stage Duration (seconds)
Compressing objects 6.402
Downloading 219.25
Processing objects 2823.818
Importing 42205.704

 

This graph shows the average time taken (in seconds) for importing each changeset. The light-blue line represents individual values, the dark-blue line is a trend line, formed with moving average, over the 10 nearest samples:

image.thumb.png.e451bbd3cf866ea35a55ff78782bc714.png

What this means is, importing the remaining 68k changesets will take a long time.

- Even if the trend would suddenly stop, and land at 2 seconds/changeset, it would take another 37 hours to import the remaining changesets.

- More likely, the trend continues to rise with 1 second per 10k changesets, and it will take 4 days and 20 hours to import the remaining changesets.


I will disable Windows Update for a month and continue running this on my machine, just to see what the trend looks like.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...