Jump to content

Mikael Kalms

  • Content Count

  • Joined

  • Last visited

  • Days Won


Mikael Kalms last won the day on August 12

Mikael Kalms had the most liked content!

Community Reputation

2 Neutral

About Mikael Kalms

  • Rank
    Advanced Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I think you are missing the full story by not running the entire test. My machine reached 45k imported changesets after 2.5 hours - it is around the 50k-60k mark that I begin to see real performance problems. Here are some tools for profiling the situation: https://github.com/kalmalyzer/profile-gitsync in case you are interested. I have let the sync step run for approx 12 hours now. I am syncing to a local server. I interrupted it after it had imported 72k changesets out of 140k. Here is where the time has been spent: Stage Duration (seconds) Compressing objects 6.402 Downloading 219.25 Processing objects 2823.818 Importing 42205.704 This graph shows the average time taken (in seconds) for importing each changeset. The light-blue line represents individual values, the dark-blue line is a trend line, formed with moving average, over the 10 nearest samples: What this means is, importing the remaining 68k changesets will take a long time. - Even if the trend would suddenly stop, and land at 2 seconds/changeset, it would take another 37 hours to import the remaining changesets. - More likely, the trend continues to rise with 1 second per 10k changesets, and it will take 4 days and 20 hours to import the remaining changesets. I will disable Windows Update for a month and continue running this on my machine, just to see what the trend looks like.
  2. I'm trying out the sync against a local server. After 24 hours it has processed 70k of 133k changesets. Quicker than against Plastic Cloud _but_ it will still take multiple days to complete. I am using a reasonably high-end workstation. CPU: AMD Ryzen 7 1700X 8-core @ 3.4GHz Disk: two SSDs (boot partition: Samsung 960 EVO, repo partition: Samsung 860 QVO) RAM: 32GB OS: Windows 10 Home. Most time seems to be spent in LibGit2Sharp: I have disabled logging for cm.exe, but that did not seem to make any major performance difference. The only attribute which stands out for the cm process is memory usage: Breaking into it with a debugger, the bulk of time is spent inside of ForeignScm.dll!Codice.Foreign.GitPuller.GetForeignTree. That av9.b() method spends a lot of time constructing a tree -- just by breakpointing I would notice that sometimes the while-loops in there could take 1+ second to complete: internal TreeForeignNode b(Tree A_0) { Stack<TreeForeignNode> stack = new Stack<TreeForeignNode>(); Stack<Tree> stack2 = new Stack<Tree>(); TreeForeignNode treeForeignNode = new TreeForeignNode(A_0.Sha, string.Empty, TreeForeignNode.Mode.Directory, false); stack.Push(treeForeignNode); stack2.Push(A_0); TreeEntryInfo treeEntryInfo = new TreeEntryInfo(); while (stack2.Count > 0) { TreeForeignNode treeForeignNode2 = stack.Pop(); Tree a_ = stack2.Pop(); TreeIterator treeIterator = this.a(a_); if (treeIterator != null) { while (this.a(treeIterator, treeEntryInfo)) { if (av9.h(treeEntryInfo)) { TreeForeignNode treeForeignNode3; if (this.a(treeEntryInfo, out treeForeignNode3)) { treeForeignNode2.AddChild(treeForeignNode3); } else { treeForeignNode3 = this.g(treeEntryInfo); if (treeForeignNode3 != null) { treeForeignNode2.AddChild(treeForeignNode3); if (treeForeignNode3.IsDirectory) { Tree tree = this.b(treeEntryInfo); if (!(tree == null) && tree.Count != 0) { stack.Push(treeForeignNode3); stack2.Push(tree); } } } } } } } } this.a.a(treeForeignNode); return treeForeignNode; } I really don't know why this completes in a few hours for Manu but takes >24h for me. I could imagine it is one of these three things happening: 1) The CLR is somehow running a lot slower on my workstation than it does on manu's workstation, in general...? 2) There are a lot more changesets in the UnrealEngine repo than there used to be (... but, the repo has existed for 4 years, it doesn't make sense)? 3) Interop is a lot slower on my machine than on yours, because of ... configuration reasons on my machine? I have stopped the replication for the time being. If you have ideas on what to test, let me know. I would like to find a solution for this in the next month(s) but it is not urgent for us.
  3. Hi, I was looking to change the database path for my local server, but had quite a bit of trouble finding the appropriate documentation. It took me three or four rounds of searching until I had connected the dots. Here are things that would haved helped me get an answer more quickly: In the https://www.plasticscm.com/documentation/administration/plastic-scm-version-control-administrator-guide document... * In "Chapter 16: Configuration files", the links in the 'db.conf' section are broken (they lead to anchors in the guide that have been renamed since - #Chapter8:Databasesetup vs #Chapter8:Repositorystorageconfiguration). It would have helped me if 1) this section had correct links, and 2) it mentioned that the db.conf was for configuring SQL-type databases. * In "Chapter 16: Configuration files", there is no mention of the 'jet.conf' file (there is a section for it in Chapter 8 however). It would have helped me if 1) there was a separate section about jet.conf in Chapter 16, and 2) it mentioned that jet.conf was for configuring Jet-type databases. * It would have helped me if there was a default 'jet.conf' present in "c:\Program Files\PlasticSCM5\server", as that would have allowed me to discover the setting(s) by looking through all the server's config files. Mikael
  4. It seems I need to have a trailing slash when listing the root of a folder xlink: cm ls /Folder1/Folder2/Folder3 --tree=<treeref> cm ls /Folder1/Folder2/XlinkFolder/ --tree=<treeref> <===== notice trailing slash required cm ls /Folder1/Folder2/XlinkFolder/Folder4 --tree=<treeref> That trailing slash sometimes being required threw me off when I was testing out cm ls earlier. It seems that the cm ls output within an xlink will output the branch/changeset info from within that repository. It makes sense, but I think it is different to what I need (will need to translate changeset numbers within xlink repos into terms of the base repo). Thanks for the help so far. Will report back if I make some progress.
  5. This helps, thanks. cm ls <folder in workspace> shows contents within the folder. I can tweak that to list contents within the parent folder, and find the item I'm after. The {checkout} information is particularly interesting. If I run it against a local workspace, I get checkout information and it seems to descend into xlinks as well. cm ls <folder/tree ref on server> will unfortunately not follow xlinks as far as I can tell; that makes it less useful to me. At this point I have two options: 1. Create a local workspace, update workspace, perform one cm ls per query. When I'm testing this on my workstation with the repo in Plastic Cloud, a cm ls takes about half a second. 2. Build a custom indexing service, which listens to checkins, uses cm diff to find out which files were affected in each changeset, expands xlinks manually (how? I'm not sure, haven't found how I can get details of an xlink via the command line - do you have any tips?), incrementally updates a DAG whose sole purpose is to respond to these queries, and handles bulk queries against the data structure. I suspect that, given the size of the repos/subrepos, and the query performance I'm after, I will need to build that indexing service.
  6. (For the future, my main concern is not performance. It is robustness. I am concerned when I see that the Git->Plastic sync can leave the Plastic repository in an inconsistent state when interrupted.)
  7. I have made another attempt. 49k changesets in two days, then a Redis-related timeout occurred: When I tried restarting it, 1) GitSync claimed that there were 90k changesets remaining to pull, and 1 changeset remaining to push 2) The import process failed early on, with the message: Based on the above it seems to me that GitSync is not performant/robust enough to be used for Git repos the size of the UE4 repo. I'm not going to try again. Instead, we will import a few individual snapshots of the UE4 Git repo, without the history.
  8. Hi, I am replicating the UE4 Git repository (https://github.com/EpicGames/UnrealEngine.git) to a repo in Plastic Cloud. It is a large repository; 130k changesets, many of which are large. Is there something I can do to make this particular sync less painful and fragile? My experience so far for the UE4 repository is: - The initial download step was quick. - The "Processing Objects" step took ~30-60 mins to complete. During this step cm.exe uses 100% of 1 logical core. - Writing 130k changesets takes a long time - I estimate it to 7-10 days. The workstation which was doing the job restarted itself after 3 days (50k changesets done) thanks to Windows Update, and that left the repo in a bad state (so a second sync failed after 5k changesets when it encountered some form of bad data). - I have disabled Windows Update restarts, created another empty repository, and restarted the sync process. Fingers crossed. Mikael
  9. Over a year later-- We have realized that another workflow that works, is to use GitSync between the GitHub repo and the corresponding repo in Plastic Cloud. That way, anyone on the team can perform the GitSync if necessary. We have stopped using local repos; everyone works directly against Plastic Cloud. We notice no performance problems and collaboration is easier than when using a push-pull workflow. We intend to explore using Xlinks again.
  10. Thanks for the response! Testing testing... Both these commands can be used to produce a list with all the changesets that have affected the given folder: cm find "revision where item='<path_to_folder>' on repositories '<repository name>@<server name>'" --format="{changeset}" --nototal cm history serverpath:<path_to_folder>#br:/<branch>@<repository name>@<server name> --format="{changesetid}" After testing these commands a bit, I realize that I am looking for something a bit different: All these will look for changes across all branches. They do not provide the same result as the Changeset numbers in the workspace explorer. Let me rephrase the question: "From the aspect of changeset X, what is the latest changeset that has affected folder Y"? Here is how I make that check manually, in Plastic SCM: - Check out changeset X - Locate folder Y in the Workspace Explorer - Observe the Changeset number for the folder I would love to find a way to answer this, without needing to have a local workspace. However, I haven't found any way to use cm find / cm history to produce a similar result. The output has either been confined to changesets on a single branch (which is too narrow) or changesets across all branches, regardless of whether or not they have merged into the branch on which changeset X resides (which is too wide). (Note: a similar command in Git would be: git log --max-count 1 --format="format:"%H"" [revision range] <path_to_folder> -- it prints the most recent SHA1 associated with the folder. I'm not 100% sure it gives the kind of result I'm after but it looks promising.)
  11. Hi, Let's say that I have the UE4 engine repo - 5.5GB data, 120k files, 130k changesets - mirrored into a Plastic repository. I would like to answer the question "what is the changeset ID of the latest change that affected the folder 'Engine/Source/Programs/UnrealBuildTool'" or any files within it. Is there a commandline query (using "cm find") that answers this efficiently? Can this be extended to consider several folders at the same time? I am hoping to make 50-100 of these queries whenever there is a checkin to the repository. The results will be used as a fast and coarse dependency check: which of the 50-100 modules (executables/dlls) that reside within the repository will need to go through the full rebuild process? (If it is not feasible to get the condensed results via 'cm find' then my backup plan is to list the changeset ID + path for every single file in the repository, and condense the results myself in C# code.)
  12. v2.22 of the PlasticSCM Jenkins plugin includes the fix. Thanks! See JIRA issue JENKINS-50284. Confirmed that this works for us: we are again using Plastic for hosting our Jenkins Shared Library.
  13. Hi, we have been using Plastic Cloud + Jenkins + Unity up until a few months ago. Support staff may have better answers, but hopefully this will get you going: This is happening because Jenkins is by default not running as your local user, but under the Local System account. C:\WINDOWS\system32\config\systemprofile\appdata is the %APPDATA% folder for the Local System account. You can change which account is used when launching Jenkins if you want to. Change these settings by starting the "Services" function in Windows, locating the Jenkins entry in the long list, and editing its properties. Ideally you should create a separate account (probably named "Jenkins") and run the Jenkins service under that. The config file location will then be C:\Users\Jenkins\AppData\Local\plastic4\client.conf. You need to ensure that two files exist in the config dir: "cryptedservers.conf" and one "*.key" file. You can copy the cryptedservers.conf / *.key file pair from your local user's Plastic folder. It is also possible to create these files programmatically, but that's a fair bit more work. 1+ year ago there were no hooks available in Plastic Cloud. We configured Jenkins to poll with 2-minute intervals - that was good enough for our purposes. The syntax to do so is to give a schedule like "H/2 * * * *" in Jenkins. ---- You may also find these references useful: Automation scripts for setting up and deleting Jenkins build slaves, for a Plastic/Jenkins/Unity build system: https://github.com/falldamagestudio/JenkinsAutomation Performance profiling results for a Jenkins build system: https://blog.falldamagestudio.com/posts/performance-optimizing-a-small-build-system-for-unity/
  14. ... We just gave up on Jira's Next-Gen projects, and switched to Classic projects instead. This was due to various other features that were lacking in Next-Gen projects.
  15. Hi, I have been testing how well Plastic SCM's Jira integration works with Jira Cloud and Jira's Next-Gen projects. Here is what I have so far: * Configuring the Jira integration using the Plastic UI behaved weirdly: If I went into Preferences / Issuetrackers and configured a Jira integration and pressed OK, the configuration would be active, there would be a jira.conf file written to my harddrive, but if I opened up Preferences / Issuetrackers again I would not see the currently-active configuration in the GUI. Solution: Configure once (to get a jira.conf file), then proceed from there by modifying the jira.conf file manually. * For Jira Cloud, the 'username / password' should be an email address / an API token. * It is reasonably straightforward to set up Plastic SCM to be able to create branches from issues. This works fine with Next-Gen projects. * When creating a custom field for Plastic SCM information, it is difficult to find the ID of the custom field in a Next-Gen project. I needed to talk directly to Jira's REST API to find it. * Plastic SCM cannot write information about check-ins to Jira issues in Next-Gen projects. This is because Next-Gen projects do not yet support Text (multi-line) fields; the only text field that is available currently is Text (single-line), which is limited to max 255 chars in length; this is not sufficient for a single entry from Plastic SCM. This is the kind of error that you will see in the log if you try this: ERROR jiraextensionrest - There was a problem putting to '/rest/api/2/issue/JIR-8': The remote server returned an error: (400) Bad Request. ERROR jiraextensionrest - Response from the server: {"errorMessages":[],"errors":{"customfield_10105":"The entered text is too long. It exceeds the allowed limit of 255 characters."}} * Plastic SCM can transition JIRA ticket statuses based on commit message keywords. This is easy to set up and works with Next-Gen projects. * There are some recurring error messages in the logs, but I don't know what impact they have: ERROR jiraextensionrest - There was a problem getting '/rest/api/2/mypreferences': The remote server returned an error: (404) Not Found. ERROR jiraextensionrest - Response from the server: {"errorMessages":["key not found: 'plastic.diffchangeset.url'"],"errors":{}} ERROR jiraextensionrest - There was a problem getting '/rest/api/2/mypreferences': The remote server returned an error: (404) Not Found. ERROR jiraextensionrest - Response from the server: {"errorMessages":["key not found: 'plastic.formatdata'"],"errors":{}} Another observation: In the "Create new child branch from task" dialog, the "Mark as open in issue tracker" checkbox is cleared by default, its setting is not remembered between multiple branch creations, and the Plastic admin cannot control this centrally. I'm not sure that this is the default that you want. I also suspect that the default of this setting would be useful to have in <issuetracker>.conf. From here on, we will either transition to Jira Classsic projects (so that we can enable Plastic SCM to write check-in information to the issues), or we will not use the Plastic-Jira integration.
  • Create New...