Trimming to reduce size of the database

Marco · January 7, 2021

Hi Manu, thank you very much for the steps to follow.. I'll try this way to clean my repository.
Anyway yes, maybe the next time I'll try to not push baked files even if often working in a team it is convenient to have these files already built

fjmon · March 31, 2021

Hi, we are interested in this feature too. Is any option to be notified when it will be done?

calbzam · April 5, 2021

Hi @fjmon, We will update this forum post.

Regards,

Carlos.

Efes · April 21, 2021

Hello, @calbzam Is this feature scheduled already? If so, could you tell a more exact date when it will be ready to use? Thanks in advance for your reply :)

mmt_tim · October 13, 2021

We are interested in this too. That features like this are missing and even after 2 years there is no visible progress on this is staggering. This feature should be a must have from day 1. I really like how Perforce is doing this.

Currently we only have our projects with the finished assets in Plastic. We want to move all our work files into version control too. To do this we need an auto purge feature like the one from Perforce. Doing this manually is too much work on bigger teams.

Moving to a on premise server might help, but even there is storage limited at some point.

Patrice Beauvais · December 8, 2021

Hi,
we would really need this feature too for our Raw data repository. Our binary assets can become very large and we don't care about old history (we would keep the last X changeset). Personally, I like the comment above on Perforce, I think it would work for us.

Do you have an ETA?

Thanks,
Pat

mklasson · December 10, 2021

On 1/5/2021 at 6:18 PM, manu said:
Hi Marco, the only option right now is the one Pablo mentioned up there:

I'm sorry I can't offer something better.

"Trimming the data" repository consists in running the "cm archive" repository to preserve the history but remove the data. So the steps would be:

1. You identify the repository is very big.

2. Pull the cloud repository to a local repository.

3. Create a new workspace to work with the new pulled local repo, you don't need to update the workspace, just create a new one empty. Open a command line tool to the new workspace path and run the following command to remove all the files greater than 300MB that are not the HEAD revision:
cm find "revs where size > 30000000 and parent!=-1" |  cm archive -c="volume00" --file="volume00" -
You can obviously add a grep command in between the find and archive commands to grep certain file extensions or certain directories only.

4. Push the local repository to a new cloud repository

5. Delete the old-big-repository.

6. Start using the new cloud repo until it gets big again.

Finally. have you considered removing the baked files from the repository and bake&use them during the CI step?

@Marco this is a way to do it, we are aware there should be a better way to do it.

This doesn't seem to work at all as the output from "cm find" is not recognized by "cm archive", which aborts with error "Incorrect object specification <...>".

If I use '--format="{item}#cs:{changeset}"' with find then things work better, at least until it comes across a file that was later moved. Then archive says "<...> does not exist." and grinds to a halt.

Could you please tell me the proper working syntax for how to achieve this? I imagine using {id} in some fashion might work?

This is on windows btw, if that matters.

I've noticed another weird thing with your windows client: cm find outputs paths with backslash as separator if you run it from inside a workspace, but outside a workspace (using the 'on repository' parameter) it instead outputs paths with slash as separator. This seems like terribly inconsistent behaviour and caused a lot of headaches yesterday until I realised what was going on.

mklasson · December 13, 2021

On 12/10/2021 at 12:22 PM, mklasson said:

This doesn't seem to work at all as the output from "cm find" is not recognized by "cm archive", which aborts with error "Incorrect object specification <...>".

If I use '--format="{item}#cs:{changeset}"' with find then things work better, at least until it comes across a file that was later moved. Then archive says "<...> does not exist." and grinds to a halt.

Could you please tell me the proper working syntax for how to achieve this? I imagine using {id} in some fashion might work?

This is on windows btw, if that matters.

I've noticed another weird thing with your windows client: cm find outputs paths with backslash as separator if you run it from inside a workspace, but outside a workspace (using the 'on repository' parameter) it instead outputs paths with slash as separator. This seems like terribly inconsistent behaviour and caused a lot of headaches yesterday until I realised what was going on.

For anyone else struggling, I finally got this working. First step for me was to use --format="rev:revid:{id} {item}" with cm find and then piping through another filter to match on but strip the item part.

For whatever reason though cm archive doesn't seem to understand what repo the id refers to, despite running from inside a workspace. It just said "rev:revid:17 does not exist" and quit. Adding the repo spec as well fixed that, e.g. adding "@rep:Dicey@local" after the id.

Let me add my voice to the chorus of people asking for an easier way of doing this.

And if I'm misunderstanding something or being needlessly complicated I'd appreciate hearing about it.

calbzam · January 13, 2022

Hi,

The following task has been released:

https://www.plasticscm.com/download/releasenotes/10.0.16.6241

All platforms - Cloud: Archiving revisions is now available in Cloud!

You can reduce the size (and the costs :)) of your cloud repositories by archiving revisions to an external storage.

Regards,

Carlos.

David Cañadas · January 13, 2022

That's dope @calbzam!

However the interface is not that friendly compared to other solutions (at least for the documented part). What I would expect of that system (based on the requirements of my DevOps team):

* To be able to provide a maximum number of revisions per file type and/or per file path.

* The Plastic SCM server to take care automatically of archiving data based on the provided setup.

Are there any plans to provide that (or similar) interface?

Patrice Beauvais · January 13, 2022

4 hours ago, David Cañadas said:

That's dope @calbzam!

However the interface is not that friendly compared to other solutions (at least for the documented part). What I would expect of that system (based on the requirements of my DevOps team):

* To be able to provide a maximum number of revisions per file type and/or per file path.

* The Plastic SCM server to take care automatically of archiving data based on the provided setup.

Are there any plans to provide that (or similar) interface?

I agree on what David said.

On our side, we need something more like "Keep the last X revisions" and erase the old one. The feature is still interesting if we need to keep the old one. But, our use case is on large raw data that we don't really need a long history.

KristofMorva · January 14, 2022

16 hours ago, Patrice Beauvais said:

I agree on what David said.

On our side, we need something more like "Keep the last X revisions" and erase the old one. The feature is still interesting if we need to keep the old one. But, our use case is on large raw data that we don't really need a long history.

Just to add another voice, archiving does not solve the issue this topic is about. We'd like files/paths, matching a regular expression (i.e. *.pdb, *.dll, etc), older than N revisions to be purged from the repository, by the server, automatically. Not backed up, not moved to an external archive storage or anything like that - we don't need them ever again (as they are usually automatically generated and we only need the newest version of them so that people can work within the repository without building anything themselves).

If these wouldn't affect the repository size (only the current data), we wouldn't mind that they are there (well, I speak for myself, I just assume that it's true for others in the same situation), but since these large binaries multiply server costs, while having no value whatsoever, it is a problem.

calbzam · January 14, 2022

Hi,

Thanks for your feedback. In this thread there were multiple comments but most of them had something in common (having a way to reduce the size of the repository so you can control storage growth of the repos). For now, we addapted the existing "cm archive" command to be supported in the cloud.

I will share your feedback with the product team so your comments are considered for future improvements.

Regards,

Carlos.

cloudwalker · February 11, 2022

On 1/14/2022 at 7:12 AM, calbzam said:

Hi,

Thanks for your feedback. In this thread there were multiple comments but most of them had something in common (having a way to reduce the size of the repository so you can control storage growth of the repos). For now, we addapted the existing "cm archive" command to be supported in the cloud.

I will share your feedback with the product team so your comments are considered for future improvements.

Regards,

Carlos.

Today I tested archiving a 1gb binary file on the cloud, I've yet to see the dashboard show the 1gb drop on my usage. How long should it take to reflect? Archiving files should reduce cloud usage correct?

calbzam · February 11, 2022

Hi,

Yes, archiving items should reduce the cloud storage. If it's not refreshed in 1-2 days, please open a ticket at support@plasticscm.com and we can further debug it.

Regards,

Carlos.

cloudwalker · February 18, 2022

Is there any update on the possibility of such feature KristOfMorva suggested? Ease of repo management is critical. On the fence to convert to Plastic completely and this is the last thing preventing!

If something easier is coming in the future, we can get by with archive for now. Could you share a archive script that would archive all .fbx,png (any extension list really) except the last N revisions?

thanks!

calbzam · February 22, 2022

Hi @cloudwalker

For now I'm afraid this is the only feature to reduce the size of the cloud repos:

https://www.plasticscm.com/download/releasenotes/10.0.16.6241

All platforms - Cloud: Archiving revisions is now available in Cloud!

You can reduce the size (and the costs :)) of your cloud repositories by archiving revisions to an external storage.

Regards,

Carlos.

Colin - Stacking Chairs · March 24, 2022

On 1/5/2021 at 5:18 PM, manu said:
Hi Marco, the only option right now is the one Pablo mentioned up there:

I'm sorry I can't offer something better.

"Trimming the data" repository consists in running the "cm archive" repository to preserve the history but remove the data. So the steps would be:

1. You identify the repository is very big.

2. Pull the cloud repository to a local repository.

3. Create a new workspace to work with the new pulled local repo, you don't need to update the workspace, just create a new one empty. Open a command line tool to the new workspace path and run the following command to remove all the files greater than 300MB that are not the HEAD revision:
cm find "revs where size > 30000000 and parent!=-1" |  cm archive -c="volume00" --file="volume00" -
You can obviously add a grep command in between the find and archive commands to grep certain file extensions or certain directories only.

4. Push the local repository to a new cloud repository

5. Delete the old-big-repository.

6. Start using the new cloud repo until it gets big again.

Finally. have you considered removing the baked files from the repository and bake&use them during the CI step?

@Marco this is a way to do it, we are aware there should be a better way to do it.

As others have noted, I had difficulties using this command, but one issue that I did not see mentioned elsewhere is that parent!=-1 does not seem to exclude the HEAD revision, but rather the root/initial revision. This makes intuitive sense to me, as only this revision would not have a parent revision.

I've instead used the following command to archive all revisions above 1 MB that are not the HEAD revision on a cloud repo:

cm find "revs where size > 1000000 and returnparent = 'true' on repository 'repo@org@cloud'" --format=rev:revid:{id}@repo@org@cloud --nototal | paste -d " " - | cm archive - --file=/external/archive

I'm using returnparent = 'true' instead, which ensures that any hits on HEAD revisions use the parent revisions instead. It should be noted that this does not guarantee those parent revisions are larger than 1 MB, and it could theoretically miss some revisions where the size of the child is below the threshold, but this was sufficient for my needs for now.

I would rather exclude the latest N revision in a totally reliable way, as others have requested in this thread, but my dusty bash skills were not up to that. If anyone has CLI commands to do this, I'd love to see them!

I'll also add my voice to those that are asking for an automated way to maintain only the N most recent revisions per type/folder. This one of the main things I'm missing after the switch from Perforce.

FulcrumGames · April 2, 2022

On 2/22/2022 at 6:19 AM, calbzam said:
Hi @cloudwalker

For now I'm afraid this is the only feature to reduce the size of the cloud repos:
https://www.plasticscm.com/download/releasenotes/10.0.16.6241

All platforms - Cloud: Archiving revisions is now available in Cloud!

You can reduce the size (and the costs :)) of your cloud repositories by archiving revisions to an external storage.
Regards,

Carlos.

To be honest it's a little unbelievable to me that this is something thats not a priority for this source control to have ANY real method of reducing repository size from Plastic itself. The fact its been in the "roadmap" for so long tells me its not a priority because you fear you'll get less money if people could more easily manage their repo sizes. Why would it take so long to implement something simple like: right click a branch and give the option to squash changes so that all changesets that aren't the head or the parent of another branch are deleted. Not like you're at risk of making people lose their data because they can choose what branches they want to squash. I can tell you that I can't reliably use your service under the circumstances as storage is getting out of hand.

Aherys · May 2, 2022

Hello,

I would like also to put my voice here.
I just migrate from perforce : i'm in love with plastic, but, in two day, my repo grow from 170GB, to 200GB.

I was very surprised there is no way to limit revisions. And the archiving command is a good point, but doesn't really help in fact.
Unfortunally, it will be hard to stay on plastic without this feature, as economically, it's not viable at all for large project.

Can we have a fast update on This situation ? Like is this planned ? or not at all ?
Thanks !

calbzam · May 30, 2022

Hi,

After checking with the product team, there should be some more improvements regarding this topic for Q3.

Our product team is currently working on the designs of this feature so we can make it more user friendly than the current option (cm archive) and also we are considering to add more functionalities (eg: limit the number of file revisions in the database).

Regards,

Carlos.

Akyoto · August 26, 2022

Hi,

this thread came up on my Google search for "plastic scm permanently delete huge files in database".

I just want to voice my opinion that this would be amazing to have and I'd consider paying for your cloud services if this gets implemented.

Hopefully there will be a user-friendly GUI in the future to list the biggest assets and right-click delete them from the database if they are no longer required.

Rafael · September 1, 2022

Hi,

On 8/26/2022 at 5:48 AM, Akyoto said:

Hopefully there will be a user-friendly GUI in the future to list the biggest assets and right-click delete them from the database if they are no longer required.

As Carlos said, now in Q3, we are working on adding this feature to the new PlasticX GUI, once we release it I will update this thread with information.

Regards,

Rafael
Unity Plastic SCM Support
Virtualize your Workspace. Make it dynamic.

JeffCraig · September 25, 2022

Just adding my voice here.

We are exploring which SCM to use for our UE5 project and this specific problem has been identified as a blocker for us. Without it we'll be forced to use Git or Perforce.

I'll be watching for this feature to be implemented into Plastic, and I hope comes out before we fully commit to another software.

blue · October 27, 2022

I am sorry to tell you that reducing the storage size by the GUI feature is postponed based on priority. Unfortunately, there's no estimated time at this point. As an alternative, you can use a feature called "Archiving Revisions". You will need Plastic SCM version 10.0.16.6241 or above. Click here to see the feature and how to use it.

I'm sorry for any inconvenience this might cause.

Trimming to reduce size of the database

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

mmt_tim

FulcrumGames

Rafael

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in