Jump to content

Trimming to reduce size of the database


Meceka

Recommended Posts

Hi Manu, thank you very much for the steps to follow.. I'll try this way to clean my repository.
Anyway yes, maybe the next time I'll try to not push baked files even if often working in a team it is convenient to have these files already built

Link to comment
Share on other sites

  • 2 months later...
  • 3 weeks later...
  • 5 months later...

We are interested in this too. That features like this are missing and even after 2 years there is no visible progress on this is staggering. This feature should be a must have from day 1. I really like how Perforce is doing this.

Currently we only have our projects with the finished assets in Plastic. We want to move all our work files into version control too. To do this we need an auto purge feature like the one from Perforce. Doing this manually is too much work on bigger teams.

Moving to a on premise server might help, but even there is storage limited at some point.

  • Like 3
Link to comment
Share on other sites

  • 1 month later...
On 1/5/2021 at 6:18 PM, manu said:

Hi Marco, the only option right now is the one Pablo mentioned up there:

I'm sorry I can't offer something better.

"Trimming the data" repository consists in running the "cm archive" repository to preserve the history but remove the data. So the steps would be:

1. You identify the repository is very big.

2. Pull the cloud repository to a local repository.

3. Create a new workspace to work with the new pulled local repo, you don't need to update the workspace, just create a new one empty. Open a command line tool to the new workspace path and run the following command to remove all the files greater than 300MB that are not the HEAD revision:

cm find "revs where size > 30000000 and parent!=-1" |  cm archive -c="volume00" --file="volume00" -

You can obviously add a grep command in between the find and archive commands to grep certain file extensions or certain directories only.

4. Push the local repository to a new cloud repository

5. Delete the old-big-repository.

6. Start using the new cloud repo until it gets big again.

Finally. have you considered removing the baked files from the repository and bake&use them during the CI step?

@Marco this is a way to do it, we are aware there should be a better way to do it.

This doesn't seem to work at all as the output from "cm find" is not recognized by "cm archive", which aborts with error "Incorrect object specification <...>".

If I use '--format="{item}#cs:{changeset}"' with find then things work better, at least until it comes across a file that was later moved. Then archive says "<...> does not exist." and grinds to a halt.

Could you please tell me the proper working syntax for how to achieve this? I imagine using {id} in some fashion might work?

This is on windows btw, if that matters.

I've noticed another weird thing with your windows client: cm find outputs paths with backslash as separator if you run it from inside a workspace, but outside a workspace (using the 'on repository' parameter) it instead outputs paths with slash as separator. This seems like terribly inconsistent behaviour and caused a lot of headaches yesterday until I realised what was going on.

Link to comment
Share on other sites

On 12/10/2021 at 12:22 PM, mklasson said:

This doesn't seem to work at all as the output from "cm find" is not recognized by "cm archive", which aborts with error "Incorrect object specification <...>".

If I use '--format="{item}#cs:{changeset}"' with find then things work better, at least until it comes across a file that was later moved. Then archive says "<...> does not exist." and grinds to a halt.

Could you please tell me the proper working syntax for how to achieve this? I imagine using {id} in some fashion might work?

This is on windows btw, if that matters.

I've noticed another weird thing with your windows client: cm find outputs paths with backslash as separator if you run it from inside a workspace, but outside a workspace (using the 'on repository' parameter) it instead outputs paths with slash as separator. This seems like terribly inconsistent behaviour and caused a lot of headaches yesterday until I realised what was going on.

For anyone else struggling, I finally got this working. First step for me was to use --format="rev:revid:{id} {item}" with cm find and then piping through another filter to match on but strip the item part.

For whatever reason though cm archive doesn't seem to understand what repo the id refers to, despite running from inside a workspace. It just said "rev:revid:17 does not exist" and quit. Adding the repo spec as well fixed that, e.g. adding "@rep:Dicey@local" after the id.

Let me add my voice to the chorus of people asking for an easier way of doing this.

And if I'm misunderstanding something or being needlessly complicated I'd appreciate hearing about it.

Link to comment
Share on other sites

  • 5 weeks later...

That's dope @calbzam!

However the interface is not that friendly compared to other solutions (at least for the documented part). What I would expect of that system (based on the requirements of my DevOps team):

* To be able to provide a maximum number of revisions per file type and/or per file path.

* The Plastic SCM server to take care automatically of archiving data based on the provided setup.

Are there any plans to provide that (or similar) interface?

 

  • Thanks 1
Link to comment
Share on other sites

4 hours ago, David Cañadas said:

That's dope @calbzam!

However the interface is not that friendly compared to other solutions (at least for the documented part). What I would expect of that system (based on the requirements of my DevOps team):

* To be able to provide a maximum number of revisions per file type and/or per file path.

* The Plastic SCM server to take care automatically of archiving data based on the provided setup.

Are there any plans to provide that (or similar) interface?

 

I agree on what David said.


On our side, we need something more like "Keep the last X revisions" and erase the old one. The feature is still interesting if we need to keep the old one. But, our use case is on large raw data that we don't really need a long history.

Link to comment
Share on other sites

16 hours ago, Patrice Beauvais said:

I agree on what David said.


On our side, we need something more like "Keep the last X revisions" and erase the old one. The feature is still interesting if we need to keep the old one. But, our use case is on large raw data that we don't really need a long history.

Just to add another voice, archiving does not solve the issue this topic is about. We'd like files/paths, matching a regular expression (i.e. *.pdb, *.dll, etc), older than N revisions to be purged from the repository, by the server, automatically. Not backed up, not moved to an external archive storage or anything like that - we don't need them ever again (as they are usually automatically generated and we only need the newest version of them so that people can work within the repository without building anything themselves).

If these wouldn't affect the repository size (only the current data), we wouldn't mind that they are there (well, I speak for myself, I just assume that it's true for others in the same situation), but since these large binaries multiply server costs, while having no value whatsoever, it is a problem.

Link to comment
Share on other sites

Hi,

Thanks for your feedback. In this thread there were multiple comments but most of them had something in common (having a way to reduce the size of the repository so you can control storage growth of the repos). For now, we addapted the existing "cm archive" command to be supported in the cloud.

I will share your feedback with the product team so your comments are considered for future improvements.

Regards,

Carlos.

  • Thanks 1
Link to comment
Share on other sites

  • 4 weeks later...
On 1/14/2022 at 7:12 AM, calbzam said:

Hi,

Thanks for your feedback. In this thread there were multiple comments but most of them had something in common (having a way to reduce the size of the repository so you can control storage growth of the repos). For now, we addapted the existing "cm archive" command to be supported in the cloud.

I will share your feedback with the product team so your comments are considered for future improvements.

Regards,

Carlos.

Today I tested archiving a 1gb binary file on the cloud, I've yet to see the dashboard show the 1gb drop on my usage. How long should it take to reflect? Archiving files should reduce cloud usage correct?

Link to comment
Share on other sites

Is there any update on the possibility of such feature KristOfMorva suggested? Ease of repo management is critical. On the fence to convert to Plastic completely and this is the last thing preventing!

If something easier is coming in the future, we can get by with archive for now. Could you share a archive script that would archive all .fbx,png (any extension list really) except the last N revisions?

thanks!

Link to comment
Share on other sites

Hi @cloudwalker

For now I'm afraid this is the only feature to reduce the size of the cloud repos:

https://www.plasticscm.com/download/releasenotes/10.0.16.6241

All platforms - Cloud: Archiving revisions is now available in Cloud!

You can reduce the size (and the costs :)) of your cloud repositories by archiving revisions to an external storage.

Regards,

Carlos.

Link to comment
Share on other sites

  • 1 month later...
On 1/5/2021 at 5:18 PM, manu said:

Hi Marco, the only option right now is the one Pablo mentioned up there:

I'm sorry I can't offer something better.

"Trimming the data" repository consists in running the "cm archive" repository to preserve the history but remove the data. So the steps would be:

1. You identify the repository is very big.

2. Pull the cloud repository to a local repository.

3. Create a new workspace to work with the new pulled local repo, you don't need to update the workspace, just create a new one empty. Open a command line tool to the new workspace path and run the following command to remove all the files greater than 300MB that are not the HEAD revision:

cm find "revs where size > 30000000 and parent!=-1" |  cm archive -c="volume00" --file="volume00" -

You can obviously add a grep command in between the find and archive commands to grep certain file extensions or certain directories only.

4. Push the local repository to a new cloud repository

5. Delete the old-big-repository.

6. Start using the new cloud repo until it gets big again.

Finally. have you considered removing the baked files from the repository and bake&use them during the CI step?

@Marco this is a way to do it, we are aware there should be a better way to do it.

As others have noted, I had difficulties using this command, but one issue that I did not see mentioned elsewhere is that parent!=-1 does not seem to exclude the HEAD revision, but rather the root/initial revision. This makes intuitive sense to me, as only this revision would not have a parent revision.

I've instead used the following command to archive all revisions above 1 MB that are not the HEAD revision on a cloud repo:

cm find "revs where size > 1000000 and returnparent = 'true' on repository 'repo@org@cloud'" --format=rev:revid:{id}@repo@org@cloud --nototal | paste -d " " - | cm archive - --file=/external/archive

I'm using returnparent = 'true' instead, which ensures that any hits on HEAD revisions use the parent revisions instead. It should be noted that this does not guarantee those parent revisions are larger than 1 MB, and it could theoretically miss some revisions where the size of the child is below the threshold, but this was sufficient for my needs for now.

I would rather exclude the latest N revision in a totally reliable way, as others have requested in this thread, but my dusty bash skills were not up to that. If anyone has CLI commands to do this, I'd love to see them!

I'll also add my voice to those that are asking for an automated way to maintain only the N most recent revisions per type/folder. This one of the main things I'm missing after the switch from Perforce.

Link to comment
Share on other sites

  • 2 weeks later...
On 2/22/2022 at 6:19 AM, calbzam said:

Hi @cloudwalker

For now I'm afraid this is the only feature to reduce the size of the cloud repos:

https://www.plasticscm.com/download/releasenotes/10.0.16.6241

All platforms - Cloud: Archiving revisions is now available in Cloud!

You can reduce the size (and the costs :)) of your cloud repositories by archiving revisions to an external storage.

Regards,

Carlos.

To be honest it's a little unbelievable to me that this is something thats not a priority for this source control to have ANY real method of reducing repository size from Plastic itself. The fact its been in the "roadmap" for so long tells me its not a priority because you fear you'll get less money if people could more easily manage their repo sizes. Why would it take so long to implement something simple like: right click a branch and give the option to squash changes so that all changesets that aren't the head or the parent of another branch are deleted. Not like you're at risk of making people lose their data because they can choose what branches they want to squash. I can tell you that I can't reliably use your service under the circumstances as storage is getting out of hand. 

  • Like 3
Link to comment
Share on other sites

  • 5 weeks later...

Hello,

I would like also to put my voice here.
I just migrate from perforce : i'm in love with plastic, but, in two day, my repo grow from 170GB, to 200GB.

I was very surprised there is no way to limit revisions. And the archiving command is a good point, but doesn't really help in fact.
Unfortunally, it will be hard to stay on plastic without this feature, as economically, it's not viable at all for large project. 

Can we have a fast update on This situation ? Like is this planned ? or not at all ? 
Thanks !

  • Like 1
Link to comment
Share on other sites

  • 4 weeks later...
Hi,
 
After checking with the product team, there should be some more improvements regarding this topic for Q3.
 
Our product team is currently working on the designs of this feature so we can make it more user friendly than the current option (cm archive) and also we are considering to add more functionalities (eg: limit the number of file revisions in the database).
 
Regards,
Carlos.
Link to comment
Share on other sites

  • 2 months later...

Hi,

this thread came up on my Google search for "plastic scm permanently delete huge files in database".

I just want to voice my opinion that this would be amazing to have and I'd consider paying for your cloud services if this gets implemented.

Hopefully there will be a user-friendly GUI in the future to list the biggest assets and right-click delete them from the database if they are no longer required.

Link to comment
Share on other sites

Hi,

On 8/26/2022 at 5:48 AM, Akyoto said:

Hopefully there will be a user-friendly GUI in the future to list the biggest assets and right-click delete them from the database if they are no longer required.

As Carlos said, now in Q3, we are working on adding this feature to the new PlasticX GUI, once we release it I will update this thread with information.


Regards,
 
Rafael
Unity Plastic SCM Support
Virtualize your Workspace. Make it dynamic.

  • Like 2
Link to comment
Share on other sites

  • 4 weeks later...

Just adding my voice here.

We are exploring which SCM to use for our UE5 project and this specific problem has been identified as a blocker for us. Without it we'll be forced to use Git or Perforce.

I'll be watching for this feature to be implemented into Plastic, and I hope comes out before we fully commit to another software.

Link to comment
Share on other sites

  • 1 month later...

I am sorry to tell you that reducing the storage size by the GUI feature is postponed based on priority. Unfortunately, there's no estimated time at this point. As an alternative, you can use a feature called "Archiving Revisions". You will need Plastic SCM version 10.0.16.6241 or above. Click here to see the feature and how to use it. 

I'm sorry for any inconvenience this might cause.
 

  • Sad 2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...