Jump to content

Trimming to reduce size of the database


Meceka

Recommended Posts

Hello. 

We are using Plastic SCM Cloud with Unity for 1,5 years now and database size is growing up by time, although our Repository size doesn't increase at all.

This happens because the database (and cloud storage) keeps multiple versions of all assets. For example we have 50+ versions of our scene assets and they are really big. And we almost never need the older versions of scenes.

We no longer need the histories of most assets from 1 year ago. We currently have 600 changesets but the changesets 1 to 400 aren't required anymore.

As I know there was a trim command to delete old changesets from the database, but I couldn't find any info about this in the documentation.

In example, to delete all changesets from 1 to 400 and database would look like it started in changeset 401, keeping all changesets to 600.

And I guess we can backup and locally keep the original database from our disk. And if we ever need a file from those early changesets, we can access it with plastic using those backups. (From this directory: C:\Program Files\PlasticSCM5\server\jet)

Am I correct with these? And if there is such an operation available, whats it called? 

Are there some best practices on what to do with such scene files that are changed often, but old versions aren't important? What should we do to avoid them growing the database size so fast?

Thank you,

Link to comment
Share on other sites

Hello,

I'm afraid we still don't have a feature to support this kind of workflow to easily remove the old changesets (repo history). Not sure if you are refering to the "cm archive" commnad, but I think it won't be useful in this scenario as it won't release the space used by the databases. 

I'm sharing your feedback with the team because we have in mind to support and provide a solution for these kind of workflow. I will let you know if we shedule something in the near future.

Sorry for the inconveniences,

Carlos.

Link to comment
Share on other sites

Hello,

We have the same problem, we've been using Plastic Cloud for more than 2 years, and we love it, but the data size on the cloud is starting to pile up and as an indie studio the cost of storage is something we'd like to reduce.

We work with Unreal Engine and we have a lot of content, textures or binaries for instance, that we want to keep in source control but do not really care about very old versions, and this stuff takes a lot of space.

I do understand it's also part of your business model to sell storage, but I wouldn't mind paying more for the license if we had a way to control what's kept in the database....

Thanks.

Link to comment
Share on other sites

  • 2 weeks later...

Hi @Meceka and @Marc Audouy

As Carlos said, right now we don't have a way to trim databases, but this is something we'll be working on soon because it does make a lot of sense.

There's a workaround you can use, but it is not trivial: it involves cloning the full repo locally, trimming the data, pushing the new repo, and then deleting the old one.

Let me share the idea with the team and I'll get back to you.

 

pablo

Link to comment
Share on other sites

Hello,

My team just asked me about this yesterday.  Just thought I'd add another voice to the feature request; we'd also like to be able to purge old versions of large assets--the vast majority of our db size is made up of non-mergeable assets.  I'm glad you're working on a feature for this.  Fwiw, we'd love to have a way to specify something like "keep only latest N versions" on individual files or groups of files".

Thanks,

Jeremy

Link to comment
Share on other sites

Personally I would like to be able to do a cutoff by date: basically nuke the history of everything older than a specific date.

Ideally per sub folders as well, as we want to keep full history of code source file but not binaries and assets. But I guess as csets can affect many folders the simplest is probably "delete all csets before cs:xxxx" would do

Link to comment
Share on other sites

On 11/22/2019 at 10:34 AM, psantosl said:

There's a workaround you can use, but it is not trivial: it involves cloning the full repo locally, trimming the data, pushing the new repo, and then deleting the old one.

Can you please briefly explain how I can do this workaround now? I am not an expert.

I guess I already have a cloned repo as I am a Plastic Cloud user working distributed. But how do I trim the data?

Thank you.

Link to comment
Share on other sites

  • 4 months later...

I've used Perforce to manage my Unreal Engine projects for some time now. In the type definitions file, I can write something like this:

binary+wS5 //....exe

This will tell perforce on the serverside to only keep 5 revisions of a particular filetype (the '+w' makes that filetype always writable, the '+S5' dictates number of revisions stored). Whenever a new revision is pushed the oldest now 6th file is deleted.

 

Those of us that only deal with art I exclude all extraneous files like source files and intermediates and push only executables and assets to their branch. They have several revisions available if something breaks. If I wasn't able to dictate a revision restriction the repository would grow out of proportion (with a single WORKING copy taking up 300GB there would have been, eventually, Terabytes of binary files stored on the server).

 

I'm trying out plastic atm as perforce is quite unweildy at times and I love the suite thus far, but not being able to dictate revisions will likely keep me on perforce for the time being. It's the same reason svn is an absolute no-go for me, as well as Hg.

 

Plastic seems like the perfect solution for small teams and inordinately large repository sizes (as games tend to be), but this feature is important to me. Storage really can be expensive for small teams and lone developers (and there's no way I'm investing in crazy cloud storage when I have my own nas) so having the tools to optimize my storage is essential.

Link to comment
Share on other sites

Hi Carlos,

I would also ask a way to get rid of deleted assets that take space in the database.

For example textures that are deleted in 2018 still take space in the database including their revisions. We should be given a choice to get rid of them and their revisions in the database. For our case, we can get rid of all of them that are so old.

There is for example a demo scene (including textures and models) of a Unity Asset that we added to version control by mistake. We deleted it in some changeset but now there isn't an easy way to get rid of it from the database. And there is no reason for us to keep it.

Thanks,
Mehmet

  • Like 1
Link to comment
Share on other sites

  • 3 months later...
  • 2 weeks later...
  • 1 month later...

One more voice to the crowd. Our DB is currently so large that I have to purge literally everything else from the machine to squeeze in a few more 100GB. The project has grown to over 3TB in size over the period of 2 years, 90% of it is probably useless at this point.

Link to comment
Share on other sites

  • 3 weeks later...

I'd like to add another voice to this feature. It will greatly help us to be able to do this as we currently have a lot of unnecessary  pipelines in place to keep the bloating at a minimum. Frankly it's quite a hassle as it is so the sooner some rudimentary system for this could be in place the better. A way to do it even if it's awkward is still better than no option :).

Link to comment
Share on other sites

  • 1 month later...
  • 4 weeks later...

Hi @manu, thank you for the response. We are a team of 2 people and the project is very small... but everytime we rebuild maps and lights (that generate big files) our storage size increase every month, and we can't sustain the expenses. Do you know if there is any workaround? 

Link to comment
Share on other sites

Hi Marco, the only option right now is the one Pablo mentioned up there:

On 11/22/2019 at 10:34 AM, psantosl said:

There's a workaround you can use, but it is not trivial: it involves cloning the full repo locally, trimming the data, pushing the new repo, and then deleting the old one.

I'm sorry I can't offer something better.

"Trimming the data" repository consists in running the "cm archive" repository to preserve the history but remove the data. So the steps would be:

1. You identify the repository is very big.

2. Pull the cloud repository to a local repository.

3. Create a new workspace to work with the new pulled local repo, you don't need to update the workspace, just create a new one empty. Open a command line tool to the new workspace path and run the following command to remove all the files greater than 300MB that are not the HEAD revision:

cm find "revs where size > 30000000 and parent!=-1" |  cm archive -c="volume00" --file="volume00" -

You can obviously add a grep command in between the find and archive commands to grep certain file extensions or certain directories only.

4. Push the local repository to a new cloud repository

5. Delete the old-big-repository.

6. Start using the new cloud repo until it gets big again.

Finally. have you considered removing the baked files from the repository and bake&use them during the CI step?

@Marco this is a way to do it, we are aware there should be a better way to do it.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...