ruben Posted March 5, 2014 Report Share Posted March 5, 2014 Hello again Andy, I want to share this optimization with you because I'm sure you'll like it. We have set the number of compression threads (in compression pipeline stage) as configurable. I've used the original zlib library to make even more relevant the compression time as it happens in your scenario. RESULTS (Alien Arena) 1 compression thread UploadData - ReadFileContent: 1234 ms UploadData - CompressFileContent: 31000 ms UploadData - SetRevisionData: 16539 ms UploadData - CalcHashCode: 3817 ms Total time uploading data 31078 ms 2 compression threads UploadData - ReadFileContent: 1235 ms UploadData - CompressFileContent: 32485 ms UploadData - SetRevisionData: 18002 ms UploadData - CalcHashCode: 3936 ms Total time uploading data 20913 ms 3 compression threads UploadData - ReadFileContent: 1639 ms UploadData - CompressFileContent: 35278 ms UploadData - SetRevisionData: 15967 ms UploadData - CalcHashCode: 4279 ms Total time uploading data 17656 ms 4 compression threads UploadData - ReadFileContent: 1345 ms UploadData - CompressFileContent: 39053 ms UploadData - SetRevisionData: 14936 ms UploadData - CalcHashCode: 4252 ms Total time uploading data 15875 ms 5 compression threads UploadData - ReadFileContent: 1609 ms UploadData - CompressFileContent: 42719 ms UploadData - SetRevisionData: 16279 ms UploadData - CalcHashCode: 4448 ms Total time uploading data 16813 ms As you can see performance is hugely improved from one thread to 2 or 3. And there is not almost difference using 3, 4 or 5 compression threads in this scenario. I have to point out that the CompressionFileContent time is the sum of all the compression thread times. This explains that although the compression time is higher with more threads, the total time is less because they run in parallel. And... yes! we'll probably also make configurable the rest of the pipeline stages in order to improve heavy upload processes in every possible environment. We like to make real performance enthusiasts happy! Rubén. Link to comment Share on other sites More sharing options...
Andy22 Posted March 5, 2014 Author Report Share Posted March 5, 2014 And... yes! we'll probably also make configurable the rest of the pipeline stages in order to improve heavy upload processes in every possible environment. We like to make real performance enthusiasts happy! Rubén. Nice tests, btw the threads are for blocks or streams? Will this help on single large files too? We will probably go with the "no compression" plastic version or switch to whatever lightweight/fast compression support u add in the future. We will always prefer a algorithm that is more geared towards low cpu usage, since we have a green server and laptop clients. We really don't need the compression to safe diskspace, we need maximum performance. Space is cheap and even the ultra fast compression libs that deliver 500-1000 MB/s produce file sizes that are only 10-30% larger, compared to gzip. thx Andy Link to comment Share on other sites More sharing options...
ruben Posted March 5, 2014 Report Share Posted March 5, 2014 Yes, of course it helps. The compression of the big files is done in chunks. Thus, it doesn't matter if it's a big file / a lot of small or a combination of both. All of them will take advantage of the possible parallel compression. RESULTS - movie.avi - 700MB compressed video 1 compression thread UploadData - ReadFileContent: 502 ms UploadData - CompressFileContent: 26531 ms UploadData - SetRevisionData: 9813 ms UploadData - CalcHashCode: 1502 ms Total time uploading data 26625 ms 3 compression threads UploadData - ReadFileContent: 516 ms UploadData - CompressFileContent: 29249 ms UploadData - SetRevisionData: 11593 ms UploadData - CalcHashCode: 1544 ms Total time uploading data 13203 ms I completely agree with you about using a better compression algorithm. We'll do it for sure. This functionality just adds the possibility of using a multiple compressors to really perform faster checkin operations (although the best compression algorithm is used, this will still improve the performance), Rubén. Link to comment Share on other sites More sharing options...
Andy22 Posted March 5, 2014 Author Report Share Posted March 5, 2014 Yes, of course it helps. The compression of the big files is done in chunks. Thus, it doesn't matter if it's a big file / a lot of small or a combination of both. All of them will take advantage of the possible parallel compression. ... This functionality just adds the possibility of using a multiple compressors to really perform faster checkin operations (although the best compression algorithm is used, this will still improve the performance), Ah nice, btw is decompression done on the server or the client? I wonder, since both client/server folders have zlibs in them? Will those changes be in the next release or when can we expect to use it? thx Andy PS: If i'm happy with our nas4free Plastic setup, i might write a tutorial. As example the general opinion that SQLite is for testing only or slower than the server backends, is often wrong. In fact SQLite3 always outperformed all other backends for our testcases, even when using ridiculous memory cache settings for SQL Server or MySQL. The term not "production" rdy is often used and is actually misleading/wrong, since SQLite is probably the most stable and well tested db backend u can pick. I guess "hard to scale" would be more correct. Link to comment Share on other sites More sharing options...
ruben Posted March 6, 2014 Report Share Posted March 6, 2014 Hello Andy, The decompression is always done in the client side. Client and server have zlib libraries because they are also used for some communication calls. These changes will be probably available the next week. Anyway, if you are interested in testing them right now, please, could you write to support? We'd build a labs release with this functionality and answer you with the link. About sqlite, yes, it really has a great performance, although it scales awful. Rubén. Link to comment Share on other sites More sharing options...
Andy22 Posted March 6, 2014 Author Report Share Posted March 6, 2014 About sqlite, yes, it really has a great performance, although it scales awful. Rubén. While this is true and was never the goal of SQLite, but most tests/info regarding performance/useability/scalebility are for the ancient SQLite2 or pre 3.7 versions. In 3.7 SQLite implements WAL, which resolves many of those critics concerning concurrency problems even for a normal big source repository and a huge dev base 100+. I can only speak from a game dev. point of view and here u only have a handful of actual programmers (10-50) that need access to the source repo, which SQLite3 + WAL fully supports. The time spend in exclusive write locks for code is so short, that u normally wont notice them. U actually would need 10+ dev's to all checkin at the exact same time to notice a short block, which is just a constructed scenario. On the artist side thats where SQLite falls a little short, since WAL only works nicely up to 100 MB filesize. So u can either disable journaling, use the normal mode and try to minimize the time spend in the write locks (checkin speed). Disabling Journaling is not this crazy of an idea, u can setup a ZFS NAS + UPS, which provides a extremely resilient filesystem. The cheap alternative is to use something like Crashplan with a second local server + Cloud and store continuous backups every 10 minutes. Because SQLite is so simple, u can easily just rollback the broken database file. This ofc needs manual admin interaction and blocks potential checkin/checkout operations until the bad repo is repaired. SQLite3 with WAL works nice even for larger groups, since as a source repository u never actually have the requirement that 100+ users all checkin at the exact same moment. One actual bad scenario for SQLite3 is the checkin of very large assets (textures, sound, max-psd files) over a slow LAN/WAN connection, which blocks all other outstanding writes to the same repo. Readers can ofc still work nicely, but the writers block eachother. The most valid argument against SQLite3 is normally the lack of fine grained user access control, but here Plastic already adds this feature. bye Andy PS: Btw thats why u should update your SQLite3 version and test the WAL option. Link to comment Share on other sites More sharing options...
Andy22 Posted March 18, 2014 Author Report Share Posted March 18, 2014 @Rubén, any update on when those changes make it to a actual official release version? thx Andy Link to comment Share on other sites More sharing options...
psantosl Posted April 23, 2014 Report Share Posted April 23, 2014 While this is true and was never the goal of SQLite, but most tests/info regarding performance/useability/scalebility are for the ancient SQLite2 or pre 3.7 versions. In 3.7 SQLite implements WAL, which resolves many of those critics concerning concurrency problems even for a normal big source repository and a huge dev base 100+. I can only speak from a game dev. point of view and here u only have a handful of actual programmers (10-50) that need access to the source repo, which SQLite3 + WAL fully supports. The time spend in exclusive write locks for code is so short, that u normally wont notice them. U actually would need 10+ dev's to all checkin at the exact same time to notice a short block, which is just a constructed scenario. PS: Btw thats why u should update your SQLite3 version and test the WAL option. Hi Andy, I was making some changes and tests today to support the newest SQLite + WAL. It really works great for read operations since WAL doesn't lock several read operations. But unfortunately it won't solve any write problems. As the documentation clearly states: http://www.sqlite.org/draft/wal.html Writers merely append new content to the end of the WAL file. Because writers do nothing that would interfere with the actions of readers, writers and readers can run at the same time. However, since there is only one WAL file, there can only be one writer at a time. And my tests confirm that: try to checkin with several clients at the same time and you'll end up with locks all the time. I made some changes to the System.Data.SQLite.dll since it should return immediately if you're in a transaction and you get a BUSY return code (as the doc states) which at least improves responsiveness, but still the locks are there. I had great expectations for the WAL because I wanted to make SQLite the default backend but it is unfortunately not possible at this time. Thanks, pablo Link to comment Share on other sites More sharing options...
Andy22 Posted April 23, 2014 Author Report Share Posted April 23, 2014 Yeah thats how SQLite3 WAL works, but as i noted in reality u mostly need many readers and a few writers (in a gaming environment). If we also talk about code/text checkins the time spend in those locks are so short that the lock does not matter for each writer. Btw the write lock should be per database/plastic repo and not global? So one of the real world examples i can think of, is if plastic is used by artist to checkin large binary max/psd/animation files. Than again even in this example in reality u need 2 artist that try to checkin at the exact same timeframe, which is rather unrealistic. If u finally add the checkin optimizations we talk about 60-120 MB/s checkin speed of binary files, so even several gigabyte only take a minute. So i don't see this as a big drawback for SQLite3, since the writelock is hard to reproduce/notice even in a midsize/big gaming company. bye Andy Link to comment Share on other sites More sharing options...
Andy22 Posted September 18, 2014 Author Report Share Posted September 18, 2014 Just getting back to this, so what of the discussed changes made it into the current 5.4 release? 1) Is the multi-threaded zlib compression enabled by default? 2) Can we specify zlib "no compression" or different compression algorithm per file type (extension)? 3) Does 5.4 ship with a current SQLite 3.8.6 lib and is WAL used by default? 4) Is 5.4 now compatible with freeBSD 9.2 x64 and was the install script fixed to work on freeBSD? PS: Just found the 5.4 feature list and the multi-threaded checkin seem to have made it in officially. So thx for this! Link to comment Share on other sites More sharing options...
calbzam Posted September 29, 2014 Report Share Posted September 29, 2014 Hi Andy, I will try to answer your questions using this post: - You are right, the SQLite library is not yet updated. - Compression (from the release notes): Added a new configuration file called "compession.conf", that enables a custom configuration of the compressionmethod used to store the new revision of a file in thedatabase. Currenty there are two tipes of compression methodssupported: none zip Each line of the compression.conf file will define acompression type followed by an space ' ', and a ruleto match with the file path. Example: none .jpgzip .txt By default the compression type of any file is "zip". There are 4 types of rules that can be specified, andthe order of application is the following: 1.- File path rule2.- File name rule3.- File extension rule4.- Wildcard rule Examples: 1.- /dir/foo.png2.- foo.png3.- .png4.- /**/foo*.??? If a file path match with a path rule, that will be thechosen compression type, if not it will try to match witha file name rule, and so on. The compression.conf file can be defined in the followinglocations: Root of the workspace (will ve valid only for that workspace) User config folder (usually \Users\\AppData\Local\plastic4) (will apply for all workspaces)If both files exist their rules will be combined.- Threads: The number of compression threads during the upload processes (checkin, fast-import, etc.) can be configured to take advantage of the multi-cpu environments. This number can be between 1 and 10, with any other number it will work as single-threaded. The client.conf key to configure it is UploadCompressorsNumber.[client.conf] <UploadCompressorsNumber>2</UploadCompressorsNumber>[/client.conf] - WAL: Please review previous Pablos´s post. Regards, Carlos Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.