cidico Posted November 24, 2011 Report Share Posted November 24, 2011 Hi guys! I just did a fast export operation to avoid the creation of a lot of replication files and I noticed that the fast export tool didn't handle very well some of my branch names. They contain chars like "ã","ç","õ". When I did the fast import command, the branch names came with "??" instead of "cã", "çõ" and etc. Just to clarify, I'm in Brasil and we use a lot of those chars here, I believe that in Spain you do have "strange" chars too.. Link to comment Share on other sites More sharing options...
psantosl Posted November 24, 2011 Report Share Posted November 24, 2011 We do! I'll share it with the team. Thanks. Link to comment Share on other sites More sharing options...
cidico Posted November 24, 2011 Author Report Share Posted November 24, 2011 Well, I run some tests here and I found a "big" problem here. I just did a replication package from 1 branch with the wrong name and Plastic "duplicated" the branch. It now shows both branches, one with the chars and other with the "??" chars. Luckily, I did a with only one replication package file, I guess that using the fast export and fast import commands it could replicated all wronge named branches. Is there a "dark way" to delete this branch? The normal way says that I can't because it has revisions... Link to comment Share on other sites More sharing options...
manu Posted November 29, 2011 Report Share Posted November 29, 2011 Hi cidico, well, you can remove one by one the changesets on the branch and finally remove the branch, but if you have a lot of changesets inside the branch this can be annoying... maybe a short script? Link to comment Share on other sites More sharing options...
cidico Posted November 29, 2011 Author Report Share Posted November 29, 2011 Thank god I choose a small branch! Just deleting the changesets resolved the problem! The branch was deleted automatically when deleting all changesets! Thanks Manuel! Link to comment Share on other sites More sharing options...
cidico Posted January 16, 2012 Author Report Share Posted January 16, 2012 Hello guys! I need to give you an update here about special chars again... It seems that the problem is happening with usernames too. At home, my user is: Plácido Bisneto I just imported my code from home and my username appears: Pl?cido Bisneto... Just updating! Link to comment Share on other sites More sharing options...
cidico Posted January 20, 2012 Author Report Share Posted January 20, 2012 Hi! It seems that in version 4.0.239.0 the problem with special chars in branch names still here. My project isn't as big as others, but unfortunately, I can't change the branches names right now... Due to those mistakes that me and my team made, I can't use fast-export / import feature. :( :( :( As it duplicate the branches (when using fast import) with special chars, replacing chars like "ã","ç","õ" by "??" chars, I guess doing it would lead me to a lot of troubles when fast exporting / importing incrementally. Link to comment Share on other sites More sharing options...
manu Posted January 20, 2012 Report Share Posted January 20, 2012 Hello cidico, I'll insert a task in our bug tracker system regarding this issue with the encoding. We will try to fix it ASAP. Sorry for the inconveniences. Manu. Link to comment Share on other sites More sharing options...
cidico Posted April 25, 2012 Author Report Share Posted April 25, 2012 Hello my fellow friends from Spain! I know, I'm "ant on a picnic" about this, but it seems that this particular bug was not corrected in 4.1 Even not being a "urgent" bug, but have you scheduled the correction at least? I do have a workaround to deal with this, but I feel kinda "dirty" or "cheating" doing it. Please, forgive my mistakes when dealing using such filth chars! hehehe Link to comment Share on other sites More sharing options...
Soho Posted April 26, 2012 Report Share Posted April 26, 2012 Hi, I described a very similar problem here: http://www.plasticscm.net/index.php?/topic/930-importing-tfs-project-into-plastic/ I tried to use different encodings for the "author" and "committer" tags in fast-export using a perl script, but special characters always appear as "?" in the changeset "Created by" column in the Plastic GUI. If encoding iso-8859-1 my name is "S?ren", if I use UTF8, my name is "S??ren". I double checked the encoding in an encoding aware text-editor. For the record I used 4.0.239.24 Link to comment Share on other sites More sharing options...
cidico Posted April 26, 2012 Author Report Share Posted April 26, 2012 So it seems that I'm not the only one who this problem affects. I really don't know what's the encoding, I suppose it's UTF-8, since is the Plastic itself who's generating the file. One thing I've noticed a long time ago when I first found this issue: When exporting, Plastic does show all the branches names correctly in cmd. The error seems to be happening only when importing. Is it the same thing with you? Link to comment Share on other sites More sharing options...
Soho Posted April 26, 2012 Report Share Posted April 26, 2012 I am importing from a fast-export made with Git. The Git repository comes from a git-tfs export from Microsoft TFS. The fast-export tags that Plastic uses to register "Created by" is either "author" or "committer", which is the same in all cases for me. It is these tags that I have tried replacing with different encoding using a simple perl script. Link to comment Share on other sites More sharing options...
manu Posted May 2, 2012 Report Share Posted May 2, 2012 Maybe cidico can use your script to import new repositories. Link to comment Share on other sites More sharing options...
Aaron K Posted May 8, 2012 Report Share Posted May 8, 2012 I had a similar problem to Soho's when importing from VSS -> Git -> Plastic. My file paths with UTF8 characters were exported as octal strings (/235 etc) and spaces were not being quoted correctly. In the end I wrote a utility to fix it up. I'm not sure why your user names are coming out weird though. Have you looked at the binary dump of the data? Is it actually UTF8 or some other encoding? Link to comment Share on other sites More sharing options...
Soho Posted May 9, 2012 Report Share Posted May 9, 2012 Yes, it is UTF8. I have checked. If I look at the git fast-export with a hex editor, the nordic o-slash (ø) is encoded "c3 b8", which is the correct UTF8 code. This appears as ?? in Plastic. I have also tried iso-8859-1 with the same result (only one ? though). Link to comment Share on other sites More sharing options...
wvd_vegt Posted May 9, 2012 Report Share Posted May 9, 2012 Hi I had some isues with UTF-8 encoding in some software I developer some weeks ago. UTF-8 does not require a 3 byte long BOM (Byte Order Marker) at the start of the text file but ommitting it makes it impossible for applications to guess the encoding. To see the bytes have a look at the first bytes of the file and see if they are 0xEF,0xBB,0xBF. (see http://en.wikipedia.org/wiki/Byte_order_mark). If ommitted UTF-8 basically is a one byte encoding with an occasional two byte code (hence the two ? marks where you expect single character). Link to comment Share on other sites More sharing options...
Soho Posted May 11, 2012 Report Share Posted May 11, 2012 It is correct that GIT does not prepend a BOM in the fast-export. The question is, will it make a difference to the Plastic importer? And if the BOM is not present, does Plastic use a default encoding? Link to comment Share on other sites More sharing options...
wvd_vegt Posted May 15, 2012 Report Share Posted May 15, 2012 Hi Other tools I used had exactly this problem.I don't known about PlasticSCM but without the BOM it's virtually impossible for software to detect the encoding (and .NET code is no exception). wvd_vegt Link to comment Share on other sites More sharing options...
Soho Posted May 16, 2012 Report Share Posted May 16, 2012 The importer could have a default encoding, since git does not write the BOM. Actually I am not sure that git uses a specific encoding. It could also just write the author names in whatever bytes used to represent the author string internally in git. But since git does not write the BOM, I would expect the Plastic importer to use some kind of default encoding and it would be nice to know what that encoding was. Link to comment Share on other sites More sharing options...
wvd_vegt Posted May 16, 2012 Report Share Posted May 16, 2012 If you read http://en.wikipedia.org/wiki/Unicode it says that it's most of the time a bad idea to leave the BOM out unless you are absolutly sure you know the encoding. My strategy would be to try to prepend it and have a try with it to see if it makes any diffence. wvd_vegt Link to comment Share on other sites More sharing options...
Soho Posted May 16, 2012 Report Share Posted May 16, 2012 My point is still that while it may be a good idea to include a BOM, git does not do so even though the git documentation recommends the use of UTF8 and possibly encodes author names with this encoding (I haven't verified that). Perhaps I could patch my (8 GB+) fast-export with a BOM and Plastic would read the author names correctly, but it is a hassle and other users will probably end up with the same problems. If git uses UTF8 internally I would suggest that Plastic defaults to that encoding, BOM or not, if git just stores the author names in whatever encoding used by the committer, then it would make more sense to prepend the fast-import with a BOM, but it would still be nice to know which encoding Plastic uses by default if a BOM is absent. It should be possible to import a git fast-export into Plastic without patching the fast-export file. This is not the case now. (See my other post with a couple of other issues with the importer). It is a hard selling point to TFS'ers that if they want to convert to Plastic they should reserve a lot of time and patience before they get a clean import. Link to comment Share on other sites More sharing options...
cidico Posted May 16, 2012 Author Report Share Posted May 16, 2012 Mother of god. I didn't knew that this problem would go so deep. Why the hell git did not includes the BOM? Does anyone here really uses git or used to? But SoHo has a point when he says: It is a hard selling point to TFS'ers that if they want to convert to Plastic they should reserve a lot of time and patience before they get a clean import. Not everybody would easily accept this. Link to comment Share on other sites More sharing options...
vsanchezm Posted May 29, 2012 Report Share Posted May 29, 2012 Hi all!! I was taking a look to this and indeed, we do well for paths but we don't have covered the case of special characters in branch names, "author" and "committer" tags. I'm going to take care of it. Sorry for the inconvenience. Link to comment Share on other sites More sharing options...
Soho Posted May 31, 2012 Report Share Posted May 31, 2012 I was taking a look to this and indeed, we do well for paths but we don't have covered the case of special characters in branch names, "author" and "committer" tags. I'm going to take care of it. Assuming you are a Plastic developer, please take a look at the other reported problems (mixed casings in paths, path quotes, etc.) Link to comment Share on other sites More sharing options...
vsanchezm Posted June 1, 2012 Report Share Posted June 1, 2012 Hi Soho! Yes, I'm a member of Plastic development team Say that I have already fixed Cidico's issues with branch and user names, but I would like to be sure that I have also fixed the rest of the reported problems. Could you send an export file you have which plastic does not import right? Or at least, some cases you know we fail during the import. I would like to make this fixed for everyone before closing the task! Thanks in advanced. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.