psantosl Posted September 3, 2013 Report Share Posted September 3, 2013 Hi all, Delphi support is the top 5 request in our User Voice: Delphi support in SemanticMerge. So we're eager to add Object Pascal to the list of supported languages. In order to do so we’ve developed a way to plug-in external parsers, so if you can develop a Delphi language parser it will be very simple to get it invoked from Semantic. If you’re interested on joining our “Delphi Parser” effort, please join this thread and we will send you the required tools. Right now all what plugin a parser requires is: * Create an standalone executable. * Able to receive some data as arguments. * And able to export the "tree" of the file in YAML format. Of course you'll need all the details but this is just an intro of what it takes. We've also developed tools to help testing the parsers, like a "directory parser" which will loop through a code tree parsing (invoking your parser) and then rebuilding the source file making sure the original and the regenerated ones match. We're eager to get this started! pablo 2 Link to comment Share on other sites More sharing options...
cidico Posted September 3, 2013 Report Share Posted September 3, 2013 Wow! Top 5?? Too bad I'm not working with Delphi anymore. I really would like to help you! Keep the good work!! 1 Link to comment Share on other sites More sharing options...
rafrancoso Posted September 3, 2013 Report Share Posted September 3, 2013 what's the YAML output definition? Link to comment Share on other sites More sharing options...
cyberflex Posted September 3, 2013 Report Share Posted September 3, 2013 Hi, I'm interested in helping out as a tester/reviewer. Unfortunately my job is pretty full-on and I don't see that I'll be able to spend a lot of time coding on this project, but will definitely get involved and help as much as I can. I took a look at the YAML spec (never seen it before) and it seems someone is going to have to write a parser for Delphi. We use the Delphi SuperObject for JSON implementation and something like that might be able to be adapted. Please let me know how I can get involved. Rick. Link to comment Share on other sites More sharing options...
psantosl Posted September 3, 2013 Author Report Share Posted September 3, 2013 Hi! Here it comes an example, just to get you used to how it looks like, ok? I mean, I'll be publishing more detailed information early tomorrow. Considering this Java file: package com.codicesoftware.parser.java; import org.eclipse.core.runtime.IProgressMonitor; import org.eclipse.jdt.core.dom.*; import java.util.*; public class PlasticVisitor extends ASTVisitor { static final String NODE = ""%NODE%""; /** * Visits the given AST node prior to the type-specific visit. * The default implementation does nothing. Subclasses may reimplement. * @param node the node to visit */ public void preVisit(ASTNode node) { System.out.println(getNodeAsString(node)); // Comments that have null parent are not visited and do not show in the Tree. // If you would like to see them under the Compilation Unit node, uncomment the // code below: if (node instanceof CompilationUnit) visitComments((CompilationUnit)node); } // Normal (not javadoc) leading comment private void visitComments(CompilationUnit node) { List comments = ((CompilationUnit)node).getCommentList(); if (comments != null) { for (int i=0; i < comments.size(); ++i) { Comment comment = (Comment) comments.get(i); if (comment != null && comment.getParent() == null) comment.accept(this); } } } static private String getNodeAsString(ASTNode node) { String className = node.getClass().getName(); int index = className.lastIndexOf("".""); if (index > 0) className = className.substring(index+1); if (node instanceof Comment) return className; String value = """"; return value; } //Trailing Comment } The resulting "tree" in YAML format is going to be something like the following: --- type : file name : /path/to/file locationSpan : {start: [1,0], end: [52,0]} footerSpan : [1752, 1751] parsingErrorsDetected : true children : - type : package name : com.codicesoftware.parser.java locationSpan : {start: [1,0], end: [2,1]} span : [0, 40] - type : import name : org.eclipse.core.runtime.IProgressMonitor locationSpan : {start: [3,0], end: [3,50]} span : [41, 93] - type : import name : org.eclipse.jdt.core.dom.* locationSpan : {start: [4,0], end: [4,35]} span : [94, 129] - type : import name : java.util.* locationSpan : {start: [5,0], end: [6,1]} span : [130, 150] - type : class name : PlasticVisitor locationSpan : {start: [7,0], end: [52,0]} headerSpan : [151, 202] footerSpan : [1751, 1751] children : - type : field name : NODE locationSpan : {start: [9,4], end: [11,3]} span : [203, 250] - type : method name : preVisit locationSpan : {start: [11,4], end: [27,3]} span : [251, 865] - type : method name : visitComments locationSpan : {start: [27,4], end: [40,3]} span : [866, 1340] - type : method name : getNodeAsString locationSpan : {start: [40,4], end: [51,5]} span : [1341, 1750] parsingError : - location: [2,3] message: "Error before {" - location: [4,5] message: "Error other" As I said, I'll be posting a link to the SemanticMerge version able to deal with the external parsers and so on. I think we should probably create a github repo to contain the examples plus source code, what do you think? Thanks! pablo Link to comment Share on other sites More sharing options...
A. Mussche Posted September 4, 2013 Report Share Posted September 4, 2013 Can you post the link to the tools? And some more specifications about the yaml format? (seems simple enough, but above example is not complete?) Then I will try to make at least a PoC and make a github repro then. At least I found a YAML parser for Delphi: https://bitbucket.org/OCTAGRAM/delphi-yaml/wiki/Home (don't know how good it is) Next is a delphi syntax parser, found some: https://code.google.com/p/dwscript/ (supports various syntaxes like Prims/Oxygene too?) https://github.com/jacobthurman/Castalia-Delphi-Parser http://jedicodeformat.sourceforge.net/ http://wiki.freepascal.org/fcl-passrc (don't know which is the best) Link to comment Share on other sites More sharing options...
A. Mussche Posted September 4, 2013 Report Share Posted September 4, 2013 I have an initial PoC working, which parses the following file: unit Unit1; interface type TTest = class(TObject) procedure Test; end; implementation { TTest } procedure TTest.Test; begin // end; end. The output it generates is: --- type : file name : test locationSpan : {start: [0,0], end: [19,0]} footerSpan : [173,173] children : - type : class name : TTest locationSpan : {start: [5,0], end: [7,6]} headerSpan : [35,59] footerSpan : [88,88] children : - type : method name : Test locationSpan : {start: [6,0], end: [6,19]} span : [61,80] Do you have a good name for the github repo? Note: in Delphi/Pascal a function/procedure/method has a definition and an implementation: how will this be handled? Link to comment Share on other sites More sharing options...
A. Mussche Posted September 4, 2013 Report Share Posted September 4, 2013 Created github repo: https://github.com/andremussche/SemanticMergeDelphi/ Link to comment Share on other sites More sharing options...
Jeroen Vandezande Posted September 4, 2013 Report Share Posted September 4, 2013 I am also looking into parsing Oxygene .pas files and I have a couple of questions... - Why YAML? why not standard XML? - Do you have a .NET lib for writing YAML? (or C# code that you used for reading the YAML) - As Someone else already asked: what about the method definitions and the method implementation? Are you going to add a new type? like 'method definition'? Best Regards, Jeroen Vandezande Link to comment Share on other sites More sharing options...
psantosl Posted September 4, 2013 Author Report Share Posted September 4, 2013 Hi all, The version you should use is this one: 0.9.38 (or higher) - http://plasticscm.com/releases/SemanticMerge-0.9.38.0-installer.exe Update: we just released version 0.9.38 replacing 0.9.37 with a modification in the way how files are sent to the parser, so that it file now goes on a different line and it is much easier to parse. The protocol is as follows: 1) SemanticMerge will invoke your external parser tool (specified as -ep=yourtool.exe on the Semantic CLI (see below)). 2) SemanticMerge will send you two params: "shell" (saying you should loop instead of just exiting) and a "flag file" telling you "write to this file when you're ready to start parsing" 3) For each file to parse Semantic will write 2 values in your stdin: the path of the file to parse, the path of the file where you should write the output 4) When you have written the output file, write OK in the stdout So, a typical session will look like the following: 1) semantic invokes your app as yourapp shell flagfile 2) Your stdin will tipically look like file1.pas output1.pas file2.pas output2.pas file3.pas output3.pas end (Normally the "end" will only arrive when semantic is about to be closed because it will leave the external parser open just in case it needs to parse something else). And your output according to the following input would be: OK OK OK If, for whatever reason parsing of a file goes wrong, then you'll write KO instead. I wrote a small C# example (will rewrite it in Delphi in a few minutes) to better explain you how to interact with SemanticMerge. using System; using System.IO; namespace emptyparser { class Program { static void Main(string[] args) { // there are two arguments to consider: // 1) "shell" saying you must run in "shell mode" // - don't exit basically and wait for commands // 2) A "flag file" - write it when you're done just // in case you need initialization (like starting // up the Java VM string shell = args[0]; string flagFile = args[1]; // Write the "flagfile" when you're ready File.WriteAllText(flagFile, "READY"); string line; // Loop until Semantic writes "end" while( (line = Console.In.ReadLine() ) != "end" ) { // read the file to parse first string fileToParse = line; // then where to put the resulting tree string outputFile = Console.In.ReadLine(); // Parse the "fileToParse" try { // Write the result to "outputFile" File.Copy(@"base.tree.txt", outputFile); // write OK when you're done or KO if it didn't work Console.WriteLine("OK"); } catch(Exception) { Console.WriteLine("KO"); } } } } } The "base.tree.txt" file that I'm always returning (empty parser) is like this: --- type : file name : /path/to/file locationSpan : {start: [1,0], end: [52,0]} footerSpan : [1752, 1751] parsingErrorsDetected : false children : - type : package name : com.codicesoftware.parser.java locationSpan : {start: [1,0], end: [2,1]} span : [0, 40] - type : import name : org.eclipse.core.runtime.IProgressMonitor locationSpan : {start: [3,0], end: [3,50]} span : [41, 93] - type : import name : org.eclipse.jdt.core.dom.* locationSpan : {start: [4,0], end: [4,35]} span : [94, 129] - type : import name : java.util.* locationSpan : {start: [5,0], end: [6,1]} span : [130, 150] - type : class name : PlasticVisitor locationSpan : {start: [7,0], end: [52,0]} headerSpan : [151, 202] footerSpan : [1751, 1751] children : - type : field name : NODE locationSpan : {start: [9,4], end: [11,3]} span : [203, 250] - type : method name : preVisit locationSpan : {start: [11,4], end: [27,3]} span : [251, 865] - type : method name : visitComments locationSpan : {start: [27,4], end: [40,3]} span : [866, 1340] - type : method name : getNodeAsString locationSpan : {start: [40,4], end: [51,5]} span : [1341, 1750] And I invoke semanticmerge this way: semanticmergetool.exe -s src.pas -d dst.pas -b base.pas -r output.pas -emt=default -ep=emptyparser.exe And it works!! (Please note the entry files are not .java because otherwise the internal java parser will be triggered - which is what happened to me when I initially wrote this entry, so I just have fixed it now). The file I'm using is a very simple Java file like this: (I modified the "preVisit" method on both contributors) package com.codicesoftware.parser.java; import org.eclipse.core.runtime.IProgressMonitor; import org.eclipse.jdt.core.dom.*; import java.util.*; public class PlasticVisitor extends ASTVisitor { static final String NODE = ""%NODE%""; /** * Visits the given AST node prior to the type-specific visit. * The default implementation does nothing. Subclasses may reimplement. * @param node the node to visit */ public void preVisit(ASTNode node) { System.out.println(getNodeAsString(node)); // Comments that have null parent are not visited and do not show in the Tree. // If you would like to see them under the Compilation Unit node, uncomment the // code below: if (node instanceof CompilationUnit) visitComments((CompilationUnit)node); } // Normal (not javadoc) leading comment private void visitComments(CompilationUnit node) { List comments = ((CompilationUnit)node).getCommentList(); if (comments != null) { for (int i=0; i < comments.size(); ++i) { Comment comment = (Comment) comments.get(i); if (comment != null && comment.getParent() == null) comment.accept(this); } } } static private String getNodeAsString(ASTNode node) { String className = node.getClass().getName(); int index = className.lastIndexOf("".""); if (index > 0) className = className.substring(index+1); if (node instanceof Comment) return className; String value = """"; return value; } //Trailing Comment } So, if you plug a REAL delphi parser, you should be able to get it into semantic. The release I sent you should work but it won't let you save the result because you still don't have the "external parser" license. We will be activating this license for everyone on this thread or everyone who already reached me by email. But you'll be able to see the tool running even without license. Hope it helps! pablo Link to comment Share on other sites More sharing options...
psantosl Posted September 4, 2013 Author Report Share Posted September 4, 2013 Hi Jeroen and André, I am also looking into parsing Oxygene .pas files and I have a couple of questions... - Why YAML? why not standard XML? - Do you have a .NET lib for writing YAML? (or C# code that you used for reading the YAML) - As Someone else already asked: what about the method definitions and the method implementation? Are you going to add a new type? like 'method definition'? Best Regards, Jeroen Vandezande Yes, we must add the "method definition" thing to our internal mechanism. We'll be discussing it early tomorrow and let you know. It is something we had in mind already (same holds true for .h in C and many more, and we had Delphi in mind). André, thanks for the GitHub repo. You know you can use Plastic SCM GitSync to access it, right? Link to comment Share on other sites More sharing options...
rafrancoso Posted September 4, 2013 Report Share Posted September 4, 2013 This YAML supports output from parsed DFM? Link to comment Share on other sites More sharing options...
psantosl Posted September 4, 2013 Author Report Share Posted September 4, 2013 You mean the interface definition files?We do not support those yet, just source code... Unless you have a way to define the DFM structure inside the element types we defined so far... which I'm not sure you can at the current status. Link to comment Share on other sites More sharing options...
Jeroen Vandezande Posted September 5, 2013 Report Share Posted September 5, 2013 Hi, Can you give us a list of element types that can be defined in the YAML file? Best Regards, Jeroen Link to comment Share on other sites More sharing options...
miryamgsm Posted September 5, 2013 Report Share Posted September 5, 2013 Hi all, Why YAML? We thought about different formats: plain text, XML, JSON and YAML. Finally, we decided to use YAML because it has the following advantages: - There are YAML parsers for all languages - YAML is a superset of json. So, if someone wants to write the descriptor in JSON, we will be able to parse it with the YAML parser - It is human readable It won meritoriously YAML help Regarding to the .NET lib for YAML, we evaluated some libraries and "yamldotnet" was the one chosen. URL: http://www.aaubry.net/page/YamlDotNet Documentation: http://www.aaubry.net/page/YamlDotNet-Documentation Where to get YamlDotNet? The most up-to-date version can always be found in the following NuGet packages: http://nuget.org/packages/YamlDotNet.Core http://nuget.org/packages/YamlDotNet.RepresentationModel Steps: 1. Install NuGet Package Manager: http://visualstudiogallery.msdn.microsoft.com/27077b70-9dad-4c64-adcf-c7cf6bc9970c 2. Finding and Installing a NuGet Package Using the Package Manager Console: http://docs.nuget.org/docs/start-here/using-the-package-manager-console 3. Run: Install-Package YamlDotNet.Core Install-Package YamlDotNet.RepresentationModel YAML structure File encoding: UTF-8 It can be defined 3 kind of structures: file, container, terminal node. Structure: file It is unique and required. - "type": file. - "name": path of the file. - "locationSpan": row and column where the file starts and ends. (optional) - "footerSpan": start and end char where the file starts and ends. - "parsingErrorsDetected": flag, the file contains or not parsing errors. - "children:" set of containers and/or terminal nodes that it contains. (If there are no children, it must not be specified) - "parsingError:" set of parsing errors. (optional) Structure: container - "type": relevant name of the element type in the current program language. - "name": name of the element. - "locationSpan": row and column where the container starts and ends. (optional) - "headerSpan": start and end char where the header of the container starts and ends. - "footerSpan": start and end char where the footer of the container starts and ends. - "children:" set of containers and/or terminal nodes that it contains. (If there are no children, it must not be specified) Structure: terminal node - "type": relevant name of the element type in the current program language. - "name": name of the element. - "locationSpan": row and column where the node starts and ends. (optional) - "span": start and end char where the node starts and ends. Remarks: If there are no children, do not specify "children" field. Optional fields: locationSpan and parsingError. The structure of parsingError is the following: (check the provided example) - "location" : row and column where the error is located. - "message" : error message. Refering to the question 'Are you going to add a new type? like 'method definition'?', you can define any type you need: procedure, procedure declaration... We take care of them Hope it helps! Best regards, Míryam Link to comment Share on other sites More sharing options...
A. Mussche Posted September 5, 2013 Report Share Posted September 5, 2013 just encountered a small bug in example below, because the "filetoparse" and the "outputfile" are send in a single line: "C:\Users\amussche\Documents\GitHub\SemanticMergeDelphi\test\base.pas" "C:\Users\amussche\AppData\Local\Temp\\1165c576-c794-4fbc-b753-2d896190bd83.tree" So one should not use ReadLine twice! ... // Loop until Semantic writes "end" while( (line = Console.In.ReadLine() ) != "end" ) { // read the file to parse first string fileToParse = line; // then where to put the resulting tree string outputFile = Console.In.ReadLine(); ... Link to comment Share on other sites More sharing options...
A. Mussche Posted September 5, 2013 Report Share Posted September 5, 2013 I have committed an initial version of the command line parser: https://github.com/andremussche/SemanticMergeDelphi/tree/master/test note: change the fixed local path of my pc in the .bat file note2: see debug.log for more info (received data, parsing done, etc) However, the semanticmergetool.exe keeps waiting, but parsing is done and "OK" is written back... Link to comment Share on other sites More sharing options...
A. Mussche Posted September 5, 2013 Report Share Posted September 5, 2013 I am also looking into parsing Oxygene .pas files and I have a couple of questions... I hope we can change the parser a bit and support all kinds of pascal dialects (Oxygene, FPC, SmartMobileStudio) Link to comment Share on other sites More sharing options...
psantosl Posted September 5, 2013 Author Report Share Posted September 5, 2013 Hi all, We just released version 0.9.38 and I have updated my blog post above. As André said my sample code was wrong. Now Semantic is fixed so it matches the sample code I wrote. Link to comment Share on other sites More sharing options...
miryamgsm Posted September 5, 2013 Report Share Posted September 5, 2013 Here you have the descriptor tree for your example file. Hope it helps (check the attached images) --- type : file name : /path/to/file locationSpan : {start: [1,0], end: [19,4]} footerSpan : [0, -1] parsingErrorsDetected : false children : - type : unit name : Unit1 locationSpan : {start: [1,0], end: [1,13]} span : [0, 12] - type : interface name : interface locationSpan : {start: [2,0], end: [9,0]} headerSpan : [13, 25] footerSpan : [0, -1] children : - type : type name : type locationSpan : {start: [4,0], end: [9,0]} headerSpan : [26, 33] footerSpan : [0, -1] children : - type : class name : TTest locationSpan : {start: [6,0], end: [9,0]} headerSpan : [34, 59] footerSpan : [81, 88] children : - type : procedure declaration name : Test locationSpan : {start: [7,0], end: [7,21]} span : [60, 80] - type : implementation name : implementation locationSpan : {start: [9,0], end: [19,4]} headerSpan : [89, 106] footerSpan : [164, 169] children : - type : procedure name : TTest.Test locationSpan : {start: [11,0], end: [18,0]} span : [107, 163] Explanation of the example: It is easy to get the spans if you "translate" the code as the following: Terminal node: Unit1 Text: "unit Unit1;\r\n" Container: interface Header text: "\r\ninterface\r\n" Footer text: empty Container: type Header text: "\r\ntype\r\n" Footer text: empty Container: TTest Header text: " TTest = class(TObject)\r\n" Footer text: " end;\r\n" Terminal node: Test Text: " procedure Test;\r\n" Container: implementation "\r\nimplementation\r\n" "\r\nend." Terminal node: TTest.Test Text: "\r\n{ TTest }\r\n\r\nprocedure TTest.Test;\r\nbegin\r\n //\r\nend;\r\n" Remarks: - The end of line is in the windows format: "\r\n" - When there is not footerSpan, it should be specified as [0, -1]. For example, in the "type" container. Link to comment Share on other sites More sharing options...
A. Mussche Posted September 6, 2013 Report Share Posted September 6, 2013 Here you have the descriptor tree for your example file. Hope it helps (check the attached image) thanks! some remarks: - "interface" is one block, but "implementation" is a seperate block (not child of interface but at same level in unit), but I don't think it that's a problem if I change that - do I understand correct that I can choose my own types? the semantic merge "only" compares same types, no matter what name it has? (so no hard predefined names but dynamic) - indeed, some kind of linking the "interface" procedure to the "implementation" procedure would be nice (in case of renames?) Link to comment Share on other sites More sharing options...
miryamgsm Posted September 6, 2013 Report Share Posted September 6, 2013 Hi all, thanks! some remarks: - "interface" is one block, but "implementation" is a seperate block (not child of interface but at same level in unit), but I don't think it that's a problem if I change that - do I understand correct that I can choose my own types? the semantic merge "only" compares same types, no matter what name it has? (so no hard predefined names but dynamic) - indeed, some kind of linking the "interface" procedure to the "implementation" procedure would be nice (in case of renames?) - My previous post was updated, I completed the descriptor and added a small explanation to clarify the example. Also, I fixed the image too. Thanks André for your kind cooperation. - The containers and terminal nodes have a "type" field, which specify the relevant name of the element type in the current program language. It is used by SM, you do not worry about the internal mechanism - Yes. We think the linking the "definition" to the "implementation" would be nice in some scenarios. We will think about it. For your information, we are thinking about make the spans optional, as we see there are quite difficult to get and even for us to explain. We will discuss about it on Monday, so we will let you know. Would be easier if we request just the line and column? Is it difficult to get the span (chars) from the parsers? We would like to know your opinion so we can make simpler the structure. Also, headerSpan and footerSpan could be join in one only span... well, tell us your opinions! Your suggestions are welcome at any time!! Best regards, Míryam Link to comment Share on other sites More sharing options...
A. Mussche Posted September 9, 2013 Report Share Posted September 9, 2013 thanks for the updated post, it makes more sense now I hope to update the PoC today or tomorrow spans are only confusing when making the parser once, but I have the information (but making it optional can be easier for other parser later? ) the exact format doesn't really matter (like joining header/footer) once you just know what it should be - My previous post was updated, I completed the descriptor and added a small explanation to clarify the example. Also, I fixed the image too. Thanks André for your kind cooperation. - The containers and terminal nodes have a "type" field, which specify the relevant name of the element type in the current program language. It is used by SM, you do not worry about the internal mechanism - Yes. We think the linking the "definition" to the "implementation" would be nice in some scenarios. We will think about it. For your information, we are thinking about make the spans optional, as we see there are quite difficult to get and even for us to explain. We will discuss about it on Monday, so we will let you know. Would be easier if we request just the line and column? Is it difficult to get the span (chars) from the parsers? We would like to know your opinion so we can make simpler the structure. Also, headerSpan and footerSpan could be join in one only span... well, tell us your opinions! Your suggestions are welcome at any time!! Best regards, Míryam Link to comment Share on other sites More sharing options...
A. Mussche Posted September 10, 2013 Report Share Posted September 10, 2013 Cool! A simple file works now However... when a locationSpan ends on start of a line, the "i" of "implementation" is selected too. E.g.: locationSpan : {start: [3, 0], end: [10, 0]} Second: the empty lines are not copied! Or should these be included within the "spans"? Link to comment Share on other sites More sharing options...
miryamgsm Posted September 11, 2013 Report Share Posted September 11, 2013 Cool! A simple file works now However... when a locationSpan ends on start of a line, the "i" of "implementation" is selected too. E.g.: locationSpan : {start: [3, 0], end: [10, 0]} Second: the empty lines are not copied! Or should these be included within the "spans"? Congrats André! In your example, "interface", "type" and "TTest" should end in the same locationSpan, the last char of the " end;\r\n" line. Second: Yes, the empty lines should be included in the structure. In general, the source file should be re-builded correctly from the spans. Míryam Link to comment Share on other sites More sharing options...
Recommended Posts