Jump to content
psantosl

Delphi parser development

Recommended Posts

Hi all,

 

Delphi support is the top 5 request in our User Voice: Delphi support in SemanticMerge. So we're eager to add Object Pascal to the list of supported languages.

 

In order to do so we’ve developed a way to plug-in external parsers, so if you can develop a Delphi language parser it will be very simple to get it invoked from Semantic.

 

 

 

If you’re interested on joining our “Delphi Parser” effort, please join this thread and we will send you the required tools.

 

Right now all what plugin a parser requires is:

 

* Create an standalone executable.

* Able to receive some data as arguments.

* And able to export the "tree" of the file in YAML format.

 

Of course you'll need all the details but this is just an intro of what it takes.

 

We've also developed tools to help testing the parsers, like a "directory parser" which will loop through a code tree parsing (invoking your parser) and then rebuilding the source file making sure the original and the regenerated ones match.

 

We're eager to get this started!

 

pablo
  • Like 2

Share this post


Link to post
Share on other sites

Wow! Top 5?? :o

 

Too bad I'm not working with Delphi anymore. :(

I really would like to help you! :)

 

Keep the good work!!

  • Like 1

Share this post


Link to post
Share on other sites

Hi,

I'm interested in helping out as a tester/reviewer. Unfortunately my job is pretty full-on and I don't see that I'll be able to spend a lot of time coding on this project, but will definitely get involved and help as much as I can.

I took a look at the YAML spec (never seen it before) and it seems someone is going to have to write a parser for Delphi. We use the Delphi SuperObject for JSON implementation and something like that might be able to be adapted.

Please let me know how I can get involved.

Rick.

Share this post


Link to post
Share on other sites

Hi!

 

Here it comes an example, just to get you used to how it looks like, ok? I mean, I'll be publishing more detailed information early tomorrow.

 

Considering this Java file:

 

 

package com.codicesoftware.parser.java;


import org.eclipse.core.runtime.IProgressMonitor;
import org.eclipse.jdt.core.dom.*;
import java.util.*;


public class PlasticVisitor extends ASTVisitor {
    
    static final String NODE = ""%NODE%"";
    
    /**
     * Visits the given AST node prior to the type-specific visit.
     * The default implementation does nothing. Subclasses may reimplement.
     * @param node the node to visit
     */
    public void preVisit(ASTNode node)
    {
        System.out.println(getNodeAsString(node));


        // Comments that have null parent are not visited and do not show in the Tree. 
        // If you would like to see them under the Compilation Unit node, uncomment the
        // code below:
        if (node instanceof CompilationUnit) 
            visitComments((CompilationUnit)node);
    }
    
    // Normal (not javadoc) leading comment
    private void visitComments(CompilationUnit node)
    {
        List comments = ((CompilationUnit)node).getCommentList();
        if (comments != null) {
            for (int i=0; i < comments.size(); ++i) {
                Comment comment = (Comment) comments.get(i);
                if (comment != null && comment.getParent() == null) 
                    comment.accept(this);
            }
        }
    }
    
    static private String getNodeAsString(ASTNode node)
    {
        String className = node.getClass().getName();
        int index = className.lastIndexOf(""."");
        if (index > 0)
            className = className.substring(index+1);
        
        if (node instanceof Comment) 
            return className;
        String value = """";
        return value;
    } //Trailing Comment
}

The resulting "tree" in YAML format is going to be something like the following:

 

 

---
type : file
name : /path/to/file
locationSpan : {start: [1,0], end: [52,0]}
footerSpan : [1752, 1751]
parsingErrorsDetected : true
children :


  - type : package
    name : com.codicesoftware.parser.java
    locationSpan : {start: [1,0], end: [2,1]}
    span : [0, 40]


  - type : import
    name : org.eclipse.core.runtime.IProgressMonitor
    locationSpan : {start: [3,0], end: [3,50]}
    span : [41, 93]


  - type : import
    name : org.eclipse.jdt.core.dom.*
    locationSpan : {start: [4,0], end: [4,35]}
    span : [94, 129]


  - type : import
    name : java.util.*
    locationSpan : {start: [5,0], end: [6,1]}
    span : [130, 150]


  - type : class
    name : PlasticVisitor
    locationSpan : {start: [7,0], end: [52,0]}
    headerSpan : [151, 202]
    footerSpan : [1751, 1751]
    children :


     - type : field
       name : NODE
       locationSpan : {start: [9,4], end: [11,3]}
       span : [203, 250]


     - type : method
       name : preVisit 
       locationSpan : {start: [11,4], end: [27,3]}
       span : [251, 865]


     - type : method
       name : visitComments 
       locationSpan : {start: [27,4], end: [40,3]}
       span : [866, 1340]


     - type : method
       name : getNodeAsString 
       locationSpan : {start: [40,4], end: [51,5]}
       span : [1341, 1750]


parsingError :


  - location: [2,3]
    message: "Error before {"


  - location: [4,5]
    message: "Error other"

As I said, I'll be posting a link to the SemanticMerge version able to deal with the external parsers and so on.

 

I think we should probably create a github repo to contain the examples plus source code, what do you think?

 

Thanks!

 

pablo

Share this post


Link to post
Share on other sites

Can you post the link to the tools? 

And some more specifications about the yaml format? (seems simple enough, but above example is not complete?)

Then I will try to make at least a PoC and make a github repro then.

 

At least I found a YAML parser for Delphi:

https://bitbucket.org/OCTAGRAM/delphi-yaml/wiki/Home

(don't know how good it is)

 

Next is a delphi syntax parser, found some:

https://code.google.com/p/dwscript/ (supports various syntaxes like Prims/Oxygene too?)

https://github.com/jacobthurman/Castalia-Delphi-Parser

http://jedicodeformat.sourceforge.net/

http://wiki.freepascal.org/fcl-passrc

(don't know which is the best)

Share this post


Link to post
Share on other sites

I have an initial PoC working, which parses the following file:

unit Unit1;

interface

type
  TTest = class(TObject)
    procedure Test;
  end;

implementation

{ TTest }

procedure TTest.Test;
begin
  //
end;

end.

The output it generates is:

---
type : file
name : test
locationSpan : {start: [0,0], end: [19,0]}
footerSpan : [173,173]
children :

  - type : class
    name : TTest
    locationSpan : {start: [5,0], end: [7,6]}
    headerSpan : [35,59]
    footerSpan : [88,88]
    children :
  
    - type : method
      name : Test
      locationSpan : {start: [6,0], end: [6,19]}
      span : [61,80]
    

Do you have a good name for the github repo?

 

Note: in Delphi/Pascal a function/procedure/method has a definition and an implementation: how will this be handled? 

Share this post


Link to post
Share on other sites

I am also looking into parsing Oxygene .pas files and I have a couple of questions...

- Why YAML? why not standard XML?

- Do you have a .NET lib for writing YAML? (or C# code that you used for reading the YAML)

- As Someone else already asked: what about the method definitions and the method implementation?

  Are you going to add a new type? like 'method definition'?

 

Best Regards,

 

Jeroen Vandezande

Share this post


Link to post
Share on other sites

Hi all,

 

The version you should use is this one: 0.9.38 (or higher) - http://plasticscm.com/releases/SemanticMerge-0.9.38.0-installer.exe

 

Update: we just released version 0.9.38 replacing 0.9.37 with a modification in the way how files are sent to the parser, so that it file now goes on a different line and it is much easier to parse.

 

The protocol is as follows:

 

1) SemanticMerge will invoke your external parser tool (specified as -ep=yourtool.exe on the Semantic CLI (see below)).

2) SemanticMerge will send you two params: "shell" (saying you should loop instead of just exiting) and a "flag file" telling you "write to this file when you're ready to start parsing"

 

3) For each file to parse Semantic will write 2 values in your stdin: the path of the file to parse, the path of the file where you should write the output

 

4) When you have written the output file, write OK in the stdout

 

 

So, a typical session will look like the following:

 

1) semantic invokes your app as yourapp shell flagfile

2) Your stdin will tipically look like

file1.pas
output1.pas
file2.pas
output2.pas
file3.pas
output3.pas
end

(Normally the "end" will only arrive when semantic is about to be closed because it will leave the external parser open just in case it needs to parse something else).

 

And your output according to the following input would be:

OK
OK
OK

If, for whatever reason parsing of a file goes wrong, then you'll write KO instead.

 

 

 

I wrote a small C# example (will rewrite it in Delphi in a few minutes) to better explain you how to interact with SemanticMerge.

using System;
using System.IO;

namespace emptyparser
{
    class Program
    {
        static void Main(string[] args)
        {
            // there are two arguments to consider:
            // 1) "shell" saying you must run in "shell mode" 
            //    - don't exit basically and wait for commands
            // 2) A "flag file" - write it when you're done just
            //    in case you need initialization (like starting
            //    up the Java VM
            string shell = args[0];
            string flagFile = args[1];

            // Write the "flagfile" when you're ready
            File.WriteAllText(flagFile, "READY");

            string line;

            // Loop until Semantic writes "end"
            while( (line = Console.In.ReadLine() ) != "end" )
            {
                // read the file to parse first
                string fileToParse = line;

                // then where to put the resulting tree
                string outputFile = Console.In.ReadLine();

                // Parse the "fileToParse"

                try
                {
                    // Write the result to "outputFile"

                    File.Copy(@"base.tree.txt", outputFile);

                    // write OK when you're done or KO if it didn't work
                    Console.WriteLine("OK");
                }
                catch(Exception)
                {
                    Console.WriteLine("KO");
                }
            }
        }
    }
}

The "base.tree.txt" file that I'm always returning (empty parser) is like this:

---
type : file
name : /path/to/file
locationSpan : {start: [1,0], end: [52,0]}
footerSpan : [1752, 1751]
parsingErrorsDetected : false
children :

  - type : package
    name : com.codicesoftware.parser.java
    locationSpan : {start: [1,0], end: [2,1]}
    span : [0, 40]

  - type : import
    name : org.eclipse.core.runtime.IProgressMonitor
    locationSpan : {start: [3,0], end: [3,50]}
    span : [41, 93]

  - type : import
    name : org.eclipse.jdt.core.dom.*
    locationSpan : {start: [4,0], end: [4,35]}
    span : [94, 129]

  - type : import
    name : java.util.*
    locationSpan : {start: [5,0], end: [6,1]}
    span : [130, 150]

  - type : class
    name : PlasticVisitor
    locationSpan : {start: [7,0], end: [52,0]}
    headerSpan : [151, 202]
    footerSpan : [1751, 1751]
    children :

     - type : field
       name : NODE
       locationSpan : {start: [9,4], end: [11,3]}
       span : [203, 250]

     - type : method
       name : preVisit
       locationSpan : {start: [11,4], end: [27,3]}
       span : [251, 865]

     - type : method
       name : visitComments
       locationSpan : {start: [27,4], end: [40,3]}
       span : [866, 1340]

     - type : method
       name : getNodeAsString
       locationSpan : {start: [40,4], end: [51,5]}
       span : [1341, 1750]

And I invoke semanticmerge this way:

semanticmergetool.exe -s src.pas -d dst.pas -b base.pas -r output.pas -emt=default -ep=emptyparser.exe

And it works!!

 

(Please note the entry files are not .java because otherwise the internal java parser will be triggered - which is what happened to me when I initially wrote this entry, so I just have fixed it now).

 

The file I'm using is a very simple Java file like this: (I modified the "preVisit" method on both contributors)

 

 

package com.codicesoftware.parser.java;


import org.eclipse.core.runtime.IProgressMonitor;
import org.eclipse.jdt.core.dom.*;
import java.util.*;


public class PlasticVisitor extends ASTVisitor {
    
    static final String NODE = ""%NODE%"";
    
    /**
     * Visits the given AST node prior to the type-specific visit.
     * The default implementation does nothing. Subclasses may reimplement.
     * @param node the node to visit
     */
    public void preVisit(ASTNode node)
    {
        System.out.println(getNodeAsString(node));


        // Comments that have null parent are not visited and do not show in the Tree. 
        // If you would like to see them under the Compilation Unit node, uncomment the
        // code below:
        if (node instanceof CompilationUnit) 
            visitComments((CompilationUnit)node);
    }
    
    // Normal (not javadoc) leading comment
    private void visitComments(CompilationUnit node)
    {
        List comments = ((CompilationUnit)node).getCommentList();
        if (comments != null) {
            for (int i=0; i < comments.size(); ++i) {
                Comment comment = (Comment) comments.get(i);
                if (comment != null && comment.getParent() == null) 
                    comment.accept(this);
            }
        }
    }
    
    static private String getNodeAsString(ASTNode node)
    {
        String className = node.getClass().getName();
        int index = className.lastIndexOf(""."");
        if (index > 0)
            className = className.substring(index+1);
        
        if (node instanceof Comment) 
            return className;
        String value = """";
        return value;
    } //Trailing Comment
}

So, if you plug a REAL delphi parser, you should be able to get it into semantic.

 

The release I sent you should work but it won't let you save the result because you still don't have the "external parser" license. We will be activating this license for everyone on this thread or everyone who already reached me by email. But you'll be able to see the tool running even without license.

 

Hope it helps!

 

pablo

Share this post


Link to post
Share on other sites

Hi Jeroen and André,

 

I am also looking into parsing Oxygene .pas files and I have a couple of questions...

- Why YAML? why not standard XML?

- Do you have a .NET lib for writing YAML? (or C# code that you used for reading the YAML)

- As Someone else already asked: what about the method definitions and the method implementation?

  Are you going to add a new type? like 'method definition'?

 

Best Regards,

 

Jeroen Vandezande

 

Yes, we must add the "method definition" thing to our internal mechanism. We'll be discussing it early tomorrow and let you know. It is something we had in mind already (same holds true for .h in C and many more, and we had Delphi in mind).

 

André, thanks for the GitHub repo. You know you can use Plastic SCM GitSync to access it, right? :P

Share this post


Link to post
Share on other sites

You mean the interface definition files?

We do not support those yet, just source code... Unless you have a way to define the DFM structure inside the element types we defined so far... which I'm not sure you can at the current status.

Share this post


Link to post
Share on other sites

Hi all,

 

Why YAML?
 
We thought about different formats: plain text, XML, JSON and YAML.
Finally, we decided to use YAML because it has the following advantages:
- There are YAML parsers for all languages
- YAML is a superset of json. So, if someone wants to write the descriptor in JSON, we will be able to parse it with the YAML parser
- It is human readable
 
It won meritoriously ;)
 
YAML help
 
Regarding to the .NET lib for YAML, we evaluated some libraries and "yamldotnet" was the one chosen.
 
Where to get YamlDotNet?
The most up-to-date version can always be found in the following NuGet packages:
 
Steps:
2. Finding and Installing a NuGet Package Using the Package Manager Console: http://docs.nuget.org/docs/start-here/using-the-package-manager-console
3. Run:
Install-Package YamlDotNet.Core
Install-Package YamlDotNet.RepresentationModel
 
YAML structure
 
File encoding: UTF-8
It can be defined 3 kind of structures: file, container, terminal node.
 
Structure: file
It is unique and required.
 - "type": file.
 - "name": path of the file.
 - "locationSpan": row and column where the file starts and ends. (optional)
 - "footerSpan": start and end char where the file starts and ends.
 - "parsingErrorsDetected": flag, the file contains or not parsing errors.
 - "children:" set of containers and/or terminal nodes that it contains. (If there are no children, it must not be specified)
 - "parsingError:" set of parsing errors. (optional)
 
Structure: container
 - "type": relevant name of the element type in the current program language.
 - "name": name of the element.
 - "locationSpan": row and column where the container starts and ends. (optional)
 - "headerSpan": start and end char where the header of the container starts and ends.
 - "footerSpan": start and end char where the footer of the container starts and ends.
 - "children:" set of containers and/or terminal nodes that it contains. (If there are no children, it must not be specified)
 
Structure: terminal node
 - "type": relevant name of the element type in the current program language.
 - "name": name of the element.
 - "locationSpan": row and column where the node starts and ends. (optional)
 - "span": start and end char where the node starts and ends.
 
Remarks:
If there are no children, do not specify "children" field. 
Optional fields: locationSpan and parsingError.
The structure of parsingError is the following: (check the provided example)
 - "location" : row and column where the error is located.
 - "message" : error message.

Refering to the question 'Are you going to add a new type? like 'method definition'?', you can define any type you need: procedure, procedure declaration... We take care of them ;)

 

Hope it helps!

 

Best regards,

Míryam

Share this post


Link to post
Share on other sites

just encountered a small bug in example below, because the "filetoparse" and the "outputfile" are send in a single line:

"C:\Users\amussche\Documents\GitHub\SemanticMergeDelphi\test\base.pas" "C:\Users\amussche\AppData\Local\Temp\\1165c576-c794-4fbc-b753-2d896190bd83.tree"

So one should not use ReadLine twice!

 

...

            // Loop until Semantic writes "end"
            while( (line = Console.In.ReadLine() ) != "end" )
            {
                // read the file to parse first
                string fileToParse = line;


                // then where to put the resulting tree
                string outputFile = Console.In.ReadLine();

...

Share this post


Link to post
Share on other sites

I have committed an initial version of the command line parser:

https://github.com/andremussche/SemanticMergeDelphi/tree/master/test

 

note: change the fixed local path of my pc in the .bat file :)

note2: see debug.log for more info (received data, parsing done, etc)

 

However, the semanticmergetool.exe keeps waiting, but parsing is done and "OK" is written back...

Share this post


Link to post
Share on other sites

I am also looking into parsing Oxygene .pas files and I have a couple of questions...

 

I hope we can change the parser a bit and support all kinds of pascal dialects (Oxygene, FPC, SmartMobileStudio)

Share this post


Link to post
Share on other sites

Hi all,

 

We just released version 0.9.38 and I have updated my blog post above. As André said my sample code was wrong. Now Semantic is fixed so it matches the sample code I wrote.

Share this post


Link to post
Share on other sites
Here you have the descriptor tree for your example file. Hope it helps :) (check the attached images)
 
---
type : file
name : /path/to/file
locationSpan : {start: [1,0], end: [19,4]}
footerSpan : [0, -1]
parsingErrorsDetected : false
children :

  - type : unit
    name : Unit1
    locationSpan : {start: [1,0], end: [1,13]}
    span : [0, 12]

  - type : interface
    name : interface
    locationSpan : {start: [2,0], end: [9,0]}
    headerSpan : [13, 25]
    footerSpan : [0, -1]
    children :

      - type : type
        name : type
        locationSpan : {start: [4,0], end: [9,0]}
        headerSpan : [26, 33]
        footerSpan : [0, -1]
        children :

          - type : class
            name : TTest
            locationSpan : {start: [6,0], end: [9,0]}
            headerSpan : [34, 59]
            footerSpan : [81, 88]
            children :

              - type : procedure declaration
                name : Test
                locationSpan : {start: [7,0], end: [7,21]}
                span : [60, 80]

  - type : implementation
    name : implementation
    locationSpan : {start: [9,0], end: [19,4]}
    headerSpan : [89, 106]
    footerSpan : [164, 169]
    children :

      - type : procedure 
        name : TTest.Test
        locationSpan : {start: [11,0], end: [18,0]}
        span : [107, 163]

 

Explanation of the example:

 

It is easy to get the spans if you "translate" the code as the following:

 

Terminal node: Unit1

Text: "unit Unit1;\r\n"

 

Container: interface

Header text: "\r\ninterface\r\n"

Footer text: empty

 

Container: type

Header text: "\r\ntype\r\n"

Footer text: empty

 

Container: TTest

Header text: "  TTest = class(TObject)\r\n"
Footer text: "  end;\r\n"

 

Terminal node: Test

Text: "    procedure Test;\r\n"

 

Container: implementation

"\r\nimplementation\r\n"
"\r\nend."

 

Terminal node: TTest.Test

Text: "\r\n{ TTest }\r\n\r\nprocedure TTest.Test;\r\nbegin\r\n  //\r\nend;\r\n"

 

Remarks:

- The end of line is in the windows format: "\r\n"

- When there is not footerSpan, it should be specified as [0, -1]. For example, in the "type" container.

post-9801-0-26909400-1378472294_thumb.png

post-9801-0-46186100-1378472305_thumb.jpg

Share this post


Link to post
Share on other sites

 

Here you have the descriptor tree for your example file. Hope it helps :) (check the attached image)

 

thanks!

some remarks:

- "interface" is one block, but "implementation" is a seperate block (not child of interface but at same level in unit), but I don't think it that's a problem if I change that

- do I understand correct that I can choose my own types? the semantic merge "only" compares same types, no matter what name it has? (so no hard predefined names but dynamic)

- indeed, some kind of linking the "interface" procedure to the "implementation" procedure would be nice (in case of renames?)

Share this post


Link to post
Share on other sites

Hi all,

 

 

thanks!

some remarks:

- "interface" is one block, but "implementation" is a seperate block (not child of interface but at same level in unit), but I don't think it that's a problem if I change that

- do I understand correct that I can choose my own types? the semantic merge "only" compares same types, no matter what name it has? (so no hard predefined names but dynamic)

 

- indeed, some kind of linking the "interface" procedure to the "implementation" procedure would be nice (in case of renames?)

 

- My previous post was updated, I completed the descriptor and added a small explanation to clarify the example. Also, I fixed the image too. Thanks André for your kind cooperation.

- The containers and terminal nodes have a "type" field, which specify the relevant name of the element type in the current program language. It is used by SM, you do not worry about the internal mechanism :)
- Yes. We think the linking the "definition" to the "implementation" would be nice in some scenarios. We will think about it.
 
For your information, we are thinking about make the spans optional, as we see there are quite difficult to get and even for us to explain. We will discuss about it on Monday, so we will let you know.
 
Would be easier if we request just the line and column? Is it difficult to get the span (chars) from the parsers? We would like to know your opinion so we can make simpler the structure.
 
Also, headerSpan and footerSpan could be join in one only span... well, tell us your opinions! Your suggestions are welcome at any time!!
 

Best regards,

Míryam

Share this post


Link to post
Share on other sites

thanks for the updated post, it makes more sense now 

I hope to update the PoC today or tomorrow

 

spans are only confusing when making the parser once, but I have the information (but making it optional can be easier for other parser later? :) )

the exact format doesn't really matter (like joining header/footer) once you just know what it should be :)

 

 

- My previous post was updated, I completed the descriptor and added a small explanation to clarify the example. Also, I fixed the image too. Thanks André for your kind cooperation.

- The containers and terminal nodes have a "type" field, which specify the relevant name of the element type in the current program language. It is used by SM, you do not worry about the internal mechanism :)

- Yes. We think the linking the "definition" to the "implementation" would be nice in some scenarios. We will think about it.
 
For your information, we are thinking about make the spans optional, as we see there are quite difficult to get and even for us to explain. We will discuss about it on Monday, so we will let you know.
 
Would be easier if we request just the line and column? Is it difficult to get the span (chars) from the parsers? We would like to know your opinion so we can make simpler the structure.
 
Also, headerSpan and footerSpan could be join in one only span... well, tell us your opinions! Your suggestions are welcome at any time!!
 

Best regards,

Míryam

Share this post


Link to post
Share on other sites

Cool! A simple file works now :)

post-1393-0-11175500-1378840598_thumb.png
 

However... when a locationSpan ends on start of a line, the "i" of "implementation" is selected too.

E.g.: locationSpan : {start: [3, 0], end: [10, 0]}

 

Second: the empty lines are not copied! Or should these be included within the "spans"?

 

post-1393-0-11175500-1378840598_thumb.png

Share this post


Link to post
Share on other sites

Cool! A simple file works now :)

 
 

However... when a locationSpan ends on start of a line, the "i" of "implementation" is selected too.

E.g.: locationSpan : {start: [3, 0], end: [10, 0]}

 

Second: the empty lines are not copied! Or should these be included within the "spans"?

 
Congrats André!
 
In your example, "interface", "type" and "TTest" should end in the same locationSpan, the last char of the "  end;\r\n" line.
Second: Yes, the empty lines should be included in the structure. In general, the source file should be re-builded correctly from the spans.
 
Míryam

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...