My top-down parser

In the last post I uploaded I’d said that I wanted to have a go at creating my own parser for checking input against a BNF grammar. That’s been my occasional goal for the last couple of weeks and although at the moment the grammar is only a small subset of anything that could be considered useful I’m quite happy that I’ve got the method sorted which is the most important part. At the moment the BNF grammar looks like this:

main_program : OPEN_TOKEN statements CLOSE_TOKEN;

statements : statement statements
           | statement
           ;

statement : assignment_statement
          | if_statement
          ;

assignment_statement : operand EQUALS operand SEMICOLON;

if_statement : IF OPEN_BRACKET comparison CLOSE_BRACKET SEMICOLON;

comparison : operand comparator operand;

comparator : MORE_THAN
           | LESS_THAN
           ;
operand : NUMBER
        | VARIABLE
        ;

I’m aware that it’s functionally useless (who’d want an if statement without any consequences, and why aren’t the brackets in the comparison rule) but it suited my purpose for testing the method.

Since I wanted to focus on this part of the compilation process exclusively, for the moment I’ve just decided to manually stick in a list of tokens to be matched against instead of tokenising input text. If I do extend this to the point that I want to include the other compilation steps (and I might because it’s been an interesting little project) a tokenising phase can be added with relatively little hassle.

The basic method is quite simple and looking back I’ve written comparatively little code when considering how long it took me to get it working, but that’s part and parcel of learning things. A grammar is comprised of terminals and non-terminals. Non-terminals can be comprised of terminals and other non-terminals. Terminals can only be matched and cannot be split down any further. In my method we take the list of tokes and pass it to a rule (our non-terminal) and attempt to match it to what makes up that rule. Tokens can be matched to that rule’s tokens, but if a token cannot be matched to a rule’s token then the program can check if that token is a rule and try and match it against that rule. Whenever a sub-rule is called a subset of the toke stream is created, from the token we matched up to to the end of the stream, and the number of tokens a rule matches is recorded so any rule it came from can continue from that point.

Sorry, that’s the simplest way I can explain it and I appreciate that I’ve used the word token to the extent that seeing it again will likely infuriate you. In short try and match a token to a rule’s token, if that’s not possible see if there is a sub-rule that can be matched for the rule that will match the next part of the token list. If so then continue as normal. If not look for the rule’s alternatives. That’s what’s happening in the code below.

if (PassedTokenStrean[TokenStreamCounter].Name != RuleComprised[i].Name)
{   //Go here if must match a nonterminal in the rule
    try
    {
        List RetrievedRuleList = new List();
        bool BrokenInRule = (RuleComprised[i] as Rule).CheckMatchinRule(PassedTokenStrean.GetRange(TokenStreamCounter, PassedTokenStrean.Count - TokenStreamCounter), ref matchednodes,  RetrievedRuleList);

The matched nodes number is to record the amount of tokens that the rule has matched. Once the method has returned this number then it is added to the counter in order for the calling rule to continue from that point. Below is the code which handles the returns from any sub-rules.

if (!BrokenInRule) 
{ //Go here if the nonterminal was matched 
    for (int j = 0; j < RetrievedRuleList.Count; j++) 
        ReturningStrings.Add(RetrievedRuleList[j]); 
    TokenStreamCounter = TokenStreamCounter + matchednodes; 
    Broken = false; 
} 
else 
{ //Go here if the nonterminal wasn't matched 
    if (IsRecursiveRule) 
    { 
        BrokenRule = RuleComprised[i].Name; 
    } 
    Broken = true; 
    break; //At this point the token stream cannot be matched to the initial rule. 
}

If the sub-rule didn’t cause the parse to break then we have to do two things. Add the amount of tokens the sub-rule matched to the token counter and then add the name of any matched rules within the sub-rule to the list of rules that this rule matched. And like, the token thing earlier I’m sure the overuse of the word rule is annoying if not confusing. The reason for this list of rules is that I wanted to be able to print out the statements that have been matched when the parse is done. To make sure that only the rules which are correct are added to the output list each rule gets it’s own list and that rule’s list will ultimately only be added to the output if the rule it belongs to doesn’t break. For example, if I have an if statement which contains a comparison statement and the comparison statement parses correctly but the rest of the if statement is broken, the list of matched statements will never get added to the greater list because the if statement list would break and never have its list broken. This is the method that I’d use to build up a parse tree, simply use node objects as opposed to lists of strings.

SuccessfulParse

Admittedly, the code to write things out still needs some tinkering; you can see from the above output that the recursive statements statement hasn’t been output for the following token stream:

OPEN_TOKEN, IF_TOKEN, OPEN_BRACKET, NUMBER_TOKEN, MORE_THAN_TOKEN, NUMBER_TOKEN, CLOSE_BRACKET, SEMICOLON, VARIABLE_TOKEN, EQUALS, NUMBER_TOKEN, SEMICOLON.

It’s a small problem that I’ll have to go rooting around for before switching to building a parse tree; I didn’t put as much effort into this simple output as I did the main parsing bit (don’t judge me too harshly when you see the code).
One of the biggest pains I’ve had when creating this was coming up with a way to handle recursive rules; without that then there’s really no point to it. My initial thoughts on this were to make the initial rule of the recursive statements the recursive part and to fall back on the non-recursive parts. This makes sense because it would first have to check against any recursive statement and would then just check for a standard statement when the recursive part is no longer true. Unfortunately, due to my own implementation this became too tiresome to implement, so I came up with a better method: check if the rule is recursive and if the rule that the parse breaks on is the recursive one then go back a token, accept that it’s matched, and then continue as normal.

public bool CheckMatchinRule(List PassedTokenStrean, ref int matchednumber, List FoundNodesList)
{
    bool IsRecursiveRule = false;

    if (RuleComprised != null)
    {
        for (int i = 0; i < RuleComprised.Count; i++)
        {
            if (RuleComprised[i].Name == Name)
            {
                IsRecursiveRule = true;
                break;
            }
        }
    }

    bool Broken = true;
    int matchednodes = 0;
    int TokenStreamCounter = -1;
    string BrokenRule = "";

    List ReturningStrings = new List();

    if (RuleComprised != null)
    {
        for (int i = 0; i < RuleComprised.Count; i++)
        {
            TokenStreamCounter += 1;
            if (PassedTokenStrean[TokenStreamCounter].Name != RuleComprised[i].Name)
            {   //Go here if must match a nonterminal in the rule
                try
                {
                    //Want to append the strings of the matched rules if the rule matched.
                    //Don't bother appending if they didn't
                    List RetrievedRuleList = new List();

                    bool BrokenInRule = (RuleComprised[i] as Rule).CheckMatchinRule(PassedTokenStrean.GetRange(TokenStreamCounter, PassedTokenStrean.Count - TokenStreamCounter), ref matchednodes, /*FoundNodesList*/ RetrievedRuleList);
                    if (!BrokenInRule)
                    {   //Go here if the nonterminal was matched
                        for (int j = 0; j < RetrievedRuleList.Count; j++)
                            ReturningStrings.Add(RetrievedRuleList[j]);
                                
                            TokenStreamCounter = TokenStreamCounter + matchednodes;
                            Broken = false;
                    }
                    else
                    {   //Go here if the nonterminal wasn't matched
                        if (IsRecursiveRule)
                        {
                            BrokenRule = RuleComprised[i].Name;
                        }
                        Broken = true;
                        break; //At this point the token stream cannot be matched to the initial rule.
                    }
                }
                catch
                {
                     if (IsRecursiveRule && RuleComprised[i].Name == Name)
                     {
                         Broken = false;
                         BrokenRule = RuleComprised[i].Name;
                     }
                     else
                     {
                         Broken = true;
                     }
                     break; //If the token cannot be treated as a rule then
                 }
             }
             Broken = false;
         }
         matchednumber = TokenStreamCounter;
     }

     if (!Broken)
     {   //If the rule could be matched then return the fact that it's matched. If it couldn't try alternate rules.
         for (int i = 0; i < ReturningStrings.Count; i++)
             FoundNodesList.Add(ReturningStrings[i]);
         return Broken;
     }

     if(IsRecursiveRule && BrokenRule == Name)
     { //If it's a recursive rule and it broke on the recursive part then stop parsing
         for (int i = 0; i < ReturningStrings.Count; i++)
             FoundNodesList.Add(ReturningStrings[i]);
         matchednumber = TokenStreamCounter - 1; //If broke on recursive rule then set the token back 1 to the point it successfully matched
         return false;
     }

     //Here have to check if this is a recursive rule and if the recursive part was what we broke on

     if (MatchingRules != null)
     {
         for (int i = 0; i < MatchingRules.Count; i++)
         {
             List RetrievedRuleList = new List();
             Broken = MatchingRules[i].CheckMatchinRule(PassedTokenStrean, ref matchednodes, /*FoundNodesList*/ RetrievedRuleList);

             if (!Broken)
             {
                 FoundNodesList.Add(Name);
                 FoundNodesList.Add(MatchingRules[i].Name);
                 for (int j = 0; j < RetrievedRuleList.Count; j++)
                     FoundNodesList.Add(RetrievedRuleList[j]);

                 matchednumber = matchednodes;
                 break;
             }
         }
     }
            
     return Broken;
}

 

Advertisements

My simple compiler

During my Languages module at university we all made a simple compiler for a basic language. This compiler takes an input file and outputs the equivalent code in C. I did fairly well with this module; my compiler outputted everything correctly, had some pretty good type checking, and employed basic optimisations like constant folding. Recently I’ve tried to do something similar, but wanted to employ some different data control mechanisms, and give the use the ability to create lists of dynamic length by default. This isn’t the most fleshed out language, but overall I think I’ve accomplished my goals.

Flex and Bison were used to create this solution, because, although I find them to be a pain to setup and get to use together, they do their respective jobs very well, and since they’re pretty old, well-documented tools someone will have a solution for whatever dodgy error message inevitably crops up. That, and I quite like using C, but can rarely find a task to justify its use.

In the language the user is able to create an individual variable or list of characters, integers or floats. In my compiler when a variable list is declared, three different methods are generated to construct lists of that type: the addvalue method which appends a value of the requried type to the end of the list, the RetrieveValue method which finds the value in the list based on the position in the subscript, and the DeleteValues method which loops through the list and releases each node in the chaain from beginning to end. For example, the program immediately below results in the output C code below it.

:( 
     int list IntegerList; 
     IntegerList<-1; 
     IntegerList<-2; 
     IntegerList<-3; 
     IntegerList<-4; 
     IntegerList<-5; 

     output<-IntegerList[1]; 
     output<-IntegerList[2]; 
     output<-IntegerList[3]; 
     output<-IntegerList[4]; 
     output<-IntegerList[5];
:)
#include 
#include 
#include 

struct intnode{
     int StoredValue;
     struct intnode *Next;
} typedef IntNode;

void AddIntValue(IntNode *RootNode, int passedvalue)
{
     if(RootNode->Next != NULL)
     {
          AddIntValue(RootNode->Next, passedvalue);
     }
     else
     {
          IntNode *NewIntNode = malloc(sizeof(IntNode));
          NewIntNode->StoredValue = passedvalue;
          NewIntNode->Next = NULL;
          RootNode->Next = NewIntNode;
     }
}

int RetrieveValue(IntNode *RootNode, int PassedIndicator, int PassedCurrentNode)
{
     if(PassedIndicator == PassedCurrentNode)
     {
          return RootNode->StoredValue;
     }
     else if(RootNode->Next != NULL)
     {
          RetrieveValue(RootNode->Next, PassedIndicator, PassedCurrentNode + 1);
     }
}

void DeleteValues(IntNode *RootNode)
{
     if(RootNode->Next != NULL)
     {
          DeleteValues(RootNode->Next);
     }
     free(RootNode->Next);
}

void main()
{
     IntNode *IntegerList = malloc(sizeof(IntNode));
     IntegerList->Next = NULL;

     AddIntValue(IntegerList, 1);
     AddIntValue(IntegerList, 2);
     AddIntValue(IntegerList, 3);
     AddIntValue(IntegerList, 4);
     AddIntValue(IntegerList, 5);

     printf("%d", RetrieveValue(IntegerList->Next, 1, 1));
     printf("%d", RetrieveValue(IntegerList->Next, 2, 1));
     printf("%d", RetrieveValue(IntegerList->Next, 3, 1));
     printf("%d", RetrieveValue(IntegerList->Next, 4, 1));
     printf("%d", RetrieveValue(IntegerList->Next, 5, 1));

     DeleteValues(IntegerList);
     free(IntegerList);
}

While this isn’t exactly a high-end compiler I did at least make some marginal optimisations. The three different methods for each list type of the standard variables are only generated once the code has been checked to make sure that the user has declared a list of that type. Hardly groundbreaking stuff, but it at least somewhat considerate.

Checks for variable type and variable presence is always accomplished with a recursive run through every child branch of the current node in the tree. Once a node that could have a variable in it has been reached the node’s value (in my implementation (heavily influenced by the projects undertaken throughout my university course)) is used to retrieve the name of the variable or its type from a global array.

In addition, I thought I’d take a crack at making a control mechanism in which the user could use logical comparisons or a numeric stopping point. In the following code the first loop creates a standard for loop to execute the inherited code a set number of times and the second loop should create a while loop with the comparison included.

:(
	int Counter;
	
	Counter = 0;
	until(5)
	->
		Counter = Counter + 1;
		output<-Counter;
	<- 	 	                                            
     
        Counter = 0; 	                           
        until(Counter > 5)
	->
		Counter = Counter + 1;
		output<-Counter;
	<-
:)

The below code is the compiled output for the above loops.

#include 
#include 

void main()
{
     int Counter;
     Counter = 0;
     for(int i = 0; i < 5; i++)            
     {                      
          Counter = Counter + 1;                      
          printf("%d", Counter);            
     }                  

     Counter = 0;            
     while((Counter > 5))
     {
          Counter = Counter + 1;
          printf("%d", Counter);
     }
}

This compiler doesn’t have quite as extensive a type-checking mechanism as that of my coursework; I wanted to see how easy or difficult it would be to include dynamic lists as a feature so this is an aspect I just put off for the time being. Overall, I’m happy with how the implementation worked out though I suppose it is could cause some memory problems if the output program throws an exception and crashes. As I said, not exactly a fleshed out language.

This hasn’t been a bad experiment, but what I want to do is make an attempt at my own top-down parser. It’s useful to use Flex and bison but it would be quite an accomplishment to build my own parser. Just as a quick aside, I did briefly consider using the Irony.NET tool instead of Flex and Bison; I did use Irony.NET for certain parts of my final year project and it seems a worthwhile tool, but I opted for the more familiar tools this time.

A simple mechanism for storing code change

One of the things I’ve looked at making is a simple way to document all of the changes someone makes in code. What I wanted was a simple way of making a quick snapshot of the code I make so that I could come back to it in the event that I do something stupid. The interesting part of my mechanism is that I wanted it to store any different changes so that I could travel so far down one route and then change to another when I decide that I’ve made a mistake; a sort of lazy local change control.

The structure I’ve created is a branching chain of nodes. Each node can have a previous node, a subsequent node, a change node indicating the start of a new branch, and a deviating node indicating the node from which this chain starts. In addition to these four pointers, each node contains a copy of the text at the point it was created. The diagram below indicates this setup.

Change control diagram 1

In the example diagram above saves were made at points 2 and 3. At point 3 it was decided that it would have been better to go a different route at point 3 and so the text was reverted and a new route was taken and so the branch 4 through 7 was created. At point 7 it was decided that it really was better going with my first choice so the whole text was reverted to text 3 and the further saves 8 and 9 were created. It’s a bit of a strange example but I think it fairly illustrates the kind of functionality this can provide.

Each new node is given its own int id number to check against when trying to retrieve the necessary text. This can come in handy when doing the tree traversal.

My initial idea for traversing the structure was to simply start at the current node and then run through accordingly. The thinking behind this was that the user would likely want to change to a recent save and that by starting at the current node I could probably minimize the number of nodes that have to be traversed. The problem with this is that the desired save could be on different branches and because it is possible to make new saves to old branches it is impossible to know which it is on without travelling down it. The simplest solution to this is to create a list of entry points at which the search can start. In my program the start of each new branch serves as an entry point ensuring that it is only necessary to traverse down to the end of each node . By using the desired save number and assuming that the user wants to move to a save close to the current save the number of traversals could be minimized by ordering the branches according to their start value when searching.

At this point a valid criticism would be that there is no need for the branches to be connected; if we’re just going to use single traversals down each route why bother with the overhead? The advantage to connecting each of the different branches is that it makes it extensible. My understanding is that in actual control mechanisms only the differences between the last save and the current are documented in order to save memory. Since its simple to get the complete route from any save node to the original text this kind of mechanism could be implemented if I put in a way to document the differences between two texts. The very basic test screen I’ve created uses this exact method to list individual nodes in its listbox (which admittedly seems a tad useless without some indication of what the changes are, but I’m focusing on the data structure not the interface (however crappy)).

Displaying branches 1

While I did intend this to be a branching storage mechanism, it should be noted that this can easily be changed to overwrite previous saves; when checking for a subsequent change a new subsequent node can be assigned as opposed to a branching node. The complete code for the control mechanism can be found below.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace RecordingChanges
{
    class StoredChange
    {
        public enum StorageMode
        {
            NonBranchingChange, BranchingChange
        }

        static StoredChange InitialStorage;
        static int Currentcounter;
        static List BranchNodes = new List();

        string CurrentlyStoredText;
        int PositionCode;

        public List GetRouteToInitial()
        {
            List ReturningList = new List();
            StoredChange CurrentlyStoredChange = this;

            while(CurrentlyStoredChange.PreviousChange != null || CurrentlyStoredChange.DeviatingChange != null)
            {
                ReturningList.Add(CurrentlyStoredChange);
                if (CurrentlyStoredChange.PreviousChange != null)
                    CurrentlyStoredChange = CurrentlyStoredChange.PreviousChange;
                else if (CurrentlyStoredChange.DeviatingChange != null)
                    CurrentlyStoredChange = CurrentlyStoredChange.DeviatingChange;
            }

            return ReturningList;
        }

        public static List GetBranchEnds()
        {
            List ReturningListEnds = new List();

            for(int i = 0; i < BranchNodes.Count; i++)
            {
                StoredChange CurrentStorageChange = BranchNodes[i];
                //Gets the base of the branch node
                while(CurrentStorageChange.SubsequentChange != null)
                {
                    CurrentStorageChange = CurrentStorageChange.SubsequentChange;
                }   //Advance to the end of each branch and store it
                ReturningListEnds.Add(CurrentStorageChange);
            }

            return ReturningListEnds;
        }

        public static void AddBranchStart(StoredChange PassedStoredChange)
        {
            BranchNodes.Add(PassedStoredChange);
        }

        public static StoredChange GetStoredChange(int RequiredStoredChange)
        {
            StoredChange ReturningChange = null;
            for(int i = 0; i < BranchNodes.Count; i++)
            {
                ReturningChange = BranchNodes[i];

                while(ReturningChange.ChangeCode != RequiredStoredChange && ReturningChange.SubsequentChange != null)
                {
                    ReturningChange = ReturningChange.SubsequentChange;
                }

                //At this point we've hit the end of the chain or the right node
                if (ReturningChange.ChangeCode == RequiredStoredChange)
                    return ReturningChange;
                //If that wasn't the correct code then we go to the start of the next branch
            }

            //At this point the input node hasn't been found
            return null;
        }

        public static StoredChange Initial
        {
            get
            {
                return InitialStorage;
            }
            set
            {
                InitialStorage = value;
            }
        }

        StoredChange NextChange, LastChange, NewBranch, DeviatingPoint;

        public StoredChange(string PassedCurrentlyStoredText)
        {
            PositionCode = Currentcounter;
            CurrentlyStoredText = PassedCurrentlyStoredText;
            Currentcounter += 1;
        } 

        public int ChangeCode
        {
            get
            {
                return PositionCode;
            }
        }

        public string ChangeText
        {
            get
            {
                return CurrentlyStoredText;
            }
        }

        private StoredChange AddSubsequentChange(StoredChange PassedSubsequentChange)
        {
            SubsequentChange = PassedSubsequentChange;
            PassedSubsequentChange.PreviousChange = this;
            return PassedSubsequentChange;
        }

        private StoredChange AddBranchingChange(StoredChange PassedBranchingChange)
        {
            ChangedBranch = PassedBranchingChange;
            PassedBranchingChange.DeviatingChange = this;
            return PassedBranchingChange;
        }

        public StoredChange AddChange(StorageMode PassedStorageMode, StoredChange PassedStorageChange)
        {
            switch (PassedStorageMode)
            {
                case StorageMode.NonBranchingChange:
                    PassedStorageChange = AddSubsequentChange(PassedStorageChange);
                    break;
                case StorageMode.BranchingChange:
                    if (SubsequentChange != null)
                        PassedStorageChange = AddBranchingChange(PassedStorageChange);
                    else
                        PassedStorageChange = AddSubsequentChange(PassedStorageChange);
                    break;
            }
            return PassedStorageChange;
        }

        public StoredChange SubsequentChange
        {
            get
            {
                return NextChange;
            }
            set
            {
                NextChange = value;
            }
        }

        public StoredChange PreviousChange
        {
            get
            {
                return LastChange;
            }
            set
            {
                LastChange = value;
            }
        }

        public StoredChange ChangedBranch
        {
            get
            {
                return NewBranch;
            }
            set
            {
                NewBranch = value;
                BranchNodes.Add(value);
            }
        }

        public StoredChange DeviatingChange
        {
            get
            {
                return DeviatingPoint;
            }
            set
            {
                DeviatingPoint = value;
            }
        }
    }
}

SUVAT question application

One of the problems I keep facing is that, although I can write software pretty well, I keep struggling to find things to make. The advice always seems to be come up with something useful so this is my first attempt at doing just that.

Since I spent a few days looking for new suvat questions when I was practicing for them throughout college, I figured I’d put together an application I’d like to have had at the time. The application presents the user with one suvat question from a list of back-end problems. The user can attempt to answer it, look through the solution, and request another problem once they’re done.

Screenshot_2017-08-23-19-42-35 Screenshot_2017-08-23-19-42-43

Since I want to be able to add new problems to the application without making any laborious changes I decided that it would be best to store the input problems in a back-end XML file. Luckily Java has a pretty good XML parser, albeit one I still took too long to get working due to my misunderstanding of how the read methods would advance the parser’s position.

Sticking everything in XML makes editing easier but does leave me with a tiny technical problem, but one I don’t think is quite worth fixing at the moment. Normal strings stored as an Android string constant can be changed according to the language of the target device; I couldn’t do the same thing with the XML file contents without some manual checking.

Despite it only being in English and the fact that it’s very basic-looking, I like to think that someone might have a genuine use for this application. I think I’ll publish it, maybe once it looks better and I’ve exhausted every method of not paying for a developer application on the various app stores.

Code Editor: My Final Year Project

Since the academic year is over and I finally graduated I think I can now write about my final year project without it jeopardizing any marking.

My project was to create a code editor with syntax highlighting and autocomplete. In addition to these features required by the brief I also created other functionality in order to expand the functionality of the editor; these were error-feedback, project spaces, compiler integration, and simple source-control mechanisms (commit, revert, and update). The editor targeted the C language, specifically the ANSI-C standard.

Standard editor script

Figure 1. The main layout of the editor

The above figure shows the main layout of the editor that the user interacts with. It’s a fairly simple layout devoting the most screen space to the coding section and making the editor functionality available in the toolbar to the left.

Regarding the toolbar, the choice of this was largely based on another editor’s interface, one made by Robert Crocombe a couple of years earlier. The difference was that he made a very elegant toolbar which expands to give greater information about the tool; in my toolbar I relied on familiarity with common icons in order to show the functionality of each button (this is something I’ll have to change according to the user feedback I got).

Overall I managed to get the majority of the goals for the project, but couldn’t get all of the source-control solutions done in time.

Syntax highlighting

The syntax highlighting feature was made to highlight each different lexeme in the C language. This included all 32 standard keywords and operators (numbers, pointers, etc), and any user-defined types.

Thankfully, a fair few people have undertaken the code editor project before me so a pretty clear method for visually drawing the text on the screen had been provided; inherit from the standard text box and override the rendering method to highlight the text.

Like the people who did the project before me I managed to make a few optimisations to the rendering code; before generating any of the formatted text objects to be displayed first check to see if the line to be rendered would be on-screen. In this way the same number of lines are rendered regardless of the length of code and there is negligible difference between the rendering times because the checks for whether the line is visible is comprised of a relatively inexpensive mathematical comparison.

Autocomplete

Code editor autocomplete example 1

Figure 2. The autocomplete feature suggesting two valid variables

Code editor autocomplete example 2

Figure 3. The autocomplete feature suggesting method signatures alongside suggestions

Code editor autocomplete example 3

Figure 4. The autocomplete feature ensuring the method signature remains displayed until the closing bracket of the method call has been completed.

An autocomplete feature is one which suggests any of the different possible tokens to be entered based on the partial string the user has entered. For example, if the user had entered the text ‘d’ the autocomplete feature would suggest the words ‘do’, ‘double’, and any variables starting with the string ‘d’.

One of the aspects of my editor that I’m most proud of is its ability to account for variable scope. When I was looking at the other editors I didn’t want to have an editor that suggested variable names regardless of scope (Notepad++ had the type of autocomplete I wanted to avoid (not that Notepad++ is a bad editor)). In order to accomplish this scope awareness I’ve built up some classes to build a type of symbol table. A MethodScope object is created for each method. This MethodScope object contains the name of the method and its start and end points in the text. Each method scope object contains a list of variable objects. These are comprised of the variable name, its type, and the point in text beyond which it is applicable. Using these objects it is possible to see which variables are applicable by comparing the text’s caret position to each MethodScope’s start and end position and the applicable point of each variable within that scope.

The autocomplete feature also tries to allow the user a degree of error before completely removing suggestions. To do this I’ve used the edit distance algorithm to make sure that the entered text is within a set number of changes before deciding that a variable cannot be suggested. For example, if the code contains a variable ‘LocalVariable’ and the user enters ‘localVariabl’ the variable would still be entered because 2 of the characters could be changed to get us to the actual suggestion (2 was the acceptable number I decided on, but this could easily be changed). I should mention that this was an idea suggested by people who had previosuly undertaken the code editor project.

Error-feedback

My code editor contains syntactic error-feedback. This is accomplished via the Irony.NET parsing technology which I chose because it allows the BNF grammar to be coded into the program and doesn’t require any kind of additional runtime-environment. Incidentally, the autocomplete and syntax highlighting features could be accomplished via this parsing technology by checking the end nodes of the generated parse tree for their token types and then colouring and storing them. This approach was not used because the first syntactic error causes the parsing technology to stop parsing and I didn’t have the time implement any error-recovery mechanism.

The parsing technology stops on the first syntactic error, but an ideal error-feedback feature will highlight as many errors as possible. Using the MethodScopes object it is possible to extract all of the text from a method and parse them individually, thus my editor can get the first syntactic error in each method and highlight multiple errors throughout the code.

Project spaces

This is the most basic of the features I’ve implemented. Each file in the project should be located in the same space in secondary memory. Once a program is saved through the editor both the back-end file and the XML project file are updated. The XML project file contains lists of each file in the project, the location of any SVN repository location, and any project-specific compiler arguments that might have been entered.

The XML format was chosen because this back-end file has to be human-readable in the event that edits have to be made outside of the editor. Jason was another viable format option that’s quicker to parse, but XML was chosen because the speed of execution isn’t vital to this feature and because the .NET library comes with in-built XML tools whereas a Jason approach would require 3rd party tools which I wanted to limit the use of for clarity’s sake.

Below is an example of a typical project XML file which is loaded and saved in order to store the necessary information about the project:

Main.c
Person.h
Person.c

To conform with good standards that Project tag really should encompass all of the file tags, but this worked well enough.

Compiler integration

Compilation and run program settings

Figure 5. The compiler and run-program sections of the settings page.

Compilation is a necessary part of C development.  Since I wanted my editor to provide all of the necessary tools for writing C programs it is necessary for the user to be able to compile and run the output program through the editor. Above is the compiler and run program section of the settings page that the user-feedback indicates could use some work.

The editor allows the user to specify any compiler on the system and any number of compiler directives. Once specified a compiler setup is saved into secondary memory and can be used again on multiple programs. Having multiple applicable compilers was a feature I adapted from Code::Blocks since it seems like a useful feature but I didn’t think it was readily accessible in most of the larger editors I’d looked at.

Ensuring the compiler setup can be applicable to multiple projects means I’ve had to setup a mechanism to evaluate the setup arguments and produce the output string to be passed to the compiler as an argument. For example, the GCC compiler can output an executable with the specified name using the -o argument. If the user specifies this in my editor (as shown in the figure below), the output string -o followed by whatever the project name is produced by the EvaluateArgument method.

Standard GCC setup

Figure 6. The standard GCC argument for outputting the executable stored in the C editor.

        public string EvaluateArgument(Project PassedProject)
        {
            string ReturningString = CompilerArgumentSpecifier;

            switch(ChosenArgumentType)
            {
                case ArgumentType.UseAllFiles:
                    for(int i = 0; i < PassedProject.ReturnProjectFiles().Count; i++)
                    {
                        if(PassedProject.ReturnProjectFiles()[i].EndsWith(".c"))
                        {
                            ReturningString += ("\"" + PassedProject.ProjectLocation + @"\" + PassedProject.ReturnProjectFiles()[i] + "\"");
                            ReturningString += " ";
                        }
                    }
                    break;
                case ArgumentType.ProjectName:
                    ReturningString += "\"" + PassedProject.Name + "\"";
                    break;
                case ArgumentType.ProjectLocation:
                    ReturningString += "\"" + PassedProject.ProjectLocation + "\"";
                    break;
                case ArgumentType.EntireProjectAddress:
                    ReturningString += ("\"" + PassedProject.ProjectLocation + @"\" + PassedProject.Name + "\"");
                    break;
                case ArgumentType.ManualEntry:
                    ReturningString += "\"" + OptionalArgumentType + "\"";
                    break;
                case ArgumentType.Ended:
                    //Should be used when the argument is a single variable
                    break;
            }
            return ReturningString;
        }

One of the things I should have done, and might in the future, is have each compiler include all of the standard directives by default; when I implemented the feature I wanted the user to be able to input the necessary directives in the (admittedly unlikely) event that someone was using an unconventional compiler or variation of standard, but including all standard directives by default would save significant time.

Running the compiler and the output program uses very similar code. The .Net library’s System.Diagnostics library contains the code to launch an external process and specify any input arguments. This works quite simply for individual program arguments, but the compiler arguments required me to build a mechanism to produce the output arguments based on their type.

Source-control mechanisms

I wanted the editor to provide commit, update, and revert to options to the user. Unfortunately, due to time constraints and a lack of familiarity with the API (SharpSVN which was the best tool I could find for the job), it wasn’t possible to implement the revert to feature within the time given for the project.

The editor allows the user to both commit and update their work via the tortoise source-control software, but there are some caveats. Unfortunately, I wasn’t able to implement the functionality to commit new folders to the repository thus if the user wants to commit and update from a repository it is first necessary to checkout the folder from the repository and then save the project in the checked out folder.

I should mention that someone quite generously put up working code for the commit and update functions of the API, which meant I could adapt the code to implement these features quicker. These can be located here (Accessed 28 October 2016).

Overall

My editor works very well considering the functionality I tried to implement, the timespan I had, and the new tools I had to work with. If I’m going to return to this in the future then there are a fair few simple updates to be implemented:

  • Update the source-control mechanism to allow the user to create folders on the repository side
  • Include the revert-to functionality in the source-control functionality
  • Changed the compiler directives settings to include all standard directives by default

There’s quite a bit more that could be done, however these are relatively simple (hopefully simple in the case of the SVN work) that could be done in the immediate future.

This blog post is simply a quick look at the project; it would take a great deal longer to talk about all of the project. While I’ve just done a quick run-through of the features here it’s fair to say that I made some unorthodox design choices and had to come up with strange solutions for obscure problems, which might make pretty good posts later on.

Creating a better management system

ManagementSystemServerSide ManagementSystemClientSide

Figure 1 (left). The server application which services each client

Figure 2 (right). A simple client to communicate via the implemented protocol

This week I’ve been working on a better management system. This was an interesting project because it incorporates many different approaches to make it accessible and secure.

The basis for the project is very simple. A user can connect to and log into the server via the client application (this could be changed to a web-server or mobile application later with a little careful tweaking). Once logged in the client can place a new order or look at the list of their previous orders.

Once connected all interactions take place over a persistent connection. In this way, a user’s individual Id can be associated with the connection (incidentally it would be possible to extend this further later on in order to implement some kind of privilege mechanism to distinguish customers and staff in this hypothetical company).

The actual protocol implemented is very basic with the first line sent denoting the command necessary and subsequent lines supplying the information for each command.

Multi-threading

A server has to handle many connections concurrently (or it should at least if it’s going to be considered any good), so my application relegates each connection to a separate thread. In my application the TCPListener object listens on a background-thread (which is necessary because the AcceptTcpClient method is a blocking call) and launches a new thread passing the accepted TcpClient to it.

            while (true)
            {
                TcpClient Connection = ConnectionListener.AcceptTcpClient();
                Thread t = new Thread(() => { ServiceConnection(Connection); });
                t.Start();
                //Notion of using delegate to pass object to thread provided at the following source
                //https://stackoverflow.com/questions/3988514/how-to-pass-refrence-and-get-return-in-thread
                //Accessed 26/06/2017
            }

The idea of using a delegate method to pass the TcpClient to the thread was provided here by user sumit_programmer (accessed 26/07/2017).

Salting hashes

While creating this application I’ve attempted to follow best practice when storing password, meaning that users with the same passwords should not have the same hashes associated with them. This is generally accomplished by appending random data to the password to make sure the hash becomes vastly different.

private string GenerateSalt()
{
     string GeneratedSalt = "";

     for(int i = 0; i < SaltLength; i++)
     {
         GeneratedSalt += System.Text.Encoding.ASCII.GetString(new byte[] { (byte)rnd.Next(0, 255) });
     }

     return GeneratedSalt;
}

In my system this is done by generating a random string of fixed length and appending it to the entered password. I’ve accomplished this by simple using the .Net Random object and using the result for hash values. This has worked pretty well so far but because the Random object only gets its value from a pseudo-random sequence it is entirely possible that the same character will be selected from each part of the salt if selected quickly enough. Fortunately, this is unlikely to happen throughout the salt and very unlikely to produce the same salts for the same passwords. As an example both users 1 and 2 have the same password (‘password’) with vastly different hashes to be matched.

ChangedHashes

Figure 3. The difference in hashes for the same password

Hash-based check-sums

Another of the security measures I’ve implemented is to ensure that the message received hasn’t been tampered with en-route. To do this a hash of the concatenated message contents is sent along with the message. Then a new hash is recreated upon receipt; if the newly created hash and received hash are equal then there has been no tampering, but if they differ the message has been altered.

This leaves me with something of a weird question; if the message has been intercepted and changed do I want to inform the client of it? If you do wouldn’t this alert the attacker to the fact that they’ve been discovered? Because this is unlikely to occur in my system I’ve simply resolved to make sure that nothing happens in the event that the two hashes don’t match, but this is something I’ll have to look into later.

      int ChecksumLength = int.Parse(ConnectionInputStream.ReadLine());

      string CheckSum = "";

      for(int i = 0; i < ChecksumLength; i++)
      {
           CheckSum += (char)ConnectionInputStream.Read();
      }
      ConnectionInputStream.ReadLine();

An interesting problem that I didn’t think of accounting for before the start of the project was that the hashes would inevitably include special characters (see above). This meant that the length of the hash would have to be sent to know how many individual characters to read in; this is the downside of using something low-level like ports.

Conclusion

This hasn’t been a bad project. I’ve made a reasonably useful protocol that could be extended further, and I’ve implemented some fairly reasonable security measures throughout it. It certainly may have been a better idea to implement it in some higher form of communication technology but overall I’m reasonably happy with it.

Creating my simple XML setup file

At the end of the last blog post I said that it’d be a good idea to create some kind of config file in order to change the functions that the program could provide without having to keep recompiling the program. To fulfill this hastily decided upon requirement I’ve created XML config file with each of the queries encapsulated within their own tag (as seen in the example snippet immediately below) and a method which creates a XAML button for each query stored. The appropriate query can be run upon this button’s click.

<SQLQuery name=”…” query=”…” result=”…”/>

There are a few problems to overcome for this to work. Since the number of queries provided isn’t known until run-time, each button click must run the same method since the code to create the buttons doesn’t iterate a set number of times. Since each SQL query returns a different mix of columns (and consequently types) this has to be accounted for when retrieving the rows.

Upon successfully loading in the XML query element a ‘StoredDynamicQuery’ object is created to store the query, its name, and the format of the results. To make it simpler the content of the button is set as the query name; once the button is clicked its content is used to retrieve the necessary ‘StoredDynamicQuery’ object.

result=”CustomerID:int,Name:string,Age:int,Address:string,PostCode:string”

To account for the changes in the end-results I’ve come up with a results mechanism that could generously be described as JSON-like (as seen above). They’re written in key-value pairs separated by colons, and the results string can contain any number of these with individual pairs split by commas. The code to account for this can be seen below.

StoredDynamicQuery RetrievedQuery = RetrieveStoredQuery(QueryName);

using (SqlConnection connection = new SqlConnection(DataString))
{
     using (SqlCommand command = new SqlCommand(RetrievedQuery.QueryCommand, connection))
     {
          List IndividualResults = RetrievedQuery.Results.Split(',').ToList();
          connection.Open();

          using (SqlDataReader reader = command.ExecuteReader())
          {
               if (reader.HasRows)
               {
                    while(reader.Read())
                    {
                         string ConcatenatedString = "";
                         for (int i = 0; i < IndividualResults.Count; i++)
                         {
                              string QueryPart = IndividualResults[i].Split(':')[0];
                              string ResultType = IndividualResults[i].Split(':')[1];

                              switch (ResultType)
                              {
                                   case "string":
                                       ConcatenatedString += QueryPart + ": " + reader.GetString(reader.GetOrdinal(QueryPart)).TrimEnd() + " ";
                                       break;
                                   case "int":
                                       ConcatenatedString += QueryPart + ": " + reader.GetInt32(reader.GetOrdinal(QueryPart)) + " ";
                                       break;
                               }
                          }
                          OutputBox.Items.Add(ConcatenatedString);
                     }
                }
           }
      }
}

In the above code the QueryPart and the ResultType each represent the column name and type of the key-value pairs of the results string. In this way I can basically add in any new query methods without the need for recompilation. For examples see the below two images.

LoadedQueriesExample1   LoadedQueriesExample2

Figure 1. The initial functionality.

Figure 2. The expanded functionality based upon the updated XML file.

Figure 1 (on the left) is created with the first XML layout below. The layout in figure 2 (on the right) is an extension of the first layout. In my implementation it was possible to add an entirely new piece of functionality by just adding an additional XML line.

More work would have to be put in to account for queries that insert and update data (some way of knowing which visual element provides which bit of data for the query would have to be agreed on), but overall it provides what I wanted.

Using SQL connections in C#

Last time I used an SQL database with C# it all went through the LINQ plugin. It’s useful, but it always takes a good while to setup and requires me to remember a good chunk of code (like where to put the NotifyChanging and NotifyChanged parameters, what attributes are needed, etc). One of the things I discovered courtesy of this page  (accessed 13/06/2017) is that the .NET framework provides some APIs to link to a back-end database.

DatabaseManagementOpenPage

Using these I’ve created a simple database management system in order to query and update a database through a WPF GUI application. The database I’ve created is comprised of 3 tables: a customer table, an order table, and a product table. The application allows the user to query the data in the database, update the data, and insert new records.

The actual code involves first creating a data connection, then a command, and finally executing the command. It’s the ability to manually stick in the SQL string you want to execute that makes it the most useful to me, but since each row being returned is comprised of different rows it means having to come up with some alternate mechanism to account for the results of each row. This means that the finished program is lot longer than a LINQ alternative.

I think I like this approach over the LINQ option; it means writing the SQL queries in by hand, but I think this is a fair trade-off for having a simpler setup. I suppose if I’d known about this simple API beforehand then I’d probably used back-end databases in some of my older projects.

One possible change to be made to this program would be to store the SQL queries in some external file so I don’t have to recompile the thing every time I want alternatew functionality. The idea I’ve come up with is to use an external XML file and then use some kind of regular-expression replacement mechanism (this has become something of a go-to thing for string replacement lately (worrying maybe, but if it works)) in order to programmatically alter the subject of the search. This would be necessary as most of the queries inherent in the project require either data insertion or mild changes of the same query.

Creating a simple Chrome extension

Working Extension

Image 1. The current popup frame for my extension

With the help of some online tutorials and pretty good documentation I’ve managed to make a simple Google Chrome extension. Built more to experiment with the kind of functionality that can be built, my extension does some pretty basic stuff, but it’s interesting to know how these things can be built to target multiple web pages.

My extension currently has 4 bits of functionality: get the page to display a simple alert, remove all of the images on a page, replace all of the images on a page, and display all of the hypertext links on the page in an alert.

Before  Removed Images

Image 2. The original image search.

Image 3. The image search with sources removed through my extension.

Replaced Images LinkRetrieval

Image 4. The images replaced with a separate PNG image through my extension (original image here)

Image 5. The links retrieved from each <a> tag on the page

As it turns out, you can make the prompt with a simple HTML file. This was a pretty nice surprise as HTML and CSS are simple enough and I can make a fairly nice interface using them. The problem is that an extension doesn’t support inline JavaScript (so StackOverflow says) meaning that you have to add event handler’s to each different button.

One interesting mechanism I’ve had to learn about is the message passing mechanism chrome extensions have to implement for communication between content-scripts (ones that are embedded into the page) and the scripts running in the extension. The messaging code involves sending an appropriate code to the recipient, it is up to the recipient to pick an appropriate action based upon this code. It reminds me of Android’s intent mechanism.

I’ve used messaging in two places in this extension: to trigger the functionality necessary from the extension, and to initially update the extension with the number of images and links on the page.

This hasn’t been a bad experiment, now I just have to think of some kind of missing functionality to base an extension around.

I learnt a significant deal how to get started from this helpful tutorial (Getting Started, accessed June 6th 2017), without which I wouldn’t be able to create the manifest file, and based most of my messaging around the tutorials found on the official messaging page (Message Passing, accessed June 6th 2017).

3 Thing Game 2017 Entry

Every year the University Of Hull’s Computer Science department host a game where teams get assigned three random words and then have to create a game around those words. This year team Error 303 (a holdover name from our first year) created Mine of Aether, which Kristian has made available here.

Mine of Aether 1  Mine of Aether 2

Why Mine of Ather? The three words we got were: miner, ethereal, and caution. Since the point of the contest is to base the game around the words the overall concept was as follows: you play as a miner navigating his way through tunnels in an attempt to find jewels (or treasure chests in this case), all the while avoiding the enemies present in the mines and using the temporary ghost mode to reach areas that would be otherwise inaccessible.

The entire thing was created in Unity, which, thankfully, the other members of the team have far more experience with, and, miraculously, has its own kind of source-control mechanism so you can work across machines.

Unlike our usual 3 Thing Game performance where we stay up most of the night and get too tired to make much progress at about 8 or 9am, this year we managed to last the full 24 hours. It would be fair to say that the game still needs work and we didn’t get to put in everything we wanted, but its definitely worth a play through.