Java JFrame

Recently a co-worker said that he preferred to use Java over C# because he preferred not to be stuck under Microsoft’s thumb. This wasn’t a bad point and when he inadvertently reminded me that Java runs on Linux I thought I’d take another crack at using it. Now I have an option for a familiar, reasonably high-level language to run on my Ubuntu installation; I never could be bothered fiddling with the XAMARIN installations to get them to work.

With this I basically picked a task to have a go with Java and the Swing library. I think it’s pretty cool that these come built in with the JDK, so I had a great deal of fun creating this program but didn’t get nearly as much done as I’d have liked.

HTTPServer1

Figure 1. The program running on Windows

Screenshot from 2017-11-05 11-57-36

Figure 2. The program running on Ubuntu

Basically, I decided to build a simple server to respond to GET HTTP requests and send out any from a list of specified HTML pages. I’ve managed to do this by setting up a simple config file which specified the list of HTML files that can be retrieved and which to default to when the user doesn’t specify a page to load to.

HTTPServerPages

Figure 3. The contents box showing every file loaded in from the config box

As part and parcel of using any new technology I’ve gone and made some rookie mistakes: notably forgetting that the GUI wouldn’t scale properly when using a different resolution, and forgetting that the Linux systems have different line endings.

I realise this wasn’t much of a blog post; more a minor update as an example of what I’ve been doing. Hopefully, the next time I post one of these I’ll have something a bit more impressive to show off.

Advertisements

JQuery based quiz

I’ve started learning JQuery. I’m not a web developer but I’m realising that I should probably keep expanding my skill set. Once again I’ve struggled to think of anything to create but I’ve settled on a simple quiz because this is the kind of thing I used to create in HTML applications when I was first learning to program.

So here’s my basic setup: a simple HTML multiple-choice quiz in which the user selects the correct answer and the score is tallied up a the end of the quiz. Naturally these questions are stuff I’m interested in (what’s the point of making a quiz if I end up screwing up capital cities again?).

Quiz Screenshot

Figure 1. An overview of my quiz

So here’s my quiz. As you can see it’s a perfect blend of UI design and intellectual challenge.

There’s a joke in the Simpsons where Marge gets onto the internet and Lisa makes a crack about her finally discovering what millions of people have already gotten around to. I realise that I’m pretty much in the same boat here.

The ‘back-end’ of my quiz is setup so that the correct answers are stored within a simple array which is iterated through when setting up and checking the answers (as shown in the code below). This is done so that I can quickly add and edit the answers and because it would be simple to setup some kind of server-based answer supply (probably through AJAX (I do seem to be coming back to this a lot)) so that the answers wouldn’t have to be hard-coded into the JavaScript.

for(var i = 0; i < QuizAnswers.length; i++)
{
	var QuestionID = 'q' + (i + 1);
	var AnswerID = "a" + QuizAnswers[i];
	$('[name="' + QuestionID + '"]#' + AnswerID).data("Answer","Correct");
}

One of the things I think is pretty ingenious about the JQuery framework is how you can select DOM elements by searching attributes. So far I’ve just being using this to programmatically split the group of radio buttons into individual groups, but as a few sources have demonstrated I could have saved myself a lot of grief by just including :checked in the search and checking whether it was correct. The more you know.

I think one of the main reasons I’ve taken this roundabout all-too-wordy approach is that I was intrigued by this .data method. This isn’t something I’ve come across before so I’ve used this as a learning exercise, associating the correct HTML radio button with a key-value pair via the data method. When it comes time to score the quiz I simply loop through the checked radio buttons and see if this associated key-value pair is present.

All in all I appreciate that this isn’t the most exciting project. It was done more for my personal learning than as any kind of viable product or project. While I sincerely doubt I’ll be doing much front-end stuff where I’m employed (thankfully) it was still interesting to make inroads with a new library.

Very simple Prolog bot

In a University course we were introduced to the Prolog programming language which we used to create a grammar parser. Today I’ve just had a bit of fun making a not-so-sophisticated bot.

My agent basically just looks for very basic parts of an input sentence structure and spits out a very basic message. It’s basically an Eliza bot but it takes into account the order of the words in a sentence. An Eliza bot would just look for keywords in a sentence and spit out a generic response. That’s all I’ve done but I’ve tried to anticipate the way the sentence will be structured.

In truth this was basically just an excuse  to break out Prolog again and have another play around with it; I think it’s a great language to work with, even if I can think of very few uses for it personally (hence this flimsy excuse).

Basic agent functionality

In the above we can see my bot responding to very basic inputs. These are governed by my Prolog ‘response’ rules which dictate what responses are output for the user. My responses are built up by providing a list of words in the order they appear in a sentence and then a series of actions that are to be completed as a consequence of matching the input to those words. E.g.

response(["name", "is", X], ( write('Hello '), write(X), nl)).

One of the things I really like about Prolog is how much can be done with little code. Being able to stick new executable code as an argument this easily is something that would be great in more languages.

As you can see this bot really would have a lot of problems: my code only cares if these words are present in the input and doesn’t account for anything that might occur between these words (e.g. if I’d input the string ‘the name of my dog is Jack’ it would happily welcome me as a dog and assume such a being was capable of operating a keyboard), and if there is no rule containing these words then the bot will fail since I haven’t implemented any bog-standard response.

agent:-
    read(X), split_string(X, " ", "", Arguments), vet_input(Arguments), agent.

vet_input(Arguments):-
    response(ResponseArguments, ActionableResponse),
    my_list_contains(Arguments, ResponseArguments),
    ActionableResponse.

my_list_contains([Head|Tail], [Head1|Tail1]):-
    (Head = Head1, my_list_contains(Tail, Tail1));
    my_list_contains(Tail, [Head1|Tail1]).
/* If head matches append it to the List, if it doesn't go to the next argument. */

my_list_contains(_, []).
/* The above rule is matched when all response words have been located in the input */

response(["hello"], (write('Hello, nice to talk to you.'), nl)).
Response(["name", "is", X], ( write('Hello '), write(X), nl)).
response(["worried", "about", X], (write('What''s so troubling about '), write(X), write('?'), nl)).

Not much could come from this project, an Eliza-bot (or a lazy bastardisation of it) doesn’t have much practical use outside of a side-project, but it is fun to play around with Prolog again. I’m also amazed at how much you can do in comparatively few lines of code.

Furthering my compiler

In the last blog post (admittedly ages ago) I’d built my own method for parsing a BNF grammar. Since then I’ve updated the program a great deal so that the user can now input their own code, have it tokenised, parsed, and then produce some output code in a language of their choice. It’s taken me a great while to manage this due to work and holidays and this blog post only represents a small update; the program itself really needs a lot more testing, but if I waited until everything I intended to do was finished I’d never post anything and this blog seems barren enough as it is. I’m reasonably happy with what I’ve made so far, inevitable overlooked bugs be damned.

NewMultiStatementParse

Figure 1. A BNF parse of 3 statements


#include <stdio.h>

 

#include <stdlib.h>
void main()
{if(Var> 3 ){
}
Var =3;Var =3;

 

}


Figure 2. The output C code from the parsed input still containing some formatting issues.

Though the BNF I’m parsing is admittedly moronic (an if statement without any consequential statement is entirely useless, as is the fact that I can’t declare variables) you can see from the above parse that my rule successfully parses the recursive BNF statements rule.

Since you’re going to use the input code in the output for variables and operators I’ve decided to store the input tokens in the tree node which corresponds to that input; for example, in the parse tree above you can see Var and 3 in the VARIABLE_RULE and NUMBER_RULE nodes, and the = and ; symbols in the ASSIGNMENT_STATEMENT node. Going forward I’m going to use a flag to stop some of these tokens being stored. Only a small subset of the input code really needs to be stored for the output code; storing everything has just provided a temporary sanity check while the thing was being built.

The tokenisation method I’ve built is a bit more elaborate than it has to be, but since I’ve setup the token and rule classes so that they could quickly be changed I wanted the tokenisation method to be configurable too. In order to do this I’ve built a sort of basic ‘bridging’ class which contains the regex that can be used to match the input and the token that must be returned as part of the token stream.


List<string> StatementsAcceptableNodes1 = new List<string>();
StatementsAcceptableNodes1.Add(“STATEMENT”);

List<string> StatementsCodeOutput1 = new List<string>();
StatementsCodeOutput1.Add(“<N1>”);

RuleCodeGeneration StatementsOutput1 = new RuleCodeGeneration(“STATEMENTS”, StatementsCodeOutput1, StatementsAcceptableNodes1);
CodeOutputRules.Add(StatementsOutput1);

List<string> StatementsAcceptableNodes2 = new List<string>();
StatementsAcceptableNodes2.Add(“STATEMENT”);
StatementsAcceptableNodes2.Add(“STATEMENTS”);

List<string> StatementsCodeOutput2 = new List<string>();
StatementsCodeOutput2.Add(“\n<N1>\n<N2>\n”);

RuleCodeGeneration StatementsOutput2 = new RuleCodeGeneration(“STATEMENTS”, StatementsCodeOutput2, StatementsAcceptableNodes2);
CodeOutputRules.Add(StatementsOutput2);


Figure 3. The creation of code generation rules setup for the different possible STATEMENTS setups

I’ve also employed this type of method in the code generation phase (see figure 3 above). Once the tree’s been built up it would be relatively simple to recurse through the built up parse tree and just output the compiled code, however I’ve made a sort of bridge again so that the output code that is produced can be changed dynamically. When recursing through the parse tree, a list of these output generation rules are passed to each node. The node type is checked against each rule and the node’s children are checked against a list of acceptable nodes for that rule.

Once the right rule has been determined the output code for that rule is selected and vetted; each rule can contain parameters denoting whether that node’s children or stored variables have to be inserted. These are denoted by the tags in the code output strings: <N(NodeNumber)> or <V(VariableNumber)>.

While I’m happy with this system, there is a downside when employing it for code generation. When using these code bridging classes to determine the output code it’s necessary to add multiple entries for each possible variation of a rule. For example, in figure 3 I’ve had to build up 2 STATEMENTS code generation rules: 1 for a STATEMENTS rule with a single STATEMENT contained, 1 for a STATEMENTS rule that contains more STATEMENTS. While I don’t intend to change it at the moment because it functions as intended, it’s worth acknowledging that it seems like adding additional work for might only be slight variations in output.

Another slight problem is that in my system I don’t currently have a method to change the output code depending on the value of a variable. The code supplied to this bridging class is ultimately what is put in the output and this is currently the only place to specify any kind of vetting of the variable; this vetting would be stuck in the output code which could be detrimental to the output code. This is something I’ll have to devote some thinking to.

The system I’ve come up with is unconventional. I’m happy that I’d be able to create a compiler for a basic grammar with the classes I’ve created. It works in principal but needs a great deal of work. Next I intend to come up with a more complete grammar and start ironing out the bugs.

My top-down parser

In the last post I uploaded I’d said that I wanted to have a go at creating my own parser for checking input against a BNF grammar. That’s been my occasional goal for the last couple of weeks and although at the moment the grammar is only a small subset of anything that could be considered useful I’m quite happy that I’ve got the method sorted which is the most important part. At the moment the BNF grammar looks like this:

main_program : OPEN_TOKEN statements CLOSE_TOKEN;

statements : statement statements
           | statement
           ;

statement : assignment_statement
          | if_statement
          ;

assignment_statement : operand EQUALS operand SEMICOLON;

if_statement : IF OPEN_BRACKET comparison CLOSE_BRACKET SEMICOLON;

comparison : operand comparator operand;

comparator : MORE_THAN
           | LESS_THAN
           ;
operand : NUMBER
        | VARIABLE
        ;

I’m aware that it’s functionally useless (who’d want an if statement without any consequences, and why aren’t the brackets in the comparison rule) but it suited my purpose for testing the method.

Since I wanted to focus on this part of the compilation process exclusively, for the moment I’ve just decided to manually stick in a list of tokens to be matched against instead of tokenising input text. If I do extend this to the point that I want to include the other compilation steps (and I might because it’s been an interesting little project) a tokenising phase can be added with relatively little hassle.

The basic method is quite simple and looking back I’ve written comparatively little code when considering how long it took me to get it working, but that’s part and parcel of learning things. A grammar is comprised of terminals and non-terminals. Non-terminals can be comprised of terminals and other non-terminals. Terminals can only be matched and cannot be split down any further. In my method we take the list of tokes and pass it to a rule (our non-terminal) and attempt to match it to what makes up that rule. Tokens can be matched to that rule’s tokens, but if a token cannot be matched to a rule’s token then the program can check if that token is a rule and try and match it against that rule. Whenever a sub-rule is called a subset of the toke stream is created, from the token we matched up to to the end of the stream, and the number of tokens a rule matches is recorded so any rule it came from can continue from that point.

Sorry, that’s the simplest way I can explain it and I appreciate that I’ve used the word token to the extent that seeing it again will likely infuriate you. In short try and match a token to a rule’s token, if that’s not possible see if there is a sub-rule that can be matched for the rule that will match the next part of the token list. If so then continue as normal. If not look for the rule’s alternatives. That’s what’s happening in the code below.

if (PassedTokenStrean[TokenStreamCounter].Name != RuleComprised[i].Name)
{   //Go here if must match a nonterminal in the rule
    try
    {
        List RetrievedRuleList = new List();
        bool BrokenInRule = (RuleComprised[i] as Rule).CheckMatchinRule(PassedTokenStrean.GetRange(TokenStreamCounter, PassedTokenStrean.Count - TokenStreamCounter), ref matchednodes,  RetrievedRuleList);

The matched nodes number is to record the amount of tokens that the rule has matched. Once the method has returned this number then it is added to the counter in order for the calling rule to continue from that point. Below is the code which handles the returns from any sub-rules.

if (!BrokenInRule) 
{ //Go here if the nonterminal was matched 
    for (int j = 0; j < RetrievedRuleList.Count; j++) 
        ReturningStrings.Add(RetrievedRuleList[j]); 
    TokenStreamCounter = TokenStreamCounter + matchednodes; 
    Broken = false; 
} 
else 
{ //Go here if the nonterminal wasn't matched 
    if (IsRecursiveRule) 
    { 
        BrokenRule = RuleComprised[i].Name; 
    } 
    Broken = true; 
    break; //At this point the token stream cannot be matched to the initial rule. 
}

If the sub-rule didn’t cause the parse to break then we have to do two things. Add the amount of tokens the sub-rule matched to the token counter and then add the name of any matched rules within the sub-rule to the list of rules that this rule matched. And like, the token thing earlier I’m sure the overuse of the word rule is annoying if not confusing. The reason for this list of rules is that I wanted to be able to print out the statements that have been matched when the parse is done. To make sure that only the rules which are correct are added to the output list each rule gets it’s own list and that rule’s list will ultimately only be added to the output if the rule it belongs to doesn’t break. For example, if I have an if statement which contains a comparison statement and the comparison statement parses correctly but the rest of the if statement is broken, the list of matched statements will never get added to the greater list because the if statement list would break and never have its list broken. This is the method that I’d use to build up a parse tree, simply use node objects as opposed to lists of strings.

SuccessfulParse

Admittedly, the code to write things out still needs some tinkering; you can see from the above output that the recursive statements statement hasn’t been output for the following token stream:

OPEN_TOKEN, IF_TOKEN, OPEN_BRACKET, NUMBER_TOKEN, MORE_THAN_TOKEN, NUMBER_TOKEN, CLOSE_BRACKET, SEMICOLON, VARIABLE_TOKEN, EQUALS, NUMBER_TOKEN, SEMICOLON.

It’s a small problem that I’ll have to go rooting around for before switching to building a parse tree; I didn’t put as much effort into this simple output as I did the main parsing bit (don’t judge me too harshly when you see the code).
One of the biggest pains I’ve had when creating this was coming up with a way to handle recursive rules; without that then there’s really no point to it. My initial thoughts on this were to make the initial rule of the recursive statements the recursive part and to fall back on the non-recursive parts. This makes sense because it would first have to check against any recursive statement and would then just check for a standard statement when the recursive part is no longer true. Unfortunately, due to my own implementation this became too tiresome to implement, so I came up with a better method: check if the rule is recursive and if the rule that the parse breaks on is the recursive one then go back a token, accept that it’s matched, and then continue as normal.

public bool CheckMatchinRule(List PassedTokenStrean, ref int matchednumber, List FoundNodesList)
{
    bool IsRecursiveRule = false;

    if (RuleComprised != null)
    {
        for (int i = 0; i < RuleComprised.Count; i++)
        {
            if (RuleComprised[i].Name == Name)
            {
                IsRecursiveRule = true;
                break;
            }
        }
    }

    bool Broken = true;
    int matchednodes = 0;
    int TokenStreamCounter = -1;
    string BrokenRule = "";

    List ReturningStrings = new List();

    if (RuleComprised != null)
    {
        for (int i = 0; i < RuleComprised.Count; i++)
        {
            TokenStreamCounter += 1;
            if (PassedTokenStrean[TokenStreamCounter].Name != RuleComprised[i].Name)
            {   //Go here if must match a nonterminal in the rule
                try
                {
                    //Want to append the strings of the matched rules if the rule matched.
                    //Don't bother appending if they didn't
                    List RetrievedRuleList = new List();

                    bool BrokenInRule = (RuleComprised[i] as Rule).CheckMatchinRule(PassedTokenStrean.GetRange(TokenStreamCounter, PassedTokenStrean.Count - TokenStreamCounter), ref matchednodes, /*FoundNodesList*/ RetrievedRuleList);
                    if (!BrokenInRule)
                    {   //Go here if the nonterminal was matched
                        for (int j = 0; j < RetrievedRuleList.Count; j++)
                            ReturningStrings.Add(RetrievedRuleList[j]);
                                
                            TokenStreamCounter = TokenStreamCounter + matchednodes;
                            Broken = false;
                    }
                    else
                    {   //Go here if the nonterminal wasn't matched
                        if (IsRecursiveRule)
                        {
                            BrokenRule = RuleComprised[i].Name;
                        }
                        Broken = true;
                        break; //At this point the token stream cannot be matched to the initial rule.
                    }
                }
                catch
                {
                     if (IsRecursiveRule && RuleComprised[i].Name == Name)
                     {
                         Broken = false;
                         BrokenRule = RuleComprised[i].Name;
                     }
                     else
                     {
                         Broken = true;
                     }
                     break; //If the token cannot be treated as a rule then
                 }
             }
             Broken = false;
         }
         matchednumber = TokenStreamCounter;
     }

     if (!Broken)
     {   //If the rule could be matched then return the fact that it's matched. If it couldn't try alternate rules.
         for (int i = 0; i < ReturningStrings.Count; i++)
             FoundNodesList.Add(ReturningStrings[i]);
         return Broken;
     }

     if(IsRecursiveRule && BrokenRule == Name)
     { //If it's a recursive rule and it broke on the recursive part then stop parsing
         for (int i = 0; i < ReturningStrings.Count; i++)
             FoundNodesList.Add(ReturningStrings[i]);
         matchednumber = TokenStreamCounter - 1; //If broke on recursive rule then set the token back 1 to the point it successfully matched
         return false;
     }

     //Here have to check if this is a recursive rule and if the recursive part was what we broke on

     if (MatchingRules != null)
     {
         for (int i = 0; i < MatchingRules.Count; i++)
         {
             List RetrievedRuleList = new List();
             Broken = MatchingRules[i].CheckMatchinRule(PassedTokenStrean, ref matchednodes, /*FoundNodesList*/ RetrievedRuleList);

             if (!Broken)
             {
                 FoundNodesList.Add(Name);
                 FoundNodesList.Add(MatchingRules[i].Name);
                 for (int j = 0; j < RetrievedRuleList.Count; j++)
                     FoundNodesList.Add(RetrievedRuleList[j]);

                 matchednumber = matchednodes;
                 break;
             }
         }
     }
            
     return Broken;
}

 

My simple compiler

During my Languages module at university we all made a simple compiler for a basic language. This compiler takes an input file and outputs the equivalent code in C. I did fairly well with this module; my compiler outputted everything correctly, had some pretty good type checking, and employed basic optimisations like constant folding. Recently I’ve tried to do something similar, but wanted to employ some different data control mechanisms, and give the use the ability to create lists of dynamic length by default. This isn’t the most fleshed out language, but overall I think I’ve accomplished my goals.

Flex and Bison were used to create this solution, because, although I find them to be a pain to setup and get to use together, they do their respective jobs very well, and since they’re pretty old, well-documented tools someone will have a solution for whatever dodgy error message inevitably crops up. That, and I quite like using C, but can rarely find a task to justify its use.

In the language the user is able to create an individual variable or list of characters, integers or floats. In my compiler when a variable list is declared, three different methods are generated to construct lists of that type: the addvalue method which appends a value of the requried type to the end of the list, the RetrieveValue method which finds the value in the list based on the position in the subscript, and the DeleteValues method which loops through the list and releases each node in the chaain from beginning to end. For example, the program immediately below results in the output C code below it.

:( 
     int list IntegerList; 
     IntegerList<-1; 
     IntegerList<-2; 
     IntegerList<-3; 
     IntegerList<-4; 
     IntegerList<-5; 

     output<-IntegerList[1]; 
     output<-IntegerList[2]; 
     output<-IntegerList[3]; 
     output<-IntegerList[4]; 
     output<-IntegerList[5];
:)
#include 
#include 
#include 

struct intnode{
     int StoredValue;
     struct intnode *Next;
} typedef IntNode;

void AddIntValue(IntNode *RootNode, int passedvalue)
{
     if(RootNode->Next != NULL)
     {
          AddIntValue(RootNode->Next, passedvalue);
     }
     else
     {
          IntNode *NewIntNode = malloc(sizeof(IntNode));
          NewIntNode->StoredValue = passedvalue;
          NewIntNode->Next = NULL;
          RootNode->Next = NewIntNode;
     }
}

int RetrieveValue(IntNode *RootNode, int PassedIndicator, int PassedCurrentNode)
{
     if(PassedIndicator == PassedCurrentNode)
     {
          return RootNode->StoredValue;
     }
     else if(RootNode->Next != NULL)
     {
          RetrieveValue(RootNode->Next, PassedIndicator, PassedCurrentNode + 1);
     }
}

void DeleteValues(IntNode *RootNode)
{
     if(RootNode->Next != NULL)
     {
          DeleteValues(RootNode->Next);
     }
     free(RootNode->Next);
}

void main()
{
     IntNode *IntegerList = malloc(sizeof(IntNode));
     IntegerList->Next = NULL;

     AddIntValue(IntegerList, 1);
     AddIntValue(IntegerList, 2);
     AddIntValue(IntegerList, 3);
     AddIntValue(IntegerList, 4);
     AddIntValue(IntegerList, 5);

     printf("%d", RetrieveValue(IntegerList->Next, 1, 1));
     printf("%d", RetrieveValue(IntegerList->Next, 2, 1));
     printf("%d", RetrieveValue(IntegerList->Next, 3, 1));
     printf("%d", RetrieveValue(IntegerList->Next, 4, 1));
     printf("%d", RetrieveValue(IntegerList->Next, 5, 1));

     DeleteValues(IntegerList);
     free(IntegerList);
}

While this isn’t exactly a high-end compiler I did at least make some marginal optimisations. The three different methods for each list type of the standard variables are only generated once the code has been checked to make sure that the user has declared a list of that type. Hardly groundbreaking stuff, but it at least somewhat considerate.

Checks for variable type and variable presence is always accomplished with a recursive run through every child branch of the current node in the tree. Once a node that could have a variable in it has been reached the node’s value (in my implementation (heavily influenced by the projects undertaken throughout my university course)) is used to retrieve the name of the variable or its type from a global array.

In addition, I thought I’d take a crack at making a control mechanism in which the user could use logical comparisons or a numeric stopping point. In the following code the first loop creates a standard for loop to execute the inherited code a set number of times and the second loop should create a while loop with the comparison included.

:(
	int Counter;
	
	Counter = 0;
	until(5)
	->
		Counter = Counter + 1;
		output<-Counter;
	<- 	 	                                            
     
        Counter = 0; 	                           
        until(Counter > 5)
	->
		Counter = Counter + 1;
		output<-Counter;
	<-
:)

The below code is the compiled output for the above loops.

#include 
#include 

void main()
{
     int Counter;
     Counter = 0;
     for(int i = 0; i < 5; i++)            
     {                      
          Counter = Counter + 1;                      
          printf("%d", Counter);            
     }                  

     Counter = 0;            
     while((Counter > 5))
     {
          Counter = Counter + 1;
          printf("%d", Counter);
     }
}

This compiler doesn’t have quite as extensive a type-checking mechanism as that of my coursework; I wanted to see how easy or difficult it would be to include dynamic lists as a feature so this is an aspect I just put off for the time being. Overall, I’m happy with how the implementation worked out though I suppose it is could cause some memory problems if the output program throws an exception and crashes. As I said, not exactly a fleshed out language.

This hasn’t been a bad experiment, but what I want to do is make an attempt at my own top-down parser. It’s useful to use Flex and bison but it would be quite an accomplishment to build my own parser. Just as a quick aside, I did briefly consider using the Irony.NET tool instead of Flex and Bison; I did use Irony.NET for certain parts of my final year project and it seems a worthwhile tool, but I opted for the more familiar tools this time.

A simple mechanism for storing code change

One of the things I’ve looked at making is a simple way to document all of the changes someone makes in code. What I wanted was a simple way of making a quick snapshot of the code I make so that I could come back to it in the event that I do something stupid. The interesting part of my mechanism is that I wanted it to store any different changes so that I could travel so far down one route and then change to another when I decide that I’ve made a mistake; a sort of lazy local change control.

The structure I’ve created is a branching chain of nodes. Each node can have a previous node, a subsequent node, a change node indicating the start of a new branch, and a deviating node indicating the node from which this chain starts. In addition to these four pointers, each node contains a copy of the text at the point it was created. The diagram below indicates this setup.

Change control diagram 1

In the example diagram above saves were made at points 2 and 3. At point 3 it was decided that it would have been better to go a different route at point 3 and so the text was reverted and a new route was taken and so the branch 4 through 7 was created. At point 7 it was decided that it really was better going with my first choice so the whole text was reverted to text 3 and the further saves 8 and 9 were created. It’s a bit of a strange example but I think it fairly illustrates the kind of functionality this can provide.

Each new node is given its own int id number to check against when trying to retrieve the necessary text. This can come in handy when doing the tree traversal.

My initial idea for traversing the structure was to simply start at the current node and then run through accordingly. The thinking behind this was that the user would likely want to change to a recent save and that by starting at the current node I could probably minimize the number of nodes that have to be traversed. The problem with this is that the desired save could be on different branches and because it is possible to make new saves to old branches it is impossible to know which it is on without travelling down it. The simplest solution to this is to create a list of entry points at which the search can start. In my program the start of each new branch serves as an entry point ensuring that it is only necessary to traverse down to the end of each node . By using the desired save number and assuming that the user wants to move to a save close to the current save the number of traversals could be minimized by ordering the branches according to their start value when searching.

At this point a valid criticism would be that there is no need for the branches to be connected; if we’re just going to use single traversals down each route why bother with the overhead? The advantage to connecting each of the different branches is that it makes it extensible. My understanding is that in actual control mechanisms only the differences between the last save and the current are documented in order to save memory. Since its simple to get the complete route from any save node to the original text this kind of mechanism could be implemented if I put in a way to document the differences between two texts. The very basic test screen I’ve created uses this exact method to list individual nodes in its listbox (which admittedly seems a tad useless without some indication of what the changes are, but I’m focusing on the data structure not the interface (however crappy)).

Displaying branches 1

While I did intend this to be a branching storage mechanism, it should be noted that this can easily be changed to overwrite previous saves; when checking for a subsequent change a new subsequent node can be assigned as opposed to a branching node. The complete code for the control mechanism can be found below.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace RecordingChanges
{
    class StoredChange
    {
        public enum StorageMode
        {
            NonBranchingChange, BranchingChange
        }

        static StoredChange InitialStorage;
        static int Currentcounter;
        static List BranchNodes = new List();

        string CurrentlyStoredText;
        int PositionCode;

        public List GetRouteToInitial()
        {
            List ReturningList = new List();
            StoredChange CurrentlyStoredChange = this;

            while(CurrentlyStoredChange.PreviousChange != null || CurrentlyStoredChange.DeviatingChange != null)
            {
                ReturningList.Add(CurrentlyStoredChange);
                if (CurrentlyStoredChange.PreviousChange != null)
                    CurrentlyStoredChange = CurrentlyStoredChange.PreviousChange;
                else if (CurrentlyStoredChange.DeviatingChange != null)
                    CurrentlyStoredChange = CurrentlyStoredChange.DeviatingChange;
            }

            return ReturningList;
        }

        public static List GetBranchEnds()
        {
            List ReturningListEnds = new List();

            for(int i = 0; i < BranchNodes.Count; i++)
            {
                StoredChange CurrentStorageChange = BranchNodes[i];
                //Gets the base of the branch node
                while(CurrentStorageChange.SubsequentChange != null)
                {
                    CurrentStorageChange = CurrentStorageChange.SubsequentChange;
                }   //Advance to the end of each branch and store it
                ReturningListEnds.Add(CurrentStorageChange);
            }

            return ReturningListEnds;
        }

        public static void AddBranchStart(StoredChange PassedStoredChange)
        {
            BranchNodes.Add(PassedStoredChange);
        }

        public static StoredChange GetStoredChange(int RequiredStoredChange)
        {
            StoredChange ReturningChange = null;
            for(int i = 0; i < BranchNodes.Count; i++)
            {
                ReturningChange = BranchNodes[i];

                while(ReturningChange.ChangeCode != RequiredStoredChange && ReturningChange.SubsequentChange != null)
                {
                    ReturningChange = ReturningChange.SubsequentChange;
                }

                //At this point we've hit the end of the chain or the right node
                if (ReturningChange.ChangeCode == RequiredStoredChange)
                    return ReturningChange;
                //If that wasn't the correct code then we go to the start of the next branch
            }

            //At this point the input node hasn't been found
            return null;
        }

        public static StoredChange Initial
        {
            get
            {
                return InitialStorage;
            }
            set
            {
                InitialStorage = value;
            }
        }

        StoredChange NextChange, LastChange, NewBranch, DeviatingPoint;

        public StoredChange(string PassedCurrentlyStoredText)
        {
            PositionCode = Currentcounter;
            CurrentlyStoredText = PassedCurrentlyStoredText;
            Currentcounter += 1;
        } 

        public int ChangeCode
        {
            get
            {
                return PositionCode;
            }
        }

        public string ChangeText
        {
            get
            {
                return CurrentlyStoredText;
            }
        }

        private StoredChange AddSubsequentChange(StoredChange PassedSubsequentChange)
        {
            SubsequentChange = PassedSubsequentChange;
            PassedSubsequentChange.PreviousChange = this;
            return PassedSubsequentChange;
        }

        private StoredChange AddBranchingChange(StoredChange PassedBranchingChange)
        {
            ChangedBranch = PassedBranchingChange;
            PassedBranchingChange.DeviatingChange = this;
            return PassedBranchingChange;
        }

        public StoredChange AddChange(StorageMode PassedStorageMode, StoredChange PassedStorageChange)
        {
            switch (PassedStorageMode)
            {
                case StorageMode.NonBranchingChange:
                    PassedStorageChange = AddSubsequentChange(PassedStorageChange);
                    break;
                case StorageMode.BranchingChange:
                    if (SubsequentChange != null)
                        PassedStorageChange = AddBranchingChange(PassedStorageChange);
                    else
                        PassedStorageChange = AddSubsequentChange(PassedStorageChange);
                    break;
            }
            return PassedStorageChange;
        }

        public StoredChange SubsequentChange
        {
            get
            {
                return NextChange;
            }
            set
            {
                NextChange = value;
            }
        }

        public StoredChange PreviousChange
        {
            get
            {
                return LastChange;
            }
            set
            {
                LastChange = value;
            }
        }

        public StoredChange ChangedBranch
        {
            get
            {
                return NewBranch;
            }
            set
            {
                NewBranch = value;
                BranchNodes.Add(value);
            }
        }

        public StoredChange DeviatingChange
        {
            get
            {
                return DeviatingPoint;
            }
            set
            {
                DeviatingPoint = value;
            }
        }
    }
}

SUVAT question application

One of the problems I keep facing is that, although I can write software pretty well, I keep struggling to find things to make. The advice always seems to be come up with something useful so this is my first attempt at doing just that.

Since I spent a few days looking for new suvat questions when I was practicing for them throughout college, I figured I’d put together an application I’d like to have had at the time. The application presents the user with one suvat question from a list of back-end problems. The user can attempt to answer it, look through the solution, and request another problem once they’re done.

Screenshot_2017-08-23-19-42-35 Screenshot_2017-08-23-19-42-43

Since I want to be able to add new problems to the application without making any laborious changes I decided that it would be best to store the input problems in a back-end XML file. Luckily Java has a pretty good XML parser, albeit one I still took too long to get working due to my misunderstanding of how the read methods would advance the parser’s position.

Sticking everything in XML makes editing easier but does leave me with a tiny technical problem, but one I don’t think is quite worth fixing at the moment. Normal strings stored as an Android string constant can be changed according to the language of the target device; I couldn’t do the same thing with the XML file contents without some manual checking.

Despite it only being in English and the fact that it’s very basic-looking, I like to think that someone might have a genuine use for this application. I think I’ll publish it, maybe once it looks better and I’ve exhausted every method of not paying for a developer application on the various app stores.

Code Editor: My Final Year Project

Since the academic year is over and I finally graduated I think I can now write about my final year project without it jeopardizing any marking.

My project was to create a code editor with syntax highlighting and autocomplete. In addition to these features required by the brief I also created other functionality in order to expand the functionality of the editor; these were error-feedback, project spaces, compiler integration, and simple source-control mechanisms (commit, revert, and update). The editor targeted the C language, specifically the ANSI-C standard.

Standard editor script

Figure 1. The main layout of the editor

The above figure shows the main layout of the editor that the user interacts with. It’s a fairly simple layout devoting the most screen space to the coding section and making the editor functionality available in the toolbar to the left.

Regarding the toolbar, the choice of this was largely based on another editor’s interface, one made by Robert Crocombe a couple of years earlier. The difference was that he made a very elegant toolbar which expands to give greater information about the tool; in my toolbar I relied on familiarity with common icons in order to show the functionality of each button (this is something I’ll have to change according to the user feedback I got).

Overall I managed to get the majority of the goals for the project, but couldn’t get all of the source-control solutions done in time.

Syntax highlighting

The syntax highlighting feature was made to highlight each different lexeme in the C language. This included all 32 standard keywords and operators (numbers, pointers, etc), and any user-defined types.

Thankfully, a fair few people have undertaken the code editor project before me so a pretty clear method for visually drawing the text on the screen had been provided; inherit from the standard text box and override the rendering method to highlight the text.

Like the people who did the project before me I managed to make a few optimisations to the rendering code; before generating any of the formatted text objects to be displayed first check to see if the line to be rendered would be on-screen. In this way the same number of lines are rendered regardless of the length of code and there is negligible difference between the rendering times because the checks for whether the line is visible is comprised of a relatively inexpensive mathematical comparison.

Autocomplete

Code editor autocomplete example 1

Figure 2. The autocomplete feature suggesting two valid variables

Code editor autocomplete example 2

Figure 3. The autocomplete feature suggesting method signatures alongside suggestions

Code editor autocomplete example 3

Figure 4. The autocomplete feature ensuring the method signature remains displayed until the closing bracket of the method call has been completed.

An autocomplete feature is one which suggests any of the different possible tokens to be entered based on the partial string the user has entered. For example, if the user had entered the text ‘d’ the autocomplete feature would suggest the words ‘do’, ‘double’, and any variables starting with the string ‘d’.

One of the aspects of my editor that I’m most proud of is its ability to account for variable scope. When I was looking at the other editors I didn’t want to have an editor that suggested variable names regardless of scope (Notepad++ had the type of autocomplete I wanted to avoid (not that Notepad++ is a bad editor)). In order to accomplish this scope awareness I’ve built up some classes to build a type of symbol table. A MethodScope object is created for each method. This MethodScope object contains the name of the method and its start and end points in the text. Each method scope object contains a list of variable objects. These are comprised of the variable name, its type, and the point in text beyond which it is applicable. Using these objects it is possible to see which variables are applicable by comparing the text’s caret position to each MethodScope’s start and end position and the applicable point of each variable within that scope.

The autocomplete feature also tries to allow the user a degree of error before completely removing suggestions. To do this I’ve used the edit distance algorithm to make sure that the entered text is within a set number of changes before deciding that a variable cannot be suggested. For example, if the code contains a variable ‘LocalVariable’ and the user enters ‘localVariabl’ the variable would still be entered because 2 of the characters could be changed to get us to the actual suggestion (2 was the acceptable number I decided on, but this could easily be changed). I should mention that this was an idea suggested by people who had previosuly undertaken the code editor project.

Error-feedback

My code editor contains syntactic error-feedback. This is accomplished via the Irony.NET parsing technology which I chose because it allows the BNF grammar to be coded into the program and doesn’t require any kind of additional runtime-environment. Incidentally, the autocomplete and syntax highlighting features could be accomplished via this parsing technology by checking the end nodes of the generated parse tree for their token types and then colouring and storing them. This approach was not used because the first syntactic error causes the parsing technology to stop parsing and I didn’t have the time implement any error-recovery mechanism.

The parsing technology stops on the first syntactic error, but an ideal error-feedback feature will highlight as many errors as possible. Using the MethodScopes object it is possible to extract all of the text from a method and parse them individually, thus my editor can get the first syntactic error in each method and highlight multiple errors throughout the code.

Project spaces

This is the most basic of the features I’ve implemented. Each file in the project should be located in the same space in secondary memory. Once a program is saved through the editor both the back-end file and the XML project file are updated. The XML project file contains lists of each file in the project, the location of any SVN repository location, and any project-specific compiler arguments that might have been entered.

The XML format was chosen because this back-end file has to be human-readable in the event that edits have to be made outside of the editor. Jason was another viable format option that’s quicker to parse, but XML was chosen because the speed of execution isn’t vital to this feature and because the .NET library comes with in-built XML tools whereas a Jason approach would require 3rd party tools which I wanted to limit the use of for clarity’s sake.

Below is an example of a typical project XML file which is loaded and saved in order to store the necessary information about the project:

Main.c
Person.h
Person.c

To conform with good standards that Project tag really should encompass all of the file tags, but this worked well enough.

Compiler integration

Compilation and run program settings

Figure 5. The compiler and run-program sections of the settings page.

Compilation is a necessary part of C development.  Since I wanted my editor to provide all of the necessary tools for writing C programs it is necessary for the user to be able to compile and run the output program through the editor. Above is the compiler and run program section of the settings page that the user-feedback indicates could use some work.

The editor allows the user to specify any compiler on the system and any number of compiler directives. Once specified a compiler setup is saved into secondary memory and can be used again on multiple programs. Having multiple applicable compilers was a feature I adapted from Code::Blocks since it seems like a useful feature but I didn’t think it was readily accessible in most of the larger editors I’d looked at.

Ensuring the compiler setup can be applicable to multiple projects means I’ve had to setup a mechanism to evaluate the setup arguments and produce the output string to be passed to the compiler as an argument. For example, the GCC compiler can output an executable with the specified name using the -o argument. If the user specifies this in my editor (as shown in the figure below), the output string -o followed by whatever the project name is produced by the EvaluateArgument method.

Standard GCC setup

Figure 6. The standard GCC argument for outputting the executable stored in the C editor.

        public string EvaluateArgument(Project PassedProject)
        {
            string ReturningString = CompilerArgumentSpecifier;

            switch(ChosenArgumentType)
            {
                case ArgumentType.UseAllFiles:
                    for(int i = 0; i < PassedProject.ReturnProjectFiles().Count; i++)
                    {
                        if(PassedProject.ReturnProjectFiles()[i].EndsWith(".c"))
                        {
                            ReturningString += ("\"" + PassedProject.ProjectLocation + @"\" + PassedProject.ReturnProjectFiles()[i] + "\"");
                            ReturningString += " ";
                        }
                    }
                    break;
                case ArgumentType.ProjectName:
                    ReturningString += "\"" + PassedProject.Name + "\"";
                    break;
                case ArgumentType.ProjectLocation:
                    ReturningString += "\"" + PassedProject.ProjectLocation + "\"";
                    break;
                case ArgumentType.EntireProjectAddress:
                    ReturningString += ("\"" + PassedProject.ProjectLocation + @"\" + PassedProject.Name + "\"");
                    break;
                case ArgumentType.ManualEntry:
                    ReturningString += "\"" + OptionalArgumentType + "\"";
                    break;
                case ArgumentType.Ended:
                    //Should be used when the argument is a single variable
                    break;
            }
            return ReturningString;
        }

One of the things I should have done, and might in the future, is have each compiler include all of the standard directives by default; when I implemented the feature I wanted the user to be able to input the necessary directives in the (admittedly unlikely) event that someone was using an unconventional compiler or variation of standard, but including all standard directives by default would save significant time.

Running the compiler and the output program uses very similar code. The .Net library’s System.Diagnostics library contains the code to launch an external process and specify any input arguments. This works quite simply for individual program arguments, but the compiler arguments required me to build a mechanism to produce the output arguments based on their type.

Source-control mechanisms

I wanted the editor to provide commit, update, and revert to options to the user. Unfortunately, due to time constraints and a lack of familiarity with the API (SharpSVN which was the best tool I could find for the job), it wasn’t possible to implement the revert to feature within the time given for the project.

The editor allows the user to both commit and update their work via the tortoise source-control software, but there are some caveats. Unfortunately, I wasn’t able to implement the functionality to commit new folders to the repository thus if the user wants to commit and update from a repository it is first necessary to checkout the folder from the repository and then save the project in the checked out folder.

I should mention that someone quite generously put up working code for the commit and update functions of the API, which meant I could adapt the code to implement these features quicker. These can be located here (Accessed 28 October 2016).

Overall

My editor works very well considering the functionality I tried to implement, the timespan I had, and the new tools I had to work with. If I’m going to return to this in the future then there are a fair few simple updates to be implemented:

  • Update the source-control mechanism to allow the user to create folders on the repository side
  • Include the revert-to functionality in the source-control functionality
  • Changed the compiler directives settings to include all standard directives by default

There’s quite a bit more that could be done, however these are relatively simple (hopefully simple in the case of the SVN work) that could be done in the immediate future.

This blog post is simply a quick look at the project; it would take a great deal longer to talk about all of the project. While I’ve just done a quick run-through of the features here it’s fair to say that I made some unorthodox design choices and had to come up with strange solutions for obscure problems, which might make pretty good posts later on.

Creating a better management system

ManagementSystemServerSide ManagementSystemClientSide

Figure 1 (left). The server application which services each client

Figure 2 (right). A simple client to communicate via the implemented protocol

This week I’ve been working on a better management system. This was an interesting project because it incorporates many different approaches to make it accessible and secure.

The basis for the project is very simple. A user can connect to and log into the server via the client application (this could be changed to a web-server or mobile application later with a little careful tweaking). Once logged in the client can place a new order or look at the list of their previous orders.

Once connected all interactions take place over a persistent connection. In this way, a user’s individual Id can be associated with the connection (incidentally it would be possible to extend this further later on in order to implement some kind of privilege mechanism to distinguish customers and staff in this hypothetical company).

The actual protocol implemented is very basic with the first line sent denoting the command necessary and subsequent lines supplying the information for each command.

Multi-threading

A server has to handle many connections concurrently (or it should at least if it’s going to be considered any good), so my application relegates each connection to a separate thread. In my application the TCPListener object listens on a background-thread (which is necessary because the AcceptTcpClient method is a blocking call) and launches a new thread passing the accepted TcpClient to it.

            while (true)
            {
                TcpClient Connection = ConnectionListener.AcceptTcpClient();
                Thread t = new Thread(() => { ServiceConnection(Connection); });
                t.Start();
                //Notion of using delegate to pass object to thread provided at the following source
                //https://stackoverflow.com/questions/3988514/how-to-pass-refrence-and-get-return-in-thread
                //Accessed 26/06/2017
            }

The idea of using a delegate method to pass the TcpClient to the thread was provided here by user sumit_programmer (accessed 26/07/2017).

Salting hashes

While creating this application I’ve attempted to follow best practice when storing password, meaning that users with the same passwords should not have the same hashes associated with them. This is generally accomplished by appending random data to the password to make sure the hash becomes vastly different.

private string GenerateSalt()
{
     string GeneratedSalt = "";

     for(int i = 0; i < SaltLength; i++)
     {
         GeneratedSalt += System.Text.Encoding.ASCII.GetString(new byte[] { (byte)rnd.Next(0, 255) });
     }

     return GeneratedSalt;
}

In my system this is done by generating a random string of fixed length and appending it to the entered password. I’ve accomplished this by simple using the .Net Random object and using the result for hash values. This has worked pretty well so far but because the Random object only gets its value from a pseudo-random sequence it is entirely possible that the same character will be selected from each part of the salt if selected quickly enough. Fortunately, this is unlikely to happen throughout the salt and very unlikely to produce the same salts for the same passwords. As an example both users 1 and 2 have the same password (‘password’) with vastly different hashes to be matched.

ChangedHashes

Figure 3. The difference in hashes for the same password

Hash-based check-sums

Another of the security measures I’ve implemented is to ensure that the message received hasn’t been tampered with en-route. To do this a hash of the concatenated message contents is sent along with the message. Then a new hash is recreated upon receipt; if the newly created hash and received hash are equal then there has been no tampering, but if they differ the message has been altered.

This leaves me with something of a weird question; if the message has been intercepted and changed do I want to inform the client of it? If you do wouldn’t this alert the attacker to the fact that they’ve been discovered? Because this is unlikely to occur in my system I’ve simply resolved to make sure that nothing happens in the event that the two hashes don’t match, but this is something I’ll have to look into later.

      int ChecksumLength = int.Parse(ConnectionInputStream.ReadLine());

      string CheckSum = "";

      for(int i = 0; i < ChecksumLength; i++)
      {
           CheckSum += (char)ConnectionInputStream.Read();
      }
      ConnectionInputStream.ReadLine();

An interesting problem that I didn’t think of accounting for before the start of the project was that the hashes would inevitably include special characters (see above). This meant that the length of the hash would have to be sent to know how many individual characters to read in; this is the downside of using something low-level like ports.

Conclusion

This hasn’t been a bad project. I’ve made a reasonably useful protocol that could be extended further, and I’ve implemented some fairly reasonable security measures throughout it. It certainly may have been a better idea to implement it in some higher form of communication technology but overall I’m reasonably happy with it.