Preetha's Summer 2015 PhD Preliminary Research Project
Problem Statement and Motivation:
Quorum is a new programming language designed specially for visually impaired individuals. Our research goal is to summarize
loops in Quorum, to identify the higher level abstraction of the action being performed by the loops. Many algorithmic steps require more than one statement to implement, but not big enough to be a method. These steps are generally implemented by loops. Our idea is to mine loop characteristics of a given loop structure over the repository of the Quorum language source code, map it to a (already developed) action identification model, and thus identify the action performed by the specified loop. This might be very useful for human readers/coders, generating automatic comments for the loop structure(as very small percentage of code is well documented to help new users/coders), facilitate automatic code search tools, and help blind programmers grasp a quick high level view of code segments that otherwise are tedious and difficult to understand.
Summary of background(from Xiaoran's paper):
Motivation for this project comes from a preexisting work by Xiaoran Wang, Dr. Lori Pollock and Dr. Vijay Shanker on “Developing a Model of Loop Actions by Mining Loop Characteristics from a Large Code Corpus”. This project involved identifying the higher level abstraction of the action being performed by a particular loop structure in Java based on their structure, data flow and linguistic characteristics. They focused on loops that contain exactly one conditional statement(if-statement) and also the last lexical statement within the loop body; which they call a loop-if structure. The loop formats considered in Java were : for, enhanced-for, while or do-while(nested loops were not considered).
Their approach was to first identify action units( a code block that consists of a sequence of consecutive statements that logically implement a high level action) that are implemented by loop structures. Then characterizing the loops as feature-value pairs to generate the loop feature vector and then develop a model(action identification model) that can associate actions with loops based on their loop feature vectors.
The problem with respect to the new language environment and motivation:
Problems:
1. Loop structures in Quorum are different from that of Java - there is no for loop, instead in Quorum the loop structures are-
repeat <expression> times
repeat while <expression>
repeat until <expression>
2. As Quorum is a comparatively new programming language:
a. Repository of source code for sample projects is very small.
b. Apart from the language compiler and standard library files, there is practically no source of complex codes to evaluate our tests.
c. The language is written by very few(one or two) developers, thus we do not get to evaluate on different coding styles.
d. Might not be able to find a lot of loop-ifs in the sample code.
3. Identifying the language grammar and generating ASTs with ANTLR4 seems challenging.
Motivation:
As we already have an action identification model developed for JAVA source code, it will be interesting to find out if we can implement the same on another programming language like Quorum.
Proposed Ideas for addressing the problem(s) and justification:
1. Mapping for loops(Java) to repeat loops in Quorum.
2. After successful implementation of the project, we would like to conduct a question-answer survey from different users/developers, to evaluate the correctness of our work. The question would consist of a piece of Quorum code containing loop-ifs, and the user have to select the correct action describing the code(according to him/her) out of the 4 available options provided to him. Thus the evaluation would not be biased.
3. Learn ANTLR4 to generate the ASTs, it might be useful as according to the last release, Quorum now uses ANTLR4 as back-end.
The research activity list and order:
1. Manually identify loop-if structures from sample Quorum code for at least 25-50 loops.
2. Manually extract the feature vectors for the identified loop-ifs.
3. Map the identified loop-ifs with the already developed action identification model.
4. Validate if the identified action is correct.
5. Identify the Quorum grammar.
6. Feed the grammar to Antlr4 to generate abstract syntax trees(ASTs).
7. Write program to automatically extract feature vectors from Quorum code using the already generated ASTs.
8. Use Xiaoran's code to map the feature vectors to the action identification model.
9. Identify the high level action implemented by a given loop.
Work to discuss for meeting on Jun9,2015:
Short Introduction to Quorum syntax:
For more help on Quorum syntax refer to http://quorumlanguage.com/syntax.php
Data Types:
1. Integer
2. Number
3. Boolean
4. Text
Type Casting: <type> <variable> = cast(<type to convert to>, <value to be converted>)
Arrays:
use Libraries.Containers.Array Array<integer> a a:Set(0, 10) //put value=10 in the first slot in the array, which is named 0 a:Get(1) //return value at index1 a:SetSize(12) //setting the size of the array to 12, by default it is 10.
Loops(Repeats):
1. repeat <expression> times
integer a = complicatedMathAction() integer b = anotherComplexAction() repeat (b / a - (b + 5)) times end
2. repeat while <expression>
integer a = 0 repeat while a < 15 a = a + 1 end \\this syntax tells Quorum to continue executing as long as a happens to be smaller than 15.
3. repeat until <expression>
integer a = 0 repeat until a < 15 a = a + 1 end \\Since a is less than 15 this loop will execute 0 times.
If Conditional:
integer a = 1 integer b = 1000 integer c = 0 if a > 100 c = 1 elseif b = 100 c = 2 else c = 3 end
Methods(Actions):
action main integer addedNumbers = add(5,10) end action add(integer a, integer b) returns integer return a + b end
Classes:
class BankAccount integer money = 0 action deposit(integer amount) money = money + amount end action withdraw(integer amount) money = money - amount end end use BankAccount class Main action Main BankAccount account account:deposit(10) end end
Inheritance:
Class A Class B Class C is A,B action <name> parent:A.<method-name>(arguments) parent:B.<method-name>(arguments) end end class Main action main C <object name> <object name>:<action name> end end
Generics:
In computer programming, we often want to create a class that generically specifies an algorithm. For example, we may want to create a list of objects, like a list of people in a phone book. Or, we may want to create a list of numbers, integers, or anything else. There are two ways we could accomplish this: 1) we could create separate implementations of our list for each data type (e.g., one for numbers, one for integers), or 2) we could create a “generic” class.
class List<Type> action add(Type item) //add our list item end end List<integer> list list:add(1) list:add(2) list:add(3) //all of the following are fine List<string> list1 List<Dog> list2 //assuming there is a Dog class somewhere List<boolean> list3
Autoboxing:
Mutator function:
Integer int int:SetValue(15)
Accessor Function:
text t t = int:GetTextValue()
Comparison Function:
Integer int1 Integer int2 int1:SetValue(5) int2:SetValue(7) integer lessThan = int1:CompareTo(int2) integer greaterThan = int2:CompareTo(int1) boolean equalTo = (int1:CompareTo(int2) = 0)
After executing the above code:
lessThan will equal -1, because 5 is less than 7, greaterThan will equal 1 because 7 is greater than 5, and equalTo will equal false because 5 is not equal to 7, and thus the result of the function is not 0.
Error Handling:
use Libraries.Language.Errors.Error use Libraries.Language.Errors.DivideByZeroError class Main action Main number result = 0 check text userInput = input("divide by?") integer divisor = cast(integer, userInput) result = 12/divisor detect e is Error or DivideByZeroError say e:GetErrorMessage() result = 12/1 always say "calculating result" end say "The result is " + result end end
Sample loop-ifs from Quorum Source Code:
Repo → Quorum3 → SourceCode → Compiler.quorum
1. Line No: 486-496
private action GetActionAtIndex(Class clazz, integer index) returns Action Action act = undefined File file = clazz:GetFile() text classPath = file:GetAbsolutePath() Iterator<Action> actions = clazz:GetActions() repeat while actions:HasNext() Action a = actions:Next() integer start = a:GetIndex() integer finish = a:GetIndexEnd() File f = a:GetFile() text actionPath = f:GetAbsolutePath() if index >= start and index <= finish and classPath = actionPath act = a return act end end return act end
Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:4, F6:0, F7:0, F8: 1,2
Action Identified: find
Repo → Quorum3 → SourceCode → Compiler.quorum
2.Line No: 658-672
private action AddSubpackagesAndClasses(CodeCompletionRequest request, CodeCompletionResult result, text name) //first find all of the packages/sub-packages Iterator<text> packs = sandboxSymbolTable:GetSubpackageNames(name) repeat while packs:HasNext() text pack = packs:Next() integer size = name:GetSize() + 1 text subpackName = pack:GetSubtext(0, size) text filteredName = subpackName + result:filter subpackName = pack:GetSubtext(size, pack:GetSize()) if pack:StartsWith(filteredName) CodeCompletionItem item = GetPackageCompletionItem(subpackName) text filter = result:filter integer size = filter:GetSize() item:dotOffset = request:caretLocation - size result:Add(item) end end
Feature Vector: F1:5, F2:2, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: add
According to Xiaoron's work:
Feature Vector: F1:5, F2:1, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: get
Need to discuss if that is the correct action identified, according to me, it should be add
Questions for meeting on June 9,2015
1. We should not consider nested loops right? Xiaoran's work did not include nested loops.
2. Xiaoran's Action Identification model contains entry for the most frequent 100 loops in Java source code. So there are some loop-ifs in Quorum for which I don't find the valid entry in the table.
3. Show Quorum grammar file and take feedback on how to proceed.
4. What if we don't find enough loop-ifs, how many should be sufficient for the project?
Meeting summary June9,2015
1. We won't consider any kind of nested loops, only loop-ifs.
2. If we find loop-ifs that cannot be mapped to Xiaoran's action identification model, we have to leave that case. We cannot summarize such loops.
3. Only repository we have is the Quorum source code, so we have to work with less number of loop-ifs if we don't find many.We can automate the process to generate the feature vector from any Quorum code.
4. After manual extraction of feature vectors of 25-50 loop-ifs, prepare a test survey for validation.(possibly Xiaoran can help).
5. For generating AST, we might not need ANTLR4. First suggestion is to through the parser in Quorum and try to understand the code. There is a tree walker available in the code. We can use that and might not need to write one of our own.
We can try to stop the compiler execution after it generates the AST. It might not be visible or printed, it should be in the memory.
Now while running the tree walker, might print at nodes when we hit a loop,etc
Parser:/Users/preethac/Desktop/Research/Summer Project 2015/stefika-quorum-language-25186e0d2fa6/plugins/ParserPlugin/src/plugins/quorum/Libraries/Language/Compile
Compiler: /Users/preethac/Desktop/Research/Summer Project 2015/stefika-quorum-language-25186e0d2fa6/Quorum3/SourceCode
Work to discuss-Meeting on June 17, 2015:
More Sample loop-ifs from Quorum Source Code:
Repo → Quorum3 → SourceCode → Compiler.quorum
3. Line No: 727-735
private action ReformPackage(Array<text> packs) returns text text result = "" integer i = 0 repeat while i < packs:GetSize() text value = packs:Get(i) if i = 0 result = result + value else result = result + value + "." end i = i + 1 end return result end
Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: could not be identified(no entry in action identification table)
very close to find, except that F5 is 0, not 1,2 or 4.
Repo → Quorum3 → SourceCode → Compiler.quorum
4. Line No: 1572-1580
action TypeResolution(SymbolTable table, TypeChecker types, CompilerErrorManager errors) Iterator<Class> classes = table:GetClasses() repeat while classes:HasNext() Class next = classes:Next() next:ResolveUseStatements(table, errors) end classes = table:GetClasses() repeat while classes:HasNext() Class next = classes:Next() //because of the way the parent flattening algorithm works, //classes may already be resolved if not next:IsResolved() next:ResolveAllTypes(table, errors) end end classes = table:GetClasses() repeat while classes:HasNext() Class next = classes:Next() next:ComputeVirtualActionTable(errors) next:ComputeGenericsTables(errors) types:Add(next) end end
Feature Vector: F1:5, F2:ResolveAllTypes(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: not identified as method name not listed in action identification table
Repo → Quorum3 → SourceCode → Compiler.quorum
5. Line No: 1784-1792
private action ResolveQualifiedNames( SymbolTable table, Iterator<QualifiedName> names, HashTable<text, File> parsed, HashTable<text, File> unparsed, Class clazz) repeat while names:HasNext() QualifiedName qn = names:Next() if qn:IsAll() ResolveAllClassesInContainer(table, qn, parsed, unparsed, clazz) else text key = qn:GetStaticKey() ResolveClass(table, key, clazz, parsed, unparsed) end end end
Feature Vector: F1:4, F2:ResolveClass(), F3:0, F4:1, F5:0, F6:0, F7:0, F8:2
Action Identified: Not identified as method name not listed in action identification table
Xiaoran's answer:
vector: 4, method name, 0,1,0,0,0,2 action: not identified
Repo → Quorum3 → SourceCode → Compiler.quorum
6. Line no: 1834-1841
private action GetStandardLibraryFolderForPackage(text key) returns File if standardLibraryFolder = undefined File build build:SetPath(DEFAULT_STANDARD_LIBRARY_FOLDER) standardLibraryFolder = build end text loc = standardLibraryFolder:GetWorkingDirectory() text loc2 = standardLibraryFolder:GetAbsolutePath() File file file:SetWorkingDirectory(loc2) Array<text> values = key:Split("\.") text location = "/" Iterator<text> it = values:GetIterator() repeat while it:HasNext() text next = it:Next() location = location + next if it:HasNext() location = location + "/" end end file:SetPath(location) if file:Exists() return file end return undefined end
Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: Not identified(no entry in action identification table)
Xiaoran's answer:
vector: 1,0,0,2,0,0,0,2 action: not identified
Repo→ Quorum3 → SourceCode → Action.quorum
7. Line no.- 108-122
action GetCodeCompletionItem(Variable variable) returns CodeCompletionItem ... text signature = GetStaticKey() Iterator<Variable> variables = GetParameterIterator() text result = "(" integer i = 0 repeat while variables:HasNext() Variable param = variables:Next() Type type = param:GetType() text typeKey = "" text key = param:GetStaticKey() text name = param:GetName() typeKey = GetTypeName(type, variable) text value = typeKey + " " + name result = result + value if variables:HasNext() result = result + ", " end i = i + 1 end result = result + ")" Type returnType = GetReturnType() //if the return type is void, ignore it. Otherwise, include it if not returnType:IsVoid() item:rightDisplayText = GetTypeName(returnType, variable) end ... end
Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: Not identified(no entry in action identification table)
Repo→ Quorum3 → SourceCode → ActionOpcode.quorum
8. Line no.- 163-175
action WriteParameterDebuggingTable(JavaBytecodeLabel start, JavaBytecodeLabel finish) //now get all the variables in this block and write the debugging //table JavaBytecodeMethodWriter writer = GetMethodWriter() Iterator<Variable> variables = method:GetParameterIterator() repeat while variables:HasNext() Variable variable = variables:Next() if not variable:IsField() and variable:IsVisibleToDebugger() Type type = variable:GetType() text description = type:ConvertTypeToBytecodeSignatureInterface() text signature = type:ConvertTypeToBytecodeSignatureInterface() integer location = variable:GetBytecodeLocation() writer:VisitLocalVariable(variable:GetName(), description, signature, start, finish, location) end end writer:VisitLocalVariable("this", clazz:ConvertStaticKeyToBytecodePathTypeName(), clazz:ConvertStaticKeyToBytecodePathTypeName(), start, finish, 0) end
Feature Vector: F1:5, F2:VisitLocalVariable(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: Not identified(as method name not listed in action identification model)
Question: Do we deal with loop-ifs like this where we have more than one ending statements of if block?
Answer: We consider last statement.
Repo→ Quorum3 → SourceCode → ActionOpcode.quorum
9. Line no.- 163-175
action WriteJavaScript returns text text result = "this." + method:ConvertActionToJavaScriptName() + " = function " + "(" Iterator<Variable> parameters = method:GetParameterIterator() integer position = 1 repeat while parameters:HasNext() Variable param = parameters:Next() result = result + param:GetName() if parameters:HasNext() result = result + ", " end end result = result + ") {
“
Iterator<QuorumOpcode> iterator = opcodeList:GetIterator() repeat while iterator:HasNext() QuorumOpcode opcode = iterator:Next() result = result + opcode:WriteJavaScript() end result = result + "};
”
return result end
Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: Not identified(no entry in action identification table)
Repo→ Quorum3 → SourceCode → Class.quorum
10. Line no.- 292-299
action GetGenericList returns text text value = "" i = 0 repeat GetNumberGenerics() times if i = 0 value = value + GetGeneric(i) else value = value + ", " + GetGeneric(i) end i = i + 1 end return value end
Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: Not identified (no entry in action identification table)
Very close to find except that F5 is 0, not 1,2 or 4.
Repo→ Quorum3 → SourceCode → Class.quorum
11. Line no.- 850-870
private action ResolveActions(SymbolTable table, CompilerErrorManager errors) Iterator<Action> actionIterator = me:GetActions() Array<Action> temp repeat while actionIterator:HasNext() Action next = actionIterator:Next() temp:Add(next) //first, we resolve the parameters Iterator<Variable> parameters = next:GetParameterIterator() repeat while parameters:HasNext() Variable param = parameters:Next() ResolveParameterType(param, param:GetType(), table, errors) end Type returnType = next:GetReturnType() if not returnType:IsVoid() ResolveParameterType(next, returnType, table, errors) end //next we resolve all of the variables in its blocks if next:GetBlock() not= undefined ResolveBlock(next:GetBlock(), table, errors) end end
Question: Do we consider repeat with multiple ifs inside and analyze the last if only?
Answer: We only consider loops with one if and the if is also the last statement in the loop.
Loop discarded
Repo→ Quorum3 → SourceCode → Class.quorum
12. Line no.- 883-889
private action ResolveActions(SymbolTable table, CompilerErrorManager errors) Iterator<Action> actionIterator = me:GetActions() Array<Action> temp ... repeat while actionIterator:HasNext() Action next = actionIterator:Next() CompilerError error = Add(next) if error not= undefined errors:Add(error) end end end
Feature Vector: F1:5, F2:1, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: get
Need to discuss if that is the correct action identified, according to me, it should be add if F2:2
Repo→ Quorum3 → SourceCode → Class.quorum
13. Line no.- 894-900
action ResolveFrames Iterator<Action> actionIterator = me:GetActions() repeat while actionIterator:HasNext() Action next = actionIterator:Next() if not next:IsSystem() and not next:IsBlueprint() Block b = next:GetBlock() b:AssignBytecodeLocations() end end if constructor not= undefined Block b = constructor:GetBlock() b:AssignBytecodeLocations() end end
Feature Vector: F1:5, F2:AssignBytecodeLocations(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: not identified
Ques- multiple ending statements in if-block.
Ans: We consider only the last statement.
Repo→ Quorum3 → SourceCode → Class.quorum
14. Line no.- 1307-1314
action ResolveParents(SymbolTable table, CompilerErrorManager errors) isResolvingParents = true Iterator<QualifiedName> names = GetUnresolvedParents() repeat while names:HasNext() QualifiedName name = names:Next() Class myParent = GetClass(name:GetStaticKey(), table, errors) if myParent not= undefined //if it is defined, add it as a valid parent parents:Add(myParent:GetStaticKey(), myParent) end //if it is not, ignore it, as an error has already been issued end ... end
Feature Vector: F1:5, F2:1, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: get
Need to discuss if that is the correct action identified, according to me, it should be add if F2:2
Xiaoran's answer:
vector: 5,1,0,2,0,0,0,2 action: get
Repo→ Quorum3 → SourceCode → Class.quorum
15. Line no.- 1789-1802
Array<Action> finalMatches integer minPoints = 0 minPoints = minPoints:GetMaximumValue() i = 0 repeat matches:GetSize() times Action act = matches:Get(i) integer points = scores:Get(i) if points < minPoints minPoints = points finalMatches:Empty() finalMatches:Add(act) elseif points = minPoints finalMatches:Add(act) end i = i + 1 end
Feature Vector: F1:5, F2:2, F3:0, F4:2, F5:0, F6:0, F7:0, F8:1 (as we consider only the if loop and it's last statement)
Action Identified: add
Xiaoran's answer:
vector: 5,1,0,2,0,0,0,1 action: get
Note: the last statement is i=i+1, not an if. It seems that this is the syntax of loop updater in Quorum.
Repo→ Quorum3 → SourceCode → ClassOpcode.quorum
16. Line no.- 188-204
Iterator<ActionOpcode> actionIterator = actions:GetIterator() repeat while actionIterator:HasNext() ActionOpcode act = actionIterator:Next() Action method = act:GetAction() act:Write() if not clazz:IsError() //write to the interface text name = method:GetName() text params = method:ConvertActionToBytecodeParameterInterfaceSignature() JavaBytecodeMethodWriter interfaceMethodWriter = interfaceWriter:VisitMethod(opcodes:GetPublic() + opcodes:GetAbstract(), name, params, null, undefined) interfaceMethodWriter:VisitEnd() end end
Feature Vector: F1:5, F2:VisitEnd(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: cannot be determined as method name not listed in action identification table
Repo→ Quorum3 → SourceCode → ClassOpcode.quorum
17. Line no.- 207-213
Iterator<Action> it = clazz:GetParentActions() repeat while it:HasNext() Action act = it:Next() Class parentOfMethod = act:GetParentClass() if not parentOfMethod:IsError() or clazz:GetStaticKey() = QUORUM_LIBRARY_LANGUAGE_ERRORS_ERROR WriteParentActionBytecode(act) end end
Feature Vector: F1:4, F2:WriteParentActionBytecode(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: cannot be determined as method name not listed in action identification table
Repo→ Quorum3 → SourceCode → ClassOpcode.quorum
18. Line no.- 229-263
private action WriteParentGetActions Iterator<Class> parents = clazz:GetParentClasses() text classBytecodeName = converter:ConvertStaticKeyToBytecodePath(staticKey) //initialize all of the parent objects as fields repeat while parents:HasNext() Class p = parents:Next() text parentKey = p:GetStaticKey() if parentKey not= QUORUM_LIBRARY_LANGUAGE_ERRORS_ERROR text parentName = p:ConvertStaticKeyToParentFieldName() text converted = p:ConvertStaticKeyToBytecodePath() text convertedParentNameType = p:ConvertStaticKeyToBytecodePathTypeName() text parentActionName = p:ConvertStaticKeyToParentActionName() text null = undefined JavaBytecodeMethodWriter parentWriter = classWriter:VisitMethod(opcodes:GetPublic(), parentActionName, "()" + p:ConvertStaticKeyToBytecodePathTypeName(), null, undefined) parentWriter:VisitCode() //load the ME pointer parentWriter:VisitVariable(opcodes:GetObjectLoad(), ME) //load the parent variable parentWriter:VisitField(opcodes:GetField(), clazz:ConvertStaticKeyToBytecodePath(), p:ConvertStaticKeyToParentFieldName(), p:ConvertStaticKeyToBytecodePathTypeName()) //return the parent variable parentWriter:VisitInstruction(opcodes:GetObjectReturn()) parentWriter:VisitMaxSize(0,0) parentWriter:VisitEnd() JavaBytecodeMethodWriter parentWriterInterface = interfaceWriter:VisitMethod(opcodes:GetPublic()+ opcodes:GetAbstract(), parentActionName, "()" + p:ConvertStaticKeyToBytecodePathTypeName(), null, undefined) parentWriterInterface:VisitEnd() end end
Feature Vector: F1:5, F2:VisitEnd(), F3:0, F4:0, F5:0, F6:0, F7:0, F8:2
Action Identified: cannot be determined as method name not listed in action identification table
Repo→ Quorum3 → SourceCode → ClassOpcode.quorum
19. Line no.- 674-697
private action WriteParents(JavaBytecodeMethodWriter methodWriter) Iterator<Class> parents = clazz:GetParentClasses() text classBytecodeName = converter:ConvertStaticKeyToBytecodePath(staticKey) //initialize all of the parent objects as fields repeat while parents:HasNext() Class p = parents:Next() text parentKey = p:GetStaticKey() if parentKey not= QUORUM_LIBRARY_LANGUAGE_ERRORS_ERROR text parentName = p:ConvertStaticKeyToParentFieldName() text converted = p:ConvertStaticKeyToBytecodePath() text convertedParentNameType = p:ConvertStaticKeyToBytecodePathTypeName() methodWriter:VisitVariable(opcodes:GetObjectLoad(), ME) methodWriter:VisitType(opcodes:GetNew(), converted) methodWriter:VisitInstruction(opcodes:GetDuplicate()) //push false on the parents and call a separate constructor methodWriter:VisitInstruction(opcodes:GetIntegerZero()) methodWriter:VisitMethodInvoke(opcodes:GetInvokeSpecial(), converted, CONSTRUCTOR_JAVA_NAME, "(Z)V", false) methodWriter:VisitField(opcodes:GetPutField(), classBytecodeName, parentName, convertedParentNameType) text convertedHiddenType = p:ConvertStaticKeyToBytecodePathTypeNameInterface() methodWriter:VisitVariable(opcodes:GetObjectLoad(), ME) methodWriter:VisitField(opcodes:GetGetField(), clazz:ConvertStaticKeyToBytecodePath(), parentName, p:ConvertStaticKeyToBytecodePathTypeName()) methodWriter:VisitVariable(opcodes:GetObjectLoad(), ME) methodWriter:VisitField(opcodes:GetPutField(), converted, p:GetHiddenVariableName(), convertedHiddenType) end end
Feature Vector: F1:5, F2: , F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: cannot be determined
Question- Do we deal with nested method calls, like the method call as argument of another method call?What name of method call should we select in this case?
Answer:If the parameter of a method invocation is another method invocation, we only consider the out-most level of method call. The verb is “visit” in this particular example.
Repo→ Quorum3 → SourceCode → CompilerProfiler.quorum
20. Line no.- 85-97
action OutputTimes i = 0 number start = 0 number previous = 0 repeat timing:GetSize() times number value = timing:Get(i) if i = 0 output START_TIME_NAME + ": 0" start = value previous = start else name = phaseHash:GetValue(i) output name + ": " + (value - previous) previous = value end i = i + 1 end end
Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: not identified as corresponding entry not found in action identification table
Repo→ Quorum3 → SourceCode → QualifiedName.quorum
21. Line no.- 61-68
action GetStaticKey returns text key = "" i = 0 repeat GetSize() times key = key + names:Get(i) if i < GetSize() - 1 key = key + "." end i = i + 1 end return key end
Feature Vector: F1:1, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:1
Action Identified: not identified
Xiaoran's answer:
vector: 1,0,0,2,0,0,0,1 action: not identified. This is very close to be action find, except that F5 is 0, not 1,2 or 4.
Repo→ Quorum3 → SourceCode → QuorumByteCodeListener.quorum
22. Line no.- 1962-1974
private action ComputeFinalParameters(Array<QuorumOpcode> parameterOpcodes, Array<QuorumOpcode> conversions) returns Array<QuorumOpcode> //handle any casts, if there are any Array<QuorumOpcode> finalResolvedParameters integer next = 0 repeat parameterOpcodes:GetSize() times QuorumOpcode previous = parameterOpcodes:Get(next) QuorumOpcode possibleCast = conversions:Get(next) if possibleCast not= undefined ExplicitCastOpcode caster = cast(ExplicitCastOpcode, possibleCast) caster:SetMethodWriter(methodWriter) caster:SetOpcodeToCast(previous) finalResolvedParameters:Add(caster) else //nothing to do with this opcode, just push it finalResolvedParameters:Add(previous) end next = next + 1 end return finalResolvedParameters end
Feature Vector: F1:5, F2:2, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
Action Identified: add
Xiaoran's answer:
vector: 5,1,0,2,0,0,0,2 action: get
Completed looking for all loop-ifs in Repo→ Quorum3 → SourceCode →
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → Array.quorum
23. Line no.- 470-476
action Has(Type value) returns boolean integer count = 0 integer size = GetSize() repeat while count < size Type item = Get(count) if value:Equals(item) return true end count = count + 1 end return false end
Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
Action Identified: determine
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → Array.quorum
24. Line no.- 500-507
action Remove(Type value) returns boolean integer count = 0 integer size = GetSize() repeat while count < size Type item = Get(count) if value:Equals(item) RemoveAt(count) return true end count = count + 1 end return false end
Feature Vector: F1:4, F2:5, F3:0, F4:1, F5:3, F6:0, F7:0, F8:2
Action Identified: could not be identified
Note - This action should have been very easily identified as remove , need to discuss
Xiaoran's answer:
action: 4,5,0,1,3,0,0,2 action: not identified
* we do not recognize cases when the last statement is method invocation.
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → Array.quorum
25. Line no.- 532-541
action RemoveAll(Type value) returns boolean boolean hasRemoved = false integer count = 0 integer size = GetSize() repeat while count < size Type item = Get(count) if value:Equals(item) RemoveAt(count) hasRemoved = true size = size - 1 else count = count + 1 end end return hasRemoved end
Feature Vector: F1:1, F2:0, F3:0, F4:1, F5:0, F6:0, F7:0, F8:2
Action Identified: could not be identified (Should have been identified as remove)
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → HashTable.quorum
26. Line no.- 82-87
action GetValue(Key key) returns Value integer index = ComputeIndex(key) HashNode<Key, Value> node = array:Get(index) if node = undefined or key = undefined return undefined end if key:Equals(node:key) return node:value end HashNode<Key, Value> temp = node:next repeat while temp not= undefined if key:Equals(temp:key) return temp:value end temp = temp:next end return undefined end
Feature Vector: F1:0, F2:0, F3:0, F4:1, F5:4, F6:0, F7:0, F8:2
Action Identified: find
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → HashTable.quorum
27. Line no.- 490-495
action HasKey(Key key) returns boolean integer index = ComputeIndex(key) HashNode<Key, Value> node = array:Get(index) if node = undefined or key = undefined return false end if key:Equals(node:key) return true end HashNode<Key, Value> temp = node:next repeat while temp not= undefined if key:Equals(temp:key) return true end temp = temp:next end return false end
Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
Action Identified: determine
Xiaoran's answer:
vector: 0,0,0,0,3,0,0,2 action: determine
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → List.quorum
28. Line no.- 254-263
action GetFirstLocation(Type value) returns integer boolean found = false integer i = 0 ListNode<Type> current = head repeat while current not= undefined Type type = current:value if value:Equals(type) current = undefined found = true else current = current:next i = i + 1 end end if found return i else return invalidIndex end end
Feature Vector: F1:6, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:2
Action Identified: determine
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → List.quorum
29. Line no.- 292-301
action GetLastLocation(Type value) returns integer boolean found = false integer i = size - 1 ListNode<Type> current = tail repeat while current not= undefined Type type = current:value if value:Equals(type) current = undefined found = true else current = current:previous i = i - 1 end end if found return i else return invalidIndex end end
Feature Vector: F1:6, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:2
Action Identified: determine (Same as last one)
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → List.quorum
30. Line no.- 405-411
action Has(Type value) returns boolean ListNode<Type> current = head repeat while current not= undefined Type type = current:value if value:Equals(type) return true end current = current:next end return false end
Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
Action Identified: determine
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → List.quorum
31. Line no.- 448-455
action Remove(Type value) returns boolean ListNode<Type> current = head repeat while current not= undefined Type type = current:value if value:Equals(type) RemoveNode(current) return true end current = current:next end return false end
Feature Vector: F1:5, F2:5, F3:0, F4:1, F5:3, F6:0, F7:0, F8:2
Action Identified: not identified
Should have been identified as remove
Same for the following code:
action RemoveAll(Type value) returns boolean ListNode<Type> current = head boolean wasRemoved = false repeat while current not= undefined Type type = current:value if value:Equals(type) RemoveNode(current) wasRemoved = true end current = current:next end return wasRemoved end
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → List.quorum
32. Line no.- 532-543
private action GetFirstNode(Type value) returns ListNode<Type> boolean found = false integer i = 0 ListNode<Type> current = head ListNode<Type> result = undefined repeat while current not= undefined Type type = current:value integer hash = type:GetHashCode() if value:GetHashCode() = hash result = current current = undefined found = true else current = current:next i = i + 1 end end if found return result else return undefined end end
Feature Vector: F1:6, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:2
Action Identified: determine
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → Support → HashTableIterator.quorum
33. Line no.- 34-41
private action GoToNextNode if node not= undefined node = node:next end if node not= undefined return now //we've found the next one else position = position + 1 end //the current one still needs to be found, it's currently undefined repeat while position < array:GetSize() HashNode<Key, Value> newNode = array:Get(position) if newNode not= undefined node = newNode return now //we're finished searching end position = position + 1 end end
Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:4, F6:0, F7:0, F8:2
Action Identified: find
Xiaoran's answer:
vector: 1,0,0,2,4,0,0,2 action: find
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → Table.quorum
34. Line no.- 595-602
action Has(Type value) returns boolean integer count = 0 integer size = table:GetSize() repeat while count < size Array<Type> item = table:Get(count) if item:Has(value) return true end count = count + 1 end return false end
Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
Action Identified: determine
Repo→ Quorum3 → Library → Standard → Libraries→ Containers → Table.quorum
35. Line no.- 625-632
action Remove(Type value) returns boolean integer count = 0 integer size = table:GetSize() repeat while count < size Array<Type> item = table:Get(count) boolean removedItem = item:Remove(value) if removedItem return true end count = count + 1 end return false end
Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
Action Identified: determine
Xiaoran's answer:
vector: 0,0,0,0, 3,0,0,2 action: determine
*the action is identified based on the model table. However, if we understand the code, the action is not correct. The reason the model can not get this action correct is that such situations in Java do not happen often. It may not occur often in Quorum either.
Repo→ Quorum3 → Library → Standard → Libraries→ Data → Formats → DocumentTypeDefinition.quorum
36. Line no.- 161-167
public action GetEntityValue(text name) returns text Iterator<DocumentTypeDefinitionEntity> els = entities:GetIterator() DocumentTypeDefinitionEntity e repeat while els:HasNext() e = els:Next() entityName = e:GetName() if entityName:Equals(name) return e:GetValue() end end return name end
Feature Vector: F1:5, F2:GetValue(), F3:0, F4:2, F5:4, F6:0, F7:0, F8:2
Action Identified: not identified as method name not listed in action identification table.
Repo→ Quorum3 → Library → Standard → Libraries→ Games → Interface → Panel.quorum
37. Line no.- 110-116
action ItemAtPoint(number x, number y) returns Item integer index = items:GetSize() - 1 Item temp = undefined repeat items:GetSize() times temp = items:Get(index) if temp:GetX() <= x and temp:GetX() + temp:GetWidth() >= x and temp:GetY() <= y and temp:GetY() + temp:GetHeight() >= y return temp end index = index - 1 end return undefined end
Feature Vector: F1:0, F2:0, F3:0, F4:2, F5:4, F6:0, F7:1, F8:1
Action Identified: find
Repo→ Quorum3 → Library → Standard → Libraries→ Games → Keys.quorum
38. Line no.- 494-499
action InitializeKeyNames integer index = 0 repeat while index < 256 text name = ToText(index) if name not= undefined keyNames:put(index, index) end end end
Feature Vector: F1:5, F2:3, F3:0, F4:1, F5:0, F6:0, F7:0, F8:2
Action Identified: not identified
Should be identified as something like add
Xiaoran's answer:
vector: 5,3,0,1,0,0,0,2 action: not identified
Question to ask Xiaoran- do we consider loops with both if and else, or only loops with single if(which is the also the last lexical statement of the block). If we consider loops with if and else, then which ending statement to consider- if's or else's?
Answer: We consider loop-ifs with both if and else blocks, but to determine the action, we only consider the last statement in if block.
Repo→ Quorum3 → Library → Standard → Libraries→ Language → Types → Text.quorum
39. Line no.- 580-619
Complex example, we can only take the last statement of if, but our model might not work too well.
action Split(text delimiter) returns Array<text> // TODO: Rewrite this. it is horrid. There are much more clever ways to do this. Array<text> results integer pos = 0 integer length = me:GetSize() Text delim delim:SetValue(delimiter) integer delimLength = delim:GetSize() text newString = "" boolean afterDelim = false repeat while pos < length // If this doesn't start the delimiter, add it to the new string. if me:GetCharacter(pos) not= delim:GetCharacter(0) newString = newString + me:GetCharacter(pos) pos = pos + 1 afterDelim = false else // Matches start of delimiter. Keep track and bail if we don't match the delimiter text tmpString = "" integer delimPos = 0 text currentChar = me:GetCharacter(pos) text currentDelimChar = delim:GetCharacter(0) // As long as the values read continue to match the delimiter... repeat while delimPos < delimLength and currentChar = currentDelimChar and pos < length tmpString = tmpString + me:GetCharacter(pos) delimPos = delimPos + 1 // keep going through delimiter pos = pos + 1 // and keep moving ahead in main string if pos < length currentChar = me:GetCharacter(pos) end if delimPos < delimLength currentDelimChar = delim:GetCharacter(delimPos) end end if delimPos not= delimLength // We didn't reach the end of the delimiter, so add this temporary string and keep moving. tmpString = tmpString + newString afterDelim = false else // Delimiter hit. Store the result. results:Add(newString) newString = "" afterDelim = true end end end ... end
Completed searching loop-ifs till Repo→ Quorum3 → Library → Standard → Libraries
Sent 10 loop-ifs to Xiaoran for evaluation.
Manual Loop Extraction Report
1. Identified 37 loop-ifs from Quorum language repository.
2. Actions were identified for 17 loop-ifs, after mapping the feature vectors with the existent action identification model.
3. Details of actions identified: find(4 loop-ifs-Sample loop no.s - 1,26,33,37) + get(5 loop-ifs-Sample loop no.s - 2,12,14,15,22) + determine(8 loop-ifs-Sample loop no.s - 23,27,28,29,30,32,34,35). Only these 3 actions were identified.
4. 20 loop-ifs could not be identified: Could not be identified as no corresponding entry found in the action identification model(11 loop-ifs) + could not be identified as last statement of if block was method call, and the method name was not listed in the action identification model(9 loop-ifs-Sample loop no.s - 4,5,8,13,16,17,18,19,36).
5. Out of the 11 loop-ifs which could not be identified as no corresponding entry found in the action identification model - a few were very close to identifying action find(Sample loop no.s - 3,10,21)
6. Out of the 17 loop-ifs identified action, one is identified wrong(Loop no-35), it should have been action remove.
7. Also, one of the issues which I think is very important, is that, all the loop-ifs which should be identified as add, are getting identified as get(Sample loop no- 2,12,14,15,22). This is because, according to Xiaoran this loops should have F2=1 , whereas I thought it should be F2=2. Had it been identified as F2=2, then the feature vector corresponds to the correct action(add) from the action identification model.
Spoke to Xiaoran, he says it is just the label, get-actually means add.(Previously get and add were named as add1 and add2, later to avoid confusion one is labelled as get)
Not sure how to handle this. Should I just consider those as get(s) and move on? Need your input.Please let me know if you need further details about manual extraction process
Note: Loop no.s sent to Xiaoran for evaluation - 5,6,14,15,21,22,24,27,33,35,38. All details of his comments and mine are documented above for reference.
COMPILER AND AST
Reply of Dr. Stefik on Quorum Compiler in a private list(Google group):
The compiler has changed a lot in the last year and we haven't posted documentation on how to use it yet, largely because our official release date isn't until June 30th (12 days).
That said, since this is a private list, here's the easiest way to build it (I am writing the instructions soon anyway):
1. Download NetBeans 8.02 or later from Oracle and the latest of JDK 8
2. Go to Tools → Plugins. Under settings, add the following as an update center:
http://quorumlanguage.com/updates/quorum/updates.xml
3. This will give you access to two plugins, Quorum 3 and Accessibility.
4. Restart NetBeans and you should be able to open/create Quorum projects.
The compiler is in a separate project, not the standard library, but can be found here: https://bitbucket.org/stefika/quorum-language under the folder Quorum3.
5. If you grabbed the compiler, you should be able to click Build and it will work as expected.
Now, if you have the compiler, you can use the Compiler class to build source, or you can make your own QuorumSourceListener subclass and then do what you want in processing source. Finally, if you just can't get any of that to work, you can always copy the Antlr grammar for Quorum directly and then tweak it to your heart's content. That is located here:
https://bitbucket.org/stefika/quorum-language/src/269c256f6cd4f8b85d1c850d0f3098a10335fe62/plugins/ParserPlugin/src/plugins/quorum/Libraries/Language/Compile/Quorum.g4?at=master
HTH, Stefik
My updates on compiler and AST:
Successfully followed steps 1-5.
Compiler class is to be found at(no idea how to build source with it) :
/Users/preethac/Desktop/Research/Summer Project 2015/Quorum_Repo_New/Quorum3/Build/quorum/Libraries/Language/ and not visible from Sodbeans(IDE for Quorum).
Class QuorumSourceListener is to be found in file “QuorumSourceListener.quorum”
Also no clear instructions given on how to run Quorum from cmd(tried installing but failed), so trying to run the whole compiler project from Sodbeans only and figure out the flow(working on it now). Quorum3 folder is the whole compiler project.
Preethas-MacBook-Pro:SourceCode preethac$ grep “class QuorumSourceListener” *.quorum
QuorumSourceListener.quorum:class QuorumSourceListener
Preethas-MacBook-Pro:SourceCode preethac$ grep QuorumSourceListener *.quorum
Compiler.quorum: private action ParseSandbox(text source, QuorumSourceListener listener, SymbolTable table, CompilerErrorManager errors,
Compiler.quorum: action Parse(File file, QuorumSourceListener listener)
Compiler.quorum: private system action ParseNative(File file, QuorumSourceListener listener)
Compiler.quorum: action Parse(text source, QuorumSourceListener listener)
Compiler.quorum: private system action ParseNative(text source, QuorumSourceListener listener)
Parser.quorum:class Parser is QuorumSourceListener
QuorumBytecodeListener.quorum:class QuorumBytecodeListener is QuorumSourceListener
QuorumJavascriptListener.quorum:class QuorumJavascriptListener is QuorumSourceListener
QuorumSourceListener.quorum:class QuorumSourceListener
TypeCheckListener.quorum:class TypeCheckListener is QuorumSourceListener
Examining useful(what I think) pieces of code:
Compiler.quorum(line no-2121-2139) at https://bitbucket.org/stefika/quorum-language/src/caf90e22e4f17863ee3d4a6a0aa9a2f669856bef/Quorum3/SourceCode/Compiler.quorum?at=master:
action Parse(File file, QuorumSourceListener listener) listener:SetSymbolTable(symbolTable) listener:SetCompilerErrorManager(compilerErrorManager) listener:SetFile(file) listener:SetTypeChecker(checker) ParseNative(file, listener) end
private system action ParseNative(File file, QuorumSourceListener listener)
action Parse(text source, QuorumSourceListener listener) listener:SetSymbolTable(symbolTable) listener:SetCompilerErrorManager(compilerErrorManager) listener:SetTypeChecker(checker) ParseNative(source, listener) end
private system action ParseNative(text source, QuorumSourceListener listener)
QuorumSourceListener.quorum at https://bitbucket.org/stefika/quorum-language/src/caf90e22e4f17863ee3d4a6a0aa9a2f669856bef/Quorum3/SourceCode/QuorumSourceListener.quorum?at=master
Parser.quorum at https://bitbucket.org/stefika/quorum-language/src/caf90e22e4f17863ee3d4a6a0aa9a2f669856bef/Quorum3/SourceCode/Parser.quorum?at=master
Parser related .class files at : /Users/preethac/Desktop/Research/Summer Project 2015/Quorum_Repo_New/Quorum3/Library/Standard/Plugins/org/antlr/v4 (for my reference)
or, in general at: https://bitbucket.org/stefika/quorum-language/src/caf90e22e4f17863ee3d4a6a0aa9a2f669856bef/Quorum3/Library/Standard/Plugins/org/antlr/v4/runtime/?at=master
Antlr grammar for Quorum :
https://bitbucket.org/stefika/quorum-language/src/269c256f6cd4f8b85d1c850d0f3098a10335fe62/plugins/ParserPlugin/src/plugins/quorum/Libraries/Language/Compile/Quorum.g4?at=master
When tried running this grammar in Antlr, got the following errors(not able to figure out the errors yet):
Preethas-MacBook-Pro:tmp preethac$ antlr4 Quorum.g4 warning(155): Quorum.g4:317:32: rule NEWLINE contains a lexer command with an unrecognized constant value; lexer interpreter may produce incorrect output warning(155): Quorum.g4:318:38: rule WS contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output warning(155): Quorum.g4:322:34: rule COMMENTS contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output Preethas-MacBook-Pro:tmp preethac$ javac Quorum*.java QuorumParser.java:1002: error: package quorum.Libraries.Language.Compile.Context does not exist public quorum.Libraries.Language.Compile.Context.ActionContext actionContext; ^ QuorumParser.java:1150: error: package quorum.Libraries.Language.Compile does not exist public quorum.Libraries.Language.Compile.QualifiedName qualifiedName; ^ QuorumParser.java:2242: error: package quorum.Libraries.Language.Compile.Symbol does not exist public quorum.Libraries.Language.Compile.Symbol.Type type; ^ 3 errors Identified these errors are results of the following lines of code in the grammar file. Line no:63-66 method_shared returns [quorum.Libraries.Language.Compile.Context.ActionContext actionContext] : ACTION ID (LEFT_PAREN (formal_parameter (COMMA formal_parameter)*)? RIGHT_PAREN)? (RETURNS return_type = assignment_declaration )? ;
Line no:72-76 qualified_name returns [quorum.Libraries.Language.Compile.QualifiedName qualifiedName] : ids+=ID (PERIOD ids+=ID)* ; block : statement* ;
Line no:150-156 assignment_declaration returns [quorum.Libraries.Language.Compile.Symbol.Type type] : qualified_name generic_statement? #GenericAssignmentDeclaration | INTEGER_KEYWORD #IntegerAssignmentDeclaration | NUMBER_KEYWORD #NumberAssignmentDeclaration | TEXT #TextAssignmentDeclaration | BOOLEAN_KEYWORD #BooleanAssignmentDeclaration ;
Directory to be searched for the location is: 1. Desktop/Research/Summer Project 2015/Quorum_Latest/quorum/plugins/quorum/Libraries/Language/Compile/ Tried running antlr4 from the above path, shows same error. Copied All directories of "quorum/Libraries/Language/Compile/..." to local directory and also Quorum.g4 in the same directory, still gives the same error. 2. Desktop/Research/Summer Project 2015/Quorum_Latest/plugins/ParserPlugin/src/plugins/quorum/Libraries/Language/Compile Same results as above
What does return mean in ANTLR grammar? Reference : http://bkiers.blogspot.com/2011/03/2-introduction-to-antlr.html <rule> returns [<datatype> <variablename>] ; ... ; Instead of using temporary storage to share data between event methods(like in implementation of listeners/visitors), we can store those values in the parse tree itself.
Temporarily works if following changes made in grammar file(but supposedly not correct) method_shared returns [quorum.Libraries.Language.Compile.Context.ActionContext actionContext] made to: method_shared returns [ActionContext actionContext] qualified_name returns [quorum.Libraries.Language.Compile.QualifiedName qualifiedName] made to: qualified_name returns [String qualifiedName] (as QualifiedName qualifiedName doesn't work) assignment_declaration returns [quorum.Libraries.Language.Compile.Symbol.Type type] made to: assignment_declaration returns [String type] (as Type type doesn't work) Currently experimenting on /Users/preethac/Desktop/Research/Summer Project 2015/Antlr/quorum/plugins/Quorum.g4 Preethas-MacBook-Pro:plugins preethac$ grun Quorum statement -tree if a > 100 c = 1 elseif b = 100 c = 2 else c = 3 end (statement (if_statement if (expression (expression (action_call a)) > (expression 100)) (block (statement (assignment_statement c = (expression 1)))) elseif (expression (expression (action_call b)) = (expression 100)) (block (statement (assignment_statement c = (expression 2)))) else (block (statement (assignment_statement c = (expression 3)))) end)) Preethas-MacBook-Pro:plugins preethac$ grun Quorum loop_statement -tree repeat until a < 15 a = a + 1 end (loop_statement repeat until (expression (expression (action_call a)) < (expression 15)) (block (statement (assignment_statement a = (expression (expression (action_call a)) + (expression 1))))) end) Preethas-MacBook-Pro:plugins preethac$ grun Quorum statement -tree repeat while index < 256 text name = ToText(index) if name not= undefined keyNames:put(index, index) end end
(statement (loop_statement repeat while (expression (expression (action_call index)) < (expression 256)) (block (statement (assignment_statement (assignment_declaration text) name = (expression (action_call ToText ( (function_expression_list (expression (action_call index))) ))))) (statement (if_statement if (expression (expression (action_call name)) not= (expression undefined)) (block (statement (solo_method_call (qualified_name keyNames) : put ( (expression (action_call index)) , (expression (action_call index)) )))) end))) end))
Running Quorum from Console
/Users/preethac/Desktop/Research/Summer Project 2015/Quorum_Console/Quorum3
Preethas-MacBook-Pro:Quorum3 preethac$ ls -lrt total 4336 -rwxr-xr-x@ 1 preethac staff 2212531 Jun 30 11:18 Quorum.jar drwxr-xr-x@ 5 preethac staff 170 Jul 4 19:55 Library -rw-r--r-- 1 preethac staff 15 Jul 6 22:27 Hello.quorum drwxr-xr-x 3 preethac staff 102 Jul 6 22:28 Run drwxr-xr-x 3 preethac staff 102 Jul 6 22:28 Build Preethas-MacBook-Pro:Quorum3 preethac$ java -jar Quorum.jar -name MyProgram -compile Hello.quorum Quorum 3.0 Build Successful Preethas-MacBook-Pro:Run preethac$ pwd /Users/preethac/Desktop/Research/Summer Project 2015/Quorum_Console/Quorum3/Run Preethas-MacBook-Pro:Run preethac$ java -jar MyProgram.jar Hello
Running with -help
Preethas-MacBook-Pro:Quorum3 preethac$ java -jar Quorum.jar -help Quorum Quorum 3.0
Quorum is a computer programming language, which you can use either from the console using flags (the program you just ran) or from a development environment (like NetBeans). For this version, the commands that Quorum knows take the following format:
java -jar Quorum.jar (-flag value*)*
What this means is that you can pass a flag to Quorum combined with any number of values. The legal flags are listed as follows: -name This flag tells Quorum to change the name the file is outputs.
- compile This flag tells Quorum to compile a set of files.
- test This flag tells Quorum to run its test suite on itself.
- help This flag tells Quorum to output this help screen.
Here are a few examples of how you can use this program: java -jar Quorum.jar Hello.quorum
This would request that Quorum compiles the source file Hello.quorum.
java -jar Quorum.jar -compile Hello.quorum
java -jar Quorum.jar -name MyProgram -compile Hello.quorum
This would cause Quorum to compile Hello.quorum and name the output MyProgram.
java -jar Quorum.jar Hello.quorum Goodbye.quorum
This would cause Quorum to Compile two source code files. The first file must have a Main action.
For more information on writing programs in Quorum, visit www.quorumlanguage.com.
Build.sh
Creates a temporary directory, copies into it the source code for compiler(all files in source code), the library files and the files under Run. Compiles all latest(Quorum3) compiler files in Quorum2(using Default.jar) and creates Quorum.jar. Using this jar file re-compiles Quorum3 in Quorum3. Creates a folder Quorum, copies into the final Quorum.jar and Library files. Runs all the test suite(library files) on Quorum.jar. This Quorum.jar is the actual compiler, using which we can now compile our quorum programs.
wifi-roaming-128-4-156-36:Quorum preethac$ java -jar Quorum.jar Hello.quorum Quorum 3.0 Build Successful wifi-roaming-128-4-156-36:Quorum preethac$ ls -lrt total 3592 -rw-r--r-- 1 preethac staff 1833024 Jul 7 16:36 Quorum.jar drwxr-xr-x@ 4 preethac staff 136 Jul 7 16:37 Library -rw-r--r-- 1 preethac staff 15 Jul 7 16:42 Hello.quorum drwxr-xr-x 3 preethac staff 102 Jul 7 16:42 Run drwxr-xr-x 3 preethac staff 102 Jul 7 16:42 Build wifi-roaming-128-4-156-36:Quorum preethac$ cd Run/ wifi-roaming-128-4-156-36:Run preethac$ ls -lrt total 512 -rw-r--r-- 1 preethac staff 258607 Jul 7 16:42 Default.jar wifi-roaming-128-4-156-36:Run preethac$ java -jar Default.jar Hello
Antlr Error
Couldn't fix the error. Refer https://github.com/antlr/antlr4/issues/497 and http://stackoverflow.com/questions/22027175/why-am-i-getting-an-error-when-assigning-tokens-to-a-channel. But could not figure out the causes-not sure if we need to edit the grammar file to run Antlr. Compared the Quorum.tokens file generated by Antlr after running Antlr4 Quorum.g4, and the given Quorum.token file, looks like it is similar.
.class files in library Eg:parser.java, lexer.java, Token.java, AbstractParseTreeVisitor.java, SyntaxTree.java,etc. Decoded the .class files to .java files with Sodbeans.Not able to understand the compiled-code sections in it.Clueless.
ANTLR Notes
Running ANTLR
Add antlr-4.5-complete.jar to your CLASSPATH:
$ export CLASSPATH=".:/usr/local/lib/antlr-4.5-complete.jar:$CLASSPATH"
It's also a good idea to put this in your .bash_profile or whatever your startup script is.
Create aliases for the ANTLR Tool, and TestRig.
$ alias antlr4='java -Xmx500M -cp "/usr/local/lib/antlr-4.5-complete.jar:$CLASSPATH" org.antlr.v4.Tool' $ alias grun='java org.antlr.v4.runtime.misc.TestRig' ANTLR provides a flexible testing tool in the runtime library called TestRig. It can display lots of information about how a recognizer matches input from a file or standard input. TestRig uses Java reflection to invoke compiled recognizers. The test rig takes a grammar name, a starting rule name kind of like a main() method, and various options that dictate the output we want. Let’s say we’d like to print the tokens created during recognition. Tokens are vocabulary symbols like keyword hello and identifier parrt.
grammar Hello; // Define a grammar called Hello r : 'hello' ID ; // match keyword hello followed by an identifier ID : [a-z]+ ; // match lower-case identifiers WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines, \r (Windows) $ antlr4 Hello.g4 # Generate parser and lexer using antlr4 alias from before $ ls Hello.g4 HelloLexer.java HelloParser.java Hello.tokens HelloLexer.tokens HelloBaseListener.java HelloListener.java $ javac *.java # Compile ANTLR-generated code $ grun Hello r -tokens # start the TestRig on grammar Hello at rule r ➾ hello parrt # input for the recognizer that you type ➾ EOF # type ctrl-D on Unix or Ctrl+Z on Windows ❮ [@0,0:4='hello',<1>,1:0] # these three lines are output from grun [@1,6:10='parrt',<2>,1:6] [@2,12:11='<EOF>',<-1>,2:0] ➾ $ grun Hello r -tree ➾ hello parrt ➾EOF ❮ (r hello parrt) grun Hello r -trace hello parrt enter r, LT(1)=hello consume [@0,0:4='hello',<1>,1:0] rule r consume [@1,6:10='parrt',<2>,1:6] rule r exit r, LT(1)=<EOF> $ grun java org.antlr.v4.runtime.misc.TestRig GrammarName startRuleName [-tokens] [-tree] [-gui] [-ps file.ps] [-encoding encodingname] [-trace] [-diagnostics] [-SLL] [input-filename(s)] Use startRuleName='tokens' if GrammarName is a lexer grammar. Omitting input-filename makes rig read from stdin.
- tokens prints out the token stream.
- tree prints out the parse tree in LISP form.
- gui displays the parse tree visually in a dialog box.
- ps file.ps generates a visual representation of the parse tree in PostScript and stores it in file.ps. The parse tree figures in this chapter were generated with -ps.
- encoding encodingname specifies the test rig input file encoding if the current locale would not read the input properly. For example, we need this option to parse a Japanese-encoded XML file in Section 12.4, Parsing and Lexing XML, on page 224.
- trace prints the rule name and current token upon rule entry and exit. -diagnostics turns on diagnostic messages during parsing. This generates messages only for unusual situations such as ambiguous input phrases.
- SLL uses a faster but slightly weaker parsing strategy.
ANTLR Parsing
The process of grouping characters into words or symbols (tokens) is called lexical analysis or simply tokenizing. We call a program that tokenizes the input a lexer. The lexer can group related tokens into token classes, or token types, such as INT (integers), ID (identifiers), FLOAT (floating-point numbers), and so on. The lexer groups vocabulary symbols into types when the parser cares only about the type, not the individual symbols. Tokens consist of at least two pieces of information: the token type (identifying the lexical structure) and the text matched for that token by the lexer.
The second stage is the actual parser and feeds off of these tokens to recognize the sentence structure, in this case an assignment statement. By default, ANTLR generated parsers build a data structure called a parse tree or syntax tree that records how the parser recognized the structure of the input sentence and its component phrases.
Lexers process characters and pass tokens to the parser, which in turn checks syntax and creates a parse tree. The correspond- ing ANTLR classes are CharStream, Lexer, Token, Parser, and ParseTree. The “pipe” connecting the lexer and parser is called a TokenStream.
ParseTree subclasses RuleNode and TerminalNode that correspond to subtree roots and leaf nodes. RuleNode has familiar methods such as getChild() and getParent(), but RuleNode isn’t specific to a particular grammar. To better support access to the elements within specific nodes, ANTLR generates a RuleNode subclass for each rule.
ANTLR generates tree walking mechanisms automatically.
By default, ANTLR generates a parse-tree listener interface that responds to events triggered by the built-in tree walker. To walk a tree and trigger calls into a listener, ANTLR’s runtime provides class ParseTreeWalker. To make a language application, we build a ParseTreeListener implementation containing application-specific code that typically calls into a larger surrounding application.
ANTLR generates a ParseTreeListener subclass specific to each grammar with enter and exit methods for each rule.
Option -visitor asks ANTLR to generate a visitor interface from a grammar with a visit method per rule.
ParseTree tree = ... ; // tree is result of parsing MyVisitor v = new MyVisitor(); v.visit(tree);
There are two key ANTLR components: the ANTLR tool itself and the ANTLR runtime (parse-time) API. When we say “run ANTLR on a grammar,” we’re talking about running the ANTLR tool, class org.antlr.v4.Tool. Running ANTLR generates code (a parser and a lexer) that recognizes sentences in the language described by the grammar. A lexer breaks up an input stream of characters into tokens and passes them to a parser that checks the syntax. The runtime is a library of classes and methods needed by that generated code such as Parser, Lexer, and Token. First we run ANTLR on a grammar and then compile the generated code against the runtime classes in the jar. Ultimately, the compiled application runs in conjunction with the runtime classes.
/** Grammars always start with a grammar header. This grammar is called * ArrayInit and must match the filename: ArrayInit.g4 */ grammar ArrayInit; /** A rule called init that matches comma-separated values between {...}. */ init : '{' value (',' value)* '}' ; // must match at least one value /** A value can be either a nested array/struct or a simple integer (INT) */ value : init | INT ; // parser rules start with lowercase letters, lexer rules with uppercase INT : [0-9]+ ; // Define token INT as one or more digits WS : [ \t\r\n]+ -> skip ; // Define whitespace rule, toss it out
ArrayInitParser.java This file contains the parser class definition specific to grammar ArrayInit that recognizes our array language syntax. public class ArrayInitParser extends Parser { … } It contains a method for each rule in the grammar as well as some support code.
ArrayInitLexer.java ANTLR automatically extracts a separate parser and lexer specification from our grammar. This file contains the lexer class definition, which ANTLR generated by analyzing the lexical rules INT and WS as well as the grammar literals '{', ',', and '}'. Recall that the lexer tokenizes the input, breaking it up into vocabulary symbols. Here’s the class outline: public class ArrayInitLexer extends Lexer { … }
ArrayInit.tokens ANTLR assigns a token type number to each token we define and stores these values in this file. It’s needed when we split a large grammar into multiple smaller grammars so that ANTLR can synchronize all the token type numbers. See Importing Grammars, on page 36.
ArrayInitListener.java, ArrayInitBaseListener.java By default, ANTLR parsers build a tree from the input. By walking that tree, a tree walker can fire “events” (callbacks) to a listener object that we provide. ArrayInitListener is the interface that describes the callbacks we can implement. ArrayInitBaseListener is a set of empty default implementations. This class makes it easy for us to override just the callbacks we’re interested in. ANTLR can alsogenerate tree visitors for us with the -visitor command-line option.
Integrating a Generated Parser into a Java Program
// import ANTLR's runtime libraries import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.*; public class Test { public static void main(String[] args) throws Exception { // create a CharStream that reads from standard input ANTLRInputStream input = new ANTLRInputStream(System.in); // create a lexer that feeds off of input CharStream ArrayInitLexer lexer = new ArrayInitLexer(input); // create a buffer of tokens pulled from the lexer CommonTokenStream tokens = new CommonTokenStream(lexer); // create a parser that feeds off the tokens buffer ArrayInitParser parser = new ArrayInitParser(tokens); ParseTree tree = parser.init(); // begin parsing at init rule System.out.println(tree.toStringTree(parser)); // print LISP-style tree // Create a generic parse tree walker that can trigger callbacks ParseTreeWalker walker = new ParseTreeWalker(); // Walk the tree created during the parse, trigger callbacks walker.walk(new ShortToUnicodeString(), tree); System.out.println(); // print a \n after translation } }
➾ $ javac ArrayInit*.java Test.java ➾ $ java Test ➾ {1,{2,3},4} ➾EOF ❮ (init { (value 1) , (value (init { (value 2) , (value 3) })) , (value 4) })
We can split a large grammar file into logical chunks, such as, parser rules and lexical rules, and one can import the other.
ANTLR parsers automatically generate visitor interfaces and blank method implementations
Modifications to the grammar - need to label the alternatives of the rules. (The labels can be any identifier that doesn’t collide with a rule name.) Without labels on the alternatives, ANTLR generates only one visitor method per rule.
LabeledExpr.g4 stat: expr NEWLINE # printExpr | ID '=' expr NEWLINE # assign | NEWLINE ; # blank expr: expr op=('*'|'/') expr # MulDiv | expr op=('+'|'-') expr # AddSub | INT # int | ID #id | '(' expr ')' # parens ; MUL :'*' ; // assigns token name to '*' used above in grammar DIV :'/' ; ADD :'+' ; SUB :'-' ;
ANTLR Visitor
➾ $ antlr4 -no-listener -visitor LabeledExpr.g4 First, ANTLR generates a visitor interface with a method for each labeled alternative name. public interface LabeledExprVisitor<T> { T visitId(LabeledExprParser.IdContext ctx); # from label id T visitAssign(LabeledExprParser.AssignContext ctx); # from label assign T visitMulDiv(LabeledExprParser.MulDivContext ctx); # from label MulDiv ... }
ANTLR also generates a default visitor implementation called LabeledExprBaseVisitor that we can subclass.
ANTLR Listener
The key “interface” between the grammar and our listener object is called JavaListener, and ANTLR automatically generates it for us. It defines all of the methods that class ParseTreeWalker from ANTLR’s runtime can trigger as it traverses the parse tree.
The biggest difference between the listener and visitor mechanisms is that listener methods are called by the ANTLR-provided walker object, whereas visitor methods must walk their children with explicit visit calls. Forgetting to invoke visit() on a node’s children means those subtrees don’t get visited. ANTLR generates a default implementation called JavaBaseListener. Our interface extractor can then subclass JavaBaseListener and override the methods of interest.
ANTLR Grammar
Channels-The secret to preserving but ignoring comments and whitespace is to send those tokens to the parser on a “hidden channel.” The parser tunes to only a single channel and so we can pass anything we want on the other channels.
ANTLR generates a function for each rule in your grammar.
Introduction to grammar:
Grammars consist of a header that names the grammar and a set of rules that can invoke each other.
grammar MyG; rule1 : «stuff» ; rule2 : «more stuff» ; ...
The nouns on the right side are typically references to either tokens or yet-to-be-defined rules.
Pattern: Sequence
In a grammar for eg. the retrieve command is a keyword followed by an integer followed by a newline token. To specify such a sequence in a grammar, we simply list the elements in order. In ANTLR notation, the retrieve command is just sequence 'RETR' INT '\n', where INT represents the integer token type.
retr : 'RETR' INT '\n' ; (match keyword integer newline sequence).
We are labeling the RETR sequence as the retr rule. Elsewhere in the grammar, we can refer to the RETR sequence with the rule name as a shorthand.
To encode a sequence of one or more elements, we use the + subrule operator. Eg: INT+
To specify that a list can be empty, we use the zero-or-more * operator: INT*
match 'rule-name :' followed by at least one alternative followed by zero or more alternatives separated by '|' symbols followed by ';' rule : ID ':' alternative ('|' alternative )* ';' ;
Pattern: Choice
To express the notion of choice in a language, we use | as the “or” operator in ANTLR rules to separate grammatical choices called alternatives or produc- tions. Grammars are full of choices.
stmt: node_stmt | edge_stmt | attr_stmt | id '=' id | subgraph ;
Pattern: Token Dependency
object : '{' pair (',' pair)* '}' | '{' '}' // empty object ; pair: STRING ':' value ;
Pattern: Nested Phrase
If the pseudocode for a rule references itself, we are going to need a recursive (self-referencing) rule.
Let’s see how nesting works for code blocks. A while statement is the keyword while followed by a condition expression in parentheses followed by a statement. We can also treat multiple statements as a single block statement by wrapping them in curly braces. Expressing that grammatically looks like this:
stat: 'while' '(' expr ')' stat // match WHILE statement | '{' stat* '}' // match block of statements in curlies ... // and other kinds of statements ;
Antlr grammar resolved operator precedence by the alternative given first
expr : expr '^'<assoc=right> expr // ^ operator is right associative | expr '*' expr // match subexpressions joined with '*' operator | expr '+' expr // match subexpressions joined with '+' operator | INT // matches simple integer atom ;
Highest priority to ^, then * and so forth. By default, ANTLR associates operators left to right as we’d expect for * and +. Some operators like exponentiation group right to left, though, so we have to manually specify the associativity on the operator token using option assoc.
In grammar pseudo code, a string is a sequence of any characters between double quotes.
STRING : '“' .*? '”' ; match anything in “…”,as a pattern for matching stuff inside quotes or other delimiters.
When a lexer matches the tokens we’ve defined so far, it emits them via the token stream to the parser. The parser then checks the grammatical structure of the stream.
Token Category Description and Examples Punctuation The easiest way to handle operators and punctuation is to directly reference them in parser rules. call : ID '(' exprList ')' ; Some programmers prefer to define token labels such as LP (left parenthesis) instead. call : ID LP exprList RP ; LP : '(' ; RP : ')' ; Keywords Keywords are reserved identifiers, and we can either reference them directly or define token types for them. returnStat : 'return' expr ';' Identifiers Identifiers look almost the same in every language, with some variation about what the first character can be and whether Unicode characters are allowed. ID : ID_LETTER (ID_LETTER | DIGIT)* ; // From C language fragment ID_LETTER : 'a'..'z'|'A'..'Z'|'_' ; fragment DIGIT : '0'..'9' ; Numbers These are definitions for integers and simple floating-point numbers. INT : DIGIT+ ; FLOAT : DIGIT+ '.' DIGIT* | '.' DIGIT+ ; Strings Match double-quoted strings. STRING: '"'(ESC|.)*?'"'; fragment ESC : '\\' [btnr"\\] ; // \b, \t, \n etc... Comments Match and discard comments. LINE_COMMENT : '//' .*? '\n' -> skip ; COMMENT : '/*' .*? '*/' -> skip ; Whitespace Match and discard comments. WS : [ \t\n\r]+ -> skip ;
ANTLR Listeners and Visitors
A listener is an object that responds to rule entry and exit events (phrase recognition events) triggered by a parse-tree walker as it discovers and finishes nodes. To support situations where an application must control how a tree is walked, ANTLR-generated parse trees also support the well-known tree visitor pattern.
The biggest difference between listeners and visitors is that listener methods aren’t responsible for explicitly calling methods to walk their children. Visitors, on the other hand, must explicitly trigger visits to child nodes to keep the tree traversal going. Visitors get to control the order of traversal and how much of the tree gets visited because of these explicit calls to visit children.
From the grammar, ANTLR generates <filename>Parser, which automatically builds the parse tree.
Once we have a parse tree, we can use ParseTreeWalker to visit all of the nodes, triggering enter and exit methods.
ANTLR’s ParseTreeWalker triggers enter and exit methods for each rule subtree as it discovers and finishes nodes, respectively.
Listener
ANTLR generates a <filename>Listener from the grammar file.
ANTLR also generates class PropertyFileBaseListener with default implementations which implements the PropertyFileListener.
Propertyfile.g4 file : prop+ ; prop : ID '=' STRING '\n' ; ID : [a-z]+ ; STRING : '"' .*? '"' ; t.properties user="parrt" machine="maniac" **PropertyFileListener.java** import org.antlr.v4.runtime.tree.*; import org.antlr.v4.runtime.Token; public interface PropertyFileListener extends ParseTreeListener { void enterFile(PropertyFileParser.FileContext ctx); void exitFile(PropertyFileParser.FileContext ctx); void enterProp(PropertyFileParser.PropContext ctx); void exitProp(PropertyFileParser.PropContext ctx); }
Because there are only two parser rules in grammar PropertyFile, there are four methods in the interface.
TestPropertyFile.java import org.antlr.v4.misc.OrderedHashMap; import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.*; import java.io.*; import java.util.Map;
public class TestPropertyFile { public static class PropertyFileLoader extends PropertyFileBaseListener { Map<String,String> props = new OrderedHashMap<String, String>(); public void exitProp(PropertyFileParser.PropContext ctx) { String id = ctx.ID().getText(); // prop : ID '=' STRING '\n' ; String value = ctx.STRING().getText(); props.put(id, value); } }
public static void main(String[] args) throws Exception { String inputFile = null; if ( args.length>0 ) inputFile = args[0]; InputStream is = System.in; if ( inputFile!=null ) { is = new FileInputStream(inputFile); } ANTLRInputStream input = new ANTLRInputStream(is); PropertyFileLexer lexer = new PropertyFileLexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); PropertyFileParser parser = new PropertyFileParser(tokens); ParseTree tree = parser.file();
// create a standard ANTLR parse tree walker ParseTreeWalker walker = new ParseTreeWalker(); // create listener then feed to walker PropertyFileLoader loader = new PropertyFileLoader(); walker.walk(loader, tree); // walk parse tree System.out.println(loader.props); // print results } }
Interface ParseTreeListener is in the ANTLR runtime library and dictates that every listener respond to events visitTerminal(), enterEveryRule(), exitEveryRule(), and (upon syntax errors) visitErrorNode(). ANTLR generates interface PropertyFileListener from grammar PropertyFile and default implementations for all methods in class PropertyFileBaseListener. The only thing that we’re building is the PropertyFileLoader, which inherits all of the blank functionality from PropertyFileBaseListener. Method exitProp() has access to the rule context object, PropContext, associated with rule prop. That context object has methods for each of the elements mentioned in rule prop (ID and STRING). Because those elements are token ref- erences in the grammar, the methods return TerminalNode parse-tree nodes. We can either directly access the text of the token payload via getText(), as we’ve done here, or get the Token payload first via getSymbol().
$ antlr4 PropertyFile.g4 $ ls PropertyFile*.java PropertyFileBaseListener.java PropertyFileListener.java PropertyFileLexer.java PropertyFileParser.java $ javac TestPropertyFile.java PropertyFile*.java $ cat t.properties user="parrt" machine="maniac" $ java TestPropertyFile t.properties {user="parrt", machine="maniac"}
Our test program successfully reconstitutes the property assignments from the file into a map data structure in memory.
Visitor
To use a visitor instead of a listener, we ask ANTLR to generate a visitor interface, implement that interface, and then create a test rig that calls visit() on the parse tree.
When we use the -visitor option on the command line, ANTLR generates interface PropertyFileVisitor and class PropertyFileBaseVisitor, which has the following default implementations:
public class PropertyFileBaseVisitor<T> extends AbstractParseTreeVisitor<T> implements PropertyFileVisitor<T> { @Override public T visitFile(PropertyFileParser.FileContext ctx) { ... } @Override public T visitProp(PropertyFileParser.PropContext ctx) { ... } }
TestPropertyFileVisitor.java import org.antlr.v4.misc.OrderedHashMap; import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.*; import java.io.*; import java.util.Map;
public class TestPropertyFileVisitor { public static class PropertyFileVisitor extends PropertyFileBaseVisitor<Void> { Map<String,String> props = new OrderedHashMap<String, String>(); public Void visitProp(PropertyFileParser.PropContext ctx) { String id = ctx.ID().getText(); // prop : ID '=' STRING '\n' ; String value = ctx.STRING().getText(); props.put(id, value); return null; // Java says must return something even when Void } }
public static void main(String[] args) throws Exception { String inputFile = null; if ( args.length>0 ) inputFile = args[0]; InputStream is = System.in; if ( inputFile!=null ) { is = new FileInputStream(inputFile); } ANTLRInputStream input = new ANTLRInputStream(is); PropertyFileLexer lexer = new PropertyFileLexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); PropertyFileParser parser = new PropertyFileParser(tokens); ParseTree tree = parser.file();
PropertyFileVisitor loader = new PropertyFileVisitor(); loader.visit(tree); System.out.println(loader.props); // print results } }
Visitors walk parse trees by explicitly calling interface ParseTreeVisitor’s visit() method on child nodes. That method is implemented in AbstractParseTreeVisitor. In this case, the nodes created for prop invocations don’t have children, so visitProp() doesn’t have to call visit().
The biggest difference between a visitor and listener test rig (such as TestPropertyFile) is that visitor test rigs don’t need a ParseTreeWalker. They just ask the visitor to visit the tree created by the parser.
Questions about Proposal Writing
1.Do I need to include all the citations that Xiaoran used in his paper? No, only 6,4,26,27,quorum website, stefik’s paper, probably 12 or 13 in total.
2.Cited Xiaoran's paper, hope that is alright.
3.What do I need to cite in reference to Quorum?Any paper, their website,etc?
4.Cited ANTLR4 book, hope that is okay.
5.To describe the project outline, can I include a sample Quorum code snippet and explain the corresponding feature vector identified for that loop?
6.Do I include the challenges and motivation with respect to new environment and how to deal with that?
7.What else do I need to add? Should I add the feature vector table from Xiaoran's paper, or should I explain little about the work related to Quorum compilers?
Margin should be 1 inch on left and right -done
Change verbs in project steps - done
after first line cite quorum website, say how many blind prog are there(to be found somewhere in his website), refer to his paper.-done
at the end, before the reference, present a summary of results , how it would help programmers,etc-done
Sent for 2nd review to Dr. Pollock.
Added Sakai site for PhD Prelims.
Application on Quorum:
To print out the parse tree of a sample loop-if(fed in command line) in Quorum using a java program
TestQuorum.java // import ANTLR's runtime libraries import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.*; public class TestQuorum { public static void main(String[] args) throws Exception { // create a CharStream that reads from standard input ANTLRInputStream input = new ANTLRInputStream(System.in); // create a lexer that feeds off of input CharStream QuorumLexer lexer = new QuorumLexer(input); // create a buffer of tokens pulled from the lexer CommonTokenStream tokens = new CommonTokenStream(lexer); // create a parser that feeds off the tokens buffer QuorumParser parser = new QuorumParser(tokens); ParseTree tree = parser.start(); // begin parsing at start rule //ParseTree tree = parser.statement(); //begin parsing at statement rule System.out.println(tree.toStringTree(parser)); // print LISP-style tree } }
With parser.start() Preethas-MacBook-Pro:Antlr preethac$ java TestQuorum repeat while index < 256 text name = ToText(index) if name not= undefined keyNames:put(index, index) end end (start (class_declaration (no_class_stmnts (statement (loop_statement repeat while (expression (expression (action_call index)) < (expression 256)) (block (statement (assignment_statement (assignment_declaration text) name = (expression (action_call ToText ( (function_expression_list (expression (action_call index))) ))))) (statement (if_statement if (expression (expression (action_call name)) not= (expression undefined)) (block (statement (solo_method_call keyNames : (solo_method_required_method_part put ( (function_expression_list (expression (action_call index)) , (expression (action_call index))) ))))) end))) end)))) <EOF>) with parser.statement() Preethas-MacBook-Pro:Antlr preethac$ java TestQuorum repeat while index < 256 text name = ToText(index) if name not= undefined keyNames:put(index, index) end end (statement (loop_statement repeat while (expression (expression (action_call index)) < (expression 256)) (block (statement (assignment_statement (assignment_declaration text) name = (expression (action_call ToText ( (function_expression_list (expression (action_call index))) ))))) (statement (if_statement if (expression (expression (action_call name)) not= (expression undefined)) (block (statement (solo_method_call keyNames : (solo_method_required_method_part put ( (function_expression_list (expression (action_call index)) , (expression (action_call index))) ))))) end))) end))
Implementing listener and walking a parse tree on Quorum print statement: output “Hello”
import org.antlr.v4.misc.OrderedHashMap; import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.*; import java.io.*; import java.util.Map; public class TestQuorumFile { public static class QuorumFileLoader extends QuorumBaseListener { //Map<String,String> props = new OrderedHashMap<String, String>(); public String value; public void exitPrint_statement(QuorumParser.Print_statementContext ctx) { //String id = ctx.getText(); //String value = ctx.getText(); //props.put(id, value); value = ctx.getText(); } } public static void main(String[] args) throws Exception { String inputFile = null; if ( args.length>0 ) inputFile = args[0]; InputStream is = System.in; if ( inputFile!=null ) { is = new FileInputStream(inputFile); } ANTLRInputStream input = new ANTLRInputStream(is); QuorumLexer lexer = new QuorumLexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); QuorumParser parser = new QuorumParser(tokens); //ParseTree tree = parser.file(); ParseTree tree = parser.start(); // begin parsing at start rule System.out.println("\nParse tree is:\n"+tree.toStringTree(parser)); // create a standard ANTLR parse tree walker ParseTreeWalker walker = new ParseTreeWalker(); // create listener then feed to walker QuorumFileLoader loader = new QuorumFileLoader(); walker.walk(loader, tree); // walk parse tree //System.out.println(loader.props); // print results System.out.println("\nInput is: \n"+loader.value); // print results } }
Preethas-MacBook-Pro:Antlr preethac$ antlr4 Quorum.g4 Preethas-MacBook-Pro:Antlr preethac$ ls Quorum*.java QuorumBaseListener.java QuorumLexer.java QuorumListener.java QuorumParser.java Preethas-MacBook-Pro:Antlr preethac$ javac TestQuorumFile.java Quorum*.java Preethas-MacBook-Pro:Antlr preethac$ cat q.properties output "Hello" Preethas-MacBook-Pro:Antlr preethac$ java TestQuorumFile q.properties Parse tree is: (start (class_declaration (no_class_stmnts (statement (print_statement output (expression "Hello"))))) <EOF>) Input is: output"Hello"
Implementing listener and walking a parse tree on Quorum loop-if statement:
import org.antlr.v4.misc.OrderedHashMap; import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.*; import java.io.*; import java.util.Map; public class TestQuorumFileNew { public static class QuorumFileLoader extends QuorumBaseListener { // Map<String,String> props = new OrderedHashMap<String, String>(); public String value1; public String value2; public String value3,value4,value5,value6; public void exitPrint_statement(QuorumParser.Print_statementContext ctx) { // String id = ctx.getText(); // String value = ctx.getText(); // props.put(id, value); value1 = ctx.getText(); } public void exitIf_statement(QuorumParser.If_statementContext ctx) { value2 = ctx.getText(); value5 = ctx.block().getText(); System.out.println("\nParent is:" +ctx.getParent());//not sure what the output means System.out.println("\nChild of if statement is:" +ctx.getChild(0)); //System.out.println("Object of if statement is:"+ctx.value()); //gives error //System.out.println("Object of if statement is:"+ctx.object()); //gives error } public void exitLoop_statement(QuorumParser.Loop_statementContext ctx) { value3 = ctx.getText(); value4 = ctx.expression().getText(); value6 = ctx.block().getText(); System.out.println("\nChild of loop statement is:" +ctx.getChild(4)); } public void visitTerminal(TerminalNode node) { Token symbol = node.getSymbol(); // System.out.println(node); if ( symbol.getType()==QuorumParser.IF ) { System.out.println("\nTerminal node is: "+symbol); } // { // stack.push( Integer.valueOf(symbol.getText()) ); // } } } public static void main(String[] args) throws Exception { String inputFile = null; if ( args.length>0 ) inputFile = args[0]; InputStream is = System.in; if ( inputFile!=null ) { is = new FileInputStream(inputFile); } ANTLRInputStream input = new ANTLRInputStream(is); QuorumLexer lexer = new QuorumLexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); QuorumParser parser = new QuorumParser(tokens); ParseTree tree = parser.start(); // begin parsing at start rule System.out.println("\nParse tree is:\n"+tree.toStringTree(parser)); // create a standard ANTLR parse tree walker ParseTreeWalker walker = new ParseTreeWalker(); // create listener then feed to walker QuorumFileLoader loader = new QuorumFileLoader(); walker.walk(loader, tree); // walk parse tree System.out.println("\nInside Print Statement: \n"+loader.value1); // print results System.out.println("\nInside If Statement: \n"+loader.value2); // print results System.out.println("\nInside Loop Statement: \n"+loader.value3); // print results System.out.println("\nBlock Inside Loop Statement: \n"+loader.value6); // print results } }
Preethas-MacBook-Pro:Antlr preethac$ javac TestQuorumFileNew.java Quorum*.java Preethas-MacBook-Pro:Antlr preethac$ java TestQuorumFileNew qnew.properties Parse tree is: (start (class_declaration (no_class_stmnts (statement (loop_statement repeat while (expression (expression (action_call index)) < (expression 256)) (block (statement (assignment_statement (assignment_declaration text) name = (expression (action_call ToText ( (function_expression_list (expression (action_call index))) ))))) (statement (if_statement if (expression (expression (action_call name)) not= (expression undefined)) (block (statement (solo_method_call keyNames : (solo_method_required_method_part put ( (function_expression_list (expression (action_call index)) , (expression (action_call index))) ))))) end))) end)))) <EOF>) Terminal node is: [@22,71:72='if',<58>,3:8] Parent is:[212 425 221 123 120 96] Child of if statement is:if Child of loop statement is:end Inside Print Statement: null Inside If Statement: ifnamenot=undefinedkeyNames:put(index,index)end Inside Loop Statement: repeatwhileindex<256textname=ToText(index)ifnamenot=undefinedkeyNames:put(index,index)endend Block Inside Loop Statement: textname=ToText(index)ifnamenot=undefinedkeyNames:put(index,index)end
Update on Aug 18,2015:
Code (to generate feature vectors and identify action from a Quorum sample code) is ready.
Next step is to do further testing on the code for accuracy - Completed
Results - Testing of tool on identified Quorum loop-ifs:
- Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:4, F6:0, F7:0, F8:2
- Action Identified: find
- Feature Vector: F1:5, F2:1, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: get
- Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: could not be identified(no entry in action identification table)
- Feature Vector: F1:5, F2:ResolveAllTypes(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: not identified as method name not listed in action identification table
- Feature Vector: F1:4, F2:ResolveClass(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: Not identified as method name not listed in action identification table
- Feature Vector: 1,0,0,2,0,0,0,2 Action Identified: not identified
- Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: Not identified(no entry in action identification table)
- Feature Vector: F1:5, F2:VisitLocalVariable(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: Not identified(as method name not listed in action identification model)
- Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: Not identified(no entry in action identification table)
- Feature Vector: F1:1, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:2
- Action Identified: Not identified (no entry in action identification table)
- Multiple ifs, loop discarded.
- Feature Vector: F1:5, F2:1, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: get
- Feature Vector: F1:5, F2:AssignBytecodeLocations(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: not identified
- Feature Vector: F1:5, F2:1, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: get
- Feature Vector: F1:5, F2:1, F3:0, F4:2, F5:0, F6:0, F7:0, F8:1 (as we consider only the if loop and it's last statement)
- Action Identified: get
- Feature Vector: F1:5, F2:VisitEnd(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: cannot be determined as method name not listed in action identification table
- Feature Vector: F1:4, F2:WriteParentActionBytecode(), F3:0, F4:2, F5:0, F6:1, F7:0, F8:2
- Action Identified: cannot be determined as method name not listed in action identification table
- Feature Vector: F1:5, F2:VisitEnd(), F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: cannot be determined as method name not listed in action identification table
- Feature Vector: F1:5, F2: undefined, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: cannot be determined
- Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:0, F6:0, F7:0, F8:2
- Action Identified: not identified as corresponding entry not found in action identification table
- Feature Vector: F1:1, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:1
- Action Identified: not identified
- Feature Vector: F1:5, F2:1, F3:0, F4:0, F5:0, F6:0, F7:0, F8:2
- Action Identified: Not identified
- Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
- Action Identified: determine
- Feature Vector: F1:4, F2:5, F3:0, F4:1, F5:3, F6:0, F7:0, F8:2
- Action Identified: could not be identified
- Feature Vector: F1:3, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:2
- Action Identified: could not be identified
- Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:4, F6:1, F7:0, F8:2
- Action Identified: find
- Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:1, F7:0, F8:2
- Action Identified: determine
- Feature Vector: F1:6, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:2
- Action Identified: determine
- Feature Vector: F1:6, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:2
- Action Identified: determine
- Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
- Action Identified: determine
- (a) Feature Vector: F1:4, F2:5, F3:0, F4:1, F5:3, F6:0, F7:0, F8:2
- Action Identified: not identified (b) Feature Vector: F1:6, F2:0, F3:0, F4:0, F5:0, F6:0, F7:0, F8:2 Action Identified: determine
- Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
- Action Identified: determine
- Feature Vector: F1:1, F2:0, F3:0, F4:2, F5:4, F6:0, F7:0, F8:2
- Action Identified: find
- Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
- Action Identified: determine
- Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:3, F6:0, F7:0, F8:2
- Action Identified: determine
- Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:4, F6:0, F7:0, F8:2
- Action Identified: find.
- Feature Vector: F1:0, F2:0, F3:0, F4:0, F5:4, F6:0, F7:0, F8:2
- Action Identified: find
- Feature Vector: F1:5, F2:3, F3:0, F4:1, F5:0, F6:0, F7:0, F8:2
- Action Identified: not identified
Outline of prelims report
1. Introduction
2. Contributions
3. “Developing of Model of Loop Actions”
Cite Xiaoran's paper, Summary of that paper, can use Fig:1 from that paper and the limitations of the model
4. Quorum Language
Introduction, Loop Structures
5. Java to Quorum(Something like that)
In general how to do
Challenges in doing this for Quorum
Research Process/Methodology - Examples, Algorithms, Pictures, ANTLR-how used, parse trees, what changes for each feature vector-syntactic or semantic
6. Results- Accuracy, evaluation by Xiaoran(author of that paper)
7. Discussion - Implications of results, how general is the process, how easy to apply to other languages.
8. Threats to Validity(My limitations of this project)
Less no. of Quorum programs
Less variety of loops
Code form developers of Quorum is used, other programmers' code not available yet.
To minimize the threats:-
a. Collected as many Quorum loops as possible.
b. Real code examples are used, and not sample ones.
c.Could be a future work- to check how our tool works on unseen code from other programmers.
As per department instructions - The written report should not exceed 20 pages in length, and should be structured as a scientific publication, including: Title, Abstract, Introduction, Related Work, appropriate sections that describe the work conducted, the methods used and the results obtained, followed by Analysis/Conclusions and Future Work. The report must also contain a bibliography which is not included in the 20 pages length limit. The first page of the report must include the student name, the title of the work, and the names of the advisor and the committee member.
Tested the tool on standard library test files (This is separate from the initial set of loop-ifs used for the project)
Out of identified 13 loop-ifs, actions were identified for 3 loop-ifs- all three were MAX/MIN.
For rest of the 10 loop-ifs, no actions were identified.
Details of loop code and feature vector values attached.testing_the_tool_on_standard_library_test_files.pdf