Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
resarch:nlpa:paper_4 [2014/10/30 11:24]
preethac
resarch:nlpa:paper_4 [2014/10/30 15:11] (current)
preethac
Line 5: Line 5:
  
 **Problem:​** Mine archived emails to support program comprehension activities and provide views of a software system that are alternative and complementary to those offered by the source code. **Problem:​** Mine archived emails to support program comprehension activities and provide views of a software system that are alternative and complementary to those offered by the source code.
 +
 +**\\ Importance:​** Programmers who need to know the design rationale behind an implementation have to communicate with other developers, as the information stored in different artifacts(eg:​ source code, design documents, bug reports, chat logs etc) emphasize different aspects of the system'​s evolution.
 +
 +**Approach:​**
 +\\ __Classify emails containing source code__ - by
 +\\ 1. No. of occurrences of Java keywords/​special characters
 +\\ 2. End of line (ends with semicolon)
 +\\ 3. Check on method call pattern using regular expressions.
 +\\ 4. Beginning of block.
 +\\ __Extract the source code pieces inside the emails.__
 +
 +**Previous work:**
 +Work by Bettenburg: Use of island parser to extract code snippets from bug reports, gave almost perfect results(P=0.98 R=0.99)
 +But using a parser for extracting source code from emails involves ​
 +\\ 1.high computational effort ​
 +\\ 2.scaling up to archives might be difficult
 +\\ 3. Mailing list as natural language documents more prone to noise.
 +
 +Hence, devised lightweight and easy to implement approaches.
 +
 ** **
-Importance:** Programmers who need to know the design rationale behind an implementation ​have to communicate with other developersas the information stored in different artifacts(eg: source code, design documentsbug reportschat logs etc) emphasize different aspects ​of the system's evolution.+\\ Benchmark:** 
 +5 different open source Java projects with different development paradigms.-->​ randomly picked emails and shows a table with percentage of emails containing code(done manually). 
 + 
 +**\\ Work done:** 
 +Developed a custom web application '​Miler'​ with 
 +\\ a) Systems- list of software systems loaded and to be analyzed 
 +\\ b) Mails- no. of emails that have been read 
 +\\ c) Retrieve any email by its id 
 +\\ d) Email header and body 
 +\\ e) Annotated code fragments 
 + 
 + 
 +** 
 +\\ Evaluation:​** 
 +Precision= fraction of retrieved lines that contain code 
 +\\ Recall: Fraction of correct lines retrieved. 
 +\\ Instead of emphasizing either P or Rthey use a beta value for weighting of precision and recall --> which I find is really logical. 
 +\\ Assess effectiveness of approach they also use Levenshtein distance(edit distance function) outputs ​the min no. of changes(lines) between the text labeled as source code in benchmark and the extracted fragments. 
 + 
 +** 
 +Critique:​** 
 +\\ 1. Bad- They are only using no. of Java keywords occurences or Java special characterswhich means this is restricted to emails with discussion on development of projects in Java. This also means they should already have a database containing the Java keywords/​special characters. 
 +\\ 2. Good- As P and R trade off against each otherthey devise an approach in which by varying the thresholdthey can obtain either perfect P or perfect R  
 +\\ 3. Ques - How are they providing alternative views of the software ​system? They are just extracting source code pieces from developer emails. 
 + 
 + 
 + 
resarch/nlpa/paper_4.1414682677.txt.gz · Last modified: 2014/10/30 11:24 by preethac
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0