Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
resarch:nlpa:paper_4 [2014/10/30 12:40] preethac |
resarch:nlpa:paper_4 [2014/10/30 15:11] (current) preethac |
||
---|---|---|---|
Line 9: | Line 9: | ||
**Approach:** | **Approach:** | ||
- | \\ Classify emails containing source code | + | \\ __Classify emails containing source code__ - by |
- | \\ Extract the source code pieces inside the emails. | + | \\ 1. No. of occurrences of Java keywords/special characters |
+ | \\ 2. End of line (ends with semicolon) | ||
+ | \\ 3. Check on method call pattern using regular expressions. | ||
+ | \\ 4. Beginning of block. | ||
+ | \\ __Extract the source code pieces inside the emails.__ | ||
**Previous work:** | **Previous work:** | ||
Line 32: | Line 36: | ||
\\ d) Email header and body | \\ d) Email header and body | ||
\\ e) Annotated code fragments | \\ e) Annotated code fragments | ||
+ | |||
+ | |||
+ | ** | ||
+ | \\ Evaluation:** | ||
+ | Precision= fraction of retrieved lines that contain code | ||
+ | \\ Recall: Fraction of correct lines retrieved. | ||
+ | \\ Instead of emphasizing either P or R, they use a beta value for weighting of precision and recall --> which I find is really logical. | ||
+ | \\ Assess effectiveness of approach they also use Levenshtein distance(edit distance function) outputs the min no. of changes(lines) between the text labeled as source code in benchmark and the extracted fragments. | ||
+ | |||
+ | ** | ||
+ | Critique:** | ||
+ | \\ 1. Bad- They are only using no. of Java keywords occurences or Java special characters, which means this is restricted to emails with discussion on development of projects in Java. This also means they should already have a database containing the Java keywords/special characters. | ||
+ | \\ 2. Good- As P and R trade off against each other, they devise an approach in which by varying the threshold, they can obtain either perfect P or perfect R | ||
+ | \\ 3. Ques - How are they providing alternative views of the software system? They are just extracting source code pieces from developer emails. | ||
+ | |||