Differences

This shows you the differences between two versions of the page.

--- resarch:nlpa:paper_2 [2014/09/18 18:22]
preethac created
+++ resarch:nlpa:paper_2 [2014/09/25 16:10] (current)
preethac
@@ Line 4: / Line 4: @@
 [[http://serl.clarkson.edu/site/wp-content/uploads/2013/04/icpc2013.pdf|Link]]
+**Problem:** identify problematic API design features automatically
+**Importance/Applications of the technique:**
+    Enable speed reading of forums for problems
+    Enable queries for search for problematic features
+    Estimating API hot topics
+**Approach:**
+    assumption is that hot topics in forums about a given API that are discussed frequently are
+**problematic API features**
+    assumption is that problematic API features are discussed in negative sentences and its neighbors
+    approach was to identify negative sentences, then extract features from negative sentence +2 and -1 sentences
+    negative sentences are identified by sentiment140 to categorize sentences as negative, positive, neutral
+    then use Stanford NLP tools to extract word phrases with API dictionary words in it
+    create dictionary from SWING tutorial to get the specific API features - closed world assumption
+    Developed a tool called Haystack
+**Evaluation:**
+    measured precision against gold set from Swing api
+    showed high precision
+    did not measure recall
+**
+Critique**
+    Strengths:
+        important problem of extracting information about developer's issues with API usage
+        use of sentiment analysis techniques that exist for identifying negative sentences
+        interesting approach and categorization of sentences into several categories - maybe useful for us to automate
+    Weaknesses:
+    first author created gold set and second author confirmed it
+    precision was measured with an inappropriate match between gold set and extracted feature words in sentence sets
+thus the high precision numbers reported are not really true
+    the premise and title that these kinds of things they are extracting are problematic API features is not really accurate and should be relabeled as things such as how-tos, problems, etc