Extracting Problematic API Features from Forum Discussions
Yingying Zhang
Daqing Hou
Problem: identify problematic API design features automatically
Importance/Applications of the technique:
Enable speed reading of forums for problems Enable queries for search for problematic features Estimating API hot topics
Approach:
assumption is that hot topics in forums about a given API that are discussed frequently are
problematic API features
assumption is that problematic API features are discussed in negative sentences and its neighbors approach was to identify negative sentences, then extract features from negative sentence +2 and -1 sentences negative sentences are identified by sentiment140 to categorize sentences as negative, positive, neutral then use Stanford NLP tools to extract word phrases with API dictionary words in it create dictionary from SWING tutorial to get the specific API features - closed world assumption Developed a tool called Haystack
Evaluation:
measured precision against gold set from Swing api showed high precision did not measure recall
Critique
Strengths: important problem of extracting information about developer's issues with API usage use of sentiment analysis techniques that exist for identifying negative sentences interesting approach and categorization of sentences into several categories - maybe useful for us to automate
Weaknesses: first author created gold set and second author confirmed it precision was measured with an inappropriate match between gold set and extracted feature words in sentence sets
thus the high precision numbers reported are not really true
the premise and title that these kinds of things they are extracting are problematic API features is not really accurate and should be relabeled as things such as how-tos, problems, etc