Useful Links

Extracting Problematic API Features from Forum Discussions
Yingying Zhang
Daqing Hou

Problem: identify problematic API design features automatically

Importance/Applications of the technique:

  Enable speed reading of forums for problems
  Enable queries for search for problematic features
  Estimating API hot topics

Approach:

  assumption is that hot topics in forums about a given API that are discussed frequently are

problematic API features

  assumption is that problematic API features are discussed in negative sentences and its neighbors
  approach was to identify negative sentences, then extract features from negative sentence +2 and -1 sentences
  negative sentences are identified by sentiment140 to categorize sentences as negative, positive, neutral
  then use Stanford NLP tools to extract word phrases with API dictionary words in it
  create dictionary from SWING tutorial to get the specific API features - closed world assumption
  Developed a tool called Haystack

Evaluation:

  measured precision against gold set from Swing api
  showed high precision
  did not measure recall

Critique

  Strengths:
      important problem of extracting information about developer's issues with API usage
      use of sentiment analysis techniques that exist for identifying negative sentences
      interesting approach and categorization of sentences into several categories - maybe useful for us to automate

  Weaknesses:
  first author created gold set and second author confirmed it
  precision was measured with an inappropriate match between gold set and extracted feature words in sentence sets

thus the high precision numbers reported are not really true

  the premise and title that these kinds of things they are extracting are problematic API features is not really accurate and should be relabeled as things such as how-tos, problems, etc