**Extracting Problematic API Features from Forum Discussions**
\\ Yingying Zhang
\\ Daqing Hou

[[http://serl.clarkson.edu/site/wp-content/uploads/2013/04/icpc2013.pdf|Link]]

**Problem:** identify problematic API design features automatically

**Importance/Applications of the technique:**

    Enable speed reading of forums for problems
    Enable queries for search for problematic features
    Estimating API hot topics

**Approach:**

    assumption is that hot topics in forums about a given API that are discussed frequently are

**problematic API features**

    assumption is that problematic API features are discussed in negative sentences and its neighbors
    approach was to identify negative sentences, then extract features from negative sentence +2 and -1 sentences
    negative sentences are identified by sentiment140 to categorize sentences as negative, positive, neutral
    then use Stanford NLP tools to extract word phrases with API dictionary words in it
    create dictionary from SWING tutorial to get the specific API features - closed world assumption
    Developed a tool called Haystack

**Evaluation:**

    measured precision against gold set from Swing api
    showed high precision
    did not measure recall
**
Critique**

    Strengths:
        important problem of extracting information about developer's issues with API usage
        use of sentiment analysis techniques that exist for identifying negative sentences
        interesting approach and categorization of sentences into several categories - maybe useful for us to automate

    Weaknesses:
    first author created gold set and second author confirmed it
    precision was measured with an inappropriate match between gold set and extracted feature words in sentence sets

thus the high precision numbers reported are not really true

    the premise and title that these kinds of things they are extracting are problematic API features is not really accurate and should be relabeled as things such as how-tos, problems, etc