This is an old revision of the document!
Preetha's Page
Notes from Fall 2016 Meetings
Work on code example mining from research articles:
To use for mining code segment comments/descriptions:
Could pull out sentences related to the code segments in the articles. Look for sentences that start with the subject being a method name, by performing chunking to get the subject phrases, verb phrases (just partial parsing). Could use the Stanford parsing because regular text sentences, not code.
Change the dictionary creation from code segments to also estimate role of the name in the code. Are they method names, variable names, etc?
Look at the verbs that are present or future tense. These are probably describing the code above.
To use for all buggy code examples and the kinds of bugs:
Need to figure out which sentences tell you the bug.
Work on identifying definitions from research articles:
From Vijay's bio lit work:
- googlism - who, what, where, when - is a relation
- find what is most common
- from genes, looked for 'is a' in bio literature
- Look at Marti Hearst - 1992 first work on extracting definitions, etc
- Now, today people are trying word embeddings to get relations and compare to her approach
Contributions: (A Miner of Definitions from Research Articles)
- Apply this existing tool for “is a” to research articles in a subfield to find terms and their definitions, and where first defined.
- Potential users - dictionary for software engineering research and nl tools
- Can find tools and what used for
Approach:
- googlism approach - generalize beyond looking for 'is a'
- Read Marti Hearst's paper to get ideas
- identify a set of key phrases, positives,. etc
- Samir Gupta has code to do this. Apply his code to a set of icse papers and see what you get.
- “such as”,.. “including a, b, and c” tells me about a, b and c
- Need to show it can be done and it is scalable to millions
- Goal: make it scalable, not every sentence of every paper.
- Start with 'is a', look for those sentences. where do they come from?
Spring/Summer 2016:
Identifying/Characterizing Facts and Advice From Mixed Text-Code Artifacts:
Fall 2015:
Summer 2015 Preliminary Research Project
Spring 2015:
Literature/Bib on Text Analysis of Software Engineering
Fall 2014:
- Summaries of papers read: