Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
resarch:nlpa:preetha [2016/08/31 10:07] pollock |
resarch:nlpa:preetha [2017/02/16 14:27] (current) preethac |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Preetha's Page ====== | ====== Preetha's Page ====== | ||
| - | ====== Notes from Fall 2016 Meetings ====== | + | ** |
| - | + | Spring 2017:** | |
| - | Work on code example mining from research articles: | + | |
| - | + | ||
| - | To use for mining code segment comments/descriptions: | + | |
| - | + | ||
| - | Could pull out sentences related to the code segments in the articles. | + | |
| - | Look for sentences that start with the subject being a method name, by performing chunking to get the subject phrases, verb phrases (just partial parsing). Could use the Stanford parsing because regular text sentences, not code. | + | |
| - | + | ||
| - | Change the dictionary creation from code segments to also estimate role of the name in the code. Are they method names, variable names, etc? | + | |
| - | + | ||
| - | Look at the verbs that are present or future tense. These are probably describing the code above. | + | |
| - | + | ||
| - | To use for all buggy code examples and the kinds of bugs: | + | |
| - | + | ||
| - | Need to figure out which sentences tell you the bug. | + | |
| - | + | ||
| - | Work on identifying definitions from research articles: | + | |
| - | + | ||
| - | From Vijay's bio lit work: | + | |
| - | * googlism - who, what, where, when - is a relation | + | |
| - | * find what is most common | + | |
| - | * from genes, looked for 'is a' in bio literature | + | |
| - | * Look at Marti Hearst - 1992 first work on extracting definitions, etc | + | |
| - | * Now, today people are trying word embeddings to get relations and compare to her approach | + | |
| - | + | ||
| - | Contributions: (A Miner of Definitions from Research Articles) | + | |
| - | * Apply this existing tool for "is a" to research articles in a subfield to find terms and their definitions, and where first defined. | + | |
| - | * Potential users - dictionary for software engineering research and nl tools | + | |
| - | * Can find tools and what used for | + | |
| - | Approach: | + | [[http://hiper.cis.udel.edu/udsacl/doku.php/research/nlpa/preethadissertation|Dissertation]] |
| - | * googlism approach - generalize beyond looking for 'is a' | + | |
| - | * Read Marti Hearst's paper to get ideas | + | |
| - | * identify a set of key phrases, positives,. etc | + | |
| - | * Samir Gupta has code to do this. Apply his code to a set of icse papers and see what you get. | + | |
| - | * "such as",.. "including a, b, and c" tells me about a, b and c | + | |
| - | * Need to show it can be done and it is scalable to millions | + | |
| - | * Goal: make it scalable, not every sentence of every paper. | + | |
| - | * Start with 'is a', look for those sentences. where do they come from? | + | |
| ** | ** | ||
| - | Spring/Summer 2016:** | + | Spring/Summer/Fall 2016:** |
| Identifying/Characterizing Facts and Advice From Mixed Text-Code Artifacts: | Identifying/Characterizing Facts and Advice From Mixed Text-Code Artifacts: | ||