Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
resarch:nlpa:preetha [2016/08/31 10:07]
pollock
resarch:nlpa:preetha [2017/02/16 14:27] (current)
preethac
Line 1: Line 1:
 ====== Preetha'​s Page ====== ====== Preetha'​s Page ======
  
-====== Notes from Fall 2016 Meetings ====== +** 
- +Spring 2017:**
-Work on code example mining from research articles: +
- +
-To use for mining code segment comments/​descriptions:​ +
- +
-Could pull out sentences related to the code segments in the articles. +
-Look for sentences that start with the subject being a method name, by performing chunking to get the subject phrases, verb phrases (just partial parsing). ​ Could use the Stanford parsing because regular text sentences, not code. +
- +
-Change the dictionary creation from code segments to also estimate role of the name in the code. Are they method names, variable names, etc? +
- +
-Look at the verbs that are present or future tense. ​ These are probably describing the code above. +
- +
-To use for all buggy code examples and the kinds of bugs: +
- +
-Need to figure out which sentences tell you the bug. +
- +
-Work on identifying definitions from research articles: +
- +
-From Vijay'​s bio lit work: +
-  ​googlism - who, what, where, when - is a relation +
-  ​find what is most common +
-  * from genes, looked for 'is a' in bio literature +
-  * Look at Marti Hearst - 1992 first work on extracting definitions,​ etc +
-  * Now, today people are trying word embeddings to get relations and compare to her approach +
- +
-Contributions (A Miner of Definitions from Research Articles) +
-  ​Apply this existing tool for "is a" to research articles in a subfield to find terms and their definitions,​ and where first defined. +
-  * Potential users - dictionary for software engineering research and nl tools +
-  ​Can find tools and what used for+
  
-Approach +[[http://hiper.cis.udel.edu/​udsacl/​doku.php/​research/​nlpa/​preethadissertation|Dissertation]]
-  * googlism approach - generalize beyond looking for 'is a' +
-  * Read Marti Hearst'​s paper to get ideas +
-  * identify a set of key phrases, positives,etc +
-  * Samir Gupta has code to do this Apply his code to a set of icse papers and see what you get. +
-  * "such as",.. "​including a, b, and c" ​ tells me about a, b and c +
-  * Need to show it can be done and it is scalable to millions +
-  * Goal: make it scalable, not every sentence of every paper. +
-  * Start with 'is a', look for those sentences. where do they come from?+
  
 ** **
-Spring/​Summer 2016:**+Spring/​Summer/Fall 2016:**
  
 Identifying/​Characterizing Facts and Advice From Mixed Text-Code Artifacts: Identifying/​Characterizing Facts and Advice From Mixed Text-Code Artifacts:
resarch/nlpa/preetha.1472652445.txt.gz · Last modified: 2016/08/31 10:07 by pollock
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0