This is an old revision of the document!


NLPA: Natural Language Program Analysis

Throughout the life cycle of an application, between 60-90% of resources are devoted to modifying the application to meet new requirements and to fix faults [1]. Building effective software tools is important to reduce these high maintenance costs. In our research, we have observed strong indicators that there are many natural language clues in program literals, identifiers, and comments that could be leveraged to increase the effectiveness of many software tools.

Our research group has been investigating how to best extract and utilize natural language clues from code. We call this kind of analysis, Natural Language Program Analysis (NLPA), since it combines natural language processing techniques with traditional program analysis to extract natural language information from the identifiers, literals, and comments of a program. Using NLPA, we have developed techniques and integrated tools that assist in performing software maintenance tasks, including program understanding, navigation, debugging, and aspect mining.

Thus far, we have focused on NLPA tools that identify scattered code segments that are somehow related: whether it be to search through code to understand a particular concern implementation, to mine aspects, or to isolate the location of a bug. Our existing NLPA tools combine program structure information such as calling relationships and code clone analysis with the natural language of comments, identifiers, and maintenance requests. Although we have only begun to explore the potential of NLPA, our various experimental results motivate further investigation of NLPA for software tools.

We believe that NLPA can be used to (a) increase the accuracy of software search tools by providing a natural language description of program artifacts to search, (b) increase the ability of program navigation tools to recommend related procedures by providing natural language clues, and © increase the accuracy of other program analyses by providing access to natural language information.

References
[1] L. Erlikh. Leveraging legacy system dollars for e-business. IT Professional, 2(3):17-23, 2000.

Selected Publications

Emily Hill, Lori Pollock, and K. Vijay-Shanker. “Exploring the Neighborhood with Dora to Expedite Software Maintenance”, International Conference on Automated Software Engineering (ASE 2007), November 2007.

Lori Pollock, K. Vijay-Shanker, David Shepherd, Emily Hill, Zachary P. Fry, and Kishen Maloor, “Introducing Natural Language Program Analysis”, Research Group Presentation at the Workshop on Program Analysis for Software Tools and Engineering (PASTE 2007), June 2007.

David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K. Vijay-Shanker, “Using Natural Language Program Analysis to Locate and Understand Action-Oriented Concerns”, International Conference on Aspect Oriented Software Development (AOSD 2007), March 2007.

David Shepherd, Lori Pollock, and K. Vijay-Shanker, “Towards Supporting On-Demand Virtual Remodularization Using Program Graphs”, International Conference on Aspect Oriented Software Development (AOSD 2006), March 2006.

All NLPA Publications

Contributors

projects/nlpa.1246887633.txt.gz · Last modified: 2009/07/06 09:40 by pollock
  • 213 Smith Hall   •   Computer & Information Sciences   •   Newark, DE 19716  •   USA
    Phone: 302-831-6339  •   Fax: 302-831-8458