This is an old revision of the document!


Overview

The research in the Software Analysis and Compilation Lab at University of Delaware is funded through research grants and student fellowships. Throughout the years, funding has been provided by:

  • NSF - The National Science Foundation
  • CTA - Army Research Lab Collaborative Technology Alliance
  • ARL - The Army Research Laboratory
  • CRA-W - The Computing Research Association's Committee on the Status of Women in Computing Distributed Mentoring Program (DMP, DREU)

Summaries of Current Research Grants

NSF 0702401 Applying and Integrating Natural Language Processing Analysis of Programs to Aid in Software Maintenance and Evolution

Due to the unprecedented size and complexity of modern software and increased code reuse, between 60-90% of resources are devoted to modifying an application to meet new requirements and to fix discovered bugs. To locate bugs or modify an application, developers must identify the high-level idea, or concept, to be changed and then locate, comprehend, and carefully modify the concept s concern, or implementation, in the code. Software engineers increasingly rely on available software tools to automate maintenance tasks as much as possible; however, despite all of the available automated support, recent studies have shown that more development time is spent reading, locating, and comprehending code than actually writing code.

We believe that software maintenance tools can be significantly improved by adapting natural language processing (NLP) to source code analysis, and integrating information retrieval (IR), NLP and traditional program analysis techniques to manage program complexity. Our research focuses on exploiting the natural language information that is embedded in the identifiers, literals, comments, and bug reports of a pro- gram in order to develop analyses and integrated tools to assist software maintenance, including program understanding, debugging, and aspect mining.

At the University of Delaware, we have successfully developed a natural language-based program repre- sentation and used this representation as the basis for analysis driving a concern location and understanding tool. In an experimental study comparing our NLP-based approach with a state-of-the-art lexical search tool and a commercial IR tool across nine concern location tasks derived from open source bug reports, our tool produced more e ective queries more consistently than either competing search tool with similar user e ort. However, we now believe that NLP analysis can be used even more e ectively if combined with IR techniques and program analysis. Based on the frequency of occurrences of words in files, IR techniques have been applied to locate concepts and reconstruct documentation traceability in source code. However, IR methods do not extract or consider crucial information regarding relationships between terms that NLP analysis can provide. Program navigation through structural program links has been shown to perform well at refining a set of modules in a mostly discovered concern, but has di culties in discovering an entirely new concern. Therefore, our observations raised two important questions: By improving the extraction of natural language information from source code and integrating NLP with IR and traditional program analysis, (1) how much improvement ultimately can be made in search and navigation tools, & (2) how can this integration be adapted to improve other software maintenance and evolution tools?

Intellectual Merit. We will contribute to the state of the art by developing analyses and integrated tools for easing program maintenance and evolution tasks by applying NLP to source code analysis and integrating NLP, IR, and traditional program analysis. We will focus on the set of tools that share a common challenge of identifying scattered code segments that are somehow related. The research community agrees that object-oriented programming, which can be viewed as noun-oriented, causes certain concerns of interest during program maintenance to become scattered, and we argue that many of these scattered concerns are action-oriented because of the natural tension between objects and actions. The major challenge is to extract useful clues about relationships among scattered code segments, beyond the capability of traditional static and dynamic program analysis. Our previous work in NLP analysis over source code demonstrates how action-oriented concerns can be located with natural language clues extracted from the code. The intellectual contributions of this research will be:

  • advanced techniques for comprehensive extraction of natural language information from source code,
  • a novel program model exposing NLP-based & IR-based relationships between code segments of a program,
  • automatic techniques that integrate NLP, IR, and traditional program analyses specialized to concern location, bug localization, and aspect mining.
  • the application of machine learning to account for multiple features in making aspect mining decisions.
  • evaluation including user studies involving professional programmers at a local R&D company.

Broader Impacts. The proposed research will lead to the advancement of our theory and development of practical tools that provide automatic or semiautomatic assistance in software development and maintanence. This, in turn, should decrease maintenance time and help to increase the quality of software. The results, collection of concern benchmarks, and developed tools will be disseminated through publications and a website. A new unit on code maintenance using our techniques will be developed for the undergraduate Java course. Open-ended projects with our tools also will be incorporated into the software engineering course. Undergraduates will be trained in research through Science and Engineer Scholars, Honor s theses, independent studies, summer research, and the CRA Distributed Mentor project for women, all of which Pollock has a long track record of participation.

funding.1246448643.txt.gz · Last modified: 2009/07/01 07:44 by pollock
  • 213 Smith Hall   •   Computer & Information Sciences   •   Newark, DE 19716  •   USA
    Phone: 302-831-6339  •   Fax: 302-831-8458