This is an old revision of the document!

Funding Overview

The research in the Software Analysis and Compilation Lab at University of Delaware is funded through research grants and student fellowships. Throughout the years, funding has been provided by:

Current Research Grants

NSF 0702401 Applying and Integrating Natural Language Processing Analysis of Programs to Aid in Software Maintenance and Evolution

PI: Lori Pollock; Co-PI: Vijay Shanker

Due to the unprecedented size and complexity of modern software and increased code reuse, between 60-90% of resources are devoted to modifying an application to meet new requirements and to fix discovered bugs. To locate bugs or modify an application, developers must identify the high-level idea, or concept, to be changed and then locate, comprehend, and carefully modify the concept s concern, or implementation, in the code. Software engineers increasingly rely on available software tools to automate maintenance tasks as much as possible; however, despite all of the available automated support, recent studies have shown that more development time is spent reading, locating, and comprehending code than actually writing code.

We believe that software maintenance tools can be significantly improved by adapting natural language processing (NLP) to source code analysis, and integrating information retrieval (IR), NLP and traditional program analysis techniques to manage program complexity. Our research focuses on exploiting the natural language information that is embedded in the identifiers, literals, comments, and bug reports of a pro- gram in order to develop analyses and integrated tools to assist software maintenance, including program understanding, debugging, and aspect mining.

At the University of Delaware, we have successfully developed a natural language-based program repre- sentation and used this representation as the basis for analysis driving a concern location and understanding tool. In an experimental study comparing our NLP-based approach with a state-of-the-art lexical search tool and a commercial IR tool across nine concern location tasks derived from open source bug reports, our tool produced more e ective queries more consistently than either competing search tool with similar user e ort. However, we now believe that NLP analysis can be used even more e ectively if combined with IR techniques and program analysis. Based on the frequency of occurrences of words in files, IR techniques have been applied to locate concepts and reconstruct documentation traceability in source code. However, IR methods do not extract or consider crucial information regarding relationships between terms that NLP analysis can provide. Program navigation through structural program links has been shown to perform well at refining a set of modules in a mostly discovered concern, but has di culties in discovering an entirely new concern. Therefore, our observations raised two important questions: By improving the extraction of natural language information from source code and integrating NLP with IR and traditional program analysis, (1) how much improvement ultimately can be made in search and navigation tools, & (2) how can this integration be adapted to improve other software maintenance and evolution tools?

Intellectual Merit. We will contribute to the state of the art by developing analyses and integrated tools for easing program maintenance and evolution tasks by applying NLP to source code analysis and integrating NLP, IR, and traditional program analysis. We will focus on the set of tools that share a common challenge of identifying scattered code segments that are somehow related. The research community agrees that object-oriented programming, which can be viewed as noun-oriented, causes certain concerns of interest during program maintenance to become scattered, and we argue that many of these scattered concerns are action-oriented because of the natural tension between objects and actions. The major challenge is to extract useful clues about relationships among scattered code segments, beyond the capability of traditional static and dynamic program analysis. Our previous work in NLP analysis over source code demonstrates how action-oriented concerns can be located with natural language clues extracted from the code. The intellectual contributions of this research will be:

  • advanced techniques for comprehensive extraction of natural language information from source code,
  • a novel program model exposing NLP-based & IR-based relationships between code segments of a program,
  • automatic techniques that integrate NLP, IR, and traditional program analyses specialized to concern location, bug localization, and aspect mining.
  • the application of machine learning to account for multiple features in making aspect mining decisions.
  • evaluation including user studies involving professional programmers at a local R&D company.

Broader Impacts. The proposed research will lead to the advancement of our theory and development of practical tools that provide automatic or semiautomatic assistance in software development and maintanence. This, in turn, should decrease maintenance time and help to increase the quality of software. The results, collection of concern benchmarks, and developed tools will be disseminated through publications and a website. A new unit on code maintenance using our techniques will be developed for the undergraduate Java course. Open-ended projects with our tools also will be incorporated into the software engineering course. Undergraduates will be trained in research through Science and Engineer Scholars, Honor s theses, independent studies, summer research, and the CRA Distributed Mentor project for women, all of which Pollock has a long track record of participation.

REU Supplements

03/05/2009 2 Undergraduates

11/06/2007 2 Undergraduates

NSF 0509170 CSR - AES: An Integrated Approach to Improving Communication Performance in Clusters

PI: Martin Swany; Co-PI: Lori Pollock

The project will develop an integrated aproach to improving communication performance in clusters. Cluster computing has become a common, cost-effective means of parallel computing. Although adding more CPUs increases the cluster's maximum processing power, real applications often can not efficiently use very large numbers of CPUs, due to lack of scalability. In regular codes the main impediment to achieving scalability is the communication overhead which increases as the number of CPUs increases. Most of these optimization methods proposed target specialized hardware or programming languages, and require specialized knowledge from the domain scientist, or are not enough to provide a comprehensive solution on their own, and do not adequately address the challenges of the layers of communication software between the sender processes and the receiver processes. Improved performance overall for these applications, it remains largely untapped due to (1) the need for the knowledge of the context of the communication operations to exploit the sophisticated network technology fully, and the (2) the low level nature of programming needed within the application program context to achieve that potential. In particular, performance can often be improved through increasing the use of lightweight asynchronous communication. Unfortunately, programming with asynchronous communication is difficult and error prone, even for the most experienced programmers.

The project will pursue a vertically integrated approach, where a set of optimizations in the compiler, network and operating system, can enable legacy parallel applications to scale to a much larger number of CPUs, even if written without any knowledge of our techniques. An experimental prototype and preliminary experiments with real scientific applications, show that significant performance improvements are possible with a vertically integrated approach where knowledge of the context of communication operations is joined with knowledge of the network and cluster details to provide a fine-grained strategy for overlapping communication and computation. Based on these initial promising results, the overall goal of this proposed research is to create a means for scalable cluster computing through enabling integrated knowledge and cooperation between the source optimizer, operating system, and network technology of the cluster, without relying on the programmer to learn about the low level details of the cluster communications system.

NSF 0720712 Collaborative: CSR-AES: System Support for Auto-tuning MPI Applications

PI: Martin Swany; CoPI: Lori Pollock

The AToMS (Automatic Tuning of MPI Software) project is investigating a software system that can automatically improve the performance of large-scale scientific applications. Scientific codes that demand more and more computing resources are critical to modern science, but too often scientists must spend time constructing programs that run fast at the expense of doing their primary research. As computers contain an increasing number of computing elements, the problem worsens. The goal of the AToMS project is to begin to address this issue by applying automatic application tuning.

An optimizing compiler transforms programs into sematically equivalent ones that perform better. When dealing with any complicated architecture it is difficult to know which transformations will improve performance. Auto-tuning takes the approach of trying many transformations and empirically evaluating the resulting versions. AToMS performs this auto-tuning with a combination of a static analysis based code transformation engine (called ASPhALT) and runtime support in the OpenMPI library. The combination of compile-time and run-time support allows for code restructuring to overlap computation and communication and the creation of optimized data-packing routines. In addition, code can be generated to take advantage of multicore processor architectures.

Intellectual Merit: The merit of the proposed pro ject is in gaining understanding about what is required to support automatically tunable MPI programs. Broader Impacts: This project will impact the high-performance and scientific computing community and users of parallel computers by making it easier to achieve good performance.

funding.1246537976.txt.gz · Last modified: 2009/07/02 08:32 by sprenkle
  • 213 Smith Hall   •   Computer & Information Sciences   •   Newark, DE 19716  •   USA
    Phone: 302-831-6339  •   Fax: 302-831-8458