This is an old revision of the document!
Extracting Source Code from E-Mails
Authors: Alberto Bacchelli, Marco D'Ambros, Michele Lanza
Problem: Mine archived emails to support program comprehension activities and provide views of a software system that are alternative and complementary to those offered by the source code.
Importance: Programmers who need to know the design rationale behind an implementation have to communicate with other developers, as the information stored in different artifacts(eg: source code, design documents, bug reports, chat logs etc) emphasize different aspects of the system's evolution.
Approach:
Classify emails containing source code
Extract the source code pieces inside the emails.
Previous work:
Work by Bettenburg: Use of island parser to extract code snippets from bug reports, gave almost perfect results(P=0.98 R=0.99)
But using a parser for extracting source code from emails involves
1.high computational effort
2.scaling up to archives might be difficult
3. Mailing list as natural language documents more prone to noise.
Hence, devised lightweight and easy to implement approaches.
Benchmark:
5 different open source Java projects with different development paradigms.–> randomly picked emails and shows a table with percentage of emails containing code(done manually).
Work done:
Developed a custom web application 'Miler' with
a) Systems- list of software systems loaded and to be analyzed
b) Mails- no. of emails that have been read
c) Retrieve any email by its id
d) Email header and body
e) Annotated code fragments