Useful Links

This is an old revision of the document!

Here are some identifiers for code elements within mixed documents found in relevant sources:

Bachelli et al. 2011: Uses naming conventions and capitalization, i.e., camel casing to identify fragments. States that they use a context-free grammar to identify stuctured fragments, but doesn't really specify how or give an example of the entries in their CFG.

Dagenais & Robillard 2012: “Code-like term” is defined as “a series of characters that matches a pattern associated with a type of code element”, e.g., parentheses for functions, camel casing for types, anchors for XML elements. There are also “code-like term lists”, which are sequences of code-like terms and “code snippets”, which are “small regions of source code that can be further divided into a list of code-like terms”. Identification of the aforementioned code-like terms or incorporations thereof occurs by lightweight techniques based on regular expressions.

Rigby & Robillard 2013: Naming conventions, camel casing and lightweight techniques based on regular expressions, just as in Bachelli et al. 2010. Uses “regular expressions approximated following constructs in the Java Language Specification”: qualified terms, package names, variable declarations, qualified variables, method chains, class definitions including inheritance, declarations, method overrides, inner classes, constructors, stack traces, annotations, and exceptions. Regular expressions are ordered from most precise to most flexible.