Useful Links

This is an old revision of the document!

Here are some identifiers for code elements within mixed documents found in relevant sources:

Bachelli et al. 2011: uses naming conventions and capitalization, i.e., camel casing to identify fragments. States that they use a context-free grammar to identify stuctured fragments, but doesn't really specify how or give an example of the entries in their CFG.

Dagenais & Robillard 2012: “Code-like term” is defined as “a series of characters that matches a pattern associated with a type of code element”, e.g., parentheses for functions, camel casing for types, anchors for XML elements. There are also “code-like term lists”, which are sequences of code-like terms and “code snippets”, which are “small regions of source code that can be further divided into a list of code-like terms”. Identification of the aforementioned code-like terms or incorporations thereof occurs by lightweight techniques based on regular expressions.