Wolfram Language

Text Capitalization

Capitalize now includes a number of capitalization schemes available for text inputs.

Each scheme uses different heuristics to determine whether a given word should be capitalized.

show complete Wolfram Language input

"TitleCase" capitalization uses The Chicago Manual of Style as a basis and takes into account word position and part of speech.

This capitalization scheme is widely used and is consistent with many work titles. For example, it can be compared against the actual capitalization used in book titles, such as the Nancy Drew Mystery Stories.

ToLowerCase can be used to ensure that all words are being capitalized from a lowercase baseline.

EditDistance can then compare the capitalization of the original work against the recapitalized form, and will return an integer indicating the number of character differences between the two.

There are 170 cases of edit distance 0 and 5 cases of edit distance 1.

Selecting those titles that do not match perfectly, one can use Style to highlight the difference in capitalization between the two.

In this case, there are minor differences in the capitalization, due to the original title capitalizing specific short words, as well as a different capitalization of "E-Mail."

The same idea can be used on a larger collection of books, such as those published between 1990 and 2000.

Use a logarithmic vertical axis to see all results.

Once again, the capitalization closely matches, and by selecting those cases with the largest number of differences, one can see these are typically caused either by unusual capitalization or the original title capitalizing specific short words against convention.

Related Examples

de es fr ja ko pt-br zh