Apache Commons logo Commons Text

Proposal for Apache Commons Text Package

(0) Rationale

Providing algorithms for processing texts like editing distance or similarity is out of scope of the standard Java libraries. The Commons Text Package provides these extra methods.

(1) Scope of the Package

This proposal is to create a package of Java utility classes implementing well known string algorithms and metrics.

(1.5) Interaction With Other Packages

Commons Text relies only on standard JDK 7 (or later) APIs for production deployment. It utilizes the JUnit unit testing framework and the hamcrest matcher library for developing and executing unit tests, but this is of interest only to developers of the component. Commons Text may be a dependency for several existing components in the open source world that implement higher order text processing.

No external configuration files are utilized.

(2) Initial Source of the Package

The initial classes came from the Commons Lang and Commons Codec subprojects.

The proposed package name for the new component is org.apache.commons.text.

(3) Required Apache Commons Resources

  • Git Repository - New repository commons-text.
  • Mailing List - Discussions will take place on the general dev@commons.apache.org mailing list. To help list subscribers identify messages of interest, it is suggested that the message subject of messages about this component be prefixed with [text].
  • Jira - New component "Common Text" under the "Commons Sandbox" product.
  • Confluence FAQ - New category "commons-text" (when available).

(4) Initial Committers

The initial committers on the Commons Text component shall be as follows:

  • Benedikt Ritter (britter)
  • Bruno P. Kinoshita (kinow)