I maintain a large on-line bibliography on automated text categorization (ATC). Click here if you want to access a fully searchable on-line version, and here if you want to download it as a whole (in BibTex format).
ATC is the activity of automatically building, by means of machine learning techniques, automated text classifiers, i.e. systems capable of assigning to a text document one or more thematic categories (or labels) from a predefined set.
Everyone is welcome to let me know either additional references or corrections and additions (e.g. URLs and abstracts, where they are not already present) to the existing ones. In general, only references specific to ATC are considered pertinent to this bibliography; in particular, references that are considered pertinent are:
References that are not considered pertinent are:
publications that discuss novel ATC methods, novel experimentation of previously known methods, or resources for ATC experimentation; publications that discuss applications of ATC (e.g. automated indexing for Boolean IR systems, filtering, etc.).
Concerning URLs from which to download on-line copies of the papers, where possible I have included URLs with unrestricted access (e.g. home pages of authors). When such URLs were not available, sometimes a URL with restricted access (e.g. the ACM Digital Library or the IEEE Computing Society Digital Library, which are accessible to subscribers only) is indicated. When this is the case, if you know of a URL with unrestricted access from which the paper is also available, please let me know and I will substitute the link.
publications that discuss techniques in principle useful for ATC (e.g. machine learning techniques, information retrieval techniques) but do not explicitly discuss their application to ATC; >publications thet discuss related topics sometimes confused with ATC; these include, in particular, text clustering (i.e. text classification by unsupervised learning) and text indexing; >technical reports and workshop papers. Only papers that have been the object of formal publication (i.e. conferences and journals) are to be included in the bibliography, so as to avoid its explosion and the inclusion of material bound to obsolescence