The impact of metadata on the accuracy of automated patent classification

Richter, G. & MacFarlane, A. (2005). The impact of metadata on the accuracy of automated patent classification. World Patent Information, 27(1), doi: 10.1016/j.wpi.2004.08.001

[img]
Preview
PDF - Accepted Version
Download (791kB) | Preview

Abstract

During the last decade, the advance of machine-learning tools and algorithms has resulted in tremendous progress in the automated classification of documents. However, many classifiers base their classification decisions solely on document text and ignore metadata (such as authors, publication date, and author affiliation). In this project, automated classifiers using the k-Nearest Neighbour algorithm were developed for the classification of patents into two different classification systems. Those using metadata (in this case inventor names, applicant names and International Patent Classification codes) were compared with those ignoring it. The use of metadata could significantly improve the classification of patents with one classification system, improving classification accuracy from 70.8% up to 75.4%, which was highly statistically significant. However, the results for the other classification system were inconclusive: while metadata could improve the quality of the classifier for some experiments (recall increased from 66.0% to 68.9%, which was a small but nonetheless significant improvement), experiments with different parameters showed that it could also lead to a deterioration of quality (recall dropping as low as 61.0%). The study shows that metadata can play an extremely useful role in the classification of patents. Nonetheless, it must not be used indiscriminately but only after careful evaluation of its usefulness.

Item Type: Article
Additional Information: NOTICE: this is the author’s version of a work that was accepted for publication in World Patent Information. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in World Patent Information, Volume 27, Issue 1, March 2005, Pages 13–26.
Uncontrolled Keywords: Automated classification, Metadata, Inventors, International Patent Classification, Bibliographic data, Classifier committee, Patent classification
Subjects: Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: School of Informatics > Centre for Human Computer Interaction Design
URI: http://openaccess.city.ac.uk/id/eprint/4499

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics