Research Scholar, Indian Institute of Technology, Guwahati
The objective is to decide the authorship of a handwritten document from a set of enrolled writers.
Biometrics refers to identification of a person based on their physical or behavioral characteristics. A physical biometric utilize data obtained from direct measurements of a part of the individual where as, behavioral characteristic consider the measurements and data derived from an action performed by the user, and thus indirectly measure some characteristics of the individual. Writer identification falls into the latter category and falls under the broader domain of automatic handwriting recognition. In writer identification, given a document of unknown authorship, we rank a list of likely writers, from the reference set. Person identification through handwriting is accepted in many government, legal and commercial transactions as a method of personal authentication. As such, it is a non-invasive and non-threatening process, and can overcome some of the privacy problems present in other biometric systems such as passwords, and fingerprints. The research works in writer identification can be divided into two approaches: (i) text dependent and (ii) text independent. The former require handwriting based on a specific text or assume the availability of handwriting recognizer for identifying the writer. The problem of signature verification is one such popular instance of text independent writer identification. In general, the use of knowledge of the content of the data increases the accuracy of such systems, However, they fail in scenarios where in the text documents with different contents need to be compared. In such scenarios, text independent writer identification systems capture the style information of handwriting and can identify the writer irrespective of textual content. On the basis of data acquisition, writer identification techniques can be categorized into two main techniques: off-line and online. The recent advances in technology has enabled the release of hand-held devices where in data entry is captured through an electronic pen/stylus. The recorded data captures the dynamic information present in the trace of the handwriting such as (x,y) coordinates, azimuth, and time stamp. The term 'online' handwriting refers to the data of the above nature. A document in online handwriting is represented as a set of strokes, each of which consists of a sequence of points recorded between a pen-down and a pen-up signal. On the contrary, a document in the offline setting is characterized in the form of a scanned image containing the handwritten data.
Abstract: Most existing online writer-identification systems require that the text content is supplied in advance and rely on separately designed features and classifiers. The identifications are based on lines of text, entire paragraphs, or entire documents; however, these materials are not always available. In this paper, we introduce a path-signature feature to an end-to-end text-independent writer-identification system with a deep convolutional neural network (DCNN). Because deep models require a considerable amount of data to achieve good performance, we propose a data-augmentation method named DropStroke to enrich personal handwriting. Experiments were conducted on online handwritten Chinese characters from the CASIA-OLHWDB1.0 dataset, which consists of 3,866 classes from 420 writers. For each writer, we only used 200 samples for training and the remaining 3,666. The results reveal that the path-signature feature is useful for writer identification, and the proposed DropStroke technique enhances the generalization and significantly improves performance.
Pub.: 19 May '15, Pinned: 22 Aug '17
Abstract: Allograph prototype approaches for writer identification have been gaining popularity recently due to its simplicity and promising identification rates. Character prototypes that are used as allographs produce a consistent set of templates that models the handwriting styles of writers, thereby allowing high accuracies to be attained. We hypothesize that the alphabet knowledge inherent in such character prototypes can provide additional writer information pertaining to their styles of writing and their identities. This paper utilizes a character prototype approach to establish evidence that knowledge of the alphabet offers additional clues which help in the writer identification process. This paper then introduces an alphabet information coefficient (AIC) to better exploit such alphabet knowledge for writer identification. Our experiments showed an increase in writer identification accuracy from 66.0 to 87.0% on a database of 200 reference writers when alphabet knowledge was used. Experiments related to the reduction in dimensionality of the writer identification system are also reported. Our results show that the discriminative power of the alphabet can be used to reduce the complexity while maintaining the same level of performance for the writer identification system.
Pub.: 16 Jan '10, Pinned: 22 Aug '17
Abstract: This paper describes a strategy to identify the authorship of online handwritten documents. We regard our research framework to that of a retrieval problem and adapt the so called codebook based Vector of Local Aggregate descriptor (VLAD) that has been promising for the object retrieval application in image processing. The codebook comprises a set of code vectors with associated Voronoi cells computed from a clustering algorithm on a set of feature vectors along the online trace. However, we show that the VLAD formulation at times, cannot effectively discriminate between writers, when their respective feature vectors are not linearly separable in the Voronoi cell of the code vectors. To overcome this problem, we propose a novel descriptor that improves upon the VLAD formulation. Secondly, we explore a normalization for the feature vectors prior to the generation of the VLAD. Our method is different to the min–max and z-score in that it takes care in ensuring that the codevectors are not influenced by the presence of outliers in the data. The performance of our proposed descriptor with the new feature normalization are evaluated on two publicly available Online Handwriting Databases – the IAM and IBM-UB1. The results show a marked improvement over the VLAD.
Pub.: 15 Dec '16, Pinned: 22 Aug '17