Authorship Attribution Using Principal Component Analysis and Nearest Neighbor Rule for Neural Networks
Abstract
Feature extraction is a common problem in statistical pattern recognition. It refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. Principal component analysis is one of these processes. In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis. To make a comparison, the original data is also processed. Authors of texts identified with higher success by the competitive neural networks, which use principal components. The process repeated on another group of authors, and similar results are obtained.
Full Text:
PDFDOI: http://dx.doi.org/10.21533/scjournal.v1i2.59
Refbacks
- There are currently no refbacks.
Copyright (c) 2015 SouthEast Europe Journal of Soft Computing
ISSN 2233 -1859
Digital Object Identifier DOI: 10.21533/scjournal
This work is licensed under a Creative Commons Attribution 4.0 International License