A team of researchers from the National Research Nuclear University, the National Research Center Kurchatov Institute, and Voronezh State University in Russia have developed an algorithm that analyses a sample of text and uses a neural network to identify the gender of the writer, so far it has achieved an accuracy approaching 80%.

This is an example of Artificial Narrow Intelligence (ANI) or weak AI, which means the process is able to significantly outperform a human in just one specific task, an example would be a chess playing computer. The project was funded by the Russian Science Foundation and the report is published in the Procedia Computer Science journal.

Many scientific studies have demonstrated how writing style can reveal certain characteristics about the writer, including gender, physiological personality traits and level of education. Speech patterns also appear to convey useful psycho-diagnostic information and together with handwriting analysis, are routinely used by HR staff for recruitment selection, particularly for the security services. It has also been found that by analysing speech it is possible to identify certain characteristics indicating conditions such as dementia, depression and even suicidal states of mind. The determination of personality traits from samples of text also has great potential.

In the age of big data it is important to accurately identify a target demographic in order to optimise marketing resources. For this reason researchers are concentrating their efforts to extract specific information from text. Using mathematical models with values assigned to specific parameters occurring in the text it is possible to identify certain personality traits of the writer. Neural networks were employed to analyse the effectiveness of diverse text-analysis machine learning algorithms.

Results indicated that the use of a deep learning CNNs (Convolutional Neural Network) was most effective at identifying the writer’s gender. The research team is also using similar techniques to identify a writer’s age group from a sample of text.