Why do we need word embedding, it’s properties and future directions.

Authors

  • Alina Karpenko Vasyl’ Stus Donetsk National University

Abstract

Natural language processing (hereinafter – NLP) with deep learning is an important combination in the modern world. Using word vector representations and embedding layers you can train recurrent neural networks with outstanding performances in the wide variety of industries.

Word embeddings (hereinafter – WE), which encode meanings of words to low-dimensional vector spaces, have become very popular due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions making interpretation of dimensions a real challenge.

The goal of word-embedding algorithms is, therefore, to embed words with the meaning based on their similarity or relationship with other words. There are a lot of applications made possible by word embeddings, also we can learn from the way researchers approached the problem of deciphering natural language for machines.

Author Biography

Alina Karpenko, Vasyl’ Stus Donetsk National University

4th year student, Faculty of Philology, Specialism “Applied Linguistics”

References

Collobert R. Natural Language Processing (Almost) from Scratch. – 2011. – Retrieved from: http://dl.acm.org

Goldberg Y., Levy O. Word2vec Explained: Deriving Mikolov et al.’s Negative Sampling Word-embedding Method. – 2014. – Retrieved from: arXiv:1402.3722

Köhn A. What’s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. – Lisbon, Portugal (17-21 September 2015), 2014. – P. 2067–2073.

Levy O., Goldberg Y. Dependency-Based Word Embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), 2014. – P. 302–308.

Melamud O., McClosky D. The Role of Context Types and Dimensionality in Learning Word Embeddings / O. Melamud, D. McClosky, S. Patwardhan, M. Bansal // Proceedings of NAACL-HLT 2016. – 2016. – P. 1030– 1040. – Retrieved from: http://arxiv.org/abs/1601.00893

Mikolov T., Yih W. Linguistic Regularities in Continuous Space Word Representations / Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig. – Vol. 13. – 2013.

Pennington J., Socher R., Manning C. Glove: Global Vectors for Word Representation // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). – Doha, Qatar. Association for Computational Linguistics, 2014. – P. 1532–1543.

Ruder S., Vulić I., Søgaard A. A Survey of Cross-lingual Word Embedding Models Sebastian. – 2017 – Retrieved from: http://arxiv.org/abs/1706.04902

The Current Best of Universal Word Embeddings and Sentence Embeddings [Remote employment]. – Retrieved from: https://medium.com/huggingface/universal-word-sentence-embeddingsce48ddc8fc3a

Tissier J., Gravier C., Habrard A. Dict2Vec: Learning Word Embeddings using Lexical Dictionaries // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. – 2017. – Retrieved from: http://aclweb.org/anthology/D17-1024

Downloads

Issue

Section

Philological sciences