Why do we need word embedding, it’s properties and future directions.
Анотація
Natural language processing (hereinafter – NLP) with deep learning is an important combination in the modern world. Using word vector representations and embedding layers you can train recurrent neural networks with outstanding performances in the wide variety of industries.
Word embeddings (hereinafter – WE), which encode meanings of words to low-dimensional vector spaces, have become very popular due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions making interpretation of dimensions a real challenge.
The goal of word-embedding algorithms is, therefore, to embed words with the meaning based on their similarity or relationship with other words. There are a lot of applications made possible by word embeddings, also we can learn from the way researchers approached the problem of deciphering natural language for machines.
Посилання
Collobert R. Natural Language Processing (Almost) from Scratch. – 2011. – Retrieved from: http://dl.acm.org
Goldberg Y., Levy O. Word2vec Explained: Deriving Mikolov et al.’s Negative Sampling Word-embedding Method. – 2014. – Retrieved from: arXiv:1402.3722
Köhn A. What’s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. – Lisbon, Portugal (17-21 September 2015), 2014. – P. 2067–2073.
Levy O., Goldberg Y. Dependency-Based Word Embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), 2014. – P. 302–308.
Melamud O., McClosky D. The Role of Context Types and Dimensionality in Learning Word Embeddings / O. Melamud, D. McClosky, S. Patwardhan, M. Bansal // Proceedings of NAACL-HLT 2016. – 2016. – P. 1030– 1040. – Retrieved from: http://arxiv.org/abs/1601.00893
Mikolov T., Yih W. Linguistic Regularities in Continuous Space Word Representations / Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig. – Vol. 13. – 2013.
Pennington J., Socher R., Manning C. Glove: Global Vectors for Word Representation // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). – Doha, Qatar. Association for Computational Linguistics, 2014. – P. 1532–1543.
Ruder S., Vulić I., Søgaard A. A Survey of Cross-lingual Word Embedding Models Sebastian. – 2017 – Retrieved from: http://arxiv.org/abs/1706.04902
The Current Best of Universal Word Embeddings and Sentence Embeddings [Remote employment]. – Retrieved from: https://medium.com/huggingface/universal-word-sentence-embeddingsce48ddc8fc3a
Tissier J., Gravier C., Habrard A. Dict2Vec: Learning Word Embeddings using Lexical Dictionaries // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. – 2017. – Retrieved from: http://aclweb.org/anthology/D17-1024