volume 34 issue 05 pages 8480-8487

Robust Named Entity Recognition with Truecasing Pretraining

Stephen D Mayhew 1
Neelesh Kumar Gupta 1
Dan Roth 1
Publication typeJournal Article
Publication date2020-04-03
General Medicine
Abstract

Although modern named entity recognition (NER) systems show impressive performance on standard datasets, they perform poorly when presented with noisy data. In particular, capitalization is a strong signal for entities in many languages, and even state of the art models overfit to this feature, with drastically lower performance on uncapitalized text. In this work, we address the problem of robustness of NER systems in data with noisy or uncertain casing, using a pretraining objective that predicts casing in text, or a truecaser, leveraging unlabeled data. The pretrained truecaser is combined with a standard BiLSTM-CRF model for NER by appending output distributions to character embeddings. In experiments over several datasets of varying domain and casing quality, we show that our new model improves performance in uncased text, even adding value to uncased BERT embeddings. Our method achieves a new state of the art on the WNUT17 shared task dataset.

Found 

Top-30

Journals

1
2
Communications in Computer and Information Science
2 publications, 11.11%
ACM Transactions on Asian and Low-Resource Language Information Processing
1 publication, 5.56%
Transactions of the Association for Computational Linguistics
1 publication, 5.56%
PLoS ONE
1 publication, 5.56%
Procedia Computer Science
1 publication, 5.56%
Springer Proceedings in Mathematics and Statistics
1 publication, 5.56%
Lecture Notes in Computer Science
1 publication, 5.56%
Expert Systems with Applications
1 publication, 5.56%
Methods
1 publication, 5.56%
IEEE Access
1 publication, 5.56%
1
2

Publishers

1
2
3
4
5
Institute of Electrical and Electronics Engineers (IEEE)
5 publications, 27.78%
Association for Computing Machinery (ACM)
4 publications, 22.22%
Springer Nature
4 publications, 22.22%
Elsevier
3 publications, 16.67%
MIT Press
1 publication, 5.56%
Public Library of Science (PLoS)
1 publication, 5.56%
1
2
3
4
5
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
18
Share
Cite this
GOST |
Cite this
GOST Copy
Mayhew S. D., Gupta N. K., Roth D. Robust Named Entity Recognition with Truecasing Pretraining // Proceedings of the AAAI Conference on Artificial Intelligence. 2020. Vol. 34. No. 05. pp. 8480-8487.
GOST all authors (up to 50) Copy
Mayhew S. D., Gupta N. K., Roth D. Robust Named Entity Recognition with Truecasing Pretraining // Proceedings of the AAAI Conference on Artificial Intelligence. 2020. Vol. 34. No. 05. pp. 8480-8487.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1609/aaai.v34i05.6368
UR - https://doi.org/10.1609/aaai.v34i05.6368
TI - Robust Named Entity Recognition with Truecasing Pretraining
T2 - Proceedings of the AAAI Conference on Artificial Intelligence
AU - Mayhew, Stephen D
AU - Gupta, Neelesh Kumar
AU - Roth, Dan
PY - 2020
DA - 2020/04/03
PB - Association for the Advancement of Artificial Intelligence (AAAI)
SP - 8480-8487
IS - 05
VL - 34
SN - 2159-5399
SN - 2374-3468
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2020_Mayhew,
author = {Stephen D Mayhew and Neelesh Kumar Gupta and Dan Roth},
title = {Robust Named Entity Recognition with Truecasing Pretraining},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
year = {2020},
volume = {34},
publisher = {Association for the Advancement of Artificial Intelligence (AAAI)},
month = {apr},
url = {https://doi.org/10.1609/aaai.v34i05.6368},
number = {05},
pages = {8480--8487},
doi = {10.1609/aaai.v34i05.6368}
}
MLA
Cite this
MLA Copy
Mayhew, Stephen D., et al. “Robust Named Entity Recognition with Truecasing Pretraining.” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, Apr. 2020, pp. 8480-8487. https://doi.org/10.1609/aaai.v34i05.6368.