A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data,

Choose from Github

The heart of the machine is compiled

Participate in: Think of source, Xiao Kun

Natural Language Processing has rife child domain, and do not have more very much achieve satisfactory performance. The purpose of the article is to track Natural Language Processing (NLP) research makes progress, brief introduction is the commonnest the current and optimal research of NLP task and relevant data set. Author Sebastian Ruder enclothed the conventional NLP mission with core in article, for example analysis of syntax of depend on sb or sth for existence and syntactical functions and morphological features that help to determine a part of speech are tagged. And the task that more near future appears, read understanding and natural language inference for example. The purpose with the mainest article is what provide fiducial data set and the current and optimal research that are interested in the task for the reader is fast see without exception, regard future as the stepping-stone of research.

Project address: Guest of Https://github.com/sebastianruder/NLP-progress referenced rich: Http://ruder.io/tracking-progress-nlp/A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Catalog (the task and corresponding data set)

1.CCG super mark

CCGBank

2. is divided piece

Penn Treebank

3. electorate is analytic

Penn Treebank

4. points to acting disappear branch

CoNLL 2012

5. depend on sb or sth for existence is analytic

Penn Treebank

6. speaks

Condition of the 2nd dialog tracks a challenge

7. region suits

Data set of much territory affection

8. language builds a model

Penn TreebankWikiText-2

9. machine translation

WMT 2014 EN-DEWMT 2014 EN-FR

10. much task learns

GLUE

11. names entity identifying

CoNLL2003

Inference of 12. natural language

SNLIMultiNLISciTail

13. syntactical functions and morphological features that help to determine a part of speech is tagged

UDWSJ

14. reads understanding

ARCCNN/Daily MailQAngarooRACESQuADStory Cloze TestWinograd Schema Challenge

Sex of likeness of 15. semantic text

SentEvalQuora Question Pairs

16. affection is analysed

IMDbSentihoodSSTYelp

17. affection is analytic

WikiSQL

18. semantic action labels

OntoNotes

19. automatic summary

CNN/Daily Mail

20. text is classified

AG NewsDBpediaTREC

CCG super mark

Assorted category is syntactic (CCG; Steedman, 2000) it is the formalism that vocabulary of a kind of height changes. Clark and Curran 2007 year analytic model used the level that raise to exceed 400 vocabularies language kind (or super mark (Supertag) , typical analytic implement include about 50 syntactical functions and morphological features that help to determine a part of speech to tag only normally.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

CCGBank

CCGBank is the corpus of CCG ramification, and from Hockenmaier and Steedman 2007 year the structure of depend on sb or sth for existence that extracts in the Penn Treebank that put forward. 2-21 part is used at training, the 00th part is used at development, the 23rd part uses as market of the test inside region. Its function is calculated on 425 the most commonly used label only. The model is based on accuracy rate to evaluate.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Cent piece

Cent piece (Chunking) it is analytic shallow-layer form, can identify form synthetic unit (for example substantival phrase or verbal phrase) the successive span of mark.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Penn Treebank- is divided piece

Penn Treebank is used at evaluating cent piece normally. 15-18 part is used at training, the 19th part is used at development, the 20th part is used at the test. The model is based on F1 to evaluate.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

The electorate is analytic

The electorate is analytic (Constituency Parsing) the purpose is from the basis grammar of word group structure comes token extracts the analytic tree that is based on an electorate in the sentence of its synthesis structure.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

The method that near future development comes out (" Grammar As A Foreign Language " ) changeover of will analytic tree is by deepness preferential ergodic alignment, can use series to series model to this thereby on analytic tree. The linearization version of analytic tree expresses above for: (S (N) (VP V N)) .

Penn Treebank- electorate is analytic

Of Penn Treebank " Wall Street Journal " the part is used at evaluating an electorate analytic implement. The 22nd part is used at development, the 23rd part is used at evaluating. The model is based on F1 to evaluate. The following most model is integrated outside data or feature. Want the individual model that contrast trains on WSJ only, refer to " Constituency Parsing With A Self-Attentive Encoder " .

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Point to acting disappear branch

Branch of disappear pointing to era (Coreference Resolution) it is to get together kind of text is medium involve identical and potential real world to carry narrated mission hypostaticly.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

" I " , " My " and " She " belong to get together identically kind, " Obama " and " He " belong to get together identically kind.

CoNLL 2012

The experiment is compose is built in " CoNLL-2012 Shared Task " over data set, its used OntoNotes point to in all tag. The F1 that the paper uses official CoNLL-2012 to evaluated crural our newspaper to accuse rate of precision, recall and MUC, B3 and CEAF φ 4 index. Main assessment the average F1 that index is 3 index.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Depend on sb or sth for existence is analytic

Depend on sb or sth for existence is analytic (Dependency Parsing) be from token the depend on sb or sth for existence that its extract in syntactic structure is analytic, define heading the relation between word and word, will revise those heading word.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

The concerns to use directional, number on the sentence arc between the word (from heading the word arrives depend on sb or sth for existence) reveal, + express depend on sb or sth for existence.

Penn Treebank- depend on sb or sth for existence is analytic

The model is in " Stanford Typed Dependencies Manual " in the Stanford Dependency of the Penn Treebank that put forward alternates and forecast parts of words to label aspirant travel is evaluated. Evaluating index is to did not label depend on mark (Unlabeled Attachment Score, UAS) depend on with mark mark (LAS) .

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Dialog

Well-known, conversational task is evaluated very hard. The method previously once used the mankind to evaluate.

Condition of the 2nd dialog tracks a challenge

To the dialog with oriented target, condition of the 2nd dialog tracks a challenge (Second Dialog State Tracking Challenge, DSTSC2) data set is a commonly used assessment data set. Conversational condition tracks the complete token that involves the goal that defines the every bout user in the dialog to be nodded in current dialog, its included a series of tie of a target, request chances (Requested Slot) the conversational action with the user. DSTC2 focusing searchs a field at dining-room. The model is based on the alone accuracy rate that trails with associated opportunity to undertake assessment.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

The domain suits oneself

Data set of much territory affection

Data set of much territory affection (Multi-Domain Sentiment Dataset) the domain that is affection analysis suits oneself commonly used evaluate data set. It included those who come from Yamaxun different product category (regard as different domain) product evaluation. These evaluations include astral class assess (1 to 5 stars) , be changed to be 2 values label normally. The target that the model is in to differ with the source region when training normally is evaluated on region, its can visit target territory only did not label example (suit without supervisory region) . Evaluating a standard is accuracy rate is mixed take average cent cost to every region.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

The language builds a model

The language builds a model is the task of next term in forecasting text. * states the model used trends to evaluate.

Penn Treebank- language builds a model

What the language builds a model is commonly used evaluating data set is Penn Treebank, the pretreatment that already waited for a person through Mikolov (" Recurrent Neural Network Based Language Model " ) . This data set by 929k training word, 73k word of test and verify and 82k test word is formed. As the one part of pretreatment, the word uses form of the ordinary form of a Chinese numeral, the number replaces N, line feed accord with expresses with blank space, and all other punctuation is deleted. Its vocabulary is the 10k that uses the oftenest word, and the rest labels with mark is replaced. The model is based on bewilderment to spend evaluate, namely average the logarithm probability of every word (Per-word Log-probability) , lower better.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

WikiText-2

WikiText-2 (" Pointer Sentinel Mixture Models " ) compare at Penn Treebank, its are built in the language is more adjacent in the model actual fiducial. WikiText-2 is formed by about 2 million words that Cong Weiji extracts in 100 divisions article.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Machine translation

Machine translation is the task that is the sentence different target language from source language changeover. The average test score of the fraction of BLEU of collect of average test and verify that the result that brings * represents to be based on 21 to be evaluated continuously, the paper of the person such as Chen of no less than " The Best Of Both Worlds: Combining Recent Advances In Neural Machine Translation " what report.

WMT 2014 EN-DE

The model is in seminar of machine translation of the 9th statistic (VMT2014) undertake assessment on English-German data set (according to BLEU mark) .

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

WMT 2014 EN-FR

Similar, in seminar of machine translation of the 9th statistic (VMT2014) undertake assessment on English-French data set (according to BLEU mark) .

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Much task learns

The target of many task study is to learn many different jobs at the same time, the biggest change among them or the function of overall mission.

GLUE

Universal language understanding is evaluated fiducial (GLUE) it is the tool that is used at evaluate and analysing a variety of model property that already natural language understands the assignment. The model is based on the average accuracy rate in all tasks to undertake assessment.

Name entity identifying

Name entity identifying (NER) it is to be in text the task with substance of corresponding type mark. Commonly used method uses BIO mark, of divisional substance initiative (Begining, b) with interior (Inside, i) . O is used at be not hypostatic mark.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

CoNLL 2003

CoNLL 2003 task includes the text version of news news report that comes from Reuters RCV1 corpus, with 4 kinds different hypostatic type undertakes tagging (PER, LOC, ORG, MISC) . The model is based on (be based on span) F1 is evaluated.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Natural language inference

Natural language inference is to give " premise " , decide " suppose " for true (contain) , false (contradictory) doubtful perhaps calm (neuter) the task.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

SNLI

Inference of Stanford natural language (SNLI) the corpus is included about 550k suppose / premise is right. The model is based on accuracy rate to evaluate.

Current and optimal result can examine on the website of SNLI: Https://nlp.stanford.edu/projects/snli/

MultiNLI

Multilingual model natural language inference (MultiNLI) the corpus is included about 433k suppose / premise is right. It and SNLI corpus are similar, but enclothed a variety of oral the language with written text (Genre) , support cross language evaluate. Data can download from MultiNLI website: Https://www.nyu.edu/projects/bowman/multinli/

Language inside (match) and cross language (do not match) assessment open pop chart can examine (but the paper that these entry already published without correspondence) :

Https://www.kaggle.com/c/multinli-matched-open-evaluation/leaderboardhttps://www.kaggle.com/c/multinli-mismatched-open-evaluation/leaderboardA photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

SciTail

SciTail (" SCITAIL: A Textual Entailment Dataset From Science Question Answering " ) derive data set includes 27k entry. Differ with SNLI, MultiNLI, it is not numerous bag data set, but from what already founded in some sentences, await pitch on to found from scientific question and corresponding answer suppose, at the same time the comes from large corpus sentence of relevant website is used as premise. The model is based on accuracy rate to evaluate.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Syntactical functions and morphological features that help to determine a part of speech is tagged

Syntactical functions and morphological features that help to determine a part of speech is tagged (POS Tagging) it is a kind of task that tags a word to belong to composition in the place in text. Syntactical functions and morphological features that help to determine a part of speech expresses the category that word place belongs to, identical category has similar syntactic property commonly.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

UD

Universal Dependencies (UD) it is a frame that crosses language grammar to tag, it includes the 100 many Treebanks of a variety of more than 60 languages. The model passes the average test accuracy rate in 28 kinds of languages to undertake assessment commonly.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Penn Treebank, POS Tagging

The standard data set that is used at syntactical functions and morphological features that help to determine a part of speech to tag is wall street daily (WSJ) distributive Penn Treebank, it includes ticket of 45 different syntactical functions and morphological features that help to determine a part of speech. Among them 0-18 is used at training, 19-21 to be used at 22-24 of other of test and verify to be used at the test. The model undertakes assessment through accuracy rate commonly.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Read understanding / interlocution task

Interlocution is a kind of task that answers a question automatically. The data set with current great majority this the task is to be read understanding, among them the problem is paragraph or text, and the answer is the span between documentation normally. The machine of UCL read research group to still introduce to read those who understand the assignment to see without exception: Https://uclmr.github.io/ai4exams/data.html.

ARC

AI2 Reasoning Challenge (ARC) it is an interlocution data set, among them it contained 7787 actual elementary school levels problem of multinomial choice science. Data set is broken up for difficult part and simple collect, difficult part includes those to be based on word retrieval algorithm and word to show the question that algorithm can not answer correctly in all only. The model is evaluated through accuracy rate likewise.

ARC publishs pop chart: Http://data.allenai.org/arc/

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

QAngaroo

QAngaroo is two read understanding data set, they need to combine many documentation many conclude measure. WikiHop of the first data set is an open field, and the data set of article of dedicated Yu Weiji, medHop of the 2nd data set is a data set that is based on PubMed paper summary.

The pop chart of this data set can refer to: Http://qangaroo.cs.ucl.ac.uk/leaderboard.html

RACE

RACE data set is one counteracts what test of high school English collects to read understanding data set at the beginning of the country from which. This data set includes many 28000 essay and nearly 100000 issues. The model can be based on a middle school to check (RACE-m) , high school checks (RACE-h) with whole data set (RACE) use accuracy rate undertakes assessment.

Data set downloads an address: Http://www.cs.cmu.edu/~glai1/data/race/

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

SQuAD

Stanford interlocution data set (SQuAD) be read understanding data set, it includes the issue that is based on dimension radical article to put forward by numerous bag. The answer reads the text part of essay for correspondence. SQuAD 2.0 had been released recently, it introduced with those who ask the problem is similar can reply to cannot answer a question in SQuAD 1.1, difficulty prep above SQuAD 1.1. In addition, SQuAD 2.0 still obtained ACL 2018 optimal short paper.

Story Cloze Test

Story Cloze Test is a data set that is used at story understanding, it provided the story of Four-sentence form and two likely outcome, the system chooses the attempt accurate story final result.

Winograd Schema challenge

Winograd Schema Challenge is a data set that is used at commonsensible inference. Problem of its use Winograd Schema points to acting disappear branch with requirement person: The system must have different meanings to point to acting antecedent in clear statement. The model is based on accuracy rate to evaluate likewise.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Sex of semantic text likeness

Sex of semantic text likeness depends on concluding the distance between two paragraphs of text, for example we can allocate 1 to 5 will state text has how similar. Corresponding task has paraphrase is changed and repeat identify.

SentEval

SentEval is a kit that is used at evaluating sentential token, it includes 17 downstream jobs, include job of general sex of semantic text likeness. Sex of semantic text likeness (STS) from 2012 to 2016 (STS12, STS13, STS14, STS15, STS16, STSB) fiducial task is based on the Yu Xian between two token similar sex magnanimity the dependency between two sentences. Evaluating a standard is Pearson dependency commonly.

SICK dependency (SICK-R) the task trains a linear model in order to output 1 to the mark of 5, point to era the dependency between two sentences. Same data set (SICK-E) can regard as use accumulate the binary classification question that contains label. The Standard Dimension of SICK-R also is Pearson dependency, SICK-E can classify accuracy magnanimity through text.

Microsoft Research Paraphrase Corpus (MRPC) the data set that the corpus is paraphrase identifying, among them the system aims to identify two statements to whether be paraphrase each other sentence. Evaluate a standard to be classified accuracy and F1 mark.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Quora Question Pairs

Data set of Quora Question Pairs is comprised by interlocution of 400000 pairs of Quora, the system needs to identify the carbon that a problem is other problem. The model also is to pass accuracy rate magnanimity.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Affection analysis

Affection analysis is to be in give task of active or inactive affective identifies below text.

IMDb

IMDb is a binary feeling that includes 50000 reviews analyses data set, the comment comes from with Internet film database (IMDb) , and tag for active or inactive two kinds. The model undertakes assessment through accuracy rate likewise.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Sentihood

Sentihood is an affection analysis that is used at be aimed at to be based on a respect (TANSA) data set, it aims to identify the affection of fine granuality according to particular facet. Data set contains 5215 sentences, among them 3862 include individual target, other has many causes. This task uses F1 mark to evaluate use respect, and use accuracy rate evaluates affection analysis.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

SST

Stanford Sentiment Treebank includes 215154 phrase, and the affection that 11855 films comment on a statement to have fine size with the means of analytic tree is tagged. The model evaluates fine granuality and binary classification effect according to accuracy rate.

Granule spends classification:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Binary classification:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Yelp

Yelp comments on data set to include review of more than 500000 Yelp. They have n duality and fine size at the same time (5 categories) the data set of level, the model carries error rate (1 - accuracy rate, lower better) evaluate.

Granule spends classification:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Binary classification:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Affection is analytic

Affection is analytic be a kind natural language translate into the task of formal and semantic token. Formal token may be the executable language such as SQL, or more abstract Abstract Meaning Representation (AMR) token.

WikiSQL

WikiSQL data set includes 87673 problems example, SQL to inquire statement and the database that in be being expressed by 26521 pieces, build are expressed. This data set supplied training, development and test part, because these each pieces of watch is broken up only. The model is based on the accuracy rate that executive result matchs to have tolerance.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Semantic function is tagged

Semantic function is tagged those who aim to build modular statement is predicative regard yuan of structure, it often described as replies " Who Did What To Whom " . BIO symbol often is used at semantic function to tag.

Give typical examples:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

OntoNotes, semantic function is tagged

The model passes the OntoNotes that is based on F1 normally fiducial undertake assessment (" Towards Robust Linguistic Analysis Using OntoNotes " ) .

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Automatic summary

If one kind sums up semantics of original text version to be the task of short text,pick automatically.

Summary of CNN/ daily mail

Data set of CNN / Daily Mail is by Nallapati Et Al. (2016) handles intercurrent cloth, it has been used at evaluating automatic summary. This data set is included contain many summary (average 3.75 sentences or 56 words) emersion news article (average 781 words) . Include 287226 training via the version of processing 13368 right, test and verify are opposite and 11490 tests are opposite. The model is based on ROUGE-1, ROUGE-2 and ROUGE-L to undertake assessment, * expresses a model to undertake on faceless data set training and evaluate.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Text is classified

Text classification is the mission that closes sentence or text allocation comfortable category. The category depends on the data set of the choice, can have different theme.

AG News

AG News corpus is included come from " AG's Corpus Of News Articles " new implied meaning, train beforehand on 4 the largest categories. This data set includes the 30000 training sample book of every category, and the 1900 tests sample of every category. The model is based on error rate to evaluate.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

DBpedia

Each when DBpedia Ontology data set includes category of 14 blame jackknife 40000 training example checks sample with 5000. The model is based on error rate to evaluate.

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

TREC

TREC (" The TREC-8 Question Answering Track Evaluation " ) it is the data set that is used at problem classification, include the open region, issue that is based on a fact, be divided into extensive semantic category. It has 6 categories (TREC-6) with 5 categories (TREC-50) two version. They have 4300 training sample book, but TREC-50 has more careful ticket. The model is based on accuracy rate to evaluate.

TREC-6:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

TREC-50:

A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data, model and paper

Welcome to reprint:News » A photograph of whole family of Natural Language Processing: Scan is current the task in NLP, data,
Share: