Can divide separate out through text information most rare mood. In digital world, through building correlation relation, causal wait, can be based on huge data and the algorithm that update ceaselessly, through understanding " mood " , understand the driving force of people behavior backside, make an explanation to the behavior of investment or undersell then.
Star of Zhang Jia of our newspaper reporter
Telangpu got the better of presidential general election, chuandazhi is gotten the better of go up 6% above; Hillary was defeated, appearance stock falls on the west 9% ; The article is off the rails, yi profit share however because the wife of the article is Ma Yi and rise... of Chinese stock market " strange flower " expression makes a person straight breathe out " look not to understand " .
These seeing be like not agree with does the incident that often manages go out left economics law? Beijing University country develops an academy (the following abbreviation " Beijing University country sends a courtyard " ) economics professor Shen Yan thinks " not be " .
Recently, beijing University country sends a courtyard to release index of Chinese investor sentiment. Shen Yan says: "The investor sentiment index that through the network the complex operation such as big data, model and algorithm reachs, can help grind sentence money market of macroscopical economic situation, understanding. " that is to say, mastered enough data, undertake deepness is analysed, "Strange flower " incident is to be able to find reason, make grind sentence.
Had the medium of big data, the economist of Beijing University people seeking jointly with big data company " not agree with often manage " the economics law of backside. "This is to begin merely, group follow-up still can use network big data and advanced artificial intelligence method to undertake series index studies, understand native land money market to us, especially stock market provides a help. " Yao Yang of dean of courtyard of hair of Beijing University country expresses, the index used the advanced method on study of deepness of a lot of big data and machine study, research won the support of company of percent of company of professional big data on the technology, this also is very good yield grinds the exploration that collaboration studies.
Big data " see look " in the mood " hide " share price clew
"Because public opinion and ethics stand in Ma Yi ,the partial reason that Yi Li rises is here. " Shen Yan says, on a few websites, the support person that can see Ma Yi states: We are in support Ma Yi , if you also abhor small 3, bear Yi profit share please.
Because value enterprise itself,buying a share is not, this kind of investor by investigator people call " noise negotiant " . "Their mood is apparent of dispute reason, but regular meeting causes an effect to share price. " Shen Yan says, to make magnanimity to such influencing factor, research group collected what investor sentiment can reflect on the network on 100 million data, use the method of deepness study, come the mood of magnanimity investor.
Investor sentiment is not a case, the influence of a few incident to share price, in original a few days often are take by the mood. Percent president holds citing of CEO Su Meng concurrently to, 13 days of beautiful couplet store rise breath and this considerably a few days ago of bit money steep fall, caused the mood reaction on relevant forum or website, also caused Dow Jones index drop.
"Panicky sentiment can spread. " Su Meng points out, if a few moods were had " masses foundation " , will become the data that can have an analysis, and " hide " share price clew. He explains: "Drop greatly e.g. bit money, beautiful couplet store add breath, we can see much sentiment information in small gain, forum, we handle these with technology of Natural Language Processing desultorily information, let algorithm can identify and be analysed. Let algorithm can identify and be analysed..
Su Meng introduces, the foundation of judgement includes the accuracy rate of bisect word and hypostatic identifying to lead. The participial accuracy rate of percent was accomplished 98.97% , accomplished on the accuracy rate of entity identifying 91.45% .
"Can divide separate out through text information most rare mood. " Su Meng says, in digital world, through building correlation relation, causal wait, can be based on huge data and the algorithm that update ceaselessly, through understanding " mood " , understand the driving force of people behavior backside, make an explanation to the behavior of investment or undersell then.
Algorithmic " pass together " not satisfy one's thirst uses note of dictionary of Chinese finance mood
Understand a sentiment from huge information, the biggest challenge is accurate.
"' this can be good really ah ' , this word is in major sign condition of China, it is a word that has acid meaning. " Shen Yan says, the group faces on 100 million, the scattered, mixed and disorderly net that comes with reptile technology put together climbs information, and make the computer right so such magnanimity, different (the person that comes from different conversation habit) expressions undertakes understanding,
The meaning that can consider expression with talking person " without deviation " , be among them the hardest part.
Very apparent, already grown-up abroad text version analyses algorithm to be able to be used at reference only, come with them " calculate " very difficult impartiality differs the specific meaning that gives Chinese, likely even " calculate dizzy " . "Most algorithm is developed in English words condition, we did a test, make clear take them to Chinese money market to use, accuracy makes a person very hard satisfactory. " Shen Yan says. With teaching children, the group gives the machine that finishs specific job first accurate " dictionary " . "Be aimed at the mood of Chinese stock market technically, we made a product that OK applying above all, call Chinese finance mood the dictionary. Call Chinese finance mood the dictionary..
Old Zuo represents doctor of economics of courtyard of hair of Beijing University country, english word is finite, the term of Chinese is infinite however. The English dictionary that is financial domain together is OK and migratory come over, but besides dictionary law, the group still used the help machine such as synonym technology to understand the word besides the dictionary, and they and the similar feature that what term has a mood to go up.
In addition, to make a machine more accurate to the assurance of the mood, the group still finds out partial term to give through manpower machine study, through little sample book (relatively climb example at the net, tag sample size artificially small) tag collect, perfect the machine understanding to whole big data. "For example ' banker ' in terrain of economics it is specific meaning, but classics regular meeting is composed by the netizen ' crops ' , be written possibly also into by the fault ' sign home ' . " path of Shen Yan citing, "According to fluctuation words condition, a few words that do not have any relations with banker, our algorithm also can be judged come out, this also is the reason that we can break through somewhat. This also is the reason that we can break through somewhat..
"The quality that tags artificially is very important, have a few dictionaries not quite good with, because tag,be quality is unwarrantable. " Du Xiaomeng evaluates percent group chief scientist, what algorithm achieves a result is accurate the understanding relationship that learns data to place with the machine is close.
The times characteristic of the language also is an element that the group needs to think. Chen Zuo shui: "Some closer year, our language change is very rapid, now 00 hind used language and we are not character of one polite formula it seems that, he states the means of openly mood and negative sentiment and traditional term to also be widely divergent. He states the means of openly mood and negative sentiment and traditional term to also be widely divergent..
Also made corresponding study for this machine. Old Zuo explains, "Algorithmic meeting uses sentential structure to figuring understanding, after learning period of time, the machine can pass the information judgement such as the position, but won't see a term. For example ' cut leek ' this word, become subsequently ' Xianggu mushroom ' ' tomato ' ... want structural similar only, also can identify the mood in giving a statement. Also can identify the mood in giving a statement..
AI can coach investment forecasts force to still be in " Xiu Lian "
Through with historical incident match, research group detected the usability of investor sentiment index. The model that algorithmic compose builds will serve as a database to the information on the net 2018 2008, undertake the index is analysed, with a broken line the graph is mirrorred. The turning point of its place scale and photograph of place of historical incident time are consistent.
Shen Yan reveals than be opposite the graph shows, the middle ten days of a month lost a value to the investor sentiment index at the beginning of July in June 2015 the biggest, contrast stock market record, of this month point to drop 3 into. In addition, to incident of other sign sex, sentiment index also can make conclude well and truly. Artificial intelligence (AI) the model can be calculated an investor sentiment, besides react somewhat to already producing a situation, still can announce of pair of prospective market information forecast. Shen Yan says: "Apiration and investor can invest to anticipate to the market in the report on certain level, then to yield, to wave motion, to trading the quantity has calculate capacity certainly, can judge the rate that produce. Can judge the rate that produce..
"Of investor sentiment index forecasting accuracy rate and abroad to be compared with paragraph photograph still is pretty good. " Introduction Shen Yan, the model in order to that the group will learn training to go out through machine deepness is forecasted (when data is collected) the stock prices that has not appeared, accuracy rate exceeds 80% .
Current, research group already collected entire network A all appearing on the market the relevant text data of the company, can give out in the light of different company, disparate industry the data magnanimity of the mood. Shen Yan expresses, this index anticipates will release continuously, as to can invest for investor guidance, forecast ability after all how to many have, still need to be quantified further, further research.
Shen Yan emphasizes, no matter can be developed on application how old efficiency, the field researchs in learning, what understand Chinese stock market to the academia is alleged " strange flower " offerred new perspective, can provide reference for financial orgnaization and asset management department.
Daily of science and technology