last week wrote an article, let the news of your data station with sina sync, some users are interested in, so I decided to share with you which mentioned in the pseudo original system, introduces the realization of the principle, this system is also introduced in my studio, sisphus.
search engine is a machine, through the change of the title, replace some words, upset some sections, insert some links and other means, to achieve the purpose of pseudo original, the Internet is also a pseudo original tools are similar, but also the need for manual operation to generate, so I want to do an automatic, unattended monitoring automatic pseudo original system, combined with automatic collection procedures, can realize the acquisition of -> storage; -> pseudo original process, and the whole process of unmanned management, and real-time.
well, to be changed without affecting the semantics of words, a better way is to use synonyms to replace, so I think the first step is to establish the thesaurus, in the online search of the database after the fruit, decided to find related sites for collection, found that Kingsoft can well satisfy my request through the collection, establish a thesaurus, tens of thousands of data.
and then is to replace what keywords, for it, replace what? My idea is the first word of the article, divided into several phrases, then take longer than two Chinese characters, search in the thesaurus if there are replaced, I use Python to achieve this process. In order to speed up the speed of synonyms, key-value can be used for storage. Some of the key code is as follows:
, def, getnewword (text, list):
cxn. Execute (", select, ID, from, tool_words, where, name=’%s’, limit, 1",%text,
result=cxn. Fetchone ()
, if, type (result), is, not, NoneType:
cxn. Execute (", select, name, from, tool_wordslike, where, wid=%d, order, by, Rand (), limit, 1",%result,)
result4=cxn. Fetchone ()
, if, type (result4), is, not, NoneType: