Avasthi AI: Representative Phrases

Friday, 8 December 2017

Finding representative phrases

As part of my doctoral research, I was faced with an interesting classification problem. I was working with a dataset extracted from the Usenet archive. In Usenet, the content is automatically classified within newsgroups. my challenge was to find representative phrases for a given content based on the primary classification and content of the Usenet post.

Identifying topics

Because we are dealing with technical text, we created our own list of stop words that we ignored while processing for representative phrases. The code appended below is looking of ngram length of maximum 2 but can be easily changed for larger length.

Friday, 8 December 2017

Finding representative phrases

How GenAI models like ChatGPT will end up polluting knowledge base of the world.