Abstract:
Urdu is a language which is a morphologically rich and weak resourced language. The distinguishing features such as free word order, context-sensitive orthography, flexible grammar rules and complex morphology makes the representation of the Urdu language a difficult problem area. In Urdu's hand-written text the words are written without any space among them. A computer needs a text file that needs a separator when a word ends with a non-joiner character. Without these separators, the words will join with one another that will not be understandable for language native speakers. Chunking is a basic technique used for entity detection that labels and segments the sequence of Multi tokens. Chunking technique helps in the progress of many Natural Processing Applications. Chunking is a mature field while dealing with other languages like Hindi, English, Chinese and Turkish but it still requires the attention of researchers in the Urdu language. The Native speakers of Urdu language are more than 70 Million. The study is about the noun and verb phrase chunking in the Urdu language. The intention of this work is to explore the corpus accuracy based on the Noun and verb phrase chunking of the Urdu language. Chunking is an NLP (natural language processing) function that focuses on splitting a text into syntactically linked non-overlapping and non-exhaustive word-groups i.e. a word could only be a part of one chunk but not all words are in chunks. Different experiments are conducted on this work by using a tag set of different input and output schemes with the same Methodology. Firstly, the corpus is selected then preprocessing is performed on that corpus. After that part of speech tagging and IOB tags are assigned to that corpus then Noun and verb phrases are detected by neural networks and machine learning techniques. After this Noun and Verb phrases are detected from a corpus. At last, evaluation will be done by using different Parameters like F-call, recall and Precision.