Identification of Units and Other Terms in Czech Medical Records

    Zvára Jr., Karel - Kašpar, Václav
    Identification of Units and Other Terms in Czech Medical Records.
    natural language processing * healthcare documentation * medical reports * EHR * finite-state machine * regular expression
    Healthcare documentation in the Czech Republic usually has the form of a free text formatted just using spaces, tabs and line breaks. Extracting information from such a documentation is a challenge that if fulfilled would allow to use Czech medical reports by physicians with no knowledge of the Czech language as well as information transfer to a structured form. It is possible to approach this task as a task of finite-state machine, as a task of the linguistic analysis or as a task of statistics. This article summarizes our findings gained using finite-state machines and using commonly used code lists. Excerpts from real medical reports are translated to English in a way that demonstrates the same or similar problems as in the Czech language. Original Czech excerpts are available in the Czech version of this article.
