Find-Health-Articles.com - making medical research available to everyone
Research article summary (published 20 Jul 2009):
Free Full Text!
See links below

Text mining and natural language processing approaches for automatic categorization of lay requests to web-based expert forums.

Full Abstract

BACKGROUND: Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called "ask the doctor" services. OBJECTIVE: To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. METHODS: We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website "Rund ums Baby" ("Everything about Babies") into one or more of 38 categories belonging to two dimensions ("subject matter" and "expectations"). After creating start and synonym lists, we calculated the average Cramer's V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. RESULTS: According to the manual classification of 988 documents, 102 (10%) documents fell into the category "in vitro fertilization (IVF)," 81 (8%) into the category "ovulation," 79 (8%) into "cycle," and 57 (6%) into "semen analysis." These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as "general information" and 351 (36%) as a wish for "treatment recommendations." The generation of indicator variables based on the chi-square analysis and Cramer's V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, "words") also included variables from other categories, most often with a negative sign. For example, absence of words predictive for "menstruation" was a strong indicator for the category "pregnancy test." CONCLUSIONS: Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback.

 

Author information

Author/s: Himmel, Wolfgang (W); Reincke, Ulrich (U); Michelmann, Hans Wilhelm (HW);

Affiliation: Department of General Practice/Family Medicine, University of Göttingen, Humboldtallee 38, 37070 Göttingen, Germany. whimmel(-atsign-)gwdg.de

Journal and publication information

Publication Type: Journal Article

Journal: Journal of medical Internet research (J Med Internet Res), published in United States. (Language: eng)

Reference: 2009-; vol 11 (issue 3) : pp e25

Dates: Created 2009/07/27; Completed 2009/10/23;

PMID: 19632978, status: MEDLINE (last retrieval date: 10/23/2009, IMS Date: )

Sourced from the National Library of Medicine. Abstract text and other information may be subject to copyright.

External Links for this article
(including full text providers, if available):

Click Electronic Full-text Provider Links to see options for finding the electronic full text links to this article. Note there may be a subscription or fee required for access to the full text. See our FAQ for information on finding FREE full text articles.

This article may also be located in paper journal collections available in many libraries. Use the Journal and Publication Information above to find the full article.

MeSH headings (categories)

This article was linked to the MESH Headings shown below.

Related articles

These are the highest related articles currently in the database:

See 100+ related articles.

Related Article Map

6/29/2001
2/28/2008
Higher Relevance Score (100)
Lower Relevance Score (65)

Legend: - FREE Full text Article. - Abstract only. - Title only. More help.

See a large map of 100+ related articles.

© Advanogy LLC 2003-2009 - All rights reserved. Terms of Use | Contact Us | Index