|
|
| Research article summary (published 26 Jul 2009): |
Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection.
Full Abstract
This paper proposes a new document retrieval (DR) and plagiarism detection (PD) system using multilayer self-organizing map (MLSOM). A document is modeled by a rich tree-structured representation, and a SOM-based system is used as a computationally effective solution. Instead of relying on keywords/lines, the proposed scheme compares a full document as a query for performing retrieval and PD. The tree-structured representation hierarchically includes document features as document, pages, and paragraphs. Thus, it can reflect underlying context that is difficult to acquire from the currently used word-frequency information. We show that the tree-structured data is effective for DR and PD. To handle tree-structured representation in an efficient way, we use an MLSOM algorithm, which was previously developed by the authors for the application of image retrieval. In this study, it serves as an effective clustering algorithm. Using the MLSOM, local matching techniques are developed for comparing text documents. Two novel MLSOM-based PD methods are proposed. Detailed simulations are conducted and the experimental results corroborate that the proposed approach is computationally efficient and accurate for DR and PD.
Author information
Author/s: Chow, Tommy W S (TW); Rahman, M K M (MK);
Affiliation: Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong. eetchow(-atsign-)cityu.edu.hk
Journal and publication information
Publication Type: Journal Article; Research Support, Non-U.S. Gov't
Journal: IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council (IEEE Trans Neural Netw), published in United States. (Language: eng)
Reference: 2009-Sep; vol 20 (issue 9) : pp 1385-402
Dates: Created 2009/09/03; Completed 2009/11/04;
PMID: 19643706, status: MEDLINE (last retrieved date: 11/4/2009)
Sourced from the National Library of Medicine. Abstract text and other information may be subject to copyright.
External Links for this article
(including full text providers, if available):
Click Electronic Full-text Provider Links to see options for finding the electronic full text links to this article. Note there may be a subscription or fee required for access to the full text. See our FAQ for information on finding FREE full text articles.
This article may also be located in paper journal collections available in many libraries. Use the Journal and Publication Information above to find the full article.
MeSH headings (categories)
This article was linked to the MeSH Headings (categories) shown below.
Note: Bold headings indicate primary MeSH headings or qualifiers.
Related articles
These are the most related articles currently in our database:
- Detecting plagiarism: Google could be the way forward.
28 Sep 2006 - Duplicate publication and 'paper inflation' in the Fractals literature.
29 Jun 2006 - Bioinformatics code must enforce citation.
4 Jun 2002 - Special report: taking on the cheats.
17 May 2005 - Chain letters & evolutionary histories.
30 May 2003
Related Article Map
Legend:
- FREE Full text Article.
- Abstract only.
- Title only. More help.
See a larger map of 100+ related articles.