Please use this identifier to cite or link to this item:
https://rfos.fon.bg.ac.rs/handle/123456789/657| Title: | Projektovanje procesa klasterovanja pomoću paterna Designing clustering process with reusable components |
Authors: | Kirchner, Kathrin Delibašić, Boris Vukićević, Milan |
Keywords: | paterni;otkrivanje zakonitosti u podacima;klasterovanje;CRISP-DM;paterns;data mining;CRISP-DM;clustering | Issue Date: | 2010 | Publisher: | Univerzitet u Beogradu - Fakultet organizacionih nauka, Beograd | Abstract: | Tipičan proces otkrivanja zakonitosti u podacima (dejta majning, u daljem tekstu OZP), prema CRISP-DM metodologiji se sastoji od nekoliko faza, počevši od razumevanja poslovnog procesa i podataka, preko predprocesiranja, modelovanja i evaluacije. Za svaku od ovih faza, predstavljeno je nekoliko generičkih zadataka koje treba sprovesti. Kod rešavanja praktičnih problema, jako je teško odlučiti koji specijalizovani zadatak najviše odgovara odgovarajućoj generičkoj fazi. Razlog za ovakav problem leži najmanje u tri razloga. Kao prvo, postoji jako puno specijalizovanih zadataka u literaturi i njihovih implementacija u softverima za OZP. Drugo, dosta ovih zadataka je enkapsulirano u algoritmima i ne mogu se izvršavati nezavisno od algoritma. Kao treće, specijalizovani zadaci (ponovo upotrebljive komponente, u daljem tekstu PUK) nisu dobro organizovani. Na primer, nije lako odabrati odgovarajuću PUK za generički zadatak (pod-problem) konkretnog poslovnog problema. U ovom radu, predstavljamo predlog metodologije modelovanja, baziranog na principu 'belih kutija', koji podržava proces OZP. Takođe, dat je prikaz metodologije za probleme grupisanja podataka (u daljem tekstu klasterovanje) kao i predlog konkretnih paterna, zasnovanim na korišćenju PUK, za pod-probleme koji se često pojavljuju kod klasterovanja podataka, pred-procesiranja i post-procesiranja. A typical data mining process, as it is described e.g. in the CRISP-DM approach, consists of several phases starting from business and data understanding and proceeds with preprocessing, modeling and evaluation. For each of these phases, several generic tasks are described that have to be carried out. In practice, however, there are difficulties to decide which specialized task solves a generic task best. There are at least three reasons for this. First, a galore of specialized tasks is proposed in the literature and available in data mining software. Second, a lot of these tasks are encapsulated in algorithms, and can't be used independently of the algorithm. Third, specialized tasks (reusable components - RCs) are not well-organized, i.e. it is not easy to select the appropriate RC for a generic task (sub-problem). In this paper, we propose a white box modeling methodology that supports the design of the data mining process. Our paper concentrates on clus tering algorithms only. Thus, we propose RCs for commonly appearing sub-problems in clustering, as well as pre- and post-processing RCs. |
URI: | https://rfos.fon.bg.ac.rs/handle/123456789/657 | ISSN: | 1451-4397 |
| Appears in Collections: | Radovi istraživača / Researchers’ publications |
Show full item record
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.