JiwonDev

# ์ •๋ณด๊ฒ€์ƒ‰ ํšŒ๊ณ 

by JiwonDev

1. ์ •๋ณด๊ฒ€์ƒ‰๊ณผ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ์ฐจ์ด์ 

2. ์ž์—ฐ์–ด ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์˜ ๋™์˜์–ด ๋‹ค์˜์–ด ๋ฌธ์ œ

3. ์ ํ•ฉ์„ฑ ํŒ๋‹จ์˜ ๊ธฐ์ค€, ํŒ๋‹จํ•˜๊ธฐ ์–ด๋ ค์šด ์‚ฌ๋ก€

4. IR ์‹œ์Šคํ…œ์˜ ํฐ ๊ทธ๋ฆผ

์ƒ‰์ธ๋ชจ๋“ˆ, ์—ญํŒŒ์ผ ์ƒ‰์ธ(ํฌ์ŠคํŒ…๋ฆฌ์ŠคํŠธ), ๊ฒ€์ƒ‰๋ชจ๋“ˆ

5. ์งˆ์˜ ๋ฌธ์„œ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ (TF, IDF, CF)

6. ์งˆ์˜ ๋ฌธ์„œ ์œ ์‚ฌ๋„ ์ˆ˜์‹ ( TF-IDF / Length(D) )

7. ์ƒ‰์ธ ๋‹จ์œ„ (2-gram)

ํ•œ๊ตญ์–ด๋Š” ํ˜•ํƒœ์†Œ ๋ถ„์„ ๊ฐ™์€ ๊ฒƒ์ด ์—†์–ด๋„ 2-gram์ด ๋น„์Šทํ•œ ๋™์ž‘์„ ํ•œ๋‹ค.

8. ์ƒ‰์ธ ์ ˆ์ฐจ

์˜์–ด -> [๋ถˆ์šฉ์–ด ์ œ๊ฑฐ, ์–ด๊ฐ„์ƒ์„ฑ(stemming)]์ด ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ ˆ์ฐจ.

9. ์˜์–ด์—์„œ ์Šคํƒœ๋ฐ๊ณผ n-gram ํšจ๊ณผ

์˜์–ด์—์„œ๋„ ๋ฌธ์ž๋‹จ์œ„ 4,5-gram์ด ์˜์™ธ๋กœ ํšจ๊ณผ๊ฐ€ ์žˆ๋‹ค. (์Šคํƒœ๋ฐ์€ ๋‹จ์–ด ๋‹จ์œ„)

 

10. ๋ฌธ์„œ์˜ ํ‘œํ˜„

๋‹จ์–ด ๋ณด์ž๊ธฐ(a bog of words) ๋ฌธ์„œํ‘œํ˜„

์ •๋ณด๊ฒ€์ƒ‰์˜ ๋Œ€ํ‘œ์ ์ธ ๋ฌธ์„œํ‘œํ˜„ ๋ฐฉ์‹

 

11. ์ •๋ณด๊ฒ€์ƒ‰๋ชจ๋ธ 4๊ฐ€์ง€

๋ถˆ๋ฆฐ, ๋ฒกํ„ฐ๊ณต๊ฐ„, ํ™•๋ฅ , ์–ธ์–ด

 

12. ๋ฒกํ„ฐ๊ณต๊ฐ„๋ชจ๋ธ

BIM (๋ฐ”์ด๋„ˆ๋ฆฌ ์ธ๋””ํŽœ๋”์Šค ๋ชจ๋ธ) -> ์ด์ง„ ๋ฒกํ„ฐ๋กœ ํ•ด์„, ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„

์ฝ”์‚ฌ์ธ ๊ณ„์‚ฐ์€ ๋ฐ˜๋“œ์‹œ ํ•˜์‹ญ์‡ผ

13. ์ด์ง„๋ฒกํ„ฐํ‘œํ˜„, TF๋ฒกํ„ฐํ‘œํ˜„, TF-IDF ๋ฒกํ„ฐํ‘œํ˜„

 

14. SMART ๋ฒกํ„ฐ๊ณต๊ฐ„๋ชจ๋ธ (ํ…Œ์ด๋ธ”์„ ์™ธ์šธ ํ•„์š”๋Š” ์—†์–ด์š”)

lnc.ltc, bnn.bnn๋“ฑ

15. ์ •๋ณด๊ฒ€์ƒ‰ํ‰๊ฐ€(Evaluation)

์ •ํ™•๋ฅ (P) ์žฌํ˜„๋ฅ (R) F1(์กฐํ™”ํ‰๊ท  ๋ฐ˜๋ฐ˜์น˜ํ‚จ) 2PR/P+R

 

16. PRC ๊ทธ๋ž˜ํ”„์™€ ํ‘œ๋ฅผ ์ด์šฉํ•œ ํ‰๊ฐ€๋ฐฉ๋ฒ•

- Pre@k, R-pre, MAP, NDCG

 

17. ํŒŒ์ด์ฌ

PythonBasics.zip
0.00MB

 

IR_00_split (์–ด์ ˆ๋‹จ์œ„)

000_test.py
0.00MB

#1. ํŒŒ์ด์ฌ ์„ค์น˜
#- https://www.python.org/ => ๋‹ค์šด๋กœ๋“œ ํ›„ ์„ค์น˜(์„ค์น˜ ์ฒซ ํ™”๋ฉด์—์„œ PATH ๋“ฑ๋ก ์„ ํƒํ•  ๊ฒƒ)
 

#2. nltk ํŒจํ‚ค์ง€ ์„ค์น˜
#- ๋ช…๋ นํ”„๋กฌํ”„ํŠธ ์‹คํ–‰ ํ›„ ๋‹ค์Œ ๋ช…๋ น ์ž…๋ ฅํ•˜์—ฌ nltk ์„ค์น˜ (์ฐธ์กฐ: https://pypi.org/project/nltk/)
#pip install nltk 

#3. ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ test.py ํŒŒ์ผ๋กœ ์ €์žฅ ํ›„ ๋ช…๋ นํ”„๋กฌํ”„ํŠธ์—์„œ python test.py ์‹คํ–‰ํ•˜๋ฉด beautiful์˜ ์Šคํ…Œ๋ฐ ๊ฒฐ๊ณผ beauti๊ฐ€ ์ถœ๋ ฅ๋จ

from nltk.stem.porter import *
stemmer = PorterStemmer()
print(stemmer.stem('beautiful'))

#4. ์œ„ 3๋ฒˆ์˜ ์ ˆ์ฐจ๋ฅผ test.py๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ๋‹ค์Œ ๊ฐ ๋‹จ์–ด์˜ ์Šคํ…Œ๋ฐ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ํŒŒ์ด์ฌ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์‹œ์˜ค.
'''
automate
automates
automated
automatic
automatical
automatically
automating
automation
operate
operating
operates
operation
operative
operatives
operational
'''

IR_00_ngram (bi-gram)

IR_ngram_lnc.ltc (lnc.ltc)

IR_ngram_lncltc.py
0.00MB
IR_00_ngram.py
0.00MB

 

์ฝ”๋“œ์˜ ๊ฐ„๋‹จํ•œ ์ˆ˜์ •.(ex lnc.ltc๋กœ ์ˆ˜์ •ํ•ด๋ณด๊ฑฐ๋ผ!)

๋ธ”๋กœ๊ทธ์˜ ์ •๋ณด

JiwonDev

JiwonDev

ํ™œ๋™ํ•˜๊ธฐ