Embedding으로 Llama2 응답 보정하기 (feat. langchain)

로컬머신에서 LlamaCpp 로 Llama2를 구동하고 Llama2 가 부족한 정보를 벡터디비로 보완해서 성능을 최적화 하는 과정을 정리해본다.

Embedding으로 Llama2 응답 보정하기 (feat. langchain)
Photo by Joakim Honkasalo / Unsplash

로컬머신에서 LlamaCpp 로 Llama2를 구동하고 Llama2 가 부족한 정보를 벡터디비로 보완해서 성능을 최적화 하는 과정을 정리해본다.

1. Llama2 설치하기

1) GGUF 버전의 Llama2 모델 다운로드 (LlamaCpp 변경사항 수정)

llama-2-7b-chat.Q4_K_M.gguf · TheBloke/Llama-2-7b-Chat-GGUF at main
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

하위 과정을 진행하기 위해 "TheBloke/Llama-2-7b-Chat-GGUF" 버전을 다운로드 하도록 한다.

2) LlamaCpp 사용하기

llama.cpp의 주요 목표 중 하나는 MacBook에서 4비트 정수 양자화(feat. GGML)를 사용하여 LLaMA 모델을 실행하는 것이라고 한다.

Llama.cpp | 🦜️🔗 Langchain
llama-cpp-python is a Python binding for llama.cpp.

3) Llama2 기본 모델 테스트 하기

from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain


template = """Question: {question}
Answer: Let's work this out in a step by step way to be sure we have the right answer."""
prompt = PromptTemplate(template=template, input_variables=["question"])

llm = LlamaCpp(
	# model_path: 로컬머신에 다운로드 받은 모델의 위치
    model_path="./llama-2-7b-chat.Q4_K_M.gguf",
    temperature=0.0,
    top_p=1,
    max_tokens=8192,
    verbose=True,
    # n_ctx: 모델이 한 번에 처리할 수 있는 최대 컨텍스트 길이
    n_ctx=4096 
)

llm_chain = LLMChain(prompt=prompt, llm=llm)

prompt = """
Among Korean dramas, please recommend 3 medical dramas about hospital life. 
When recommending, classify it by number and title, and describe the release year and cast.
"""

response = llm_chain.run(prompt)
print(response)
TaskVibes: ADHD Simple Manage - Apps on Google Play
ADHD 를 겪는 사람들을 위한 일정관리 앱

3-1) 기본모델에 한국 드라마 관련 질문하기 (영문)

Here are three Korean medical dramas that you might enjoy:

1. "White Tower" (2010) - This drama is set in a busy hospital and follows the lives of its doctors, nurses, and patients. It stars Lee Dong Wook, Kim Min Jung, and Jo Yeo Jeong.
2. "Dr. Jin" (2012) - This period medical drama takes place during the Joseon dynasty and follows a skilled physician who is transported to modern-day Korea. It stars Song Il Gok, Park Yoo Ha, and Kim Jae Woong.
3. "Good Doctor" (2013) - This drama tells the story of a young doctor with autism who struggles to connect with his colleagues and patients. It stars Lee Sun Kyun, Cho Seung Ki, and Park Jung Min.

All three dramas were well-received by audiences and critics alike, so you can't go wrong with any of them!

질문 : Among Korean dramas, please recommend 3 medical dramas about hospital life. 

3-2) 기본모델에 한국 드라마 관련 질문하기 (한글)

Here are three medical dramas from South Korea that I recommend, along with their titles, release years, and cast information:

1. "White Tower" (2010) - This drama is set in a busy hospital and follows the lives of its doctors and nurses as they navigate the challenges of their profession. The series stars Lee Dong-Wook, Kim Min-Jung, and Jo Yeo-Joo.
Release Year: 2010
Cast: Lee Dong-Wook, Kim Min-Jung, Jo Yeo-Joo
2. "Dr. Foster" (2017) - This drama is a psychological thriller that centers around a family's dark past and their struggles to move on from it. The series stars Yoon Shi-Yoon, Kim Hye-Ja, and Lee Sun-Kyun.
Release Year: 2017
Cast: Yoon Shi-Yoon, Kim Hye-Ja, Lee Sun-Kyun
3. "The Good Doctor" (2013) - This drama follows the story of a young surgeon with autism and Savant syndrome as he navigates his residency at a prestigious hospital. The series stars Park Shin-Hye, Kim Min-Jae, and Lee Sun-Kyun.
Release Year: 2013
Cast: Park Shin-Hye, Kim Min-Jae, Lee Sun-Kyun

I hope this helps! Let me know if you have any other questions.

질문 : 한국 드라마 중에서도 병원 생활을 다룬 의학 드라마 3편을 추천해줘

3-3) 기본모델이 알고 있는 한국드라마의 릴리즈 연도는 어떻게 될까

Here are the 3 most recently released Korean dramas based on my research:

1. "Vagabond" (2019) - This drama is about a stuntman who becomes embroiled in a conspiracy involving a plane crash that killed his wife and daughter. It stars Lee Seung-Gi, Bae Suzy, and Shin Sung-Rok.
2. "Hotel Del Luna" (2019) - This drama is about a hotel for ghosts, demons, and other supernatural beings. It stars IU, Yeo Jin-Goo, and Kim Dae-Myung.
3. "Strong Girl Bong-soon" (2017) - This drama is about a young woman with superhuman strength who becomes a bodyguard for a wealthy CEO. It stars Park Bo-Young, Park Hyung-Sik, and Ahn Hyo-Seop.

So there you have it! These are the 3 most recently released Korean dramas based on my research. Let me know if you need anything else!

질문 : List the titles of the 3 most recently released Korean dramas

TaskVibes: ADHD Simple Manage - Apps on Google Play
ADHD 를 겪는 사람들을 위한 일정관리 앱

2. Embedding

드라마 데이터셋 Embedding 하고 VectorDB FAISS 에 저장하기

1) 데이터셋 정의

Top 250 Korean Dramas (KDrama) Dataset
This CSV file contains data of the top 250 kdrama’s on the MyDramaList website.

지난 글에서 사용한 케글에 공개된 한국드라마 250건에 대한 데이터를 Embedding 하기로 했다.

from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter

loader = CSVLoader("./kdrama.csv")
documents = loader.load()

# 데이터를 불러와서 텍스트를 일정한 수로 나누고 구분자로 연결하는 작업
text_splitter = CharacterTextSplitter(
	chunk_size=1000, 
    chunk_overlap=0, 
    separator="\n"
    )
texts = text_splitter.split_documents(documents)

print(len(texts))

2) HuggingFace, Embeding 모델 (feat.intfloat/multilingual-e5-large)

intfloat/multilingual-e5-large · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

intfloat/multilingual-e5-large Embedding 모델은 94개국어의 텍스트를 임베딩하는 모델이다.

3) Embeding 하고 FAISS VectorDB 에 저장/로드 하기

from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

# 임베딩 모델 로드
embeddings = HuggingFaceEmbeddings(model_name="intfloat/multilingual-e5-large")

# 문서에 있는 텍스트를 임베딩하고 FAISS 에 인덱스를 구축함
index = FAISS.from_documents(
	documents=texts,
	embedding=embeddings,
	)

# faiss_db 로 로컬에 저장하기
index.save_local("faiss_db")
# faiss_db 로 로컬에 로드하기
docsearch = FAISS.load_local("faiss_db", embeddings)
TaskVibes: ADHD Simple Manage - Apps on Google Play
ADHD 를 겪는 사람들을 위한 일정관리 앱

3. Langchain, RetrievalQA 로 질의하기

VectorDB 로 Embedding 한 Llama2 모델에게 한국 드라마 질문하기

from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain.retrievers import ContextualCompressionRetriever
from langchain.chains import RetrievalQA

# 유사도 0.7로 임베딩 필터를 저장
# 유사도에 맞추어 대상이 되는 텍스트를 임베딩함
embeddings_filter = EmbeddingsFilter(
    embeddings=embeddings, 
    similarity_threshold=0.70
)
# 압축 검색기 생성
compression_retriever = ContextualCompressionRetriever(
	# embeddings_filter 설정
    base_compressor=embeddings_filter, 
    # retriever 를 호출하여 검색쿼리와 유사한 텍스트를 찾음
    base_retriever=docsearch.as_retriever()
)
# RetrievalQA 클래스의 from_chain_type이라는 클래스 메서드를 호출하여 질의응답 객체를 생성
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=compression_retriever)

3-1) 기본모델에 했던 한국 드라마 관련 질문 다시하기 (영문)

 Sure! Based on your provided information, here are three medical dramas about hospital life that I can recommend:
 
1. Hospital Playlist (2020) - This drama aired in 2020 and stars Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim Dae Myung, Jeon Mi Do, and Shin Hyun Bin. It has a rating of 9.1 and is classified as a friendship, romance, life, medical drama with strong female leads and multiple couples.
2. Hospital Playlist 2 (2021) - This drama aired in 2021 and stars Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim Dae Myung, Jeon Mi Do, and Shin Hyun Bin. It has a rating of 9.1 and is classified as a workplace, friendship, romance, life, medical drama with strong female leads and multiple couples.
3. Hospital Ship (2017) - This drama aired in 2017 and stars Ji Chang Wook, Choi Soo Young, Sung Dong Il, Won Ji An, Yang Hee Kyung, and Kil Hae Yeon. It has a rating of 8.6 and is classified as a romance, life, drama, medical drama with a tearjerker plot and strong female lead.

I hope this helps! Let me know if you have any other questions.

질문 : Among Korean dramas, please recommend 3 medical dramas about hospital life.

Embedding 한 데이터를 중심으로 20~21년 드라마도 잘 보인다.

3-2) 기본모델에 했던 한국 드라마 관련 질문 다시하기 (한글)

Yes, I can definitely recommend three Korean medical dramas that focus on hospital life. Here are my recommendations, categorized by number and title:

1. Hospital Playlist (2020) - This drama follows the lives of five doctors working at a hospital, exploring themes of friendship, romance, and the challenges of working in a high-pressure medical environment. Starring Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim Dae Myung, Jeon Mi Do, and Shin Hyun Bin. Released in 2020.
2. Doctor Prisoner (2019) - This drama centers around an ace doctor who becomes embroiled in a medical malpractice incident and is subsequently suspended from practicing medicine. He then decides to work at a prison, where he plots to gain allies and eventually exact revenge on the hospital that wronged him. Starring Joo Jin Mo, Lee Jung Jae, and Kim Byeong Chul. Released in 2019.
3. Hospital Playlist 2 (2021) - This is a sequel to the original Hospital Playlist, following the same group of doctors as they face new challenges and struggles in their personal and professional lives. Starring Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim Dae Myung, Jeon Mi Do, and Shin Hyun Bin. Released in 2021.

I hope these recommendations are helpful! Let me know if you have any other questions.

질문 : 한국 드라마 중에서도 병원 생활을 다룬 의학 드라마 3편을 추천해줘

영문 질의와 결과가 조금 다르긴 한데, 한글도 Embedding 한 데이터를 중심으로 20~21년 드라마도 잘 보인다.

3-3) Embedding 된 한국드라마의 릴리즈 연도는 어떻게 될까

Embedding 된 데이터가 2022 가 최신이라 해당 질의에 대한 응답도 2022 년도 작품이 최신으로 나온다.

TaskVibes: ADHD Simple Manage - Apps on Google Play
ADHD 를 겪는 사람들을 위한 일정관리 앱
뉴스 기사 수집해서 llama2 에 질의하기 (feat. Silicon 맥북 with GPU)
M1/M2 같은 Silicon Mac 계열에서 GPU 를 사용하여 llama2 를 파인튜닝 해보고, 뉴스기사를 Embedding 하여 얼마나 만족스러운 답변을 하는지를 테스트 해봤다.