yeon's

[프로그래머스 정규식 활용]

yeonjins — Fri, 9 May 2025 15:15:54 +0900

문자열 다루기 기본

문자열 길이가 4 or 6 이면서, 모두 숫자로만 구성되있는지 확인하는 문제

[내 풀이]

- ^ : 시작 부분 문자열

- \d{4} | \d{6} : \d 숫자가 4개 또는 6개 인지 확인

- $ : 끝 부분 문자열

import re
def solution(s):
    return bool(re.search(r'^(\d{4}|\d{6})$', s))

[다른 풀이]

- re.search 에서 처음과 끝에 해당하는 기호인 ^와 $를 활용하는 대신 re.fullmatch를 활용

import re
def solution(s):
    return bool(re.fullmatch(r'\d{4}|\d{6}',s))

옹알이 (1)

문자열 "aya", "ye", "woo", "ma" 이렇게 4가지를 조합해 완성할 수 있는지 여부 확인하는 문제

[내 풀이]

- re.sub : 해당되는 문자열을 공백으로 변경

- 모두 변경되어 문자열이 공백이 되면 4가지 조합으로 완성 가능한 것이므로 +1

import re
def solution(babbling):
    answer = 0
    for bab in babbling:
        if not re.sub("(aya|ye|woo|ma)","",bab):
            answer += 1
    
    return answer

+ re 라이브러리 공부

1. 대문자, 소문자 상관없이 변경하길 원하면 flags = re.IGNORECASE 활용하기

re.sub('(aya|ye|woo|ma)','',bab, flags=re.IGNORECASE)

2. 대체 하고 싶은 횟수가 2개이면 count=2 활용하기

re.sub('(aya|ye|woo|ma)','',bab, count=2)

신규 아아디 추천

구현하라는 그대로 하면 되는 문제

[내 풀이]

- 몇 개 이상일때는 [ ] 로 감싸주기 : 그냥 . 을 넣을 경우 문자 하나 아무거나가 되기 때문에 [.]

- 숫자 개수를 알려주는 {2,}는 [ ] 밖으로 빼기!

([ ] 안에 {,}가 들어가면 문자 하나하나로 인식됨)

- [ ] 안에 있는 문자는 개별 문자로 매칭이 됨

import re
def solution(new_id):
    new_id = new_id.lower()
    new_id = re.sub(r"[^a-z0-9-_.]",'', new_id)
    new_id = re.sub("[.]{2,}",".", new_id)
    new_id = re.sub("^[.]|[.]$","",new_id)
    if not new_id : new_id = 'a'
    if len(new_id)>=16 : new_id = new_id[:15]
    new_id = re.sub("[.]$","",new_id)
    if len(new_id)<=2:
        while len(new_id)<3:
            new_id = new_id + new_id[-1]
    print(new_id)
    return new_id

[3차] 파일명 정렬

파일명을 정렬하는 문제

1,10,2,3,4 이렇게 정렬되는 것을 막고, 01,1도 1,01 순으로 정렬되도록

[내 풀이]

import re
def solution(files):
    allFiles = []
    for file in files:
        tmpFile = []
        tmpFile.append(re.search('^[^0-9]+',file).group())
        tmpFile.append(re.search('[0-9]{1,5}',file).group())
        tmpFile.append(file[len(tmpFile[0])+len(tmpFile[1]):])
        allFiles.append(tmpFile)
        
    allFiles.sort(key=lambda x: (x[0].lower(), int(x[1])))
    allFiles = [''.join(f) for f in allFiles]
    return allFiles

[프로그래머스 level.2] 정답률 70 이상

yeonjins — Fri, 2 May 2025 03:15:31 +0900

멀리뛰기

[내 풀이]

1. 경우의 수를 몇가지 계산해보면 피보나치인 것을 알 수 있음

- 주의할 점은 n이 1 또는 2일때!

def solution(n):
    if n==1:
        return 1
    if n==2:
        return 2
    
    arr = [0]*(n+1)
    arr[1], arr[2] = 1, 2
    
    for i in range(3, n+1):
        arr[i] = arr[i-1]+arr[i-2]
    return arr[n]%1234567

N의 최소 공배수

[내 풀이]

1. 배열 내의 최댓값의 배수 중 남은 모든 값으로 나눠 떨어지는 값 구하기

- 최댓값의 배수는 1,2,3.. 순으로 쭉 곱하기

def solution(arr):
    n = 1
    max_num = max(arr)
    arr.remove(max_num)
    
    while True:
        n+=1
        cnt = 0
        for i in arr:
            if (max_num*n) % i != 0:
                break
            else:
                cnt +=1
                if cnt == len(arr):
                    return max_num*n

다음 큰 숫자

[내 풀이]

1. 이진수로 바꿔 1의 개수 구해두고,

2. 십진수에서 1씩 더해가며 이진수로 변환해 1의 개수 비교하기

+ re 모듈의 findall을 활용하면 효율성에서 시간초과가 난다. count가 훨씬 좋음!

def solution(n):
    nb = format(n,'b')
    while True:
        n += 1
        if nb.count('1') == format(n,'b').count('1'):
            return n

이진 변환 반복하기

[내 풀이]

1. 이진수에서 0 제거 → 제거된 개수 저장

2. 길이를 이진수로 변환

3. 위 과정 반복

def solution(s):
    cnt_0 = 0
    cnt = 0
    while int(s)>1:
        # 0 제거
        prev = len(s)
        s = s.replace('0','')
        len_s = len(s)
        cnt_0 += (prev-len_s)
        # 변환
        s = format(len_s,'b')
        cnt += 1
    
    return [cnt, cnt_0]

최솟값 만들기

[다른 분 풀이]

- (큰 값 x 작은 값)이 되도록

def solution(A,B):
    return sum([i*j for i,j in zip(sorted(A),sorted(B,reverse=True))])

귤 고르기

[내 풀이]

1. 시간 초과 → 정확성: 82.4

def solution(k, tangerine):
    dic={}
    answer = 0
    for t in tangerine:
        dic[t] = dic.get(t,0) + 1
        
        if dic[t] > k:
            return 1
    while k>0:
        max_key = max(dic, key=dic.get)
        k -= dic[max_key]
        del dic[max_key]
        answer += 1
        if k <=0:
            return answer

2. 정렬을 미리 해주니 모두 통과

아마 위 코드에서는 max를 탐색과정을 반복하니 시간 초과가 뜬 것 같다.

def solution(k, tangerine):
    dic={}

    for t in tangerine:
        dic[t] = dic.get(t,0) + 1
        
        if dic[t] > k:
            return 1
    
    lst = sorted(list(dic.values()), reverse=True)
    answer, cnt = 0, 0
    for l in lst:
        answer += 1
        cnt += l
        
        if cnt >= k:
            return answer

피보나치 수

[내 풀이]

- 피보나치 형태는 위에서 한 번 다룸

피보나치 문제는 이 형태로 푸는 것이 가장 편한 것 같다.

def solution(n):
    arr = [0]*(n+1)
    arr[0], arr[1] = 0, 1
    for i in range(2, n+1):
        arr[i] = arr[i-1]+arr[i-2]   
    return arr[n]%1234567

카펫

[내 풀이]

def solution(brown, yellow):
    x_y = (brown-4)//2
    # 한줄일 경우 : x+y = yellow+1
    if x_y == yellow+1:
        return [yellow+2,3]
    
    # 두줄 이상일 경우 : xy = yellow
    else: 
        for x in range(2, x_y//2+1):
            if (x*(x_y-x)==yellow):
                return [x_y-x+2, x+2]

[프로그래머스 level.1] 기출문제

yeonjins — Wed, 30 Apr 2025 21:14:08 +0900

[PCCE 기출문제] 9번 / 이웃한 칸

[내 풀이]

def solution(board, h, w):
    answer = 0
    # 현재 색
    cur_color = board[h][w]
    dx = [0, 0, -1, 1]
    dy = [-1, 1, 0, 0]
    for i in range(4):
        nh, nw = h+dx[i], w+dy[i]
        if 0 <= nh < len(board) and 0 <= nw < len(board[0]):
            if board[nh][nw] == cur_color:
                answer+=1
    
    return answer

[PCCE 기출문제] 9번 / 지폐 접기

[내 풀이]

- 지폐의 길이가 작은 쪽 > 지갑의 작은 쪽 , 지폐의 길이가 큰 쪽 > 지갑의 큰 쪽이면 지폐의 큰 쪽을 반 나누기

def solution(wallet, bill):
    answer = 0
    while (min(bill)>min(wallet)) or (max(bill)>max(wallet)):
        if bill[0]>bill[1]:
            bill[0] = bill[0]//2
        elif bill[0]<bill[1]:
            bill[1] = bill[1]//2
        answer += 1
    return answer

[1차] 비밀지도

[내 풀이]

- 첫 if 문까지 : 10진수 → 2진수 변경 format(10진수,'b')

길이가 짧은 경우 앞에 '0'을 붙여줘야 함

- 다음 for문 : 비교해서 모두 공백인 경우

def solution(n, arr1, arr2):
    lenmap = len(format(max(arr1+arr2), 'b'))
    answer = []
    for a1, a2 in zip(arr1, arr2):
        a1 = format(a1,'b')
        a2 = format(a2,'b')
        if (len(a1) != lenmap) | (len(a2) != lenmap):
            a1 = '0'*(lenmap-len(a1)) + a1
            a2 = '0'*(lenmap-len(a2)) + a2
        
        row = ''
        for idx in range(len(a1)):
            if (a1[idx]=='0') and (a2[idx]=='0'):
                row += ' '
            else:
                row += '#'
        answer.append(row)
        
    return answer

[PCCE 기출문제] 10번 / 데이터 분석

[내 풀이]

- 비교 기준, 정렬 기준 인덱스 구하고

- 비교 기준에 따라 데이터 얻고

- 정렬 기준에 따라 정렬

def solution(data, ext, val_ext, sort_by):
    val = ['code','date','maximum','remain']
    val_idx = val.index(ext)
    sort_by_idx = val.index(sort_by)
    
    answer = []
    for d in data:
        if d[val_idx] < val_ext:
            answer.append(d)
    
    answer.sort(key = lambda x: x[sort_by_idx])
    
    return answer

[PCCE 기출문제] 10번 / 공원

[GPT 풀이,,]

- all( ) 함수로 사이즈 안의 모든 좌표가 '-1' 일 경우에만 size를 리턴

def solution(mats, park):
    n, m = len(park), len(park[0])
    mats.sort(reverse=True)  # 큰 돗자리부터 시도

    for size in mats:
        for i in range(n - size + 1):
            for j in range(m - size + 1):
                if all(park[x][y] == '-1' for x in range(i, i+size) for y in range(j, j+size)):
                    return size
    return -1

[PCCP 기출문제] 1번 / 붕대 감기

[내 풀이]

- 더 효율적인 코드가 있을 것 같다,,

def solution(bandage, health, attacks):
    # 최대 시간
    curr_h = health
    all_time = attacks[-1][0]
    cnt = 0
    for time in range(all_time+1):
        # 공격 받을때
        if time == attacks[0][0]:
            minus = attacks.pop(0)
            curr_h -= minus[1]
            cnt = 0
        # 공격 없을때
        else:
            # 기준보다 작으면 채우기
            if curr_h < health:
                curr_h += bandage[1]
                cnt += 1
                if cnt == bandage[0]:
                    curr_h += bandage[2]
                    cnt = 0
            # 기준 넘으면 가만히
            else:
                curr_h = health
                
        if curr_h <= 0:
            return -1
        # 0이하로 떨어지면 return
        
    return curr_h

택배 상자 꺼내기

[내 코드]

- 풀긴 풀었는데 왜 전부 맞춘건지 모르겠다..

간단하게 풀이를 적어보면

1. 고르는 택배가 있는 층 구하기

2. 총 몇층까지 택배가 쌓여있는지 구하기

2-1. 마지막 층에 남는게 없으면 : return

2-2. 마지막 층에 짜투리가 있으면 : [ 1 1 0 0 0 ] 이런식으로 마지막 층만 리스트를 만들기

홀수 층이면 왼쪽부터 채우고, 짝수 층이면 오른쪽 부터 채우고

3. 고르는 택배가 있는 위치 정하기 (왼쪽부터)

홀수 층이면 왼쪽부터 나머지 그대로, 짝수 층이면 w+1 빼기 나머지

4. 위에서 구한 마지막 층 리스트에 고르는 택배 위치가 0인지, 1인지에 따라 return

def solution(n, w, num):
    # 고르는 택배가 몇 층에 있는지
    person_floor = num//w if num%w==0 else (num//w)+1
    
    # 총 몇 층인지
    if n%w==0: 
        floor = n//w
        return (floor-person_floor)+1 #층수가 나눠 떨어지면 return
    else: # 나머지가 있으면
        floor = (n//w)+1
        upper_floor = [0 for _ in range(w)]
        if floor%2 != 0: # 왼쪽부터 채우기
            upper_floor[:n%w] = [1]*(n%w)
        else: # 오른쪽부터 채우기
            upper_floor[:n%w] = [1]*(n%w)
            upper_floor = upper_floor[::-1]
    
    # 고르는 택배가 왼쪽부터 몇 번째에 있는지
    if person_floor%2!=0: # 홀수층: 왼부터
        pick_floor = w if num%w==0 else num%w
    else: # 짝수층 : 우부터
        pick_floor = 1 if num%w==0 else (w+1)-(num%w)
    
    if upper_floor[pick_floor-1] == 0: # 맨 위층에는 택배가 없으면
        return floor-person_floor
    elif upper_floor[pick_floor-1] == 1:  # 맨 위층에 택배가 있으면
        return floor-person_floor+1

[프로그래머스 level.1] 정답률 70~74

yeonjins — Tue, 29 Apr 2025 23:50:37 +0900

1. 가장 가까운 같은 글자

[내 풀이]

- lst.index('a') : 'a'와 같은 원소 인덱스 반환 (젤 첫번째)

def solution(s):
    answer = []
    for idx, alph in enumerate(s):
        try:
            answer.append(list(s[:idx][::-1]).index(alph)+1)
        except:
            answer.append(-1)
    return answer

[다른 풀이]

- 딕셔너리 활용 : 최근 인덱스를 저장해두고, 해당 인덱스 만큼의 차이 반환

def solution(s):
    answer = []
    d = {}
    
    for idx, alph in enumerate(s):
        if alph not in d:
            answer.append(-1)
        else:
            answer.append(idx-d[alph])
        d[alph] = idx

    return answer

- 이중 for 문 활용 : 현재 원소와 그 앞 원소들을 비교

def solution(s):
    answerList = [-1]*len(s)
    for i in range(len(s)):
        for j in range(i):
            if (s[i] == s[j]):
                answerList[i] = i-j          
    return answerList

2. 시저 암호

[내 풀이]

- 최대 한 바퀴까지만 허용 가능한 풀이라서 한정적

def solution(s, n):
    answer = []
    for alph in s:
        if alph.isupper(): # 65~90
            if ord(alph)+n > ord('Z'):
                modify = chr(ord('A') + n - (ord('Z')-ord(alph)) - 1)
            else:
                modify = chr(ord(alph)+n)
            
        elif alph.islower(): # 97~122
            if ord(alph)+n > ord('z'):
                modify = chr(ord('a') + n - (ord('z')-ord(alph)) - 1)
            else:
                modify = chr(ord(alph)+n)
        else:
            modify = ' '
        answer.append(modify)
    return ''.join(answer)

[다른 풀이]

- 여러 바퀴도 허용가능한 풀이

def solution(s, n):
    s = list(s)
    
    for idx in range(len(s)):
        if s[idx].isupper():
            s[idx] = chr((ord(s[idx]) + n - ord('A')) % 26 + ord('A'))
        elif s[idx].islower():
            s[idx] = chr((ord(s[idx]) + n - ord('a')) % 26 + ord('a'))
    
    return ''.join(s)

3. 두 개 뽑아서 더하기

[내 풀이]

def solution(numbers):
    all_sum = []
    for idx in range(len(numbers)):
        all_sum.extend(numbers[idx]+n for n in numbers[idx+1:])
    return sorted(set(all_sum))

4. K번째 수

[내 풀이]

def solution(array, commands):
    answer = []
    for problems in commands:
        problems_array = array[problems[0]-1 : problems[1]]
        problems_array = sorted(problems_array)
        answer.append(problems_array[problems[2]-1])
    return answer

[다른 풀이]

def solution(array, commands):
    answer = []
    for c in commands:
        arr = array[c[0]-1:c[1]]
        arr.sort()
        answer.append(arr[c[2]-1])
    return answer

5. 숫자 문자열과 영단어

[내 풀이]

def solution(s):
    dic = {'zero':0,'one':1,'two':2,'three':3,'four':4,'five':5,
          'six':6,'seven':7,'eight':8,'nine':9}
    for d in dic:
        s = s.replace(d,str(dic[d]))
    return int(s)

에어비앤비 호스트 가입 추천인 등록 (4만원) 링크

yeonjins — Sun, 5 Jan 2025 02:15:46 +0900

에어비앤비를 열심히 운영중인데 첫 호스트를 가입하려고 하는 사람에게 추천하면 가입자와 추천인 모두 추가 금액을 받을 수 있다는 것을 이제야 알게되었네요,,

에어비앤비 호스트 추천 링크 아래 남겨둡니다!

http://www.airbnb.co.kr/r/247d762

에어비앤비를 운영 예정중인 예비 호스트분이라면 이 링크를 통해 추가로 4만원 받아가세요!

1. 위 링크를 통해 호스트, 숙소를 등록하고

2. 호스트 등록 후 90일 내로 예약이 되어 첫 숙박 완료 후 14일이 지나면

3. 보너스가 계정으로 지급된다고 합니다!

위 링크를 통해서 가입자는 4만원을 받으실 수 있습니다.

+ 추가로 알게된 정보인데 첫 예약때 숙박일수 상관없이 한화 13만원 이상 (미화 100달러) 예약건에만 지급된다고 합니다.

댓글 주시면 호스트끼리 서로 정보도 공유합시다~

에어비앤비 호스트 추천 링크

http://www.airbnb.co.kr/r/247d762

에어비앤비 호스팅을 통해 수익을 창출할 수 있도록 Anna님이 초대하셨습니다

에어비앤비에서는 신뢰할 수 있는 게스트를 만나고, 안전하게 대금을 수령하며, 편안한 마음으로 호스팅을 즐길 수 있습니다. 호스팅을 통해 얻을 수 있는 수익을 알아보세요.

www.airbnb.co.kr

ValueError: The following `model_kwargs` are not used by the model: ['image_sizes'] (note: typos in the generate arguments will also show up in this list) huggingface llava 문제 해결

yeonjins — Wed, 30 Oct 2024 20:38:00 +0900

허깅페이스에서 오피셜 라마 경로 llava-hf/llava-v1.6-vicuna-7b-hf 를 사용하는데 model generate 중에 오류가 났다.

ValueError: The following `model_kwargs` are not used by the model: ['image_sizes'] (note: typos in the generate arguments will also show up in this list)

해결방법은 2가지인 것 같다.

1. 첫 번째는 아래 링크에서 확인하자.

https://github.com/haotian-liu/LLaVA/issues/1131

요약하자면 llava-hf에서 제공하는 경로 말고, liuhaotian에서 제공하는 모델을 사용하라고 한다. (llava 모델 경로에 버그가 있다나)

하지만 나는... 논문에서 말하는 실험 환경과 똑같이 세팅해야 했기에(핑계) 그리고 이미 모델을 로컬로 다운받아왔기 때문에 기다림의 인내심이 바닥났다. 그래서 새로 다운받을 수 없었다.

따라서 두 번째 방법으로 해결하였다.

2. 순차적으로 해결

1) 모델이 ['image_sizes']를 받지 않는다는 말이니 직접 삭제했다.

즉, processor를 통해 만든 값(모델의 인풋으로 들어갈 값)에서 image_sizes 키에 해당하는 부분을 삭제했다.

inputs = processor(images=image1, text=text_prompt, return_tensors="pt").to("cuda:0")
for k,v in inputs.items():
    print(k,v.shape)
    
input_ids torch.Size([1, 38])
attention_mask torch.Size([1, 38])
pixel_values torch.Size([1, 5, 3, 336, 336])
image_sizes torch.Size([1, 2])

del inputs['image_sizes']

위 문제를 해결하니 다음 오류가 났다.

RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 5, 3, 336, 336]

2) 딱 보니.. [배치, 채널, 가로, 세로] 로 넣어야 할 것 같은데 차원이 하나가 더 들어가있다.

따라서 직접 패키지 경로로 들어가 수정하였다.

경로는 대충 이렇다. 아마 거의 다들 비슷할 것이다.

.../miniconda3/envs/yeonjin/lib/python3.8/site-packages/transformers/models/llava_next/image_processing_llava_next.py

확인해보니 get_image_patches 함수 마지막 부분에 pathes를 더해주는 부분에서 4가 더해지는 것을 발견했다.

따라서 patches 를 빼주고, resized_original_image만 출력할 수 있게 바꿔주었다. (그냥 return만 바꿈..)

resized_original_image = resize(
    image,
    size=size,
    resample=resample,
    data_format=data_format,
    input_data_format=input_data_format,
)
print('### ---- patches 개수 확인 >> ', len(patches))

image_patches = [resized_original_image] + patches

return [resized_original_image]   # <- image_patches를 바꾼 것임

3) 마지막으로 pixel_values 차원만 수정해주면 된다.

[1, 1, 3, 336, 336] → [1, 3, 336, 336]

inputs['pixel_values']=inputs['pixel_values'][0]

정말 다행스럽게도 잘 돌아가주는 것 같다 ㅎ.ㅎ

USER:
What is shown in this image? ASSISTANT: The image appears to be a scatter plot or a heat map that compares the performance of different models or algorithms on a task. The x-axis represents different models or algorithms, and the y-axis represents some sort of performance metric, which could be accuracy, F1 score, or another evaluation metric.

Each point on the plot represents the performance of a specific model or algorithm on the task. The color intensity indicates the level of performance, with warmer colors (reds and yellows) suggesting better performance and cooler colors (blues) suggesting worse performance.

Warning: Permanently added 'node000' (000) to the list of known hosts. 오류 해결

yeonjins — Mon, 30 Sep 2024 23:01:20 +0900

우리 학과 서버는 SLURM으로 GPU를 할당받아 사용한다.

SSH [NODELIST(REASON)] 코드로 내가 할당받은 노드에 접속하는데 처음에 아래와 같은 에러가 났다.

Warning: Permanently added 'node00' (000) to the list of known hosts.

보니까 SSH로 인스턴스 연결을 시도할 때 나타나는 오류인데... Windows용으로 컴파일된 OpenSSH 클라이언트가 known_hosts 파일을 확인하지 않기 때문에 발생한다고 한다.

해결방법은 ~/.ssh/config 파일에 아래 코드를 넣으면 된다.
UserKnownHostsFile ~/.ssh/known_hosts

백준 기초 (파이썬)

yeonjins — Thu, 19 Sep 2024 20:08:35 +0900

15552번

input() 말고 map(int,sys.stdin.readline().split()) 이렇게 받으면 더 빠름

import sys

N = int(sys.stdin.readline())

for _ in range(N):
  a, b = map(int,sys.stdin.readline().split())
  print(a+b)

10989번 수 정렬하기 3

기본 sort를 사용하면 메모리초과 뜸

int의 크기 = 4byte

최대 10,000,000라고 가정하면 40,000,000byte = 40MB 임

근데 8MB로 메모리 제한을 줌

따라서 계수 정렬 방법을 이용해야함

(참고: https://kill-xxx.tistory.com/entry/python-%EA%B3%84%EC%88%98%EC%A0%95%EB%A0%AC )

계수 정렬 방법 : 리스트의 인덱스를 활용함

1. 원소가 모두 0으로 이루어진 리스트 생성

2. 숫자가 들어오면 리스트에서 해당 인덱스(=숫자)에 +1을 해줌

3. 프린트 : 리스트의 원소가 0이면 해당 인덱스 숫자는 없는 것이므로 넘어가고, 0이 아닐경우 해당 숫자 개수만큼 인덱스를 출력함

import sys
N = int(sys.stdin.readline())
lst=[0]*10001

for _ in range(N):
    lst[int(sys.stdin.readline())]+=1

for idx in range(len(lst)):
    if lst[idx]!=0:
        for _ in range(lst[idx]):
            print(idx)

2577번

숫자의 개수

마찬가지로 계수 정렬 방법으로 풀었음

import sys
n=1
for _ in range(3):
    n*=int(sys.stdin.readline())


lst=[0]*10
for idx in str(n):
  lst[int(idx)]+=1

for i in range(10):
  print(lst[i])

다른 방법

import sys
n=1
for _ in range(3):
    n*=int(sys.stdin.readline())


result = list(str(n))
for i in range(0,10):
    count = 0
    for num in result:
        if i == int(num):
            count += 1
    print(count)

둘 다 메모리, 시간이 똑같음

따라서 굳이 헷갈리게 할 계수 사용하지 말고, 아래 방법으로 하는게 좋겠다..

스택, 큐

1158번 요세푸스 문제

간단한 문제인데 3달전에는 왜 이렇게 오래걸렸는지 모르겠다.

파이썬 기본 리스트를 사용하면 런타임에러가 나서 deque를 꼭 사용해야 한다.

내 풀이

from collections import deque
N, K = map(int,input().split())
lst=deque([i+1 for i in range(N)])

s=1

print('<',end='')
while lst:
  if s%K==0:
    if len(lst)==1:
        print(f'{lst.popleft()}>')
    else:
      print(f'{lst.popleft()}, ', end='')
  else:
    lst.append(lst.popleft())
  s+=1

정석(?) 풀이

from collections import deque

N, k = map(int, input().split())

lst = deque([i+1 for i in range(N)])
answer = []

while lst:
    for _ in range(k-1):
        lst.append(lst.popleft())
    answer.append(lst.popleft())
print(str(answer).replace('[', '<').replace(']', '>'))

9012번 괄호

정석(?)풀이와 걸린 시간이 같다.

내 풀이

w.replace 를 w에 안넣어줘서 자꾸 에러가 났다.

N = int(input())

for _ in range(N):
  w = input()
  
  # 홀수개면 NO
  if len(w)%2 != 0:
    print('NO')
  else:
    for _ in range(len(w)//2):
      w = w.replace('()','')
    if w != '':
      print('NO')
    else:
      print('YES')

정석(?)풀이

import sys
num = int(sys.stdin.readline())

for _ in range(num):
    inp = sys.stdin.readline().rstrip()  # 마지막 \n 제거
    
    while '()' in inp:
        inp = inp.replace('()','')
    if inp:
        print('NO')
    else:
        print('YES')

9093번 단어 뒤집기

내 옛날 풀이가 훨씬 간결한데 걸린 시간은 3배가 넘는다.

내 풀이

N = int(input())

for _ in range(N):
  sent = input().split()
  result=''
  for i in sent:
    result += i[::-1] + ' '
  print(result)

옛날 풀이

n = int(input())

for _ in range(n):
    a = input()
    answer = ' '.join([i[::-1] for i in a.split()])
    print(answer)

DFS / BFS

1260번 DFS와 BFS

N,M,V = map(int,input().split())
graph = [[] for _ in range(N+1)]
for _ in range(M):
  a, b = map(int,input().split())
  graph[a].append(b)
  graph[b].append(a)

# 그래프 내에서 작은수부터 탐색하도록
for i in graph:
  i.sort()

# DFS 코드
visited=[0]*(N+1)
def dfs(graph, v, visited):
  print(v, end=' ')
  visited[v]=True
  
  for i in graph[v]:
    if visited[i] != True:
      dfs(graph, i, visited)
dfs(graph, V, visited)
print()

# BFS 코드
from collections import deque
visited=[0]*(N+1)
def bfs(graph, start, visited):
  cur = deque([start])
  visited[start]=True
  while cur:
    v = cur.popleft()
    print(v, end=' ')
    for i in graph[v]:
      if visited[i] != True:
        cur.append(i)
        visited[i]=True
bfs(graph, V, visited)

좀 더 깔끔한 코드를 찾았다

내 코드는 292ms 이 코드는 60ms이다..

보니까 내 코드에서 input() 을 sys 로 변경하니 60ms가 나왔다.

역시 잊지 말아야겠다 sys.stdin.readline().split()

import sys
from collections import deque

# 입력
n, m, v = map(int, sys.stdin.readline().split())
graph = [[] for _ in range(n+1)]
visited = [False] * (n + 1)
# 인접리스트 만들기
for _ in range(m):
    a, b = map(int, sys.stdin.readline().split())
    graph[a].append(b)
    graph[b].append(a)
# sorting
for i in range(1, n+1):
    graph[i].sort()
    
def dfs(n):
    print(n, end=' ')
    visited[n] = True
    for i in graph[n]:
        if not visited[i]:
            dfs(i)

def bfs(n):
    visited[n] = True
    queue = deque([n])
    while queue:
        v = queue.popleft()
        print(v, end= ' ')
        for i in graph[v]:
            if not visited[i]:
                queue.append(i)
                visited[i] = True


dfs(v)
visited = [False] * (n + 1)
print()
bfs(v)

아래는 런타임에러가 나는디... 왜?

N,M,V = map(int,input().split())
graph = [[] for _ in range(N+1)]

for _ in range(M):
  a, b = map(int,input().split())
  graph[a].append(b)
  graph[b].append(a)

for i in graph:
  i.sort()

visited=[False]*(N+1)
def dfs(v):
  print(v, end=' ')
  visited[v]=True
  for i in graph[v]:
    if not visited[i]: #방문하지 않았다면
      dfs(i)
dfs(V)

print('')
from collections import deque
def bfs(v):
  cur = deque([n])
  visited[n]=True
  while cur:
    v = cur.popleft()
    print(v, end=' ')
    for i in graph[v]:
      if not visited[i]: #방문하지 않았다면
        cur.append(i)
        visited[i]=True
        
visited=[False]*(N+1)
bfs(graph, V, visited)

2667번 단지번호붙이기

1012번 유기농배추

얼음틀에 얼음 얼리기와 비슷한 문제

인접한 부분이 1이면 연결된 것이라 하나로 통합→ 이렇게 통합된 지역이 몇 개인지 구하는 문제

dfs bfs 문제 나만 어려워..?

재귀 깊이를 설정해주는 코드를 모를때는 런타임에러가 났다...

sys.setrecursionlimit(10000) 꼭 해주자..

import sys
sys.setrecursionlimit(10000) # 재귀 깊이 설정

for _ in range(int(sys.stdin.readline())):
  M,N,K=map(int,sys.stdin.readline().split())
  graph=[[0]*M for i in range(N)]
  
  for _ in range(K):
    X,Y= map(int,sys.stdin.readline().split())
    graph[Y][X]=1
  
  def dfs(y,x):
    if x<0 or y<0 or x>=M or y>=N:
      return False
    
    if graph[y][x]==1:
      graph[y][x]=0
      dfs(y-1,x)
      dfs(y,x-1)
      dfs(y+1,x)
      dfs(y,x+1)
      return True
    else:
      return False
  
  count=0
  for i in range(N):
    for j in range(M):
      if dfs(i,j)==True:
        count+=1
  
  print(count)

이건 너무 잘하신 분 코드 가져온것..

나 진짜 너무 못해 ㅜ0ㅠ 때려쳐.. 아니야 계속해...

import sys
sys.setrecursionlimit(10000) # 재귀 깊이 설정

T = int(sys.stdin.readline())

dx = [-1,1,0,0]
dy = [0,0,-1,1]
def dfs(y,x):
  if x<0 or y<0 or x>=M or y>=N:
    return False

  if graph[y][x]==1:
    graph[y][x]=0
    for i in range(4): #모든 방향 탐색
      dfs(y+dy[i],x+dx[i])      
    return True
  else:
    return False

for _ in range(T):
  M,N,K=map(int,sys.stdin.readline().split())
  graph=[[0]*M for i in range(N)]
  
  for _ in range(K):
    X,Y= map(int,sys.stdin.readline().split())
    graph[Y][X]=1
  
  count=0
  for i in range(N):
    for j in range(M):
      if dfs(i,j)==True:
        count+=1
  
  print(count)

11724번 연결 요소의 개수

1012번과 비슷하게 그래프에서 서로 연결된 부분을 하나로 보고 → 이렇게 통합된 노드가 몇 개인지 구하는 문제

이제는 dfs코드는 그래도 좀 작성할 줄 아는 것 같음

근데 변형된 건 너무 못해..

import sys
sys.setrecursionlimit(1000000) # 재귀 깊이 설정

N,M=map(int,sys.stdin.readline().split())
graph=[[] for _ in range(N+1)]
for _ in range(M):
  a,b= map(int,sys.stdin.readline().split())
  graph[a].append(b)
  graph[b].append(a)

visited=[0]*(N+1)
def dfs(v):
  visited[v]+=1
  for i in graph[v]:
    if visited[i]==0:
      dfs(i)

cnt=0
for i in range(1,N+1):
  if visited[i]==0:
    dfs(i)
    cnt+=1  #dfs가 끝날때 +1

print(cnt)

2178 미로탐색

(N,M) 위치로 가는 가장 빠른 칸 수를 구하는 문제

지나갈 수 있는 길은 1로 되어있다.

지나간 길은 순차적으로 더해주는 방법을 사용했다.

import sys
from collections import deque

N,M=map(int,sys.stdin.readline().split())
graph=[]
for _ in range(N):
  a = list(map(int,sys.stdin.readline().rstrip()))
  graph.append(a)

dx = [-1,1,0,0]
dy = [0,0,1,-1]

def bfs(x,y):
  queue = deque()
  queue.append((x,y))
  while queue:
    x,y=queue.popleft()
    for i in range(4):
      nx, ny = x+dx[i], y+dy[i]
      if nx<0 or ny<0 or nx>=N or ny>=M:
        continue
      if graph[nx][ny]==0:
        continue
      if graph[nx][ny]==1:
        queue.append((nx,ny))
        graph[nx][ny]=graph[x][y]+1
  return graph[N-1][M-1]

print(bfs(0,0))

CLIP 코드 분해(1) 텍스트 파트

yeonjins — Tue, 10 Sep 2024 21:40:31 +0900

이 글은 허깅페이스를 참고한다.

https://huggingface.co/transformers/v4.6.0/_modules/transformers/models/clip/modeling_clip.html

보통 모델이 필요하면 허깅페이스에서 그대로 가져와서 파인튜닝을 하거나 모델의 뒷단만 조금 수정해서 새로 튜닝하는 식으로 사용한다. 하지만 이미 완성되어 패키징 되어있는 모델을 그냥 가져와 사용하니 내가 바꾸고 싶은 부분을 변경하기가 쉽지 않고, 무엇보다 내부 구조를 완벽히 이해하지 않고 사용하게 되는 경우가 많아졌다.

따라서 이번에는 모델의 안쪽 구조도 내가 원하는 방향으로 바꿔보고 싶어 대표적인 모델 중 CLIP 소스코드를 허깅페이스에서 직접 가져와 분석해보았다.

허깅페이스에서 CLIP을 가져와보면 큰 구조는 아래와 같다.

텍스트와 비전 파트 모두 encoder에서 총 11개의 동일한 구조의 레이어를 쌓았다.

- CLIPModel

    - (text_model): CLIPTextTransformer(
        - (embeddings): token_embedding(49408, 512), position_embedding(77, 512)
        - (encoder): layers 11개
                self_attn
                layer_norm1
                mlp
                layer_norm2
        - (final_layer_norm)
    )
        
    - (vision_model): CLIPVisionTransformer(
        - (embeddings): patch_embedding(Conv2d), position_embedding(50, 768)
        - (pre_layrnorm) 
        - (encoder): layers  11개
                    self_attn
                    layer_norm1
                    mlp
                    layer_norm2
        - (post_layernorm): 
        )
        
    - (visual_projection)
    
    - (text_projection)

이 중 텍스트 파트의 코드를 확인해보자.

1. 임베딩

1. 토큰 임베딩 : 사전 개수(voca size) 만큼의 임베딩을 생성함. [vocab size, 512]

self.token_embedding = nn.Embedding(args.vocab_size, embed_dim)

2. 포지션 임베딩 : 각 인풋 토큰 위치에 따라 포지션 임베딩을 생성함. 기본 CLIP의 인풋 최대길이는 77.

self.position_embedding = nn.Embedding(args.max_position_embeddings, embed_dim)

3. 기타

1) position_ids

: 0부터 max 개수까지 순서 인덱스 값을 가진 텐서 생성

: shape : [77] → expand((1, -1)) → [1, 77]

self.register_buffer("position_ids", torch.arange(args.max_position_embeddings).expand((1, -1)))

[ register_buffer ]

state_dict에 저장되지만 backpropagation을 진행하지 않고, 최적화에 사용하지 않을때 사용함

참고 : https://aigong.tistory.com/429

4. 실행

나는 로컬에 미리 토크나이저를 받아놔서 local_files_only=True 인자를 사용했다.

인풋에 position_ids와 input_embeds가 없을 경우에는

1) 각 토큰에 input_ids 에 해당하는 토큰 임베딩을 사용하고

inputs_embeds = self.token_embedding(input_ids)

2) 각 토큰 위치 position_ids 에 해당하는 포지션 임베딩을 사용하여

position_ids = self.position_ids[:, :seq_length]
position_embeddings = self.position_embedding(position_ids)

3) 둘을 합해 텍스트 임베딩을 내보낸다.

from torch import nn
import torch
import argparse


class CLIPTextEmbeddings(nn.Module):
    def __init__(self, args):
        super().__init__()
        embed_dim = args.hidden_size

        self.token_embedding = nn.Embedding(args.vocab_size, embed_dim)
        self.position_embedding = nn.Embedding(args.max_position_embeddings, embed_dim)

        # position_ids (1, len position emb) is contiguous in memory and exported when serialized
        self.register_buffer("position_ids", torch.arange(args.max_position_embeddings).expand((1, -1)))

    def forward(self, input_ids=None, position_ids=None, inputs_embeds=None):
    	# 토큰 길이
        seq_length = input_ids.shape[-1] if input_ids is not None else inputs_embeds.shape[-2]

        if position_ids is None:
            position_ids = self.position_ids[:, :seq_length]

        if inputs_embeds is None:
            inputs_embeds = self.token_embedding(input_ids)

        position_embeddings = self.position_embedding(position_ids)
        embeddings = inputs_embeds + position_embeddings

        return embeddings

2. 셀프어텐션

1. 멀티헤드 : Transformer 논문의 멀티헤드어텐션 사용함

1) head_dim

임베딩 차원은 텍스트는 512, 이미지는 768이고, 나눠서 계산할 head 개수는 텍스트는 8개, 이미지는 12개임

따라서 한번에 계산할 head 묶음(head_dim은 모두 64개

2) 어텐션을 계산하기 전 Query값을 스케일링 해주기 위한 값을 지정함

텍스트 기준으로 head_dim ^ -0.5 = 64 ** -0.5 = 1/8 이 되는 셈이므로 0.125 를 쿼리에 곱해 스케일링함

self.head_dim = self.embed_dim // self.num_heads # 텍스트 512//8 = 64, 비전 768//12 = 64
# 수치가 정확히 맞는지 확인하기 위해 가정설정문 사용
assert (
            self.head_dim * self.num_heads == self.embed_dim
        ), f"embed_dim must be divisible by num_heads (got `embed_dim`: {self.embed_dim} and `num_heads`: {self.num_heads})."
self.scale = self.head_dim ** -0.5
self.dropout = args.attention_dropout

2. Query, Key, Value

어텐션을 계산할 query, key, value 생성

텍스트 기준 Lineary의 인풋, 아웃풋 모두 512차원임

self.k_proj = nn.Linear(self.embed_dim, self.embed_dim)
self.v_proj = nn.Linear(self.embed_dim, self.embed_dim)
self.q_proj = nn.Linear(self.embed_dim, self.embed_dim)
self.out_proj = nn.Linear(self.embed_dim, self.embed_dim)

3. 기타

1) shape 조정

transpose로 shape 조정할때 원소들의 위치는 변화되지 않고, 접근 인덱스만 변화됨

따라서 원소의 위치까지 변화를 주기 위해 contiguous( ) 사용

[문장개수, 문장길이, 512] → [문장개수, head개수, 문장길이, head묶음] 으로 차원을 변경해줌

def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
        return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous()

4. 실행

이제 본격적인 어텐션 계산이 들어간다.

어텐션 계산의 흐름은 이 글에서도 설명되겠지만 Query, Key, Value 계산과정의 의미는 아래 Transformer 논문 리뷰에 자세하게 적어놓았다.

https://yeonjins.tistory.com/entry/Paper-Attention-Is-All-You-NeedNIPS-2017

[Paper] Attention Is All You Need(NIPS, 2017)

2017년 NIPS에서 발표된 Attention Is All You Need는 구글이 낸 논문으로 이를 기반으로 Bert, GPT, VIT 등 자연어뿐만 아니라 이미지에도 적용되는 모델들이 나왔다. 대학원 첫 세미나로 이 논문을 발표하게

yeonjins.tistory.com

트랜스포머 원래 논문에서는 쿼리와 키를 스케일링 해줄때 루트d차원으로 나눴는데 이 부분은 이 글 3-1에서 적용된다.

1) 어텐션 계산

- Key, Value 는 shape 조정 후 사용

텍스트 기준으로 [문장개수 , 문장길이 , 512] → shape 조정 → [문장개수, 8, 문장길이 , 64] → shape 조정 (proj_shape) → [문장개수 x 8, 문장길이, 64 ]

- Query 는 스케일링 후 shape 조정

스케일링 → [문장개수, 문장길이, 512] → shape 조정 → [문장개수 , 8, 문장길이, 64] → shape 조정 (proj_shape) → [문장개수 x 8, 문장길이, 64]

bsz, tgt_len, embed_dim = hidden_states.size() #문장개수, 문장길이, 차원512

proj_shape = (bsz * self.num_heads, -1, self.head_dim)


query_states = self.q_proj(hidden_states) * self.scale
query_states = self._shape(query_states, tgt_len, bsz).view(*proj_shape)

key_states = self._shape(self.k_proj(hidden_states), -1, bsz)
key_states = key_states.view(*proj_shape)

value_states = self._shape(self.v_proj(hidden_states), -1, bsz)
value_states = value_states.view(*proj_shape)

2) Query와 Key 행렬곱 : attention weight

- torch.bmm : [B, N, M] x [B, M, P] = [B, N, P]

3D텐서가 입력되면 즉, 곱하는 두 행렬 모두 batch 단위일때 행렬곱을 해줌

여기서는 문장개수가 batch가 됨

따라서 쿼리와 키가 서로 곱해질 수 있도록 transpose를 해줘야함

Query : [문장개수 x 8, 문장길이, 64]

Key : [문장개수 x 8, 문장길이, 64] → [문장개수 x 8, 64, 문장길이]

위 두 값이 torch.bmm 을 거치면 (곱해지면) → [문장개수 x 8, 문장길이, 문장길이]

src_len = key_states.size(1)  # [문장개수 x 8, 문장길이, 64 ] -> 문장길이
attn_weights = torch.bmm(query_states, key_states.transpose(1, 2)) #3차원끼리 행렬곱

# 차원이 맞는지 확인
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
        raise ValueError(
            f"Attention weights should be of size {(bsz * self.num_heads, tgt_len, src_len)}, but is {attn_weights.size()}"
        )

3) 여기서 attention mask가 적용됨

if causal_attention_mask is not None:
    if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
        raise ValueError(
            f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {causal_attention_mask.size()}"
        )
    attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + causal_attention_mask
    attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)

if attention_mask is not None:
    if attention_mask.size() != (bsz, 1, tgt_len, src_len):
        raise ValueError(
            f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {causal_attention_mask.size()}"
        )
    attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + attention_mask
    attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)

4) Softmax

Softmamx를 거쳐 attention score를 생성함

위에서 쿼리와 키가 곱해지면 둘의 유사한 정도가 나오고, 소프트맥스 함수를 거쳐 0~1 사이로 정규화가 되어 유사도가 확률로 나타남

즉, 쿼리 하나당 키의 유사도가 구해지고, 쿼리 하나에 따른 키의 확률들의 합은 1이 됨

→ [????]

dropout

→ [문장개수, 문장길이, 512]

attn_weights = F.softmax(attn_weights, dim=-1)

if output_attentions:
    # True이면 위에서 query, key 곱한 후 softmax를 취한 attentnion score를 사용할지 정함
    # 코드에는 attention weight의 gradient를 유지하기 위한 것이라고 설명되어 있음
    attn_weights_reshaped = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
    attn_weights = attn_weights_reshaped.view(bsz * self.num_heads, tgt_len, src_len)
else:
    attn_weights_reshaped = None

attn_probs = F.dropout(attn_weights, p=self.dropout, training=self.training)

5) attention score와 Value의 행렬곱

Attention score : [문장개수 x 8, 문장길이, 문장길이] ???

Value : [문장개수 x 8, 문장길이, 64]

위 두 값이 torch.bmm 을 거치면 (곱해지면) → [문장개수 x 8, 문장길이, 64] ???

attn_output = torch.bmm(attn_probs, value_states)

# 차원 확인
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
        raise ValueError(
            f"`attn_output` should be of size {(bsz, self.num_heads, tgt_len, self.head_dim)}, but is {attn_output.size()}"
        )

6) Output shape 변경

[문장개수 x 8, 문장길이, 64] → [문장개수, 8, 문장길이, 64] → [문장개수, 문장길이, 8, 64] → [문장개수, 문장길이, 512]

마지막 리니어 프로젝션을 거쳐 한번의 어텐션 계산 완료

# attn_output : [문장개수 x 8, 문장길이, 64]
attn_output = attn_output.view(bsz, self.num_heads, tgt_len, self.head_dim)  # [문장개수, 8, 문장길이, 64]
attn_output = attn_output.transpose(1, 2)   # [문장개수, 문장길이, 8, 64]
attn_output = attn_output.reshape(bsz, tgt_len, embed_dim)  # [문장개수, 문장길이, 512] 

attn_output = self.out_proj(attn_output)

전체 코드

class CLIPAttention(nn.Module):
    # 'Attention Is All You Need' 의 멀티헤드 어텐션 사용함

    def __init__(self, args):
        super().__init__()
        self.embed_dim = args.hidden_size
        self.num_heads = args.num_attention_heads
        self.head_dim = self.embed_dim // self.num_heads # 텍스트 512//8 = 64, 비전 768//12 = 64
        # 수치가 정확히 맞는지 확인하기 위해 가정설정문 사용
        assert (
            self.head_dim * self.num_heads == self.embed_dim
        ), f"embed_dim must be divisible by num_heads (got `embed_dim`: {self.embed_dim} and `num_heads`: {self.num_heads})."
        self.scale = self.head_dim ** -0.5
        self.dropout = args.attention_dropout

        self.k_proj = nn.Linear(self.embed_dim, self.embed_dim)
        self.v_proj = nn.Linear(self.embed_dim, self.embed_dim)
        self.q_proj = nn.Linear(self.embed_dim, self.embed_dim)
        self.out_proj = nn.Linear(self.embed_dim, self.embed_dim)

    def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
        return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous()

    def forward(self, hidden_states, attention_mask, causal_attention_mask, output_attentions):
        """Input shape: Batch x Time x Channel"""

        bsz, tgt_len, embed_dim = hidden_states.size() #문장개수, 문장길이, 차원512

        # get query proj
        proj_shape = (bsz * self.num_heads, -1, self.head_dim)

        query_states = self.q_proj(hidden_states) * self.scale
        query_states = self._shape(query_states, tgt_len, bsz).view(*proj_shape)

        key_states = self._shape(self.k_proj(hidden_states), -1, bsz)
        key_states = key_states.view(*proj_shape)

        value_states = self._shape(self.v_proj(hidden_states), -1, bsz)
        value_states = value_states.view(*proj_shape)

        src_len = key_states.size(1)   # [문장개수 x 8, 문장길이, 64 ] -> 문장길이
        
        ### 쿼리, 키 행렬곱 ###
        attn_weights = torch.bmm(query_states, key_states.transpose(1, 2)) #3차원끼리 행렬곱

        if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
            raise ValueError(
                f"Attention weights should be of size {(bsz * self.num_heads, tgt_len, src_len)}, but is {attn_weights.size()}"
            )

        # attention_mask 있으면 사용
        if causal_attention_mask is not None:
            if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
                raise ValueError(
                    f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {causal_attention_mask.size()}"
                )
            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + causal_attention_mask
            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)

        if attention_mask is not None:
            if attention_mask.size() != (bsz, 1, tgt_len, src_len):
                raise ValueError(
                    f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {causal_attention_mask.size()}"
                )
            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + attention_mask
            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)

		### softmax 계산 : 어텐션 스코어 ###
        attn_weights = F.softmax(attn_weights, dim=-1)

		# output_attentions 어텐션 스코어까지 계산한 것 사용할지 선택사항!
        if output_attentions:
            attn_weights_reshaped = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
            attn_weights = attn_weights_reshaped.view(bsz * self.num_heads, tgt_len, src_len)
        else:
            attn_weights_reshaped = None

        attn_probs = F.dropout(attn_weights, p=self.dropout, training=self.training)

		### 어텐션 스코어, 벨류 행렬곱 ###
        attn_output = torch.bmm(attn_probs, value_states)

        if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
            raise ValueError(
                f"`attn_output` should be of size {(bsz, self.num_heads, tgt_len, self.head_dim)}, but is {attn_output.size()}"
            )

        attn_output = attn_output.view(bsz, self.num_heads, tgt_len, self.head_dim)
        attn_output = attn_output.transpose(1, 2)
        attn_output = attn_output.reshape(bsz, tgt_len, embed_dim)

        attn_output = self.out_proj(attn_output)

		# 어텐션 결과, output_attentions이 False 이면 None값
        return attn_output, attn_weights_reshaped

3. CLIPMLP

1. GELU activation function

허깅페이스의 활성화 함수 모음은 아래 깃헙에 있다.

https://github.com/huggingface/transformers/blob/main/src/transformers/activations.py#L90

BERT, GPT, ViT는 인코더의 MLP구조의 활성화 함수로 ReLU가 아닌 GELU를 사용함.

- ReLU : 입력 x의 부호에 따라 self-gating (부호에 따라 deterministic하게 1 or 0을 곱함)

- Dropout : 1 or 0을 stochastic하게 곱함

- GELU : 이 둘의 개념을 합침. x가 줄어들수록 dropped될 확률이 높아짐

GELU 는 모든 점에서 미분가능하고, 단조증가함수가 아니 때문에 비선형 활성화 함수 활용 목적에 맞게 복잡한 함수를 모델링하는데 도움됨

(참고자료: https://hongl.tistory.com/236)

GELU vs ReLU

from collections import OrderedDict

class QuickGELUActivation(nn.Module):
    def forward(self, input):
        return input * torch.sigmoid(1.702 * input)

class ClassInstantier(OrderedDict):
    def __getitem__(self, key):
        content = super().__getitem__(key)
        cls, kwargs = content if isinstance(content, tuple) else (content, {})
        return cls(**kwargs)

ACT2CLS = {"quick_gelu": QuickGELUActivation}
ACT2FN = ClassInstantier(ACT2CLS)
> ClassInstantier([('quick_gelu', __main__.QuickGELUActivation)])

2. 실행

[문장개수, 문장길이, 512] → 리니어 → [ 문장개수 , 문장길이 , 2048] → 활성화 함수 → [문장개수, 문장길이, 512]

class CLIPMLP(nn.Module):
    def __init__(self, args):
        super().__init__()
        self.activation_fn = ACT2FN[args.hidden_act]
        self.fc1 = nn.Linear(args.hidden_size, args.intermediate_size) # intermediate_size : 2048
        self.fc2 = nn.Linear(args.intermediate_size, args.hidden_size)

    def forward(self, hidden_states):
        hidden_states = self.fc1(hidden_states)
        hidden_states = self.activation_fn(hidden_states)
        hidden_states = self.fc2(hidden_states)
        return hidden_states

4. 인코더 레이어

임베딩 → residual connection → 레이어정규화1 → 셀프 어텐션계산 → residual connection → 레이어정규화2 → MLP

class CLIPEncoderLayer(nn.Module):
    def __init__(self, args):
        super().__init__()
        self.embed_dim = args.hidden_size
        self.self_attn = CLIPAttention(args)
        self.layer_norm1 = nn.LayerNorm(self.embed_dim)
        self.mlp = CLIPMLP(args)
        self.layer_norm2 = nn.LayerNorm(self.embed_dim)

    def forward(self, hidden_states, attention_mask,causal_attention_mask, output_attentions):
        residual = hidden_states

        hidden_states = self.layer_norm1(hidden_states)
        hidden_states, attn_weights = self.self_attn(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            causal_attention_mask=causal_attention_mask,
            output_attentions=output_attentions,
        )
        hidden_states = residual + hidden_states

        residual = hidden_states
        hidden_states = self.layer_norm2(hidden_states)
        hidden_states = self.mlp(hidden_states)
        hidden_states = residual + hidden_states

        outputs = (hidden_states,)

        if output_attentions:
            outputs += (attn_weights,)

        return outputs

모델 구조 출력

5. 인코더 레이어 스택

1. nn.ModuleList

텍스트 기준으로 12개의 인코더 레이어를 쌓는다.

1) nn.Sequential vs nn.ModuleList

- Sequential

입력값이 하나일 때, 각 레이어를 데이터가 순차적으로 지나갈 때 사용

- ModuleList

여러 입력값을 따로 따로 동시에 돌릴 수 있음

forward 함수에서 fore문을 사용해야 함

참고: https://michigusa-nlp.tistory.com/26

항상 레이어를 쌓을때 그냥 python list가 아닌 nn.ModuleList로 담는게 궁금했는데, ModuleList가 안에 담긴 모듈들을 인식하고, optimizer를 정의할 때 파라미터들을 인식한다고 한다.

self.layers = nn.ModuleList([CLIPEncoderLayer(args) for _ in range(args.num_hidden_layers)])

2. 실행

1) nn.ModuleList로 묶어 놓은 인코더 레이어를 for문으로 실행

2) 내장함수 getattr

getattr(object, attribute, default)

- object : 클래스

- attribute : 클래스 안의 함수 (속성값)

- default : attribute 자리에 적은 함수가 object 클래스 안에 없을 경우 내뱉을 값

ex. np.array([1]) → gettattr(np, 'array')([1])

따라서 학습시에, gradient_checkpointing이 True일 경우 if문이 실행됨

참고 : https://chancoding.tistory.com/188

if getattr(self.config, "gradient_checkpointing", False) and self.training:

전체 코드

class CLIPEncoder(nn.Module):
    def __init__(self, args):
        super().__init__()
        # 인코더 레이어 쌓기
        self.layers = nn.ModuleList([CLIPEncoderLayer(args) for _ in range(args.num_hidden_layers)]) 
        # self.output_attentions = args.output_attentions
        # # self.output_hidden_states = args.output_hidden_states
        # # self.use_return_dict = args.use_return_dict
        # # self.use_return_dict = args.use_return_dict
        self.args = args

    def forward(
        self, inputs_embeds, attention_mask=None, causal_attention_mask=None, output_attentions=None, 
        output_hidden_states=None, return_dict=None,
    ):
        
        # output_attentions True일 경우 : attention weight 출력 여부임, defalut=False
        output_attentions = output_attentions if output_attentions is not None else self.args.output_attentions
        all_attentions = () if output_attentions else None
        
        # output_hidden_states True일 경우 : 각 layer의 hidden state 출력할지 여부임, defalut=False
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.args.output_hidden_states)
        encoder_states = () if output_hidden_states else None

        return_dict = return_dict if return_dict is not None else self.args.use_return_dict


        # 레이어 순서대로 실행
        hidden_states = inputs_embeds
        for idx, encoder_layer in enumerate(self.layers):

            # output_hidden_states True일 경우만
            if output_hidden_states:
                encoder_states = encoder_states + (hidden_states,)
            
            # 학습할 경우 / 학습안할 경우
            if getattr(self.args, "gradient_checkpointing", False) and self.training:

                def create_custom_forward(module):
                    def custom_forward(*inputs):
                        return module(*inputs, output_attentions) 

                    return custom_forward

                layer_outputs = torch.utils.checkpoint.checkpoint(
                    create_custom_forward(encoder_layer),
                    hidden_states,
                    attention_mask,
                    causal_attention_mask,
                )
            else:
                layer_outputs = encoder_layer(
                    hidden_states,
                    attention_mask,
                    causal_attention_mask,
                    output_attentions=output_attentions,
                )

                hidden_states = layer_outputs[0]

            if output_attentions:
                all_attentions = all_attentions + (layer_outputs[1],)

        if output_hidden_states:
            encoder_states = encoder_states + (hidden_states,)

        # # return_dict : False일 경우
        if not return_dict:
            return tuple(v for v in [hidden_states, encoder_states, all_attentions] if v is not None)
        # return_dict : True일 경우
        return BaseModelOutput(
            last_hidden_state=hidden_states, hidden_states=encoder_states, attentions=all_attentions
        )

실행 코드

from clip_textencoder import CLIPTextEmbeddings, CLIPAttention, CLIPMLP, CLIPEncoderLayer, CLIPEncoder
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--vocab_size', default=49408, help='vocab_size')
parser.add_argument('--hidden_size', default=512, help='hidden_size')
parser.add_argument('--max_position_embeddings', default=77, help='max_position_embeddings')
parser.add_argument('--num_attention_heads', default=8, help='num_attention_heads')
parser.add_argument('--attention_dropout', default=0.0, help='attention_dropout')
parser.add_argument('--hidden_act', default="quick_gelu", help='hidden_act')
parser.add_argument('--intermediate_size', default=2048, help='intermediate_size')

parser.add_argument('--num_hidden_layers', default=12, help='num_hidden_layers')

parser.add_argument('--gradient_checkpointing', default=False, help='gradient_checkpointing')
parser.add_argument('--output_attentions', default=False, help='output_attentions')
parser.add_argument('--output_hidden_states', default=False, help='output_hidden_states')
parser.add_argument('--use_return_dict', default=False, help='use_return_dict')

args, _ = parser.parse_known_args()

from transformers import AutoProcessor
# from transformers import CLIPProcessor
repo = "/home/leeanna/2024_practice/clip/clip-vit-base-patch32"
processor = AutoProcessor.from_pretrained(repo, local_files_only=True)
inputs = processor(text=["a photo of a cat", "a photo of a dog"], return_tensors="pt", 
                   padding=True)


# 1. 임베딩
embedding_class = CLIPTextEmbeddings(args)
text_embedding = embedding_class(input_ids=inputs.input_ids, position_ids=None, inputs_embeds=None)

print(text_embedding.shape)
> torch.Size([2, 7, 512])

# 정규화
residual = text_embedding

from torch import nn
layer_norm1 = nn.LayerNorm(512)
hidden_states = layer_norm1(text_embedding)
print('layer_norm1 : ', hidden_states.size())
> layer_norm1 :  torch.Size([2, 7, 512])

# 2. 어텐션
self_attn = CLIPAttention(args)
hidden_states, attn_weights = self_attn(
            hidden_states=hidden_states,
            attention_mask=None,
            causal_attention_mask=None,
            output_attentions=None,
        )
print('어텐션 끝 hidden_states : ',hidden_states.shape)
> 어텐션 끝 hidden_states :  torch.Size([2, 7, 512])

hidden_states = residual + hidden_states
residual = hidden_states

# 정규화2
layer_norm2 = nn.LayerNorm(512)
hidden_states = layer_norm2(hidden_states)
print('layer_norm2 : ', hidden_states.size())
> layer_norm2 :  torch.Size([2, 7, 512])

# 3. mlp
mlp = CLIPMLP(args)
hidden_states = mlp(hidden_states)
print('mlp : ', hidden_states.size())
> mlp :  torch.Size([2, 7, 512])

hidden_states = residual + hidden_states

# 결과
outputs = (hidden_states,)

# 4. 인코더 실행
one_encoder = CLIPEncoderLayer(args)
print(one_encoder)
one_encoder_output = one_encoder(hidden_states, attention_mask=None, causal_attention_mask=None, output_attentions=None)
print(one_encoder_output[0].shape) 
> torch.Size([2, 7, 512])

# 5. 인코터 스택
# layers = nn.ModuleList([CLIPEncoderLayer(args) for _ in range(args.num_hidden_layers)])

# hidden_states = text_embedding
# for idx, encoder_layer in enumerate(layers): 
#     layer_outputs = encoder_layer(hidden_states, attention_mask=None, causal_attention_mask=None, output_attentions=None)
#     hidden_states = layer_outputs[0]

# print(hidden_states.shape)

all_encoder = CLIPEncoder(args)
all_encoder_output = all_encoder(text_embedding, attention_mask=None, causal_attention_mask=None, output_attentions=None, output_hidden_states=None, return_dict=None)
print(all_encoder_output[0].shape)
> torch.Size([2, 7, 512])

huggingface 활용하기

yeonjins — Thu, 30 Mar 2023 17:38:24 +0900

Transformer 관련 모델들

Transformer : 2017년 6월
GPT : 2018년 6월, 최초의 pretrained transformer 모델로 각 task에 맞게 finetuning해서 사용할 수 있도록 했다.
BERT : 2018년 10월, 똑같이 pretrained된 모델로 gpt와 비슷한 크기로 만들어 비교하며 성능이 뛰어남을 보였다.
GPT-2 : 2019년 2월
DistillBERT : 2019년 10월, 메모리 소비를 40% 줄이고, 속도를 60% 높이고, BERT의 97% 성능을 유지했다.
BART, T5 : 2019년 10월, 트랜스포머 모델과 동일한 아키텍처를 사용한 pretrained 모델
GPT-3 : 2020년 5월, 미세 조정 없이 다양한 task가 가능한 zero-shot learning 모델

GPT계열은 트랜스포머의 디코더 파트를 활용한 auto-regressive transformer모델로 (CTRL, GPT, GPT-2, Transformer XL) text generation에 적합하다.

BERT 계열은 인코더 파트를 활용한 auto-encoding transformer 모델로 (ALBERT, BERT, DistilBERT, ELECTRA, RoBERTa) sentece classification, NER, extractice QA에 적합하다.

BART와 T5는 seq2seq transformer 모델로(BART, mBART, Marian, T5) summarization, translation, generative QA에 잘 활용된다.

Pipeline

허깅페이스 transformers 라이브러리의 pipeline은 데이터 전처리, 모델입력, 후처리의 3단계를 한번에 실행해주어 매우 편하다. 객체를 생성할때 체크포인트를 넣는데, 모델을 지정하지 않아도 선택한 체크포인트에 적합한 모델 아키택쳐를 자동으로 추출해주어 매우 편하다.

from transformers import pipeline
classifier = pipeline("sentiment-analysis")

위처럼 감성분석의 경우 sentiment-analysis을 활용하면 되고, 각 파이프라인에 따라 활용하면 된다.

- zero-shot-classification : 제로샷 분류 모델로 레이블을 새로 지정해 활용가능하다.

객체를 생성하고 나서 활용할때 candidate_labels 로 원하는 레이블을 지정하면 된다.

- text-generation : 프롬프트를 제공하면 모델이 나머지 텍스트를 생성해준다.

객체를 생성하고 나서 활용할때 max_length 로 텍스트의 총 길이를 지정할 수 있다.

- fill-mask : 중간 단어를 <mask> 토큰으로 지우면 해당 위치에 들어올 확률 높은 단어를 예측해준다.

객체를 생성하고 나서 활용할때 top_k로 후보 단어의 개수를 지정할 수 있다.

- ner : 사람은 PER, 조직은 ORG, 위치는 LOC 처럼 입력 텍스트에서 개체명을 인식해준다. (grouped_entitles)

- question-answering : 질문에 대한 응답을 제공한다.

객체를 생성하고 나서 활용할때 context 정보를 함께 제공하면 된다.

- summarization : 잘 요약해준다,,

- translation

또 해당 task에 원하는 모델을 직접 지정해서 활용도 가능하다.

generator = pipeline("text-generation", mmodel = "distilgpt2")

Pipeline 활용하지 않고 실행

파이프라인은 인풋 전처리, 모델로 전달, 아웃풋 후처리를 모두 실행해준다.

하지만 이 전체 과정을 따로 처리해야 할 때가 있다.

1. 인풋 전처리하기 - 토크나이저 활용

트랜스포머 라이브러리의 AutoTokenizer를 활용해 인풋을 모델에 넣을 수 있는 형태로 바꿔야 한다.

from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

inputs = ["문장1","문장2"]

# return_tensors로 출력 텐서의 유형을 지정가능
inputs = tokenizer(inputs, padding=True, truncation=True, return_tensors="pt")
inputs
> {'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,    0,     0,     0,     0,     0,     0]]), 
   'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}

토크나이저의 아웃풋 형태는 딕셔너리로 input_ids와 attention_mask가 나온다. 또 return_tensors로 텐서의 유형을 지정가능하다.

이렇게 나온 아웃풋을 모델에 넣으면 된다.

2. 모델

트랜스포머 라이브러리의 AutoModel을 활용한다.

from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

outputs = model(**inputs)

outputs.last_hidden_state.shape
> torch.Size([2, 16, 768])

모델의 아웃풋은 배치크기(문장 개수), 시퀀스 길이, 은닉크기(모델 입력의 벡터 차원)이다.

이렇게 나온 아웃풋은 hidden state로 모델 헤드의 입력이 된다.

3. Head

hidden state를 헤드에 넣어 원하는 차원만큼으로 출력한다.

from transformers import AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

outputs = model(**inputs)
outputs
> SequenceClassifierOutput(loss=None, logits=tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

outputs.logits.shape
> torch.Size([2, 2])

앞에서 2개의 문장을 넣고, class는 두 개 였다. 따라서 한 문장씩 두 class에 대한 logit값이 나와 2x2 벡터가 출력된다.

4. 후처리

Head를 통해 나온 결과는 logit으로 확률이 아니다. 따라서 softmax를 취해 확률값으로 변환해주어야 한다.

outputs.logits
> tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>)
        
# logit 값들을 확률로 변환
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predictions
> tensor([[4.0195e-02, 9.5980e-01],
        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward0>)


# config.id2label : 각 숫자별 레이블
model.config.id2label
> {0: 'NEGATIVE', 1: 'POSITIVE'}