λ―ΈλΆ„λ₯˜

πŸ’‘ ꡬ글 speech-to-text μ±Œλ¦°μ§€ 랩 전체 μ§„ν–‰ κ°€μ΄λ“œ

0
Please log in or register to do it.

이 μ±Œλ¦°μ§€ 랩의 λͺ©ν‘œλŠ” Google Cloud의 Speech-to-Text, Text-to-Speech, Translation APIλ₯Ό μ‚¬μš©ν•˜λŠ” κ²ƒμž…λ‹ˆλ‹€. λ‹€μŒ κ°€μ΄λ“œλ₯Ό 따라 μ°¨κ·Όμ°¨κ·Ό μ§„ν–‰ν•˜μ‹œλ©΄ λ©λ‹ˆλ‹€.

πŸ› οΈ νƒœμŠ€ν¬ 1. API ν‚€ 생성

이 랩의 λͺ¨λ“  API ν˜ΈμΆœμ—λŠ” 인증이 ν•„μš”ν•©λ‹ˆλ‹€. λ¨Όμ € API ν‚€λ₯Ό λ§Œλ“€μ–΄μ•Ό ν•©λ‹ˆλ‹€.

Google Cloud Console에 μ ‘μ†ν•©λ‹ˆλ‹€.

μ™Όμͺ½ 탐색 λ©”λ‰΄μ—μ„œ API 및 μ„œλΉ„μŠ€(APIs & Services) > **μ‚¬μš©μž 인증 정보(Credentials)**둜 μ΄λ™ν•©λ‹ˆλ‹€.

상단에 μžˆλŠ” **+ μ‚¬μš©μž 인증 정보 λ§Œλ“€κΈ°(+ CREATE CREDENTIALS)**λ₯Ό ν΄λ¦­ν•˜κ³ , λ“œλ‘­λ‹€μš΄ λ©”λ‰΄μ—μ„œ **API ν‚€(API key)**λ₯Ό μ„ νƒν•©λ‹ˆλ‹€.

μƒˆλ‘œμš΄ API ν‚€κ°€ μƒμ„±λ˜λ©΄, ν‚€ 값을 λ³΅μ‚¬ν•˜μ—¬ λ©”λͺ¨ν•΄ λ‘‘λ‹ˆλ‹€. 이 ν‚€λŠ” λ‚˜μ€‘μ— μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

API μ œν•œ(API restrictions) νƒ­μ—μ„œ ν•„μš”ν•œ API(예: Cloud Translation API)λ₯Ό μ„ νƒν•˜μ—¬ 킀에 λŒ€ν•œ 접근을 μ œν•œν•˜λŠ” 것이 λ³΄μ•ˆμƒ μ’‹μ§€λ§Œ, 이 λž©μ—μ„œλŠ” ν•„μˆ˜λŠ” μ•„λ‹™λ‹ˆλ‹€.

πŸ—£οΈ νƒœμŠ€ν¬ 2. Text-to-Speech API둜 μŒμ„± ν•©μ„±

이 νƒœμŠ€ν¬μ—μ„œλŠ” ν…μŠ€νŠΈ νŒŒμΌμ„ μŒμ„± 파일(.mp3)둜 λ³€ν™˜ν•©λ‹ˆλ‹€.

SSHλ₯Ό 톡해 lab-vm μΈμŠ€ν„΄μŠ€μ— μ ‘μ†ν•©λ‹ˆλ‹€.

가상 ν™˜κ²½μ„ ν™œμ„±ν™”ν•©λ‹ˆλ‹€: 

source venv/bin/activate
nano synthesize-text.json 

λͺ…λ Ήμ–΄λ₯Ό μ‚¬μš©ν•˜μ—¬ νŒŒμΌμ„ μƒμ„±ν•˜κ³ , λ‹€μŒ JSON λ‚΄μš©μ„ λΆ™μ—¬λ„£κΈ° ν•©λ‹ˆλ‹€.

JSON

{
    “input”:{
        “text”:”Cloud Text-to-Speech API allows developers to include
          natural-sounding, synthetic human speech as playable audio in
          their applications. The Text-to-Speech API converts text or
          Speech Synthesis Markup Language (SSML) input into audio data
          like MP3 or LINEAR16 (the encoding used in WAV files).”
    },
    “voice”:{
        “languageCode”:”en-gb”,
        “name”:”en-GB-Standard-A”,
        “ssmlGender”:”FEMALE”
    },
    “audioConfig”:{
        “audioEncoding”:”MP3″
    }
}

Text-to-Speech APIλ₯Ό ν˜ΈμΆœν•˜μ—¬ synthesize-text.txt νŒŒμΌμ— κ²°κ³Όλ₯Ό μ €μž₯ν•©λ‹ˆλ‹€.

Bash

gcloud ml tts synthesize-text –request-file=synthesize-text.json > synthesize-text.txt

nano tts_decode.py λͺ…λ Ήμ–΄λ‘œ 파이썬 슀크립트λ₯Ό λ§Œλ“€κ³ , 제곡된 μ½”λ“œλ₯Ό λΆ™μ—¬λ„£κΈ° ν•©λ‹ˆλ‹€.

λ‹€μŒ λͺ…λ Ήμ–΄λ₯Ό μ‹€ν–‰ν•˜μ—¬ .txt νŒŒμΌμ„ .mp3 파일둜 λ””μ½”λ”©ν•©λ‹ˆλ‹€.

Bash

python tts_decode.py –input “synthesize-text.txt” –output “synthesize-text-audio.mp3”

synthesize-text-audio.mp3 파일이 μƒμ„±λ˜λ©΄, VM의 SSH μ„Έμ…˜μ—μ„œ λ‹€μš΄λ‘œλ“œν•˜μ—¬ μŒμ„± νŒŒμΌμ„ λ“€μ–΄λ΄…λ‹ˆλ‹€.

πŸŽ™οΈ νƒœμŠ€ν¬ 3. Speech-to-Text API둜 μŒμ„± ν…μŠ€νŠΈ λ³€ν™˜

이 νƒœμŠ€ν¬μ—μ„œλŠ” .flac μ˜€λ””μ˜€ νŒŒμΌμ„ ν…μŠ€νŠΈλ‘œ λ³€ν™˜ν•©λ‹ˆλ‹€. 이전에 λ°œμƒν–ˆλ˜ 였λ₯˜λ“€μ„ λ°”νƒ•μœΌλ‘œ μ •ν™•ν•œ λͺ…λ Ήμ–΄λ₯Ό μ•ˆλ‚΄ν•΄ λ“œλ¦¬κ² μŠ΅λ‹ˆλ‹€.

speech_request_fr.json νŒŒμΌμ„ μƒμ„±ν•˜λŠ” 것은 이 νƒœμŠ€ν¬λ₯Ό ν•΄κ²°ν•˜λŠ” 데 ν•„μš”ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. gcloud λͺ…λ Ήμ–΄λŠ” νŒŒμΌμ„ 직접 μ‚¬μš©ν•˜μ§€ μ•Šκ³  ν”Œλž˜κ·Έ(flag) ν˜•νƒœλ‘œ 인수λ₯Ό λ°›μŠ΅λ‹ˆλ‹€.

gs://cloud-samples-data/speech/corbeau_renard.flac νŒŒμΌμ€ ν”„λž‘μŠ€μ–΄λ‘œ λ˜μ–΄ 있으며, 헀더 정보에 λ”°λ₯΄λ©΄ **단일 채널(mono)**μž…λ‹ˆλ‹€. 이 μ •λ³΄λŠ” API 호좜 μ‹œ 맀우 μ€‘μš”ν•©λ‹ˆλ‹€.

λ‹€μŒ λͺ…λ Ήμ–΄λ₯Ό μ‚¬μš©ν•˜μ—¬ Speech-to-Text APIλ₯Ό ν˜ΈμΆœν•˜κ³  κ²°κ³Όλ₯Ό speech_response.json νŒŒμΌμ— μ €μž₯ν•©λ‹ˆλ‹€. 이 λͺ…λ Ήμ–΄λŠ” gcloudκ°€ μš”κ΅¬ν•˜λŠ” μ •ν™•ν•œ ν˜•μ‹μž…λ‹ˆλ‹€.

Bash

gcloud ml speech recognize-long-running gs://cloud-samples-data/speech/corbeau_renard.flac –language-code=fr-FR –encoding=FLAC –sample-rate=44100 > speech_response.json

recognize-long-running: 60초 μ΄μƒμ˜ κΈ΄ μ˜€λ””μ˜€ νŒŒμΌμ— μ‚¬μš©λ˜λŠ” λͺ…λ Ήμ–΄μž…λ‹ˆλ‹€.

–language-code: fr-FR둜 μ§€μ •ν•˜μ—¬ ν”„λž‘μŠ€μ–΄λ₯Ό μΈμ‹ν•˜λ„λ‘ ν•©λ‹ˆλ‹€.

–encodingκ³Ό –sample-rate: μ˜€λ””μ˜€ 파일의 인코딩 및 μƒ˜ν”Œλ§ 속도λ₯Ό μ •ν™•ν•˜κ²Œ μ§€μ •ν•©λ‹ˆλ‹€.

> speech_response.json: λͺ…λ Ή μ‹€ν–‰ κ²°κ³Όλ₯Ό speech_response.json νŒŒμΌμ— μ €μž₯ν•©λ‹ˆλ‹€.

🌐 νƒœμŠ€ν¬ 4. Translation API둜 ν…μŠ€νŠΈ λ²ˆμ—­

이 νƒœμŠ€ν¬μ—μ„œλŠ” 일본어 λ¬Έμž₯을 μ˜μ–΄λ‘œ λ²ˆμ—­ν•©λ‹ˆλ‹€.

λ‹€μŒ curl λͺ…λ Ήμ–΄λ₯Ό μ‚¬μš©ν•˜μ—¬ Cloud Translation APIλ₯Ό ν˜ΈμΆœν•˜κ³ , κ²°κ³Όλ₯Ό translated_response.txt νŒŒμΌμ— μ €μž₯ν•©λ‹ˆλ‹€.

Bash

curl -s -X POST -H “Content-Type: application/json” \
-H “Authorization: Bearer $(gcloud auth print-access-token)” \
“https://translation.googleapis.com/language/translate/v2” \
-d ‘{
  “q”: “γ“γ‚Œγ―ζ—₯本θͺžγ§γ™γ€‚”,
  “target”: “en”
}’ > translated_response.txt

“q”: λ²ˆμ—­ν•  ν…μŠ€νŠΈμž…λ‹ˆλ‹€.

“target”: λ²ˆμ—­ν•  λͺ©ν‘œ μ–Έμ–΄ μ½”λ“œ(en은 μ˜μ–΄)μž…λ‹ˆλ‹€.

❓ νƒœμŠ€ν¬ 5. Translation API둜 μ–Έμ–΄ 감지

이 νƒœμŠ€ν¬μ—μ„œλŠ” μ£Όμ–΄μ§„ λ¬Έμž₯이 μ–΄λ–€ 언어인지 κ°μ§€ν•©λ‹ˆλ‹€.

λ‹€μŒ curl λͺ…λ Ήμ–΄λ₯Ό μ‚¬μš©ν•˜μ—¬ Cloud Translation API의 μ–Έμ–΄ 감지 μ—”λ“œν¬μΈνŠΈλ₯Ό ν˜ΈμΆœν•˜κ³ , κ²°κ³Όλ₯Ό detection_response.txt νŒŒμΌμ— μ €μž₯ν•©λ‹ˆλ‹€.

Bash

curl -s -X POST -H “Content-Type: application/json” \

-H “Authorization: Bearer $(gcloud auth print-access-token)” \

“https://translation.googleapis.com/language/translate/v2/detect” \

-d ‘{

  “q”: “Este Γ© japonΓͺs.”

}’ > detection_response.txt

“q”: μ–Έμ–΄λ₯Ό 감지할 ν…μŠ€νŠΈμž…λ‹ˆλ‹€. 이 λ¬Έμž₯은 포λ₯΄νˆ¬κ°ˆμ–΄λ‘œ “이것은 μΌλ³Έμ–΄μž…λ‹ˆλ‹€.”λΌλŠ” λœ»μž…λ‹ˆλ‹€.

이 κ°€μ΄λ“œκ°€ μ±Œλ¦°μ§€ λž©μ„ μ„±κ³΅μ μœΌλ‘œ μ™„λ£Œν•˜λŠ” 데 큰 도움이 되기λ₯Ό λ°”λžλ‹ˆλ‹€! λͺ¨λ“  단계λ₯Ό μˆœμ„œλŒ€λ‘œ 따라 ν•˜μ‹œκ³ , λͺ…λ Ήμ–΄λ₯Ό μž…λ ₯ν•  λ•Œ μ˜€νƒ€κ°€ μ—†λŠ”μ§€ λ‹€μ‹œ ν•œλ²ˆ ν™•μΈν•˜μ„Έμš”.

ꡬ글 μŠ€ν”„λ ˆλ“œμ‹œνŠΈ κ³΅μœ μ„€μ • 및 ν˜‘μ—… κ°€μ΄λ“œ
μŠ€ν”„λ ˆλ“œμ‹œνŠΈ & μ „μžμ„œλͺ… 톡합 μ†”λ£¨μ…˜ λ§Œλ“€κΈ°.

이메일 μ£Όμ†ŒλŠ” κ³΅κ°œλ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. ν•„μˆ˜ ν•„λ“œλŠ” *둜 ν‘œμ‹œλ©λ‹ˆλ‹€