iOSアプリに音声認識機能を組み込む

音声認識の機能を持ったiOSアプリを仕事で開発することになったので、メモしておきます。

いろいろある

まずは音声認識機能を実現するためにどんな技術があるのか調べてみたので、有用そうなのをまとめてみました。

DOCOMO音声入力API

https://dev.smt.docomo.ne.jp/?p=docs.api.page&api_name=speech_recognition&p_name=sdk

iOS SDK用音声認識機能ライブラリVocalKitの使い方

http://d.hatena.ne.jp/shu223/20110227/1299368179

フリーの iOS 向け音声認識／音声合成ライブラリ『OpenEars』の使い方

http://qiita.com/shu223/items/eda02dc7d334c339ff64

実装

ということで組み込み手順！！！

手順

練習として最初にOpenEarsを用いた最小限のサンプルアプリを作ってみることにしました。

完成品はこちら。ライブラリも突っ込んでるのでcloneしていただければもう動く状態かなと思います。

totzYuta/OpenEars-sample · GitHub

ということで、公式のチュートリアルと、いつもお世話になっているshuさんの記事を参考にさせていただき、OpenEarsの設定を行っていきます。

OpenEars Tutorials - Politepix

OpenEars 1.6で音声認識を行う - Over&Out その後

1. フレームワークの追加

以下のOpenEarsのサイトからパッケージをダウンロードして、

OpenEars - iPhone Voice Recognition and Text-To-Speech

その中のframeworkというディレクトリをxcodeの中にほりこみます。

Copy items into destination group's folder (if needed)

Create groups for any added folders

の二つにはチェックを入れておきます。

ここの挙動については以下を参照してやってください。

【Xcode】黄色のフォルダーと青色のフォルダーの違いについてまとめる話 - 黒ごまプリンの雑記帳

そしてXcodeのフレームワークTARGETS -> Build Phase -> Link Frameworks and Libraries -> + で、以下のライブラリを以下を追加します。

AudioToolBox
AVFoundation

2. headerファイルをインポート

.mファイルでheaderをimportします。

// ViewController.m
#import <OpenEars/OELanguageModelGenerator.h>
#import <OpenEars/OEPocketsphinxController.h>
#import <OpenEars/OEAcousticModel.h>

ここで、headerファイルが見つからないというエラーが発生してしまう。以下を参考にしていろいろ見てみたけど、なお見つからず...。

[XCode]追加してインポートしたヘッダーファイルが見つからないときの対処法3つ | Exception Code.

結局frameworkを入れ直したらうまくいきました。

関係ないけど、headerファイルのインポートについて調べてたらこんな記事が出てきたけど、非常に勉強になりました。おすすめです。

Objective-Cの循環参照について - webとかmacとかやってみようか R

3. OELanguageModelGeneratorの設定

プロパティの宣言が出来たら、以下を追加します。

- (void)createLanguageModel {
    OELanguageModelGenerator *lmGenerator = [[OELanguageModelGenerator alloc] init];
    
    NSArray *words = [NSArray arrayWithObjects:@"WORD", @"STATEMENT", @"OTHER WORD", @"A PHRASE", nil];
    NSString *name = @"NameIWantForMyLanguageModelFiles";
    NSError *err = [lmGenerator generateLanguageModelFromArray:words withFilesNamed:name forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]]; // Set Language Model to English
    
    NSString *lmPath = nil;
    NSString *dicPath = nil;
    
    if(err == nil) {
        
        lmPath = [lmGenerator pathToSuccessfullyGeneratedLanguageModelWithRequestedName:@"NameIWantForMyLanguageModelFiles"];
        dicPath = [lmGenerator pathToSuccessfullyGeneratedDictionaryWithRequestedName:@"NameIWantForMyLanguageModelFiles"];
        
    }else {
        NSLog(@"Error: &@", [err localizedDescription]);
    }
}

4. OEPocketsphinxControllerの設定

headerファイルをimport

#import <OpenEars/OEPocketsphinxController.h>
#import <OpenEars/OEAcousticModel.h>

任意のタイミングに以下の処理を追加。

[[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil];
[[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:NO]; // Change "AcousticModelEnglish" to "AcousticModelSpanish" to perform Spanish recognition instead of English.

5. OEEventsObserverを設定

OEEventsObserverとは...

OEEventsObserver is the class which keeps you continuously updated about the status of your listening session, among other things, via delegate callbacks.

ということらしい。

// ViewController.h
#import <OpenEars/OEEventsObserver.h>

interface宣言のところで以下のように設定。delegateを使って認識したデータを受け渡しするのでdelegateの設定が必要。

@interface ViewController : UIViewController <OEEventsObserverDelegate>

プロパティを定義します。

@property (strong, nonatomic) OEEventsObserver *openEarsEventsObserver;

@propertyはアクセサメソッドを自動で定義してくれるというものです。Rubyでいうattr_accessorですね。

@propertyがgetter/setterメソッドにどのように展開されるかなどは以下がわかりやすかったかなと思います。

[iOS][Objective-C] @property の基本まとめ|てくめも@ecoop.net

そしてプロパティ属性については以下でめちゃくちゃわかりやすくまとめてくださってるので激おすすめです。

Objective-C のプロパティ属性のガイドライン - Qiita

strongは省略しても良く、iOSのパフォーマンスの問題からnonatomicにすべきだそうです。

そして以下を.mファイルで、他のOpenEarsのメソッドが呼ばれる前に実行するようにします。僕はviewDidLoadに書きました。

self.openEarsEventsObserver = [[OEEventsObserver alloc] init];
[self.openEarsEventsObserver setDelegate:self];

以下のメソッドの軍団を定義しておきます。

- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID {
    NSLog(@"The received hypothesis is %@ with a score of %@ and an ID of %@", hypothesis, recognitionScore, utteranceID);
}

- (void) pocketsphinxDidStartListening {
    NSLog(@"Pocketsphinx is now listening.");
}

- (void) pocketsphinxDidDetectSpeech {
    NSLog(@"Pocketsphinx has detected speech.");
}

- (void) pocketsphinxDidDetectFinishedSpeech {
    NSLog(@"Pocketsphinx has detected a period of silence, concluding an utterance.");
}

- (void) pocketsphinxDidStopListening {
    NSLog(@"Pocketsphinx has stopped listening.");
}

- (void) pocketsphinxDidSuspendRecognition {
    NSLog(@"Pocketsphinx has suspended recognition.");
}

- (void) pocketsphinxDidResumeRecognition {
    NSLog(@"Pocketsphinx has resumed recognition."); 
}

- (void) pocketsphinxDidChangeLanguageModelToFile:(NSString *)newLanguageModelPathAsString andDictionary:(NSString *)newDictionaryPathAsString {
    NSLog(@"Pocketsphinx is now using the following language model: \n%@ and the following dictionary: %@",newLanguageModelPathAsString,newDictionaryPathAsString);
}

- (void) pocketSphinxContinuousSetupDidFailWithReason:(NSString *)reasonForFailure {
    NSLog(@"Listening setup wasn't successful and returned the failure reason: %@", reasonForFailure);
}

- (void) pocketSphinxContinuousTeardownDidFailWithReason:(NSString *)reasonForFailure {
    NSLog(@"Listening teardown wasn't successful and returned the failure reason: %@", reasonForFailure);
}

- (void) testRecognitionCompleted {
    NSLog(@"A test file that was submitted for recognition is now complete.");
}