Hallucination is an apparent perception in the absence of real external
sensory stimuli. An auditory hallucination is a perception of hearing sounds
that are not real. A common form of auditory hallucination is hearing voices in
the absence of any speakers which is known as Auditory Verbal Hallucination
(AVH). AVH is fragments of the mind's creation that mostly occur in people
diagnosed with mental illnesses such as bipolar disorder and schizophrenia.
Assessing the valence of hallucinated voices (i.e., how negative or positive
voices are) can help measure the severity of a mental illness. We study N=435
individuals, who experience hearing voices, to assess auditory verbal
hallucination. Participants report the valence of voices they hear four times a
day for a month through ecological momentary assessments with questions that
have four answering scales from ``not at all'' to ``extremely''. We collect
these self-reports as the valence supervision of AVH events via a mobile
application. Using the application, participants also record audio diaries to
describe the content of hallucinated voices verbally. In addition, we passively
collect mobile sensing data as contextual signals. We then experiment with how
predictive these linguistic and contextual cues from the audio diary and mobile
sensing data are of an auditory verbal hallucination event. Finally, using
transfer learning and data fusion techniques, we train a neural net model that
predicts the valance of AVH with a performance of 54\% top-1 and 72\% top-2 F1
score