{"id":87,"date":"2014-06-25T15:36:00","date_gmt":"2014-06-25T06:36:00","guid":{"rendered":"http:\/\/lr-www.pi.titech.ac.jp\/wp_en\/?page_id=87"},"modified":"2015-03-09T18:48:24","modified_gmt":"2015-03-09T09:48:24","slug":"research","status":"publish","type":"page","link":"https:\/\/www.lr.first.iir.isct.ac.jp\/wp_en\/?page_id=87","title":{"rendered":"Research"},"content":{"rendered":"<p><script type=\"text\/javascript\">\/\/ <![CDATA[\nfunction doToggleClassName(obj, onClassName, offClassName){\n  obj.className = (obj.className != onClassName) ? onClassName : offClassName;\n}\nfunction getParentObj(obj){\n  return obj.parentElement || obj.parentNode;\n}\n\nfunction toggleNext(element) {\n  var content = element.nextSibling;\n  while(content.nodeType != 1)\n    content = content.nextSibling;\n  if(content.style.display != 'block')\n    content.style.display = 'block';\n  else\n    content.style.display = 'none';\n}\n\/\/ ]]><\/script><\/p>\n<div class=\"general-caption\">\n<p>Text Summarization<\/p>\n<\/div>\n<p class=\"general-paragraph\">Instead of reading a long document, we would sometimes like to read<br \/> its concise summary.<\/p>\n<p> In our research group, we are working on text summarization; we develop<br \/> methods for generating a summary.<\/p>\n<p> We are specifically interested in mathematical models that emulate how<br \/> summaries are generated from the input documents.<\/p>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Maximum Coverage Summarization Model<\/p>\n<p class=\"general-hidden\">We formulate text summarization as maximum coverage problem; we represent each sentence as a set of conceptual units (words, in our setting) and generate a summary that contains as many conceptual units as possible.<br \/> [EACL2009]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Summarization Model based on Facility Location Problem<\/p>\n<p class=\"general-hidden\">We proposed a text summarization model based on facility location problem, in which a summary is generated so that the whole input documents are going to be entailed by the summary. This is the method that makes good used of entailment relations between sentences.<br \/> [CIKM2010]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Twitter Summarization<\/p>\n<p class=\"general-hidden\">We also work on Twitter summarization, where tweets on a certain topic are collected and a summary on those tweets is generated. In particular, we develop a method for automatically generating live sports updates (e.g., soccer) from the numerous tweets on the match.<br \/> [ECIR2011]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Summarization Model based on Sentence Compression and Sentence Selection<\/p>\n<p class=\"general-hidden\">One limitation of sentence selection approaches to text summarization is that each sentence can have unimportant parts. We proposed a method for generating a summary by simultaneously conducting the sentence compression and the sentence selection. We specifically formulated the summarization problem as the extraction of dependency subtrees. We also proposed a new summarization method that makes use of the rhetorical structure of the document. <br \/> [ACL2013, ACL2014]<\/p>\n<\/div>\n<div class=\"general-caption\">\n<p>Sentiment Analysis<\/p>\n<\/div>\n<p class=\"general-paragraph\">Reputation risk has been changing its characteristics with the advent of the internet. Rumors used to be propagated from a person to person individually; groundless stories tended not to be propagated because each person judged the reliability of such stories. Nowadays, however, it is surprisingly easy to spread a rumor to the public by means of emails, web bulletin board, or SNS. If somebody writes on Twitter \"a computer of *** corporation gets easily out of order\", it can make the sales of the computer decreased. There can also be offensive posts about other organizations such as schools and universities, or about individuals. On the contrary, there can also be favorable posts as well. We need to protect ourselves against bad groundless reputations, and at the same time make good use of the opinions on the internet.<\/p>\n<p class=\"general-paragraph\">In our research group, we work on sentiment analysis or opinion analysis. We are developing methods for collecting, classifying (into \"positive\" or \"negative\") and summarizing opinions on the internet.<\/p>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Sentiment Polarity of Words<\/p>\n<p class=\"general-hidden\">A fundamental resource for sentiment analysis is the sentiment polarities of words. In order to construct such a resource, we took an approach based on statistical mechanics. In our method, we first collect word pairs that are likely to have the same polarity, from a dictionary, a thesaurus, and a corpus. We then construct a large lexical network consisting of word nodes by connecting those pairs. By regarding this network as the Ising spin model, where the polarity of each word corresponds to the spin of an electron, we estimate the state of the network using the mean-field approximation and extract sentiment polarities of words.<br \/> [SIGNL-166, NLP2005, ACL2005, NLP2011]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Sentiment Polarity of Phrase<\/p>\n<p class=\"general-hidden\">In this work, we deal with the sentimen polarities of phrases consisting of a noun and an adjective. This is a little more complex than words, because the sentiment polarity of a phrase consisting of a noun and an adjective is not the mere sum of the polarities of the two words. Let's think of an example \"risk is low\". The polarity is positive, although the polarity of \"risk\" is negative. We use a latent variable model to represent the polarity of phrases and use it to extract the<br \/> computational framework of phrase polarity.<br \/> [SIGNL-168, EACL2006, NAACL2007]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Sentiment Polarity of document<\/p>\n<p class=\"general-hidden\">The task of finding the sentiment polarity of a document is also called the sentiment classification of documents. We proposed to use word subsequences and dependency subtrees extracted from the input documents as features for supervised classifiers. We also proposed a model that represents the polarity shift of words, which is the phenomenon that the polarity of a word changes from negative to positive (or vice-versa) depending on the context.<br \/> [PAKDD'05, IJCNLP2008]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Use of Sound Symbolism for Sentiment Classification<\/p>\n<p class=\"general-hidden\">We present a method for estimating the sentiment polarity of Japanese sentences including onomatopoeic words. We use the vocal sound features of onomatopoeic words as features for supervised sentiment classification.<br \/> [PRICAI 2012]<\/p>\n<\/div>\n<p>\u3000<\/p>\n<p class=\"general-paragraph\">We are also working on a number of other tasks in sentiment analysis, such as the extraction of evaluative objects and evaluative attributes, and the agreement\/disagreement classification of opinions.<\/p>\n<div class=\"general-caption\">\n<p>Toward Natural Language Understanding<\/p>\n<\/div>\n<p class=\"general-paragraph\">There are a lot of kinds of relations between words, clauses, sentences, and documents in natural language texts. In order to understand the meaning of natural language texts, these relations have to be recognized. In our research group, we work on recognition of such relations.<\/p>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Anaphora Resolution<\/p>\n<p class=\"general-hidden\">In linguistics, anaphora is a phenomenon where the meaning of one expression, which is called anaphor, depends on another expression in context. The correct interpretation of anaphora is vital for natural language understanding. We have tackled anaphora resolution in Japanese texts, especially zero anaphora and associative anaphora, on the basis of knowledge acquired from large corpus. <br \/> [<a href=\"\/~sasano\/paper\/EMNLP09.pdf\" target=\"_blank\">EMNLP09<\/a> , <a href=\"http:\/\/69.195.124.161\/~aclwebor\/anthology\/\/I\/I11\/I11-1085.pdf\" target=\"_blank\">IJCNLP2011<\/a>]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Coherence Model<\/p>\n<p class=\"general-hidden\">We have to recognize not only relations between words or sentences but also topic coherence to understand text. In addition, the technique for evaluating the local coherence is useful for text correction and proofreading. Thus, we proposed a local coherence model for Japanese text that leverages the tendency of syntactic role transition of textual entities. <br \/> [CICLing 2010]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Cross-Document Relations between Sentences<\/p>\n<p class=\"general-hidden\">A pair of sentences in different newspaper articles on an event can have one of several relations such as the relation between two sentences that have the same information on an event (equivalence) and the relation between two sentences that have the same information except for values of numeric attributes (transition). We focused on these two relations and proposed methods of identifying them.<br \/> [<a href=\"http:\/\/69.195.124.161\/~aclwebor\/anthology\/\/I\/I08\/I08-1019.pdf\">IJCNLP2008<\/a>]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Knowledge Acquisition for Case Alternation<\/p>\n<p class=\"general-hidden\">Predicate-argument structure analysis is one of the fundamental techniques for many natural language applications. In Japanese, the relationship between a predicate and its argument is usually represented by using case particles. However, since case particles vary depending on the voices, we have to take case alternation into account to represent predicate-argument structure. Therefore, we work on automatic knowledge acquisition for case alternation between the passive\/causative and active voices, which leverages large lexical case frames obtained from large Web corpus, and several alternation patterns.<br \/> [<a href=\"http:\/\/69.195.124.161\/~aclwebor\/anthology\/\/D\/D13\/D13-1121.pdf\">EMNLP2013<\/a>]<\/p>\n<\/div>\n<div class=\"general-caption\">\n<p>Text Mining on Social Media<\/p>\n<\/div>\n<p class=\"general-paragraph\">The reputation is now disseminated quickly on the WWW, because everyone can send a message to the world easily by using social media such as blogs and Twitter. Therefore, we are tackling to present methods to find out what information attracts people's attention and what opinion they have. We had developed a system that can be characterized by the following technologies of automatic blog collection and monitoring, trend analysis and sentiment analysis in the blogs, and attribute identification of bloggers.<\/p>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Generating Live Sports Updates from Twitter<\/p>\n<p class=\"general-hidden\">Many Twitter users post their opinions, impressions, and statuses of televised events such as sports events. However, since the volume of such posts is extremely huge, it requires a lot of time and effort to understand what happens within events. We propose a method of<br \/> generating live sports updates from Twitter posts on an event. Our method selects descriptive and prompt tweets that are posted within a short time after important subevents by exploiting users called good reporters, who promptly explain what is happening at each moment throughout the event.<br \/> [<a href=\"http:\/\/www.anlp.jp\/proceedings\/annual_meeting\/2013\/pdf_dir\/D2-2.pdf\">WI2013<\/a>]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Attribute identification of bloggers<\/p>\n<p class=\"general-hidden\">Blog classification (e.g., identifying bloggers' gender or age) is one of the most interesting current problems in blog analysis. Although this problem is usually solved by applying supervised learning techniques, the large labeled dataset required for training is not always available. In contrast, unlabeled blogs can easily be collected from the web. Therefore, a semi-supervised learning method for blog classification, effectively using unlabeled data, is proposed. In this method, entries from the same blog are assumed to have the same characteristics. With this assumption, the proposed method captures the characteristics of each blog, such as writing style and topic, and uses these characteristics to improve the classification accuracy.<br \/> [<a>Proceedings of the 23rd national conference on Artificial intelligence<br \/> - Volume 2, pp.1156--1161, 2008<\/a>]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Detecting bursty words from blogs<\/p>\n<p class=\"general-hidden\">We proposed a method for extracting 'burst of a word' which is related to a popular topic in a document stream. A document stream is defined as a sequence of documents which arrive in temporal order, and we regard blog and BBS as document streams to apply the method originally<br \/> proposed by Kleinberg. However, since Kleinberg's algorithm cannot be applied to the document streams whose distribution of documents is not uniform, we extend the method to be able to apply to blog and BBS.<br \/> [<a href=\"http:\/\/ci.nii.ac.jp\/naid\/110002911698\/\">First International Workshop on Knowledge Discovery on Data Streams, 2004<\/a>]<br \/> Furthermore, we have tackled a couple of work that is targeting at community-based question-answering services, and microblogs.<\/p>\n<\/div>\n<div class=\"general-caption\">\n<p>Others<\/p>\n<\/div>\n<p class=\"general-paragraph\">We are working on the following themes in addition to the above.<\/p>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Automatic Generation of Distinctive Explanation for Kanji<\/p>\n<p class=\"general-hidden\">The phonetic alphabet enables people to dictate letters of the alphabet accurately by using representative words, i.e., A for Alpha. Japanese kanji (idiographic Chinese characters) vastly outnumber the letters of the Roman alphabet, and thus Japanese requires an explanatory reading, i.e. distinctive explanation, like a phonetic alphabet. We propose a corpus-based method for automatically generating distinctive explanations for a kanji, in which information about familiarity and homophones of kanji are taken into consideration. <br \/> [<a href=\"http:\/\/69.195.124.161\/~aclwebor\/anthology\/\/C\/C12\/C12-1086.pdf\">COLING2012<\/a>]<\/p>\n<\/div>\n<div class=\"general-indent\">\n<p class=\"general-subsection general-click\">Morphological Analysis for Noisy Text<\/p>\n<p class=\"general-hidden\">In recent years, Consumer Generated Media (CGM) such as Blogs and Social Networking Service (SNS) have become prevalent, and we thus have to deal with texts written by a wide variety of authors. Since there are many types of non-standard tokens such as abbreviations and phonetic substitution in these texts, conventional text analysis tools cannot perform well. In order to alleviate this problem, we proposed a simple but effective approach to unknown word processing in Japanese morphological analysis, which handles unknown words that are derived from words in a pre-defined lexicon and unknown onomatopoeias. <br \/> [<a href=\"http:\/\/69.195.124.161\/~aclwebor\/anthology\/\/I\/I13\/I13-1019.pdf\">IJNLP2013<\/a>]<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Text Summarization Instead of reading a long document, we would sometimes like to read its concise summary. In our research group, we are working on text summarization; we develop methods for generating a summary. We are specifically interested in mathematical models that emulate how summaries are generated from the input documents. Maximum Coverage Summarization Model [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.lr.first.iir.isct.ac.jp\/wp_en\/index.php?rest_route=\/wp\/v2\/pages\/87"}],"collection":[{"href":"https:\/\/www.lr.first.iir.isct.ac.jp\/wp_en\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.lr.first.iir.isct.ac.jp\/wp_en\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.lr.first.iir.isct.ac.jp\/wp_en\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lr.first.iir.isct.ac.jp\/wp_en\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=87"}],"version-history":[{"count":13,"href":"https:\/\/www.lr.first.iir.isct.ac.jp\/wp_en\/index.php?rest_route=\/wp\/v2\/pages\/87\/revisions"}],"predecessor-version":[{"id":123,"href":"https:\/\/www.lr.first.iir.isct.ac.jp\/wp_en\/index.php?rest_route=\/wp\/v2\/pages\/87\/revisions\/123"}],"wp:attachment":[{"href":"https:\/\/www.lr.first.iir.isct.ac.jp\/wp_en\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=87"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}